AI Disclaimer: The part of this post called How certificates work was written by ChatGPT as I asked it to explain how certificates work with regards to PKI. I modified the text somewhat, but you can obviously tell that it is written by an AI because of the wording and structure. Also the featured image was generated by ChatGPT free.
TL;DR You can automate the management of your TLS certificates with Terraform and Let’s Encrypt. This post explains how to do it with Let’s Encrypt as a Proof-of-Concept.
The code 🚀
Certificates are a fundamental part of the security infrastructure of any organization. They are used to secure communication between clients and servers, authenticate users, and protect sensitive data. However, managing certificates can be a complex and time-consuming task, especially in large organizations with many different systems and applications.
Let’s Encrypt is a free, automated, and open Certificate Authority (CA) that provides TLS certificates to secure websites and applications. It simplifies the process of obtaining and renewing certificates, making it easier for organizations to implement secure communication. If you need certificates for applications that do not require extended validation, Let’s Encrypt is a great choice both for cost and ease of use.
PKI is a system that enables secure communication using asymmetric cryptography (public/private key pairs) and digital certificates.
Let’s Encrypt is a free, automated CA that issues X.509 certificates to websites, enabling HTTPS.
When you visit https://codewithme.cloud
:
Server presents its certificate.
Browser checks if it’s signed by a trusted CA.
If valid, browser and server perform a TLS handshake:
All communication is then encrypted. Only the server (with the private key) can decrypt what the browser encrypts with the public key.
There are many ways to automate certificate renewal:
You should pick the one that fits your needs best! The following Terraform configuration is just my way of solving this for my use case.
In a recent project I had to set up an internal service that required a TLS certificate. Since modern certificates are valid for a limited time (47 days going forward), we need to automate the renewal process.
Like I mentioned above, there are several ways to do this, but I wanted to use Terraform because it is my preferred IaC tool. Terraform has a provider for Acme certificates, which I used for…reasons. There is also a provider for Let’s Encrypt, but that is not in scope for this post. I assume the provider used in my example can be replaced with the Let’s Encrypt provider without much hassle if you have some experience with Terraform.
To test this configuration, you need:
Let’s jump right into the Terraform configuration!
First you need to set up the providers and terraform version requirement.
provider "azurerm" {
features {}
}
# It is best practice to use the staging server for testing purposes.
# This will not issue a valid certificate, but it will allow you to test the configuration without hitting the rate limits of the production server.
provider "acme" {
server_url = "https://acme-staging-v02.api.letsencrypt.org/directory"
}
terraform {
required_version = ">= 1.9"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~>4.0"
}
time = {
source = "hashicorp/time"
version = "~>0.12"
}
acme = {
source = "vancluever/acme"
version = "~>2.32"
}
external = {
source = "hashicorp/external"
version = "~>2.3"
}
null = {
source = "hashicorp/null"
version = "~>3.2"
}
}
}
Next, we will define some local variables that will be used throughout the configuration. This is a good place to define tags and other reusable values.
locals {
tags = {
env = "development"
purpose = "auto certificates demo"
}
dns_zone_name = "contoso.com" # Replace with your actual DNS zone name
}
resource "random_pet" "unique" {
length = 2
}
data "azurerm_client_config" "current" {}
resource "azurerm_resource_group" "rg_demo" {
location = "norwayeast"
name = "rg-demo-${random_pet.unique.id}"
}
certificates = {
demo-contoso-com = {
subject = "CN=demo.contoso.com" # Replace with your actual domain name
alternative_dns_names = [
"demo.contoso.com",
"contoso.com"
]
}
}
script_path = "${path.module}/scripts"
pending_csr = { for k, v in data.external.get_csr : k => v.result.csr }
stored_csr = { for k, v in azurerm_key_vault_secret.csr_storage : k => v.value }
csr_lookup = merge(local.stored_csr, local.pending_csr)
You need to create a key vault to store some information including the certificate and some secrets. In this case I will utilize the Azure Verified Module. If you have your own internal module, or just want to use the resource directly, you can do that as well. Remember to update the references accordingly!
It goes without saying that you need to protect your Azure Key Vault both with network firewall and the correct RBAC, but I will mention it several times anyway 😉
module "avm-res-keyvault-vault" {
source = "Azure/avm-res-keyvault-vault/azurerm"
version = "~>0.10"
location = "norwayeast"
name = "kv-demo-${random_pet.unique.id}"
resource_group_name = azurerm_resource_group.rg_demo.name
tenant_id = data.azurerm_client_config.current.tenant_id
enable_telemetry = false
sku_name = "standard"
# FOR DEMO PURPOSES ONLY
# Do not use these settings in production
purge_protection_enabled = false
soft_delete_retention_days = null
public_network_access_enabled = true
network_acls = {
bypass = "AzureServices"
default_action = "Allow"
}
tags = local.tags
}
Automatic certificate renewal in this configuration will be triggered by a time_rotating resource.
# This resource is used to rotate the Key Vault certificate development.
# In production, this should be set to a longer interval, such as 14 days.
resource "time_rotating" "cert_rotation" {
rotation_hours = 1
}
resource "null_resource" "rotate_certificates_trigger" {
triggers = {
value = time_rotating.cert_rotation.id
# Uncomment this to rotate certificates on each terraform run
#value2 = timestamp()
}
}
The key vault certificate resource will create a certificate in the key vault. This is where the certificate will be stored after it is issued by Let’s Encrypt. The terraform resource will create a private key in the key vault, and generate a csr. The csr will be used in the ACME provider to request a certificate from Let’s Encrypt.
resource "azurerm_key_vault_certificate" "certificates" {
for_each = local.certificates
name = each.key
key_vault_id = module.avm-res-keyvault-vault.resource_id
certificate_policy {...redacted for brevity...}
# This lifecycle block is used to ensure that the certificates are rotated correctly
lifecycle {
replace_triggered_by = [null_resource.rotate_certificates_trigger]
}
The ACME certificate resource is where the magic happens. This resource will request a certificate from Let’s Encrypt and store it in the key vault.
# Create the certificates using the ACME provider
# This resource uses the ACME provider to request a signed CSR from Let's Encrypt CA
# This resource will be replaced when automatic certificate renewal is run
resource "acme_certificate" "certificates" {
for_each = local.certificates
account_key_pem = acme_registration.reg.account_key_pem
revoke_certificate_on_destroy = true
certificate_request_pem = <<EOT
-----BEGIN CERTIFICATE REQUEST-----
${local.csr_lookup[each.key]}
-----END CERTIFICATE REQUEST-----
EOT
min_days_remaining = 33
dns_challenge {
provider = "azuredns"
config = {
AZURE_PRIVATE_ZONE = false
AZURE_RESOURCE_GROUP = "dns-zones-rg"
AZURE_ZONE_NAME = local.dns_zone_name
AZURE_AUTH_METHOD = "env"
AZURE_ENVIRONMENT = "public"
}
}
depends_on = [
data.external.get_csr
]
lifecycle {
ignore_changes = [
certificate_request_pem
]
replace_triggered_by = [null_resource.rotate_certificates_trigger]
}
}
As we cannot get the csr from Terraform directly, we need to use an external data source to get the signed CSR from the ACME provider. This will allow us to use the signed CSR in the next step to merge the pending certificate request in the key vault. To save space in this blog post which is already turning out too long, I have opted to not include the output or merge code here. You can find the full code in the GitHub repository.
You need to store the csr somewhere because it is not stored by the ACME provider or the key vault certificate resource. You can’t get the signed CSR from anywhere really after it is merged. I chose to store this in the key vault to prevent replacement of the certificate each terraform run.
resource "azurerm_key_vault_secret" "csr_storage" {
for_each = { for k, v in data.external.get_csr : k => v }
key_vault_id = module.avm-res-keyvault-vault.resource_id
name = "${each.key}-csr"
content_type = "text/plain"
expiration_date = timeadd(timestamp(), "1128h") # 47 days
value = each.value.result.csr
lifecycle {
ignore_changes = [
value,
expiration_date
]
# Forces replacement when the key vault is updated
replace_triggered_by = [azurerm_key_vault_certificate.certificates[each.key]]
}
}
This configuration uses Let’s Encrypt as a proof-of-concept, but I have successfully used a similar configuration with other ACME-compatible certificate authorities. I would recommend investigating if your preferred CA supports the ACME protocol for automatic renewal.
Certificates you create and renew with this process will be actually deleted and recreated every time it is renewed. There are some automatic processes that will do soft delete and recover of the certificates, but it isn’t actually renewed in the traditional sense. It is also recommended to renew the private key each time you request a new certificate, but honestly this is up to you and your internal compliance policies.
LetsEncrypt is maybe not considered a “real” certificate authority for critical services. Consider if you need to use DigiCert or GlobalSign, but in that case you can just add a configuration for integrated Certification Authority in your key vault. This way you get automatic renewal of certificates from a trusted CA, with minimal manual effort.
Azure Key Vault is a great place to store certificates and secrets, but you need to protect it with private endpoints. This requires its own infrastructure like DNS and central connectivity configurations. This is not covered in my post, but you can find information about this online. I recommend Private endpoints over service endpoints, as they provide better security and isolation. You should also use RBAC with least privilege for anyone who needs to access the key vault.
DNS Zone is required for the ACME DNS challenge to work. For simplicity I recommend Azure DNS Zone – which is supported natively by the acme_provider. You need a publicly available DNS zone that you own. Your service principal needs permission to add and remove TXT records in the DNS zone. The service principal credentials are stored in environment variables like mentioned in the DNS section. Creation and management of the DNS zone is not covered in this post.
DNS Challenge is automatically handled by the ACME provider for supported dns providers, but you need to create environment variables for the Azure credentials. NEVER STORE THIS IN GIT! CI/CD-variables would be a good place to store this. Remember that Terraform variables created with TF_VAR_-prefix will be stored in plaintext in your state file. Environment variables are a better choice for sensitive information like this.
Replace_triggered_by is a key component of this configuration. It ensures that the certificate and all related resources are replaced when the time_rotating resource is updated. Autorenew will not work in the key vault as we are not using an “integrated CA”.
If you followed the steps above, you should have a working Terraform configuration that automatically renews your TLS certificates using Let’s Encrypt. It should also be secure by default, and easy to use with supported resources (Application Gateway, Api Management, App Service, etc.).
Please leave a comment if you have any questions or suggestions for improvement. I would love to hear your feedback!