Terraform Security

Automate your certificates with Terraform and Let’s Encrypt

Automate your certificates with Terraform and Let’s Encrypt

AI Disclaimer: The part of this post called How certificates work was written by ChatGPT as I asked it to explain how certificates work with regards to PKI. I modified the text somewhat, but you can obviously tell that it is written by an AI because of the wording and structure. Also the featured image was generated by ChatGPT free.

TL;DR You can automate the management of your TLS certificates with Terraform and Let’s Encrypt. This post explains how to do it with Let’s Encrypt as a Proof-of-Concept.

Jump to recipe

The code 🚀

Certificates

Certificates are a fundamental part of the security infrastructure of any organization. They are used to secure communication between clients and servers, authenticate users, and protect sensitive data. However, managing certificates can be a complex and time-consuming task, especially in large organizations with many different systems and applications.

Let’s Encrypt is a free, automated, and open Certificate Authority (CA) that provides TLS certificates to secure websites and applications. It simplifies the process of obtaining and renewing certificates, making it easier for organizations to implement secure communication. If you need certificates for applications that do not require extended validation, Let’s Encrypt is a great choice both for cost and ease of use.

How certificates work

Public Key Infrastructure (PKI)

PKI is a system that enables secure communication using asymmetric cryptography (public/private key pairs) and digital certificates.

  • Public Key: Shared openly, used to encrypt data or verify signatures.
  • Private Key: Kept secret, used to decrypt data or sign messages.
  • Certificate: A file that binds a public key to an entity (e.g., a website), signed by a Certificate Authority (CA).

Let’s Encrypt

Let’s Encrypt is a free, automated CA that issues X.509 certificates to websites, enabling HTTPS.

  1. The website generates a key pair.
  2. Let’s Encrypt verifies domain ownership via challenge (HTTP-01, DNS-01).
  3. It issues a certificate, signed with the CA’s private key.
  4. Browsers trust Let’s Encrypt, so they trust your certificate.

TLS/HTTPS Example

When you visit https://codewithme.cloud:

  1. Server presents its certificate.

  2. Browser checks if it’s signed by a trusted CA.

  3. If valid, browser and server perform a TLS handshake:

    • Exchange keys.
    • Agree on encryption method.
    • Establish a secure session.

All communication is then encrypted. Only the server (with the private key) can decrypt what the browser encrypts with the public key.

Certificate renewal

There are many ways to automate certificate renewal:

  • A server with a cron job that runs a script using certbot.
  • A container running acme.sh or certbot.
  • A serverless function of some sort that runs Posh-Acme or similar.
  • A managed service that handles certificate issuance and renewal for you.
  • A pipeline of some sort that runs a script or container process to renew the certificate.
  • A Kubernetes cluster with cert-manager.
  • Terraform with an Acme provider.

You should pick the one that fits your needs best! The following Terraform configuration is just my way of solving this for my use case.

My take on automatic renewal of certificates

In a recent project I had to set up an internal service that required a TLS certificate. Since modern certificates are valid for a limited time (47 days going forward), we need to automate the renewal process.

Like I mentioned above, there are several ways to do this, but I wanted to use Terraform because it is my preferred IaC tool. Terraform has a provider for Acme certificates, which I used for…reasons. There is also a provider for Let’s Encrypt, but that is not in scope for this post. I assume the provider used in my example can be replaced with the Let’s Encrypt provider without much hassle if you have some experience with Terraform.

Requirements for testing

To test this configuration, you need:

  • A domain name that you own registered with a DNS provider that supports the ACME DNS challenge.
  • An Azure subscription to host a key vault and optionally a dns zone.
  • Access to a service principal with permissions to manage dns zone.
  • Terraform installed on your local machine.
  • Azure CLI installed on your local machine.

Terraform configuration

Let’s jump right into the Terraform configuration!

Providers and Terraform version

First you need to set up the providers and terraform version requirement.

provider "azurerm" {
  features {}
}

# It is best practice to use the staging server for testing purposes.
# This will not issue a valid certificate, but it will allow you to test the configuration without hitting the rate limits of the production server.

provider "acme" {
  server_url = "https://acme-staging-v02.api.letsencrypt.org/directory"
}

terraform {
  required_version = ">= 1.9"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~>4.0"
    }
    time = {
      source  = "hashicorp/time"
      version = "~>0.12"
    }
    acme = {
      source  = "vancluever/acme"
      version = "~>2.32"
    }
    external = {
      source  = "hashicorp/external"
      version = "~>2.3"
    }
    null = {
      source  = "hashicorp/null"
      version = "~>3.2"
    }
  }
}

Locals and initialization

Next, we will define some local variables that will be used throughout the configuration. This is a good place to define tags and other reusable values.

locals {
  tags = {
    env     = "development"
    purpose = "auto certificates demo"
  }
  dns_zone_name = "contoso.com" # Replace with your actual DNS zone name
}

resource "random_pet" "unique" {
  length = 2
}

data "azurerm_client_config" "current" {}

resource "azurerm_resource_group" "rg_demo" {
  location = "norwayeast"
  name     = "rg-demo-${random_pet.unique.id}"
}

certificates = {
    demo-contoso-com = {
      subject = "CN=demo.contoso.com" # Replace with your actual domain name
      alternative_dns_names = [
        "demo.contoso.com",
        "contoso.com"
      ]
    }
  }

  script_path = "${path.module}/scripts"

  pending_csr = { for k, v in data.external.get_csr : k => v.result.csr }
  stored_csr  = { for k, v in azurerm_key_vault_secret.csr_storage : k => v.value }
  csr_lookup  = merge(local.stored_csr, local.pending_csr)

Key vault

You need to create a key vault to store some information including the certificate and some secrets. In this case I will utilize the Azure Verified Module. If you have your own internal module, or just want to use the resource directly, you can do that as well. Remember to update the references accordingly!

It goes without saying that you need to protect your Azure Key Vault both with network firewall and the correct RBAC, but I will mention it several times anyway 😉

module "avm-res-keyvault-vault" {
  source  = "Azure/avm-res-keyvault-vault/azurerm"
  version = "~>0.10"

  location            = "norwayeast"
  name                = "kv-demo-${random_pet.unique.id}"
  resource_group_name = azurerm_resource_group.rg_demo.name
  tenant_id           = data.azurerm_client_config.current.tenant_id

  enable_telemetry = false
  sku_name         = "standard"

  # FOR DEMO PURPOSES ONLY
  # Do not use these settings in production
  purge_protection_enabled      = false
  soft_delete_retention_days    = null
  public_network_access_enabled = true
  network_acls = {
    bypass         = "AzureServices"
    default_action = "Allow"
  }
  tags = local.tags
}

Time rotating for certificate renewal

Automatic certificate renewal in this configuration will be triggered by a time_rotating resource.

# This resource is used to rotate the Key Vault certificate development.
# In production, this should be set to a longer interval, such as 14 days.
resource "time_rotating" "cert_rotation" {
  rotation_hours = 1
}

resource "null_resource" "rotate_certificates_trigger" {
  triggers = {
    value  = time_rotating.cert_rotation.id
    # Uncomment this to rotate certificates on each terraform run
    #value2 = timestamp()
  }
}

Key Vault Certificate

The key vault certificate resource will create a certificate in the key vault. This is where the certificate will be stored after it is issued by Let’s Encrypt. The terraform resource will create a private key in the key vault, and generate a csr. The csr will be used in the ACME provider to request a certificate from Let’s Encrypt.

resource "azurerm_key_vault_certificate" "certificates" {
  for_each     = local.certificates
  name         = each.key
  key_vault_id = module.avm-res-keyvault-vault.resource_id

  certificate_policy {...redacted for brevity...}
  # This lifecycle block is used to ensure that the certificates are rotated correctly
  lifecycle {
    replace_triggered_by = [null_resource.rotate_certificates_trigger]
  }

ACME certificate resource

The ACME certificate resource is where the magic happens. This resource will request a certificate from Let’s Encrypt and store it in the key vault.

# Create the certificates using the ACME provider
# This resource uses the ACME provider to request a signed CSR from Let's Encrypt CA
# This resource will be replaced when automatic certificate renewal is run
resource "acme_certificate" "certificates" {
  for_each                      = local.certificates
  account_key_pem               = acme_registration.reg.account_key_pem
  revoke_certificate_on_destroy = true
  certificate_request_pem       = <<EOT
-----BEGIN CERTIFICATE REQUEST-----
${local.csr_lookup[each.key]}
-----END CERTIFICATE REQUEST-----
EOT
  min_days_remaining            = 33

  dns_challenge {
    provider = "azuredns"
    config = {
      AZURE_PRIVATE_ZONE   = false
      AZURE_RESOURCE_GROUP = "dns-zones-rg"
      AZURE_ZONE_NAME      = local.dns_zone_name
      AZURE_AUTH_METHOD    = "env"
      AZURE_ENVIRONMENT    = "public"
    }
  }

  depends_on = [
    data.external.get_csr
  ]

  lifecycle {
    ignore_changes = [
      certificate_request_pem
    ]
    replace_triggered_by = [null_resource.rotate_certificates_trigger]
  }
}

Merge pending and output csr

As we cannot get the csr from Terraform directly, we need to use an external data source to get the signed CSR from the ACME provider. This will allow us to use the signed CSR in the next step to merge the pending certificate request in the key vault. To save space in this blog post which is already turning out too long, I have opted to not include the output or merge code here. You can find the full code in the GitHub repository.

Workaround for storing the CSR

You need to store the csr somewhere because it is not stored by the ACME provider or the key vault certificate resource. You can’t get the signed CSR from anywhere really after it is merged. I chose to store this in the key vault to prevent replacement of the certificate each terraform run.

resource "azurerm_key_vault_secret" "csr_storage" {
  for_each        = { for k, v in data.external.get_csr : k => v }
  key_vault_id    = module.avm-res-keyvault-vault.resource_id
  name            = "${each.key}-csr"
  content_type    = "text/plain"
  expiration_date = timeadd(timestamp(), "1128h") # 47 days
  value           = each.value.result.csr

  lifecycle {
    ignore_changes = [
      value,
      expiration_date
    ]
    # Forces replacement when the key vault is updated
    replace_triggered_by = [azurerm_key_vault_certificate.certificates[each.key]]
  }
}

Caveats and considerations

This configuration uses Let’s Encrypt as a proof-of-concept, but I have successfully used a similar configuration with other ACME-compatible certificate authorities. I would recommend investigating if your preferred CA supports the ACME protocol for automatic renewal.

Certificates you create and renew with this process will be actually deleted and recreated every time it is renewed. There are some automatic processes that will do soft delete and recover of the certificates, but it isn’t actually renewed in the traditional sense. It is also recommended to renew the private key each time you request a new certificate, but honestly this is up to you and your internal compliance policies.

LetsEncrypt is maybe not considered a “real” certificate authority for critical services. Consider if you need to use DigiCert or GlobalSign, but in that case you can just add a configuration for integrated Certification Authority in your key vault. This way you get automatic renewal of certificates from a trusted CA, with minimal manual effort.

Azure Key Vault is a great place to store certificates and secrets, but you need to protect it with private endpoints. This requires its own infrastructure like DNS and central connectivity configurations. This is not covered in my post, but you can find information about this online. I recommend Private endpoints over service endpoints, as they provide better security and isolation. You should also use RBAC with least privilege for anyone who needs to access the key vault.

DNS Zone is required for the ACME DNS challenge to work. For simplicity I recommend Azure DNS Zone – which is supported natively by the acme_provider. You need a publicly available DNS zone that you own. Your service principal needs permission to add and remove TXT records in the DNS zone. The service principal credentials are stored in environment variables like mentioned in the DNS section. Creation and management of the DNS zone is not covered in this post.

DNS Challenge is automatically handled by the ACME provider for supported dns providers, but you need to create environment variables for the Azure credentials. NEVER STORE THIS IN GIT! CI/CD-variables would be a good place to store this. Remember that Terraform variables created with TF_VAR_-prefix will be stored in plaintext in your state file. Environment variables are a better choice for sensitive information like this.

Replace_triggered_by is a key component of this configuration. It ensures that the certificate and all related resources are replaced when the time_rotating resource is updated. Autorenew will not work in the key vault as we are not using an “integrated CA”.

In summary

If you followed the steps above, you should have a working Terraform configuration that automatically renews your TLS certificates using Let’s Encrypt. It should also be secure by default, and easy to use with supported resources (Application Gateway, Api Management, App Service, etc.).

  • The private key is stored in the Azure Key Vault, which is a secure place to store sensitive information.
  • If I am interpreting the state and documentation correctly, the private key never leaves the key vault, which means it is not exposed to the outside world.
  • The certificate will work on most web servers and applications that support TLS.
  • The certificate will be automatically renewed every 14 days, so you don’t have to worry about it expiring.
  • The configuration is easy to modify if you need to change the domain name or other settings.

Please leave a comment if you have any questions or suggestions for improvement. I would love to hear your feedback!