K3s TLS Automation via Traefik and Cert-Manager ACME Integration

The implementation of Transport Layer Security (TLS) within a K3s cluster is a critical requirement for any production-grade deployment, transforming an insecure HTTP stream into an encrypted HTTPS connection. K3s, a lightweight Kubernetes distribution, simplifies this process by bundling Traefik as the default Ingress controller. Traefik possesses native capabilities to interface with Let’s Encrypt using the Automated Certificate Management Environment (ACME) protocol. This integration allows the cluster to automatically request, prove ownership of, and renew SSL/TLS certificates without manual intervention.

At the foundational level, K3s manages its own internal Public Key Infrastructure (PKI). During the initial startup of the first server node, K3s generates self-signed Certificate Authority (CA) certificates. These internal certificates are valid for a period of 10 years from the date of issuance and are not subject to automatic renewal. The authoritative CA certificates and keys are secured within the datastore's bootstrap key, utilizing AES256-GCM and HMAC-SHA1 encryption with the server token acting as the PBKDF2 passphrase. While these internal certificates handle cluster-level communication and node joining, they are unsuitable for public-facing web traffic because browsers do not trust self-signed authorities. This necessitates the integration of a trusted external CA, such as Let’s Encrypt.

The architectural choice between using Traefik's built-in ACME support and deploying a dedicated cert-manager instance represents a trade-off between simplicity and flexibility. Traefik’s native integration is streamlined, requiring only a few configuration arguments to begin issuing certificates. However, it stores these certificates in files on the disk. In contrast, cert-manager is a specialized Kubernetes operator that manages the entire lifecycle of certificates, storing them as Kubernetes Secrets. This secret-based storage is generally preferred in professional DevOps environments as it allows for better integration with Kubernetes' native resource management, enables easier backup via Velero or similar tools, and supports complex DNS-01 challenge workflows for private clusters.

Prerequisites for Let's Encrypt Integration

Before attempting to enable TLS in a K3s environment, several infrastructure requirements must be satisfied to ensure the ACME challenge can be completed successfully.

  • Public Domain Name: A registered domain (e.g., me.example.com) is required. This domain serves as the identity that Let’s Encrypt will verify.
  • DNS Configuration: The DNS records for the domain must be configured to point to the external IP address of the K3s cluster. This can be achieved using standard A records or through dynamic DNS providers.
  • Network Access: Ports 80 and 443 must be open on the firewall and forwarded from the external router or cloud load balancer to the K3s nodes. Port 80 is essential for the HTTP-01 challenge, while Port 443 is required for the actual HTTPS traffic.
  • K3s Installation: The cluster must be running with Traefik enabled. If Traefik was disabled during installation via the --disable traefik flag, it must be reinstalled to serve as the Ingress controller.

The following table outlines the common DNS providers often used in conjunction with K3s for this purpose:

Provider Type Examples Primary Use Case
Managed DNS Cloudflare, AWS Route53 High reliability, API support for DNS-01
Dynamic DNS DuckDNS Home labs, residential IPs with no static IP
Custom Registrar Namecheap, Google Domains Professional corporate branding

Native Traefik ACME Configuration

For users seeking a rapid deployment, Traefik's built-in Let's Encrypt support is the most efficient path. This method utilizes the HTTP-01 challenge, where Let’s Encrypt verifies domain ownership by attempting to access a specific file on the server via port 80.

To initiate this configuration, a HelmChartConfig manifest must be created. This is performed on one of the control-plane nodes. The manifest should be placed at the specific path /var/lib/rancher/k3s/server/manifests/traefik-config.yaml. K3s employs a watcher mechanism that detects new files in this directory and automatically applies the configuration without requiring a manual cluster restart.

The configuration requires the definition of several additionalArguments to enable the ACME resolver:

yaml apiVersion: helm.cattle.io/v1 kind: HelmChartConfig metadata: name: traefik namespace: kube-system spec: valuesContent: |- additionalArguments: - "[email protected]" - "--certificatesresolvers.default.acme.storage=/data/acme.json" - "--certificatesresolvers.default.acme.httpchallenge.entrypoint=web" ports: web: exposedPort: 80 websecure: exposedPort: 443

The impact of these specific arguments is as follows:

  • acme.email: This provides Let’s Encrypt with a contact address to send notifications regarding certificate expiry or account issues.
  • acme.storage: This defines the file path /data/acme.json where the private keys and certificates are stored.
  • acme.httpchallenge.entrypoint: This instructs Traefik to use the web entrypoint (Port 80) to solve the ACME HTTP-01 challenge.

Once the file is saved, the Traefik pods will restart to apply the new settings. Verification can be performed using the following command:

bash kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik

To monitor the actual ACME handshake and ensure the registration with https://acme-v02.api.letsencrypt.org is successful, users should stream the logs:

bash kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=100 -f

Deploying the Ingress Resource for HTTPS

After the ACME resolver is configured in Traefik, an Ingress resource must be defined to tell Traefik which domains should use TLS and which backend service they should route to.

A sample Ingress manifest for a domain like me.example.com is structured as follows:

yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: me-ingress annotations: kubernetes.io/ingress.class: "traefik" traefik.ingress.kubernetes.io/router.entrypoints: websecure traefik.ingress.kubernetes.io/router.tls.certresolver: default spec: rules: - host: me.example.com http: paths: - path: / pathType: Prefix backend: service: name: me-loadbalancer port: number: 9000 tls: - hosts: - me.example.com

In this configuration, the annotation traefik.ingress.kubernetes.io/router.tls.certresolver: default is the critical link. It tells Traefik to use the default resolver defined in the HelmChartConfig to fetch the certificate. The tls block under spec explicitly declares that the host me.example.com must be served over HTTPS.

Once the manifest is applied using kubectl apply -f me-ingress.yaml, there is typically a 30 to 60 second delay while Traefik communicates with Let's Encrypt, performs the challenge, and installs the certificate.

Advanced Certificate Management with Cert-Manager

While Traefik's built-in support is convenient, cert-manager provides a more robust framework for enterprise environments. The primary driver for choosing cert-manager is the ability to perform DNS-01 challenges.

DNS-01 verification is superior to HTTP-01 in several scenarios:

  • Private Clusters: If the K3s cluster is hosted in a private network with no external access to port 80, HTTP-01 will fail. DNS-01 allows verification by creating a TXT record in the DNS provider.
  • Firewall Restrictions: DNS-01 avoids issues with Web Application Firewalls (WAF) or corporate firewalls that might block ACME probe traffic.
  • Wildcard Certificates: Let's Encrypt only allows the issuance of wildcard certificates (e.g., *.example.com) via DNS-01 challenges.

When utilizing cert-manager with a provider like AWS Route53, the Kubernetes EC2 instances (nodes) must be assigned an IAM role that grants permission to update DNS records.

Implementing the DNS-01 Workflow

To implement this, a ClusterIssuer is first created to define the Let's Encrypt account and the DNS provider credentials. Once the issuer is ready, a Certificate resource is requested:

yaml apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: test-certificate namespace: test-cert spec: secretName: test-example-tls issuerRef: name: letsencrypt-dns01-staging-issuer kind: ClusterIssuer dnsNames: - k3s.maxdon.tech

The secretName field is particularly important because it specifies where cert-manager will store the resulting certificate and private key as a Kubernetes Secret. This differs from Traefik's file-based storage.

To verify the status of the certificate issuance, use the following command:

bash kubectl get certificate -n test-cert

Expected output should show a READY status of True, indicating that the DNS challenge was successful and the secret test-example-tls has been populated.

Integrating Cert-Manager Secrets with Traefik

When using cert-manager, Traefik does not automatically "know" about the certificate since it is stored in a Secret rather than Traefik's internal ACME storage. Therefore, the Ingress or Traefik Custom Resource Definition (CRD) must explicitly reference the secret.

Example of a routing rule that links to a cert-manager secret:

yaml routes: - match: Host(`k3s.maxdon.tech`) kind: Rule services: - name: nginx-web-service port: 80 tls: secretName: k3s-maxdon-tech-tls

The secretName must match the secretName defined in the Certificate resource spec. This architecture decouples the certificate issuance (handled by cert-manager) from the certificate usage (handled by Traefik).

Internal K3s PKI and CA Rotation

While Let's Encrypt handles external traffic, K3s maintains an internal PKI for cluster operations. Understanding the internal CA is vital for maintaining cluster health and security.

K3s generates its own self-signed CA certificates during the initial server startup. These are used to issue leaf certificates for any nodes that join the cluster. If the internal CA certificates need to be updated, K3s provides a specific utility:

bash k3s certificate rotate-ca

The rotation process follows a strict safety protocol:
1. Integrity Check: The command verifies that the updated certificates and keys are usable.
2. Atomic Update: If the validation is successful, the encrypted bootstrap key in the datastore is updated.
3. Activation: The new certificates and keys take effect upon the next K3s restart.
4. Rollback: If any errors occur during validation, the operation is cancelled, and the system log records the failure without applying changes.

Security Considerations for Cluster CA Management

A critical security warning exists regarding the reuse of Certificate Authorities across different environments. It is strongly recommended not to share Root or Intermediate CAs across multiple clusters, nor to use an existing private CA as the K3s cluster CA.

The risk associated with CA reuse stems from the "Root of Trust" concept. If two clusters share a common root CA, any client certificate issued by Cluster A will be inherently trusted by Cluster B. This creates a massive security vulnerability where a user with a valid kubeconfig for a development cluster could potentially authenticate to a production cluster if the RBAC (Role-Based Access Control) configurations are similar. To maintain strict isolation, every K3s cluster should maintain its own unique, independent CA.

Comparative Analysis of TLS Implementation Strategies

The choice of TLS strategy in K3s depends on the deployment environment and the desired level of automation.

Feature Traefik Native ACME Cert-Manager + Traefik K3s Internal CA
Primary Use Case Simple public websites Complex/Private/Wildcard Internal cluster traffic
Storage Mechanism Local JSON file (acme.json) Kubernetes Secrets Datastore/Disk
Challenge Type HTTP-01 HTTP-01 and DNS-01 Internal Issuance
Setup Complexity Low Medium to High Automatic (on start)
Trust Level Publicly Trusted Publicly Trusted Privately Trusted
Wildcard Support No Yes (via DNS-01) N/A

Conclusion

Securing a K3s cluster with Let's Encrypt requires a strategic decision between the convenience of Traefik's integrated ACME resolver and the power of cert-manager. For simple deployments where port 80 is accessible and only standard domain certificates are needed, Traefik's native implementation via HelmChartConfig is optimal, as it eliminates the overhead of managing additional operators.

However, for professional DevOps workflows, cert-manager is the superior choice. By leveraging DNS-01 challenges, it removes the requirement for open inbound ports, allowing for the securing of private clusters and the use of wildcard certificates. The shift from file-based storage to Kubernetes Secrets ensures that the security posture of the cluster is aligned with Kubernetes best practices.

Furthermore, it is essential to distinguish between external TLS (Let's Encrypt) and internal PKI (K3s CA). While Let's Encrypt provides the "face" of the application to the internet, the internal K3s CA ensures the integrity of the control plane and node communication. Regular maintenance of the internal CA via the rotate-ca command and strict adherence to the rule of not sharing CAs across clusters are mandatory for preventing unauthorized cross-cluster authentication. By combining these two layers of security, administrators can ensure that their K3s deployments are both accessible to the public via trusted HTTPS and internally secure against lateral movement.

Sources

  1. How to enable Let's Encrypt TLS in K3s with Traefik
  2. Cert-manager and Let's Encrypt Gist
  3. K3s Raspberry TLS and Cert-Manager Guide
  4. K3s CLI Certificate Documentation

Related Posts