High Availability K3s Orchestration on DigitalOcean

The deployment of a container orchestration platform requires a strategic balance between resource efficiency and system resilience. K3s, a lightweight distribution of Kubernetes developed by Rancher Labs, serves as a certified Kubernetes distribution specifically engineered for IoT and edge computing environments. By bundling multiple Kubernetes technologies into a single binary, K3s significantly reduces the complexity associated with the deployment, operation, and maintenance of a cluster. This architectural simplification ensures that the distribution remains fully conformant and secure while minimizing the overhead typically associated with upstream Kubernetes.

In a standard environment, a K3s cluster can be operated on a single host. However, for production-grade workloads, a single-node configuration introduces a catastrophic point of failure. If the virtual machine hosting the K3s server crashes, the entire application stack suffers a total failure. To mitigate this, a High Availability (HA) configuration is implemented. HA ensures that the cluster can tolerate the failure of one or more nodes without interrupting service to end users. This resilience is achieved by deploying multiple server nodes that coordinate through a shared external datastore and a load balancer, ensuring that the Kubernetes API server remains accessible regardless of individual node health.

On DigitalOcean, K3s can be implemented via manual installation for low-cost personal projects or via Infrastructure as Code (IaC) using Terraform for scalable, reproducible environments. DigitalOcean provides the necessary primitive components—such as Droplets, Virtual Private Clouds (VPC), Managed Databases, and Load Balancers—to support these architectures. While DigitalOcean offers a Managed Kubernetes solution, self-hosting K3s allows users to avoid costs associated with managed Persistent Volumes and Load Balancers, providing a more cost-effective alternative for those who prefer manual control over their infrastructure.

Architectural Components of K3s

K3s utilizes a specific "batteries included" stack that defines its operational behavior. This opinionated approach streamlines the process of moving from installation to application deployment.

  • Ingress: Traefik is the default ingress controller, managing how external traffic reaches services within the cluster.
  • Container Networking: Flannel is employed to handle the networking between pods across different nodes.
  • Persistent Storage: The Local Path Provisioner is utilized, which creates persistent volumes under a local path on the host system.

The use of these tools ensures that the cluster is functional immediately upon installation. For users seeking the lowest possible cost, a single droplet configuration—for example, one featuring 2 vCPUs, 4GB of memory, and 80GB of storage—can be used to run personal projects, such as a blog or a Ubiquiti UniFi Controller.

High Availability Infrastructure Design

The transition from a single-node setup to a High Availability cluster requires a shift in the underlying infrastructure. A reference architecture for HA K3s on DigitalOcean involves three primary tiers: the control plane (server nodes), the data layer (external datastore), and the traffic management layer (load balancer).

The control plane consists of multiple server nodes. By adding multiple servers, the cluster can tolerate the failure of one or more nodes. These servers communicate with an external datastore to maintain the cluster state, rather than relying on an embedded database.

The data layer is critical for HA. While MySQL is mentioned as a viable option for the data store, Terraform-based deployments utilize a managed PostgreSQL instance. This external database ensures that the state of the cluster is decoupled from any single node.

The traffic management layer utilizes a TCP load balancer. This provides a stable, single IP address for the Kubernetes API server, directing requests to the healthy server nodes in the cluster.

Terraform Implementation and Resource Provisioning

Deploying an HA K3s cluster via Terraform allows for the programmatic definition of the entire environment. This approach ensures that the infrastructure is version-controlled and reproducible across different regions or projects.

Network and Project Configuration

The foundational layer of the deployment begins with the project and network organization.

  • DigitalOcean Project: A resource named digitalocean_project is created with the name k3s-cluster, categorized under the "Development" environment. This allows for organized resource tracking.
  • Virtual Private Cloud (VPC): A digitalocean_vpc resource named k3s-vpc-01 is provisioned. It utilizes the IP range 10.10.10.0/24 and is located in the fra1 region. The VPC provides a private networking layer, reducing latency and increasing security between the nodes.

Control Plane and Agent Nodes

The compute layer consists of server nodes (control plane) and agent nodes (worker nodes).

  • Server Nodes: Provisioned as digitalocean_droplet resources, these nodes use the ubuntu-20-04-x64 image. They are tagged with k3s_server to allow the load balancer and firewall to identify them.
  • Agent Nodes: Provisioned as digitalocean_droplet resources, these nodes also utilize the ubuntu-20-04-x64 image. They are configured with a size of s-1vcpu-2gb and are tagged as k3s_agent.
  • Node Identification: To ensure unique naming and identification, random_id resources are used for both server and agent nodes, utilizing a byte length of 2.

Database Layer Configuration

To support HA, a managed database is provisioned to serve as the external datastore.

  • Database Cluster: A digitalocean_database_cluster named k3s-ext-datastore is deployed. It uses the pg (PostgreSQL) engine, version 11, with a size of db-s-1vcpu-1gb and a node count of 1.
  • Database User: A digitalocean_database_user named k3s_default_user is created to provide the necessary credentials for the K3s servers to access the datastore.
  • Database Firewall: A digitalocean_database_firewall is implemented to restrict access. A rule is created to allow traffic only from resources tagged as k3s_server, ensuring that the database is not exposed to the public internet.

Load Balancer Specifications

The load balancer acts as the entry point for the Kubernetes API.

  • Load Balancer Resource: A digitalocean_loadbalancer named k3s-api-loadbalancer is created using the lb-small size in the fra1 region.
  • Algorithm: The load balancer uses a round_robin algorithm to distribute traffic across the server nodes.
  • Target Selection: The load balancer targets droplets based on the k3s_server tag.
  • Forwarding Rule: A forwarding rule is established with an entry port of 6443 and a target port of 6443 using the https protocol. tls_passthrough is enabled to allow the K3s API server to handle the TLS termination.
  • Health Checks: A TCP health check is configured on port 6443 with a check interval of 10 seconds, a healthy threshold of 5, and an unhealthy threshold of 3.

Technical Specifications Summary

The following table details the infrastructure components utilized in a Terraform-based HA K3s deployment on DigitalOcean.

Component Resource Name Specification/Value Region
VPC k3s-vpc-01 10.10.10.0/24 fra1
Database Engine k3s-ext-datastore PostgreSQL 11 (db-s-1vcpu-1gb) fra1
Node Image k3s_agent / k3s_server ubuntu-20-04-x64 fra1
Agent Size k3s_agent s-1vcpu-2gb fra1
Load Balancer k3s-api-loadbalancer lb-small fra1
API Port Forwarding Rule 6443 (HTTPS) fra1

Manual Installation and Cluster Management

For users who do not require the complexity of Terraform or High Availability, K3s can be installed manually on a single DigitalOcean droplet.

Installation Process

The installation is performed as the root user on the target droplet. The process involves executing the installation script provided by the official K3s documentation.

Once the installation is complete, the cluster is operational. However, the default interaction is through the k3s client. For professional management, it is recommended to use kubectl.

Accessing the Cluster

To manage the cluster from a local machine, the administrator must retrieve the kubeconfig file generated by the installation.

  • The kubeconfig file is located at /etc/rancher/k3s/k3s.yaml.
  • The contents of this file must be copied to the local machine.
  • Users can then use kubectl to interact with the cluster.
  • For those managing multiple clusters, kubectx is recommended to switch between different cluster contexts efficiently.

Operational Analysis and Comparison

The choice between a single-node K3s deployment and an HA K3s deployment involves a trade-off between cost, complexity, and reliability.

Single-Node Analysis

A single-node deployment is the most cost-effective method for running containerized applications. By avoiding the costs of managed databases and load balancers, users can run services on a budget. However, this architecture is fragile. The failure of the single droplet leads to an immediate outage of all hosted services. This is suitable for development, testing, or non-critical personal projects.

High Availability Analysis

The HA configuration removes the single point of failure. By distributing the control plane across multiple nodes and using an external database, the cluster maintains operational continuity. Even if a server node fails, the load balancer redirects API traffic to the remaining healthy nodes, and the external database ensures that the cluster state remains consistent.

While HA increases the monthly cost due to the requirement for additional droplets, a managed database, and a load balancer, it provides the stability required for production environments. The use of Terraform further enhances this by allowing the infrastructure to be torn down and rebuilt in minutes, reducing the risk of configuration drift.

Conclusion

The deployment of K3s on DigitalOcean represents a powerful intersection of lightweight orchestration and cloud scalability. Whether implemented as a single-node system for cost-efficiency or as a High Availability cluster for production resilience, K3s provides a fully conformant Kubernetes experience with significantly reduced operational overhead.

The transition to High Availability is not merely about adding more nodes, but about implementing a robust architectural pattern. By utilizing an external PostgreSQL datastore and a TCP load balancer, administrators can ensure that their API server remains accessible and their cluster state remains durable. The integration of Terraform transforms this process from a manual, error-prone exercise into a disciplined engineering workflow.

Ultimately, the use of K3s on DigitalOcean allows users to escape the constraints of managed services while maintaining the ability to scale. By leveraging the "batteries included" stack—Traefik, Flannel, and the Local Path Provisioner—users can deploy complex microservices architectures with confidence, knowing that the underlying infrastructure is optimized for both performance and reliability.

Sources

  1. Colin Wilson
  2. Adam Hancock
  3. SUSE Rancher Blog

Related Posts