Engineering Scalable Lightweight Kubernetes Clusters with Ansible and K3s

The convergence of lightweight container orchestration and declarative configuration management has revolutionized the deployment of edge computing and home laboratory environments. K3s, a highly optimized distribution of Kubernetes, provides the necessary foundation for running containerized workloads on resource-constrained hardware, such as mini PCs, without the overhead associated with full-scale Kubernetes (K8s) deployments. When paired with Ansible, an agentless automation engine, the process of bootstrapping these clusters transforms from a manual, error-prone sequence of shell scripts into a repeatable, version-controlled infrastructure-as-code (IaC) pipeline. This synergy allows engineers to move beyond the basic installation scripts provided by the K3s project and instead implement a professional-grade deployment strategy that encompasses firewall configuration, precise versioning, and complex node topologies. By treating the cluster as a software artifact, operators can ensure that their environments are consistent across various Linux distributions, from the minimalist Alpine Linux to the enterprise-grade Rocky Linux and the declarative NixOS.

The K3s Ecosystem and Provisioning Landscape

K3s is designed to be an extensible distribution, and its utility is expanded through a variety of community-driven projects that aim to simplify high availability (HA) and multi-node orchestration. These projects serve as critical bridges between the raw K3s binary and the operational requirements of a production or development cluster.

Strategic Provisioning Tools

Depending on the operator's technical preferences and environment, several tools facilitate the deployment of K3s:

  • k3s-ansible: This is a specialized set of playbooks designed for users who are already familiar with the Ansible ecosystem. It allows the operator to shift their focus from the mechanical process of installation to the strategic configuration of the cluster.
  • k3sup: A tool written in Go (golang) that simplifies the setup process by requiring only SSH access to the target nodes. A primary advantage of k3sup is its ability to deploy K3s with external datastores, bypassing the necessity of the embedded etcd for state management.
  • autok3s: This project provides a Graphical User Interface (GUI), which is essential for users who prefer visual orchestration over command-line interfaces. It supports provisioning across cloud providers, virtual machines, and local hardware.
  • hetzner-k3s: Specifically engineered for the Hetzner Cloud environment, this CLI tool is written in the Crystal language. It automates the specific networking and infrastructure steps required to make K3s operational within the Hetzner ecosystem.

Deep Dive into the Ansible Role for K3S

The use of a dedicated Ansible role for K3s installation provides a structured framework for deploying the software as either a standalone server or a distributed cluster. This approach ensures that the installation is not just a "one-off" event but a documented process.

Controller Requirements and Python Dependencies

For the Ansible role to execute successfully, the control node (the machine running the playbooks) must meet specific software prerequisites. The dependency chain is critical for the handling of network addresses and the execution of the Ansible modules.

  • Python Version: The controller requires python >= 3.6.0. This ensures that the modern syntax and library support required by current Ansible collections are present.
  • Ansible Version: The environment must have ansible >= 2.9.16 or ansible-base >= 2.10.4. This versioning is necessary to support the specific module implementations used in the K3s role.
  • Netaddr: The netaddr >= 1.3.0 package is mandatory. This library is specifically utilized for dual-stack IP address handling, allowing the role to manage both IPv4 and IPv6 environments correctly.

These dependencies can be efficiently managed using a requirements.txt file via the command: pip3 install -r requirements.txt.

Target OS Compatibility and Testing

The robustness of the K3s Ansible role is evidenced by its extensive testing across a diverse array of Linux distributions. This compatibility ensures that the role can be used in heterogeneous environments.

Distribution Supported Version/Variant
Alpine Linux Supported
Archlinux Supported
CentOS 8
Debian 11, 12
Fedora 41, 42
openSUSE Leap 15
Rocky Linux 9, 10
Ubuntu 22.04 LTS, 24.04 LTS

Technical Configuration and Version Management

A critical aspect of managing K3s through Ansible is the ability to pin specific releases and manage the configuration method, moving away from volatile environment variables toward persistent configuration files.

Versioning and Channel Control

The variable k3s_release_version allows the operator to dictate exactly which version of K3s is deployed. If this variable is omitted, the role defaults to the latest stable release.

  • Stable Channel: Setting k3s_release_version: stable or k3s_release_version: false installs the latest stable release.
  • Testing Channel: Setting k3s_release_version: testing deploys the latest version from the testing channel, which is useful for evaluating new features.
  • Version Pinning: An operator can specify a specific version, such as k3s_release_version: v1.19, to get the latest release in that branch, or a highly specific version like k3s_release_version: v1.19.3+k3s3.
  • Commit-Based Deployment: For extreme testing scenarios, a specific 40-character git commit hash can be used (e.g., k3s_release_version: 48ed47c4a3e420fa71c18b2ec97f13dc0659778b).

The Transition to Configuration Files

Historically, K3s was configured using command-line arguments passed to the systemd unit file. However, starting with K3s v1.19.1+k3s1, a configuration file method was introduced. The v2 release of the Ansible role has fully transitioned to this method. This change is significant because it separates the execution logic (the systemd unit) from the configuration logic (the YAML file), making the cluster easier to audit and modify without altering the service definition.

Advanced Node Configuration and Orchestration

The Ansible role utilizes specific dictionary variables to define the behavior of the control plane and the worker nodes.

Control Plane Configuration (k3s_server)

The k3s_server variable is a dictionary that maps directly to K3s server flags. The -- prefix is removed when defining these in Ansible.

Example configuration for the control plane: yaml k3s_server: datastore-endpoint: postgres://postgres:verybadpass@database:5432/postgres?sslmode=disable cluster-cidr: 172.20.0.0/16 flannel-backend: 'none' disable: - traefik - coredns In this configuration, the operator is using an external PostgreSQL database as the datastore instead of the default etcd, and is disabling the default Traefik ingress and CoreDNS to provide their own alternatives.

Worker Node Configuration (k3s_agent)

Similarly, the k3s_agent variable manages the worker nodes. This allows for granular control over how agents join the cluster.

Example configuration for agents: yaml k3s_agent: with-node-id: true node-label: - "foo=bar" - "hello=world" The use of node labels is essential for scheduling workloads to specific hardware based on capabilities or availability.

Externalizing Configuration

To maintain clean playbooks, the role supports reading configuration from external YAML files using the lookup plugin: k3s_server: "{{ lookup('file', 'path/to/k3s_server.yml') | from_yaml }}" k3s_agent: "{{ lookup('file', 'path/to/k3s_agent.yml') | from_yaml }}"

Execution Logic and Privilege Management

The Ansible role includes several flags to control the execution flow, specifically regarding validation and privilege escalation.

Variable Description Default Value
k3sskipvalidation Skips tasks that validate the configuration false
k3sskipenv_checks Skips environment configuration checks false
k3sskippost_checks Skips verification of post-execution state false
k3s_become Enables privilege escalation for root tasks false

Python Interpreter Enforcement

Since version 3 of the role, Python 3 is required on both the controller and the target system. Because some legacy Linux systems may still have Python 2 installed as the default /usr/bin/python, the operator must explicitly set the ansible_python_interpreter.

Example Inventory Configuration:

```yaml

k3scluster: hosts: kube-0: ansibleuser: ansible ansiblehost: 10.10.9.2 ansiblepythoninterpreter: /usr/bin/python3 kube-1: ansibleuser: ansible ansiblehost: 10.10.9.3 ansiblepythoninterpreter: /usr/bin/python3 kube-2: ansibleuser: ansible ansiblehost: 10.10.9.4 ansiblepython_interpreter: /usr/bin/python3 ```

Declarative Home Lab Implementations

Integrating K3s into a broader home lab strategy often involves combining it with other declarative tools to ensure the entire stack is reproducible.

NixOS and Rocky Linux Integration

The use of NixOS provides a highly declarative approach to infrastructure. When combined with Rocky Linux, Ansible serves as the common denominator for provisioning. In these environments, the low resource requirement of K3s makes it the only viable choice for container orchestration on mini PCs.

A critical real-world implementation detail when deploying K3s on Rocky Linux involves the firewall. While the K3s installation is a simple shell script, the network requires specific firewall rules to classify traffic originating from the container network as "trusted." Using Ansible to automate these rules prevents the manual overhead that occurs when scaling to multiple nodes.

The Role of Helm in K3s Management

While K3s manages the orchestration of containers, Helm serves as the package manager for Kubernetes. As the number of services in a lab grows, managing raw Kubernetes manifests becomes unmanageable. Helm allows the operator to define services as charts, simplifying the deployment and upgrading of complex applications.

Observability and System Maintenance

A production-ready K3s cluster requires a monitoring stack to ensure health and performance visibility.

The Observability Stack

The implementation of a Prometheus and Grafana stack is the most direct path to achieving observability. This allows the operator to monitor the resource usage of the mini PCs and the health of the K3s pods.

Critical Update Considerations

When utilizing the system-upgrade-controller for K3s updates, operators must be cautious regarding file linking. The controller cannot follow symbolic links; therefore, the role must use hard links to ensure the upgrade process does not fail.

Conclusion

The deployment of K3s via Ansible represents a shift toward professionalized home lab and edge infrastructure. By leveraging specific Ansible roles, operators can achieve a level of precision in versioning (down to the git commit), environment validation, and node configuration that is impossible with manual installation. The transition from environment variables to configuration files and the integration of Helm for manifest management further solidify this architecture as a scalable solution. Whether deploying on the immutable layers of NixOS or the stable environment of Rocky Linux, the combination of K3s and Ansible ensures that the infrastructure is not only lightweight but entirely declarative, allowing for rapid recovery and consistent scaling across any number of nodes.

Sources

  1. K3s Related Projects
  2. PyratLabs Ansible Role K3S GitHub
  3. Building a Declarative Home Lab using K3s, Ansible, Helm on NixOS and Rocky Linux

Related Posts