SUSE Rancher Kubernetes Distribution Architecture: K3s and RKE2

The landscape of container orchestration has shifted from monolithic cluster deployments to highly specialized distributions tailored for specific environmental constraints. Within the SUSE Rancher ecosystem, two primary distributions emerge as the standard for production-ready workloads: K3s and RKE2. While both are rooted in the Kubernetes API and are designed to be managed via the Rancher orchestration platform, they serve fundamentally different operational philosophies. K3s was born out of a necessity for speed and minimalism, stripping away the "bloat" of stock Kubernetes to enable deployments in resource-constrained environments. Conversely, RKE2—often referred to as RKE Government—was engineered to bring that same ease of deployment to the most stringent security environments on earth, focusing on compliance, hardening, and federal standards. Understanding the intersection of these two tools is critical for any DevOps engineer or system architect tasked with deploying workloads that range from a single IoT sensor in a remote wind farm to a highly regulated government database in a secure data center.

The Genesis and Philosophy of K3s

The creation of K3s was driven by a practical frustration experienced by Darren Shepherd at Rancher Labs. During the development of a project called Rio, Shepherd found that the time required to spin up a new Kubernetes cluster for every test cycle was a significant bottleneck to productivity. The core hypothesis was simple: reducing the time to bring clusters online would directly accelerate the coding process, which in turn would accelerate the shipping of new features.

Leveraging existing knowledge of the Rancher Kubernetes Engine (RKE) and the underlying Kubernetes source code, Shepherd began a process of aggressive pruning. He identified and removed extraneous content and components that were not strictly necessary for running workloads in a streamlined environment. The result was not a fork of Kubernetes—which would have implied a divergent set of features or APIs—but rather a lightweight distribution. K3s remains a certified Kubernetes distribution, meaning it does not change core Kubernetes functionalities and stays closely aligned with stock Kubernetes, ensuring that manifests and tools designed for standard K8s work seamlessly on K3s.

The primary objective of K3s is to provide a "batteries-included" experience. It is designed for users who want to focus entirely on their applications rather than the intricacies of the environment running those applications. This philosophy manifests in its binary size, which is kept under 60MB, and its minimal resource requirements, making it the definitive choice for edge computing and developer workstations.

RKE2: The Security-First Evolution

While K3s optimizes for size and speed, RKE2 (Rancher Kubernetes Engine 2) optimizes for security and compliance. RKE2 evolved from the original RKE project but adopted the usability ethos established by K3s, specifically the single-binary deployment model. However, the target audience for RKE2 is significantly different. It is tailored for security-critical workloads and regulated environments, such as those operated by government agencies.

The defining characteristic of RKE2 is its commitment to a hardened security baseline. This is most evident in its support for Federal Information Processing Standard (FIPS) 140-2, a U.S. government computer security standard that uses validated cryptography. Furthermore, RKE2 is designed to meet DISA STIG (Defense Information Systems Agency Security Technical Implementation Guides) compliance, ensuring that the cluster is configured according to the strict security requirements of the Department of Defense.

RKE2 is not merely for the edge; it is equally at home in traditional data centers. It provides advanced networking capabilities, including built-in support for the Multus CNI plugin, which allows pods to attach to multiple network interfaces—a requirement for many advanced telecommunications and high-performance computing workloads.

Architectural Comparison and Deployment Models

Both K3s and RKE2 employ a single-binary deployment model. This means that all necessary dependencies are bundled into a single download, drastically lowering the barrier to entry for users with minimal Kubernetes experience. This approach eliminates the "dependency hell" often associated with manually installing various Kubernetes components like kubelet, kube-proxy, and the API server.

For organizations operating in high-security environments, both distributions support air-gapped deployments. In an air-gapped scenario, machines are physically separated from the internet or external networks to prevent unauthorized access or data exfiltration. SUSE provides air-gapped images as release artifacts. The deployment workflow involves transferring these images to the target machine and then running the standard installation script, which bootstraps the cluster using the locally available images.

The following table provides a detailed comparison of the technical specifications and target use cases for K3s and RKE2.

Feature K3s RKE2
Binary Size Under 60MB Larger (includes security layers)
Primary Goal Resource Efficiency / Speed Security / Compliance
Compliance Standard K8s Certified FIPS 140-2 / DISA STIG
Ideal Environment IoT, Edge, Dev Workstations Government, Secure Data Centers
Networking Standard K8s / Lightweight Advanced (e.g., Multus support)
Complexity Very Low (Stripped bloat) Low (Single binary, but hardened)
Management Rancher / Fleet Rancher / Fleet

K3s Operational Capabilities and Edge Integration

The lightweight architecture of K3s does more than just save disk space; it directly impacts performance. Because it lacks the overhead of stock Kubernetes, it is faster for the specific workloads it is designed to run. This efficiency allows K3s to operate in environments where standard Kubernetes would be physically impossible to host.

K3s is specifically engineered for "hardened equipment" used in harsh environments. These locations often suffer from limited or intermittent connectivity, which would cause a standard Kubernetes control plane to fail or behave erratically. K3s handles these constraints gracefully, making it viable for deployment in:

  • Satellites and aerospace applications
  • Airplanes and aviation systems
  • Submarines and maritime environments
  • Automotive vehicles and transit systems
  • Wind farms and remote energy grids
  • Retail locations and point-of-sale systems
  • Smart city infrastructure and urban sensors

To facilitate the rapid bootstrapping of these remote clusters, K3s includes an integrated Helm controller. This controller allows administrators to install Helm packages via a HelmChart manifest. Furthermore, any Kubernetes manifest placed in a specific directory on the control plane node—either before or after the installation process—will be automatically installed into the cluster. This mechanism ensures that a cluster can come online with every required application already running, requiring zero external intervention once the binary is executed.

RKE2 Installation and File System Structure

Deploying RKE2 provides the user with a set of specific binaries and scripts designed for lifecycle management. After a successful installation, three primary components are made available to the user:

  • rke2: This is the primary RKE2 binary used to manage the cluster.
  • rke2-killall.sh: A specialized script used to terminate all RKE2-related processes, including the underlying container runtime.
  • rke2-uninstall.sh: A destructive script that completely wipes RKE2 from the node, including all downloaded images.

Beyond these primary scripts, additional binaries are located in the /var/lib/rancher/rke2/bin directory. These include:

  • kubectl: The standard Kubernetes command-line tool, version-matched to the installed Kubernetes release.
  • crictl: The CRI-O CLI used to interact with the Container Runtime Interface (CRI).
  • ctr: The ContainerD CLI.

A known operational quirk exists regarding crictl and ctr. By default, these tools may not function correctly because the configuration does not point to the correct containerd socket. To resolve this, users must manually export three environment variables:

bash export CONTAINER_RUNTIME_ENDPOINT=/run/k3s/containerd/containerd.sock export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock export CONTAINERD_NAMESPACE=k8s.io

It is important to note that the socket resides in /run/k3s even when using RKE2, which is a specific architectural detail of the Rancher implementation.

For configuration management, RKE2 utilizes a structured file system approach. Administrators typically generate a config.yaml for every node type located in /etc/rancher/rke2, or they utilize multiple configuration files within the /etc/rancher/rke2/config.d/* directory. Additional specialized configurations, such as registries.yaml for container registry mirrors, are placed in /etc/rancher/rke2/registries.yaml. Static manifests, which are applied upon startup, are stored in /var/lib/rancher/rke2/server/manifests.

Data Persistence, Backups, and Cluster Recovery

One of the most significant challenges in Kubernetes operations is the reliable backup and restoration of the state store. K3s and RKE2 address this by implementing a mirrored approach to snapshots. Both distributions automatically write snapshots of the cluster state and retain them for a period that can be configured by the administrator.

In the event of a catastrophic failure or the need to revert to a previous state, a cluster can be restored using the --cluster-reset flag. The specific command differs slightly based on the distribution being used:

For K3s restoration:
bash k3s server \ --cluster-reset \ --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/etcd-old-<BACKUP_DATE>

For RKE2 restoration:
bash rke2 server \ --cluster-reset \ --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/etcd-old-<BACKUP_DATE>

This snapshotting mechanism ensures that the etcd database (or the alternative backing datastore used by K3s) can be recovered without requiring a complete re-installation of the operating system or the Kubernetes binary, reducing Mean Time to Recovery (MTTR) in production environments.

High Availability and Fleet Management

Both K3s and RKE2 are built for production use and support High Availability (HA) for the control plane. While K3s has a reputation as the ideal choice for single-node clusters due to its tiny footprint, it possesses robust multi-node management capabilities. This allows K3s to scale from a single Raspberry Pi to fleets of thousands of IoT devices.

The integration with the broader Rancher ecosystem elevates both distributions from simple binaries to enterprise-grade platforms. By using Rancher, administrators can manage both K3s and RKE2 clusters through a single pane of glass. A critical component of this is Fleet, a GitOps-driven deployment tool. Fleet allows operators to deploy applications at scale across thousands of clusters using a "pull" model, where the clusters check into a central repository for their desired state. This combination of a lightweight or hardened distribution and a centralized management layer via Rancher and Fleet provides a complete lifecycle management solution for containerized workloads.

Decision Matrix: Selecting the Right Distribution

The choice between K3s and RKE2 depends entirely on the environmental constraints and the sensitivity of the workloads being deployed.

K3s should be the primary choice when:
- The target hardware is resource-constrained (low RAM, low CPU).
- The deployment target is at the "edge" (IoT, remote sensors, vehicles).
- Rapid iteration and developer productivity are the priority.
- The environment does not require strict federal or government security certifications.
- The goal is to focus on application development rather than infrastructure hardening.

RKE2 should be the primary choice when:
- The workload is security-critical or handles highly sensitive data.
- FIPS 140-2 compliance is a mandatory legal or contractual requirement.
- The deployment is destined for a government agency or a highly regulated industry (e.g., defense, banking).
- The infrastructure requires advanced networking via Multus.
- The deployment is located in a traditional data center but requires a hardened security baseline.

It is important to reiterate that K3s is not "less production-ready" than RKE2; rather, it is production-ready for different contexts. K3s is a "batteries-included" project for agility, whereas RKE2 is a "fortified" project for security.

Conclusion

The divergence between K3s and RKE2 represents a strategic understanding of the Kubernetes ecosystem's evolution. By splitting the "lightweight" and "hardened" paths, SUSE Rancher provides a specialized tool for every possible deployment scenario. K3s solves the problem of Kubernetes bloat, enabling the orchestration of containers in places that were previously unreachable, from the depths of the ocean in submarines to the vacuum of space in satellites. Its integration of Helm and auto-manifest directories creates a seamless bootstrapping experience that is essential for unmanned edge locations.

RKE2, on the other hand, solves the problem of trust and compliance. By incorporating FIPS 140-2 and DISA STIG standards into a single-binary deployment model, it removes the friction of securing a cluster manually, which is often an error-prone and tedious process. The ability to maintain a healthy security baseline out-of-the-box makes it the gold standard for government and defense sectors.

Ultimately, both distributions share a common lineage of simplicity and efficiency. Whether an organization is deploying a fleet of smart city sensors using K3s or a secure federal database using RKE2, the underlying synergy with the Rancher platform ensures that the operational overhead remains low. The ability to perform cluster resets from snapshots, deploy via air-gapped images, and manage everything through GitOps via Fleet ensures that regardless of the distribution chosen, the resulting infrastructure is resilient, scalable, and maintainable.

Sources

  1. SUSE Rancher Blog
  2. Traefik Glossary
  3. Rancher Products
  4. Consol IT Consulting

Related Posts