K3s IPVS Integration and Network Fabric Optimization

The implementation of IPVS (Internet Protocol Virtual Server) within a K3s environment represents a strategic shift in how network traffic is load-balanced across a Kubernetes cluster. While the default kube-proxy implementation typically relies on iptables, IPVS is designed to provide superior scalability and performance, particularly in clusters with a massive number of services. In a standard iptables-based configuration, the system must evaluate a linear list of rules for every packet, which can lead to significant latency as the service count grows. IPVS, conversely, utilizes hash tables to look up services, ensuring that the time required to route a packet remains constant regardless of the number of services defined in the cluster. However, the transition to IPVS in K3s is not without critical pitfalls, particularly concerning specific version compatibility, kernel requirements, and the intricate interplay between dual-stack networking and node-level IP prioritization.

IPVS Implementation and Version-Specific Critical Failures

The adoption of IPVS in K3s requires a precise alignment of software versions and system dependencies. A catastrophic failure has been documented in K3s version v1.26.3+k3s1 (built with go1.19.7) when running on Debian 11 (kernel 5.10.0-21-amd64) in specific cloud environments like Hetzner Cloud VPS.

When IPVS is enabled in this specific version, the cluster's traffic flow can be completely severed. This occurs as soon as the firewall rules and ipvsadm policies are enacted. The impact is comprehensive: internal cluster communication is broken, and external access to critical management ports is lost. Specifically, port 22 (SSH) and port 6443 (the Kubernetes API server) become unreachable. This renders the server inaccessible via remote shell and prevents any external orchestration tools from communicating with the cluster.

Beyond connectivity loss, the operational state of the cluster degrades to a point where pods cannot be scheduled or started. For instance, attempting to install a CNI (Container Network Interface) such as Calico during this failure state results in a total communication breakdown; the Calico pods are unable to reach the API server, leading to a largely failed cluster state.

To resolve this failure, the following paths are available:

  • Upgrade the cluster to version 1.27 or newer, where these regressions are addressed.
  • Apply a specific kubelet argument to prioritize traffic families to avoid the bug in version 1.26.

For those forced to remain on the affected version, the following commands must be added to both K3s servers and agents:

To prioritize IPv4 traffic:
--kubelet-arg="node-ip=0.0.0.0"

To prioritize IPv6 traffic:
--kubelet-arg="node-ip=::"

Prerequisites and Initial Installation Sequence

Before enabling IPVS mode in K3s, the underlying host operating system must be equipped with the necessary tooling to manage IPVS rules. Without these utilities, the kube-proxy cannot effectively manipulate the kernel's load-balancing tables.

On Debian-based systems, the required packages are ipset and ipvsadm. These tools allow the system to create sets of IP addresses and manage the virtual server entries that IPVS uses to redirect traffic.

The installation of K3s with IPVS enabled is performed during the initial server bootstrap process using a specific kube-proxy argument. The command sequence is as follows:

curl -sfL https://get.k3s.io | sh -s - server --kube-proxy-arg proxy-mode=ipvs

This command instructs the K3s installer to set the proxy-mode of kube-proxy to ipvs instead of the default iptables. This change fundamentally alters how the cluster handles Service-to-Pod traffic, shifting the burden from the netfilter firewall to the IPVS kernel module.

Advanced Network Fabric and Flannel Configuration

K3s utilizes Flannel as its default lightweight L3 network fabric, which implements the CNI standard. Flannel provides the necessary encapsulation to allow pods on different nodes to communicate.

Flannel Backend Options

The choice of Flannel backend significantly impacts the security and performance of the cluster.

  • vxlan: This is the default backend. It encapsulates Layer 2 Ethernet frames within Layer 3 UDP packets. It is widely compatible but provides no native encryption.
  • wireguard-native: This backend enables native encryption for the cluster network. Implementing this requires specific kernel modules to be present on every node (both servers and agents) before K3s is started. Users must follow a dedicated WireGuard installation guide to ensure the environment is prepared.
  • none: This option is used when the user intends to replace Flannel with a custom CNI provider.

Flannel Configuration Parameters

Flannel options are restricted to server nodes and must be kept identical across all servers to prevent routing inconsistencies.

CLI Flag Description
--flannel-ipv6-masq Applies masquerading rules to IPv6 traffic. This is essential for dual-stack or IPv6-only clusters to ensure pods can communicate with the external internet.
--flannel-external-ip Forces Flannel to use the node's external IP address for traffic destination rather than internal IPs, provided --node-external-ip is also configured.

Dual-Stack and Single-Stack IPv6 Architectures

Modern K3s deployments often require the ability to handle both IPv4 and IPv6 traffic simultaneously (dual-stack) or to operate exclusively on IPv6.

Dual-Stack Configuration Requirements

Dual-stack networking must be established at the moment of cluster creation. It is impossible to convert an existing IPv4-only cluster to a dual-stack cluster. This requires the definition of both a cluster-cidr and a service-cidr for both IP families.

Recommended configuration masks are /16 for IPv4 and /56 for IPv6 for clusters, and /16 for IPv4 and /112 for IPv6 for services. Example configuration:

--cluster-cidr=10.42.0.0/16,2001:db8:42::/56 --service-cidr=10.43.0.0/16,2001:db8:43::/112

If the cluster-cidr mask is altered, the user must also adjust the following values to match the total node count and planned pods per node:

  • node-cidr-mask-size-ipv4
  • node-cidr-mask-size-ipv6

The absolute limits for service-cidr masks are /12 for IPv4 and /112 for IPv6.

IPv6 Routing and Masquerading

In environments where IPv6 addresses are not publicly routed (such as the Unique Local Address or ULA range), the --flannel-ipv6-masq option is mandatory. Without this, pods using their pod IPv6 address for outgoing traffic will not receive responses because the external network will not know how to route the return packets back to the internal pod IP.

Conversely, if publicly routed IPv6 addresses are utilized, the external routing infrastructure must be configured to route those addresses toward the cluster.

Single-Stack IPv6 Clusters

Supported as of version v1.22.9+k3s1, single-stack IPv6 clusters remove IPv4 entirely. The configuration requires only the IPv6 CIDRs:

--cluster-cidr=2001:db8:42::/56 --service-cidr=2001:db8:43::/112

In these environments, if the default route is provided via Router Advertisement (RA), a specific sysctl setting is required to prevent the node from dropping the route upon expiration:

sysctl net.ipv6.conf.all.accept_ra=2

Users must be aware that accepting RAs increases the potential risk of man-in-the-middle attacks. Furthermore, when IPv6 is the primary family, the node-ip of all members must be explicitly set, ensuring the desired IPv6 address is listed first, as the kubelet defaults to IPv4.

Custom CNI Integration and Decommissioning

While Flannel is the default, many enterprise environments utilize Calico or Cilium for advanced network policy enforcement.

Calico Configuration

When integrating Calico, it is often necessary to ensure that IP forwarding is enabled within the container settings. This is done by modifying the Calico YAML to include:

json "container_settings": { "allow_ip_forwarding": true }

After applying the YAML, verification is performed on the host using:

cat /etc/cni/net.d/10-calico.conflist

Cilium Decommissioning Procedures

Removing K3s when Cilium is used requires manual intervention to prevent host network lockout. Before running k3s-killall.sh or k3s-uninstall.sh, the following interfaces must be deleted:

ip link delete cilium_host
ip link delete cilium_net
ip link delete cilium_vxlan

Additionally, Cilium-specific iptables rules must be purged from the system:

iptables-save | grep -iv cilium | iptables-restore
ip6tables-save | grep -iv cilium | ip6tables-restore

Infrastructure Control-Plane Communication

K3s manages communication between the control-plane (apiserver) and the agent (kubelet and containerd) via websocket tunnels. This mechanism is an integrated alternative to the external Konnectivity service used in standard Kubernetes.

These tunnels allow agents to communicate with the apiserver without requiring the kubelet or container runtime streaming ports to be exposed to incoming connections. This enhances the security posture of the cluster. The configuration of this behavior is managed through the apiserver's egress selector, with the default operational mode set to agent.

Node Identity and Hostname Resolution

In certain cloud environments, such as Linode, instances may be provisioned with "localhost" as the hostname or may lack a hostname entirely. This leads to failures in domain name resolution within the Kubernetes cluster.

To circumvent this, K3s provides two methods to explicitly define the node's identity:

  • Use the --node-name CLI flag during startup.
  • Set the K3S_NODE_NAME environment variable.

Either method ensures that the node is correctly identified within the cluster's internal registry, preventing resolution errors.

Conclusion: The Strategic Trade-off of IPVS in K3s

The transition from iptables to IPVS in a K3s environment is fundamentally a trade-off between simplicity and scalability. For small-scale clusters or those with a limited number of services, the default iptables mode is sufficient and generally more stable across various Linux distributions. However, as the cluster expands, the linear search complexity of iptables becomes a performance bottleneck. IPVS solves this by implementing O(1) lookup time via hash tables, which is critical for high-throughput production environments.

The operational risk associated with IPVS is most evident in version-specific regressions, such as the critical failure seen in v1.26 on Debian 11. This highlights a vital lesson in K3s administration: network-level changes—especially those involving the kernel's packet-handling logic—must be validated against the specific OS kernel and cloud provider's network implementation. The fact that enabling IPVS can lock an administrator out of SSH (port 22) and the API server (port 6443) underscores the need for a precise installation sequence and the potential need for kubelet arguments to stabilize IP family prioritization.

Ultimately, the success of an IPVS deployment in K3s depends on the rigorous application of prerequisites: the installation of ipvsadm and ipset, the correct selection of a CNI backend (Flannel, Calico, or Cilium), and the precise definition of CIDR blocks for dual-stack operations. When configured correctly, IPVS transforms the K3s network fabric from a basic routing system into a high-performance load-balancing infrastructure capable of sustaining massive service growth without sacrificing latency.

Sources

  1. K3s Issue #7183
  2. K3s Basic Network Options
  3. K3s Discussion #4412

Related Posts