K3S GlusterFS Distributed Storage Architecture

The intersection of lightweight orchestration and distributed storage represents a critical evolution in edge computing and small-scale cloud deployments. K3S, a highly optimized and production-grade Kubernetes distribution, is specifically engineered to run on resource-constrained environments, such as the Hetzner CX23 virtual machines. However, the inherent nature of containers is ephemeral; when a pod crashes or is rescheduled to another node, any data stored on a local disk is lost. This is where GlusterFS enters the architectural stack. GlusterFS is a scalable, open-source network filesystem that aggregates storage from multiple servers into a single, unified volume. By integrating GlusterFS with K3S, administrators can create a resilient data layer where persistent volumes are replicated across multiple nodes. This ensures that no matter where a pod is scheduled within the cluster, it maintains consistent access to its data, effectively eliminating the risk of data loss during node failures or pod migrations.

Infrastructure Prerequisites and Network Topology

Before deploying a distributed storage system like GlusterFS on K3S, a stable and secure networking foundation is mandatory. The architecture typically involves a combination of public-facing endpoints for external traffic and private networking for internal cluster communication.

In a professional deployment, such as one hosted on Hetzner Cloud, the use of a private network is non-negotiable. This private network allows K3S master and agent nodes to communicate using internal IP addresses, reducing latency and increasing security by keeping cluster-internal traffic off the public internet.

The following table outlines a standard IP assignment for a three-node cluster integrated with a cloud load balancer:

Component Role Private IP Address
K3S Master Control Plane & Storage Manager 10.0.0.2
K3S Node 1 Worker Node & Storage Brick 10.0.0.3
K3S Node 2 Worker Node & Storage Brick 10.0.0.4
Cloud Load Balancer SSL Offloading & Traffic Entry 10.0.0.254

To manage these resources programmatically, the hcloud CLI utility is utilized. This tool allows for the automated creation of VMs and networking configurations, ensuring that the environment is reproducible and consistent.

K3S Cluster Deployment and Node Integration

The installation of K3S begins with the master node, which serves as the orchestration brain of the cluster. Once the server is operational, agent nodes must be joined to the cluster using a secure secret token to prevent unauthorized nodes from joining the network.

To enhance the cluster's capability to manage its own lifecycle, the system-upgrade-controller can be installed. This is achieved by executing the following command on the master node:

bash kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml

When joining agent nodes to the master, specific flags are required to ensure the networking and cloud provider settings align with the infrastructure. The following command is used on each agent node:

bash curl -sfL https://get.k3s.io | K3S_URL=https://10.0.0.2:6443 K3S_TOKEN=<your_secret_value> sh -s - agent \ --node-name="$(hostname -f)" \ --kubelet-arg="cloud-provider=external" \ --flannel-iface=ens10

The specific arguments used in this installation process serve critical functions:

  • --kubelet-arg="cloud-provider=external": This disables the integrated K3S cloud controller. This is necessary because a dedicated Cloud Controller Manager (such as Hetzner's) is used to handle the specific nuances of the underlying cloud provider.
  • --flannel-iface=ens10: This forces Flannel, the default CNI (Container Network Interface), to use the ens10 interface. In this architecture, ens10 is the interface connected to the private network, ensuring that pod-to-pod communication remains internal.

To verify that all nodes have successfully joined the cluster and are in a "Ready" state, the administrator runs:

bash kubectl get nodes

GlusterFS Installation and Volume Configuration

GlusterFS transforms ordinary storage on multiple VMs into a self-healing cluster. It uses a "brick" model, where a brick is a directory on a local disk that GlusterFS manages as part of a larger volume.

The first step in preparing the nodes for GlusterFS is creating the directory structure that will hold the actual data (the brick) and the mount point where the volume will be accessed.

bash mkdir -p /data/glusterfs/k8s/brick1 mkdir -p /mnt/gluster-k8s

Once the directories are prepared, the master node must be made aware of the other nodes in the storage pool through a process called peer probing.

bash gluster peer probe 10.0.0.3 gluster peer probe 10.0.0.4

After probing, the administrator must verify the health of the peer connections to avoid performance degradation during the volume creation process.

bash gluster peer status

With the peers established, a replicated volume is created. Replication ensures that data is mirrored across multiple nodes, providing high availability. For a three-node cluster, a replica count of 3 is used to ensure that data survives the failure of up to two nodes.

bash gluster volume create k8s replica 3 \ 10.0.0.2:/data/glusterfs/k8s/brick1/brick \ 10.0.0.3:/data/glusterfs/k8s/brick1/brick \ 10.0.0.4:/data/glusterfs/k8s/brick1/brick \ force

Following the creation, the volume must be started and verified:

bash gluster volume start k8s gluster volume info

To make this distributed storage accessible to the local operating system of each node, the volume is added to the /etc/fstab file. The _netdev option is critical here; it tells the system to wait for the network to be up before attempting to mount the GlusterFS volume.

bash echo "127.0.0.1:/k8s /mnt/gluster-k8s glusterfs defaults,_netdev 0 0" >> /etc/fstab mount /mnt/gluster-k8s

Integrating GlusterFS with Kubernetes Orchestration

While the GlusterFS volume now exists at the OS level, K3S needs a way to map this storage to pods. This is achieved through the use of Kubernetes Endpoints and either direct Pod manifests or PersistentVolume (PV) resources.

The glusterfs-client package must be installed on all Kubernetes worker nodes to allow them to mount the GlusterFS volumes.

To enable K3S to discover the GlusterFS cluster, an Endpoints object is created. This object acts as a map, telling Kubernetes exactly which IP addresses belong to the GlusterFS storage pool.

yaml apiVersion: v1 kind: Endpoints metadata: name: glusterfs-cluster labels: storage.k8s.io/name: glusterfs storage.k8s.io/part-of: kubernetes-complete-reference storage.k8s.io/created-by: ssbostan subsets: - addresses: - ip: 192.168.12.7 hostname: node004 - ip: 192.168.12.8 hostname: node005 - ip: 192.168.12.9 hostname: node006 ports: - port: 1

There are two primary methods for connecting a pod to this storage:

Method 1: Direct Connection via Pod Manifest
This method uses the GlusterfsVolumeSource in the PodSpec. It is a more direct approach but less flexible than using PersistentVolumes.

yaml apiVersion: v1 kind: Pod metadata: name: test labels: app.kubernetes.io/name: alpine app.kubernetes.io/part-of: kubernetes-complete-reference app.kubernetes.io/created-by: ssbostan spec: containers: - name: alpine image: alpine:latest command: - touch - /data/test volumeMounts: - name: glusterfs-volume mountPath: /data volumes: - name: glusterfs-volume glusterfs: endpoints: glusterfs-cluster path: k8s-volume readOnly: no

Method 2: Connection via PersistentVolume (PV) and StorageClass
In this more advanced method, a StorageClass is defined that points to the GlusterFS service. When a user creates a PersistentVolumeClaim (PVC), Kubernetes automatically binds it to a GlusterFS volume, abstracting the storage details from the application developer.

Advanced Volume Management with Heketi

For larger or more dynamic clusters, managing volumes manually via the CLI becomes cumbersome. Heketi is a RESTful API and GUI that manages GlusterFS volumes, providing a layer of abstraction that allows for easier provisioning.

Heketi requires its own database to track the bricks and volumes it manages. To ensure the Heketi database is itself highly available, it should be stored on a replicated GlusterFS volume.

To create a dedicated volume for the Heketi database:

bash gluster volume create heketi-db-volume replica 3 transport tcp \ node004:/gluster/heketi \ node005:/gluster/heketi \ node006:/gluster/heketi gluster volume start heketi-db-volume

To allow Heketi to communicate with the GlusterFS nodes via SSH, a Kubernetes secret must be created to store the SSH private key:

bash kubectl create secret generic heketi-ssh-key-file \ --from-file=heketi-ssh-key

Heketi's operation is governed by a config.json file, which defines the port, authentication mechanisms (JWT), and the SSH executor settings.

json { "_port_comment": "Heketi Server Port Number", "port": "8080", "_use_auth": "Enable JWT authorization.", "use_auth": true, "_jwt": "Private keys for access", "jwt": { "_admin": "Admin has access to all APIs", "admin": { "key": "ADMIN-HARD-SECRET" } }, "_glusterfs_comment": "GlusterFS Configuration", "glusterfs": { "executor": "ssh", "_sshexec_comment": "SSH username and private key file", "sshexec": { "keyfile": "/heketi/heketi-ssh-key", "user": "root", "port": "22" }, "_db_comment": "Database file name", "db": "/var/lib/heketi/heketi.db", "loglevel" : "debug" } }

Load Balancing and Traffic Management

To expose the K3S applications to the internet, a Cloud Load Balancer is used. This component handles SSL offloading, which removes the burden of decrypting HTTPS traffic from the K3S nodes and moves it to the network edge.

For the application to receive the original client IP address rather than the load balancer's IP, the PROXY protocol must be enabled. This is configured via Traefik, the default ingress controller in K3S.

The following configuration is applied to the master node to trust the load balancer's IP:

yaml cat <<EOF > /var/lib/rancher/k3s/server/manifests/traefik-config.yaml apiVersion: helm.cattle.io/v1 kind: HelmChartConfig metadata: name: traefik namespace: kube-system spec: valuesContent: |- additionalArguments: - "--entryPoints.web.proxyProtocol.trustedIPs=10.0.0.254" - "--entryPoints.web.forwardedHeaders.trustedIPs=10.0.0.254" EOF

This configuration ensures that Traefik correctly interprets the X-Forwarded-For headers, allowing for accurate logging and IP-based access control.

Troubleshooting and Operational Considerations

Running a distributed filesystem on a lightweight Kubernetes distribution requires a proactive approach to monitoring and maintenance.

Common Pitfalls

  • Stale Peers: If a node goes offline or a network partition occurs, GlusterFS peers can become "stale." This will cause significant delays or failures during the volume mount process for pods. Administrators should regularly run gluster peer status to ensure all nodes are communicating.
  • Brick Pathing: It is critical that the brick path used during volume creation exists on all participating nodes. Any mismatch in the directory structure will prevent the volume from starting.
  • Disk Space Exhaustion: Since GlusterFS replicates data, the available space is limited to the size of the smallest brick in the replica set. Monitoring disk usage on all nodes is essential to prevent filesystem read-only errors.

Performance Tuning

For small clusters, the peer model of GlusterFS is highly effective because it removes the need for a centralized metadata server. This allows the cluster to scale linearly as more nodes are added. To optimize performance, ensure that the network interface used for GlusterFS (e.g., ens10) is optimized for high throughput and low latency.

Conclusion: Analysis of the K3S and GlusterFS Synergy

The integration of K3S and GlusterFS creates a symbiotic relationship that addresses the primary weakness of edge Kubernetes: storage persistence. By decoupling the data layer from the compute layer, GlusterFS provides a safety net that allows K3S to be truly agile. The ability to move pods across nodes without losing state is what elevates a simple K3S installation from a testing environment to a production-ready system.

From an architectural standpoint, the use of a three-node replica set provides the optimal balance between redundancy and resource consumption. While the overhead of replicating data across three VMs consumes more disk space, the trade-off is a system that can withstand the complete loss of a node without interrupting the availability of the application data.

Furthermore, the addition of Heketi for volume management and a Cloud Load Balancer for traffic entry completes the professional stack. This architecture mimics the capabilities of large-scale cloud providers but maintains the simplicity and low overhead of a lightweight distribution. The ultimate value of this setup is the abstraction of complexity; the application developer sees a standard Kubernetes volume, while the administrator manages a robust, distributed storage backend that ensures the "bytes" produced by the containers are guarded against failure.

Sources

  1. Hetzner Community Tutorials
  2. Hoop.dev Blog
  3. Kubedemy Kubernetes Storage Tutorial

Related Posts