GlusterFS Integration for Kubernetes and OpenShift Persistent Storage

The architecture of modern container orchestration requires a fundamental decoupling of compute resources from storage resources. In the ecosystem of Kubernetes, which leverages over fifteen years of production experience from Google and a vast global community, managing storage is treated as a distinct operational problem from managing compute. To address this, Kubernetes introduces the PersistentVolume subsystem, an API framework that abstracts the technical details of how storage is provisioned from how it is consumed by end-users. GlusterFS enters this landscape as a scalable, network-based filesystem designed to run on common off-the-shelf hardware. It is specifically engineered to support data-intensive tasks, such as large-scale media streaming and complex data analysis, by providing a distributed storage solution. As a free and open-source software project available on GitHub, GlusterFS allows administrators to create massive storage pools that can be consumed across a cluster, ensuring that data persists even when individual pods are destroyed or rescheduled.

The Mechanics of Persistent Volumes and Claims

In a standard Kubernetes environment, the lifecycle of a pod is ephemeral. When a pod is deleted, any data stored within its local container filesystem is lost. To prevent this catastrophic data loss, the PersistentVolume (PV) and PersistentVolumeClaim (PVC) system is employed. A PersistentVolume is a piece of networked storage that is provisioned by a cluster administrator. Unlike standard volumes, PVs are cluster resources, meaning they exist independently of any specific pod. This independence ensures that the storage lifecycle is not tied to the pod lifecycle, allowing data to remain intact across pod restarts or migrations.

The PersistentVolumeClaim acts as a request for storage by a user. While a PV is the actual storage resource, a PVC is the claim for a specific size and access mode. The Kubernetes control plane matches a PVC to an available PV that satisfies the request. When GlusterFS is used as the backend, the PV is configured to point to a GlusterFS volume, enabling the pod to mount a distributed filesystem that is accessible across multiple nodes in the cluster.

GlusterFS Architectural Overview and Positioning

GlusterFS utilizes a peer-to-peer architecture, distinguishing it from centralized storage solutions. This architecture allows it to scale out across multiple nodes without the need for a central metadata server, which often becomes a bottleneck in other distributed systems. In the context of the broader Kubernetes storage landscape, GlusterFS is often compared to other solutions like Ceph, Longhorn, and OpenEBS.

The following table provides a technical comparison of GlusterFS against its primary open-source competitors:

From a performance perspective, GlusterFS exhibits moderate performance levels, although it is prone to performance spikes during write-heavy operations. This makes it suitable for read-heavy workloads or general-purpose file storage, but potentially problematic for high-frequency transactional databases. Furthermore, while solutions like Ceph provide robust disaster recovery via mirroring and Longhorn offers integrated UI-based backups to S3 or NFS, GlusterFS lacks out-of-the-box backup or disaster recovery functionality, requiring manual intervention for these tasks.

Deployment and Environment Configuration

To integrate GlusterFS into a Kubernetes cluster, specific software requirements must be met across the infrastructure. The glusterfs-client package must be installed on all Kubernetes worker nodes. This installation is critical because the worker nodes are responsible for mounting the GlusterFS volumes into the containers. Without the client software, the node will be unable to communicate with the GlusterFS cluster, resulting in mount failures and pod crashes.

Discovering GlusterFS via Endpoints

For Kubernetes to interact with a GlusterFS cluster, the cluster must be discoverable within the Kubernetes API. This is achieved by creating an Endpoints object. The Endpoints object serves as a map, pointing the Kubernetes volume plugin to the specific IP addresses and hostnames of the servers comprising the GlusterFS cluster.

The following configuration defines an Endpoints object for a GlusterFS cluster:

yaml apiVersion: v1 kind: Endpoints metadata: name: glusterfs-cluster labels: storage.k8s.io/name: glusterfs storage.k8s.io/part-of: kubernetes-complete-reference storage.k8s.io/created-by: ssbostan subsets: - addresses: - ip: 192.168.12.7 hostname: node004 - ip: 192.168.12.8 hostname: node005 - ip: 192.168.12.9 hostname: node006 ports: - port: 1

In this configuration, the glusterfs-cluster endpoint maps the logical name to physical nodes (node004, node005, and node006). This abstraction allows the pod manifest or the PV definition to reference glusterfs-cluster without needing to hardcode individual IP addresses, facilitating easier cluster maintenance and scaling.

Implementation Methods for GlusterFS Connectivity

There are two primary methods for connecting a pod to a GlusterFS volume: direct connection via the pod manifest and indirect connection using the PersistentVolume resource.

Method 1: Direct Connection with Pod Manifest

Connecting directly to GlusterFS is an approach where the storage details are embedded within the PodSpec. This method uses the GlusterfsVolumeSource to define the connection. This is typically used in development or simple environments where the overhead of managing PVs and PVCs is not required.

Example pod manifest for direct GlusterFS connection:

yaml apiVersion: v1 kind: Pod metadata: name: test labels: app.kubernetes.io/name: alpine app.kubernetes.io/part-of: kubernetes-complete-reference app.kubernetes.io/created-by: ssbostan spec: containers: - name: alpine image: alpine:latest command: - touch - /data/test volumeMounts: - name: glusterfs-volume mountPath: /data volumes: - name: glusterfs-volume glusterfs: endpoints: glusterfs-cluster path: k8s-volume readOnly: no

In this scenario, the pod directly requests the glusterfs-cluster endpoint and specifies the volume path k8s-volume. The data is mounted at /data within the container.

Method 2: Connecting via PersistentVolume Resource

The preferred production method is using the PersistentVolume (PV) resource. This separates the storage provisioning from the application deployment, allowing administrators to manage storage capacity and reclaim policies independently.

To create a PV for a GlusterFS volume, the following manifest is used:

yaml apiVersion: "v1" kind: "PersistentVolume" metadata: name: "gluster-default-volume" spec: capacity: storage: "8Gi" accessModes: - "ReadWriteMany" glusterfs: endpoints: "glusterfs-cluster" path: "gluster_vol" readOnly: false persistentVolumeReclaimPolicy: "Recycle"

Key attributes of this PV configuration include:
- Capacity: Set to 8Gi, defining the size of the volume available to the claim.
- AccessModes: ReadWriteMany (RWX) is utilized, which is a primary advantage of GlusterFS. This allows multiple pods on different nodes to read and write to the same volume simultaneously.
- PersistentVolumeReclaimPolicy: Set to Recycle, which indicates that the volume should be cleaned and made available again once the claim is deleted.

Advanced GlusterFS Deployments in Containers

GlusterFS can be deployed within Kubernetes or OpenShift using containerized pods. This approach allows the Gluster nodes themselves to be managed as containers, providing a layer of persistence for the overall cluster setup.

Deploying GlusterFS Pods

The deployment of a GlusterFS pod is performed using the oc create command in OpenShift. For example:

bash oc create -f gluster-1.yaml

To verify the status and configuration of the deployed GlusterFS pod, the oc describe command is used:

bash oc describe pod gluster-1

A typical output for a GlusterFS pod reveals the following technical details:
- Image: gluster/gluster-centos
- Node: Assigned to a specific node (e.g., atomic-node1/10.70.43.174).
- Status: Running.
- Volumes: The pod often utilizes a HostPath volume for the brick path (e.g., /mnt/brick1), which allows the container to use a directory on the bare host as a storage brick.

Heketi Integration for Volume Management

Heketi is a RESTful front-end for GlusterFS that simplifies the management of volumes. Without Heketi, administrators must manually create and manage volumes via the Gluster CLI. Heketi provides a more automated approach to volume provisioning.

Heketi Deployment

Heketi is deployed as a Deployment in Kubernetes. The following manifest outlines the Heketi configuration:

yaml apps/v1 kind: Deployment metadata: name: heketi labels: app.kubernetes.io/name: heketi app.kubernetes.io/part-of: glusterfs app.kubernetes.io/origin: kubernetes-complete-reference app.kubernetes.io/created-by: ssbostan spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: heketi app.kubernetes.io/part-of: glusterfs app.kubernetes.io/origin: kubernetes-complete-reference app.kubernetes.io/created-by: ssbostan template: metadata: labels: app.kubernetes.io/name: heketi app.kubernetes.io/part-of: glusterfs app.kubernetes.io/origin: kubernetes-complete-reference app.kubernetes.io/created-by: ssbostan spec: containers: - name: heketi image: heketi/heketi:10 ports: - containerPort: 8080 volumeMounts: - name: ssh-key-file mountPath: /heketi - name: config mountPath: /etc/heketi - name: data mountPath: /var/lib/heketi volumes: - name: ssh-key-file secret: secretName: heketi-ssh-key-file - name: config configMap: name: heketi-config - name: data glusterfs: endpoints: glusterfs-cluster path: heketi-db-volume

Cluster Topology Loading in Heketi

Once Heketi is deployed, the cluster topology must be loaded to enable Heketi to manage the GlusterFS nodes. This is done using the heketi-cli via kubectl exec.

To load the topology:

bash kubectl exec POD-NAME -- heketi-cli \ --user admin \ --secret ADMIN-HARD-SECRET \ topology load --json /etc/heketi/topology.json

To verify the cluster and retrieve the cluster ID:

bash kubectl exec POD-NAME -- heketi-cli \ --user admin \ --secret ADMIN-HARD-SECRET \ cluster list

Successful execution will yield the cluster ID, such as Id:c63d60ee0ddf415097f4eb82d69f4e48 [file][block], confirming that Heketi now has control over the file and block storage resources of the GlusterFS cluster.

Comparative Analysis of Storage Solutions

Selecting between GlusterFS and other options depends on the specific operational requirements of the organization. While GlusterFS is a powerful tool for distributed file systems, it is categorized as "legacy" in some modern comparisons due to the rise of Kubernetes-native storage solutions.

Disaster Recovery and Backup Analysis

A critical evaluation of disaster recovery (DR) reveals significant gaps in GlusterFS compared to its contemporaries:

Ceph: Offers advanced DR capabilities, including asynchronous mirroring and geo-replication for object storage via RGW. It also supports snapshots and rbd export.
Longhorn: Simplifies DR through a dedicated UI-based backup and remote restore support, utilizing NFS and S3.
OpenEBS: Support is engine-specific; the cStor engine supports snapshots and backup to remote PVCs.
GlusterFS: Lacks any out-of-the-box backup or DR functionality. All backup processes must be handled manually by the administrator.

Performance and Resource Utilization

Resource consumption varies significantly across the different storage backends:
- GlusterFS: Exhibits moderate resource usage, but experiences performance spikes specifically under write-heavy operations.
- Longhorn: Remains lightweight when at rest but shows moderate resource consumption under load.
- OpenEBS: Resource usage is highly dependent on the engine; the Jiva engine is considered low-impact, while the Mayastor engine is higher.

For those requiring high-performance block storage, Ceph RBD is a strong candidate, showing benchmarks such as IOPS=31.5k and BW=123MiB/s with low latency (2.2ms), whereas GlusterFS is better suited for scenarios requiring shared file access across multiple nodes.

Technical Analysis and Conclusion

The integration of GlusterFS into Kubernetes and OpenShift provides a robust mechanism for achieving shared, persistent storage across a distributed cluster. By utilizing the PersistentVolume and PersistentVolumeClaim APIs, GlusterFS abstracts the underlying physical storage, allowing developers to request storage based on capacity and access modes (specifically ReadWriteMany) rather than physical hardware locations. The use of Endpoints allows for a flexible mapping between the Kubernetes control plane and the GlusterFS nodes, facilitating scalability and simplifying the process of adding or removing nodes from the storage pool.

However, a detailed analysis suggests that while GlusterFS is highly effective for specific use cases—namely large-scale distributed file systems on commodity hardware—it faces challenges in the modern cloud-native era. The lack of native Kubernetes integration (unlike Longhorn or OpenEBS) and the absence of integrated disaster recovery and backup tools make it a more labor-intensive solution. The transition toward "K8s-native" storage using Custom Resource Definitions (CRDs) has shifted the industry preference toward solutions that can be managed via standard kubectl commands without requiring external CLI tools like heketi-cli.

Ultimately, GlusterFS remains a viable option for organizations that require a mature, peer-to-peer distributed filesystem and are capable of managing the manual overhead associated with backups and disaster recovery. For teams prioritizing simplicity and native integration, the trend is clearly shifting toward Longhorn and OpenEBS, while those requiring massive scale and robustness continue to lean toward Ceph. The choice of GlusterFS is therefore a strategic trade-off between the power of a distributed filesystem and the operational ease of a Kubernetes-native storage orchestrator.