Orchestrating Persistent Storage with High Availability NFS in Kubernetes

The integration of Network File System (NFS) within Kubernetes environments represents a fundamental architectural decision for organizations requiring persistent, shared storage. While the technology behind NFS is established, its application within the ephemeral and dynamic nature of Kubernetes clusters provides a critical bridge for applications that demand ReadWriteMany (RWX) access. In a standard Kubernetes deployment, many storage options are restricted to ReadWriteOnce (RWO), meaning a volume can be mounted as read-write by a single node. However, for distributed applications, AI/ML workloads, or legacy systems where multiple pods across different nodes must simultaneously read from and write to the same data source, NFS emerges as a primary solution. The challenge inherent in standard NFS is its lack of native high availability; if the single NFS server fails, all dependent pods experience a storage outage. To mitigate this, advanced implementations leverage synchronous replication and cluster resource managers to ensure that storage remains available even during hardware failure.

The Architecture of Shared Storage and RWX Requirements

In the Kubernetes ecosystem, the ability to share data across multiple pods is not merely a convenience but a requirement for specific workload patterns. When a pod is configured with a PersistentVolumeClaim (PVC) utilizing the ReadWriteMany (RWX) access mode, it signifies that the storage backend must support simultaneous access from multiple nodes in the cluster.

NFS is uniquely positioned for this role because it operates as a network-based file system, decoupling the physical storage location from the compute nodes. This allows any pod, regardless of which worker node it is scheduled on, to mount the same remote directory. The real-world impact for the user is the elimination of data silos. For example, a content management system where multiple frontend pods must serve the same set of uploaded images can rely on a single NFS export to ensure consistency across the entire application tier.

Within the broader context of Kubernetes storage, NFS serves as a flexible alternative to hyperconverged storage solutions. While hyperconverged systems often bind storage directly to the platform they are installed on, an HA NFS cluster can be shared across multiple diverse platforms on the same network. This provides a level of infrastructure agility that prevents vendor lock-in and allows organizations to utilize commercial off-the-shelf (COTS) hardware instead of expensive, proprietary NAS or SAN appliances.

High Availability NFS Frameworks and LINBIT Integration

Standard NFS installations create a single point of failure. To transform a basic NFS setup into a production-grade High Availability (HA) cluster, specialized software layers are required to handle data replication and service failover. LINBIT provides a suite of tools designed to ensure that NFS exports remain accessible even if a physical node crashes.

At the foundation of this architecture is DRBD (Distributed Replicated Block Device). DRBD acts as a block storage driver that enables the synchronous replication of data between cluster nodes. When data is written to a DRBD-backed NFS export on one node, it is simultaneously written to the peer node. This ensures that no data is lost during a failover event.

To manage the state of the cluster and the movement of services, different Cluster Resource Managers (CRM) can be employed depending on the complexity of the requirements:

  • DRBD Reactor: This is a LINBIT-developed CRM that utilizes a promoter plugin. It monitors the quorum state of the DRBD devices. If the primary node fails, DRBD Reactor promotes a secondary device to primary and restarts the dependent NFS services. It is designed as a simpler solution for those who do not require complex orchestration.
  • Pacemaker: A community-developed CRM used for building highly complex HA clusters. While it has a steeper learning curve due to its high configurability, Pacemaker is the preferred choice when an administrator needs precise control over the ordering and collocation of services.
  • LINBIT VSAN: This solution combines DRBD and LINSTOR with a web-based front end. It is delivered as a Linux distribution that can be installed on bare metal or virtual machines, providing an "easy mode" for engineers to deploy HA NFS exports without manual configuration of complex CRM rules.

The operational result of these technologies is the creation of a Virtual IP (VIP) address. Instead of pods connecting to a specific physical server IP, they connect to the VIP (e.g., 192.168.222.25). This address "floats" across the cluster, automatically migrating to whichever node is currently hosting the active NFS export. To the Kubernetes pods, a node failure is perceived merely as a brief network hiccup, with failover typically completing in a few seconds.

Implementation Strategies for DigitalOcean Kubernetes (DOKS)

For users leveraging managed services like DigitalOcean Kubernetes (DOKS), the process of integrating NFS is streamlined through the use of DigitalOcean NFS Shares. DOKS provides a managed control plane with built-in high availability and autoscaling, allowing users to connect their clusters to an external NFS share for specialized tasks, such as AI/ML workloads that require massive shared datasets.

To connect a DOKS cluster to an NFS share, the administrator must first retrieve the connection details from the DigitalOcean control panel.

Connection Detail Extraction

The connection string typically follows a specific format: ServerIP:MountPath. For instance, if the control panel displays 10.128.0.69:/123456/6160d138-60cb-4e61-9ff3-076eebed5c0f, the components are parsed as follows:

Component Value
Server IP Address 10.128.0.69
Mount Path /123456/6160d138-60cb-4e61-9ff3-076eebed5c0f

For automated environments, these values can be retrieved via the API by sending a GET request to the /v2/nfs endpoint.

Kubernetes Manifest Configuration and Deployment

There are two primary methods for integrating NFS into a Kubernetes workload: direct pod specification and the preferred method of using PersistentVolumes (PV) and PersistentVolumeClaims (PVC).

Direct Pod Mounts

While not the preferred method for production, it is possible to specify the NFS server and path directly within a pod manifest. This is useful for quick testing or specific legacy requirements.

The following configuration demonstrates a pod that mounts an HA NFS export:

```yaml

kind: Pod
apiVersion: v1
metadata:
name: nfs-in-a-pod
spec:
containers:
- name: nfs-app
image: alpine
volumeMounts:
- name: nfs-volume
mountPath: /data
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(hostname; date) >> /data/test-file.txt; sleep 30s; done"]
volumes:
- name: nfs-volume
nfs:
server: 192.168.222.25
path: /drbd/exports/nfs-app
```

To deploy this pod, the administrator uses the following command:

bash kubectl apply -f nfs-in-pod.yaml

To verify that the pod is successfully writing to the shared storage, the exec command is used to read the test file:

bash kubectl exec nfs-in-a-pod -- cat /data/test-file.txt

A critical limitation of this approach is that NFS mount options cannot be specified within the pod spec. Administrators must either set these options on the server side or configure them within /etc/nfsmount.conf on the worker nodes.

PersistentVolume and PersistentVolumeClaim Workflow

The industry-standard approach for managing storage in Kubernetes is the abstraction of the physical storage via PersistentVolumes. This separates the storage infrastructure details from the application requirements.

The workflow for implementing NFS via PVs consists of three distinct phases:

  1. Static Provisioning: The administrator creates a PersistentVolume (PV) object. This object contains the actual NFS server IP and the export path. The PV is configured with the nfs type and the ReadWriteMany access mode.
  2. Claim Binding: The developer creates a PersistentVolumeClaim (PVC). The PVC requests a certain amount of storage and specifies the ReadWriteMany access mode. Kubernetes then binds the PVC to the matching PV.
  3. Workload Mounting: The pod spec references the claimName of the PVC. Because the PVC is bound to an RWX volume, any number of pods can reference the same claim without needing to know the underlying NFS server IP or the specific export path.

This abstraction provides significant operational advantages. If the NFS server is migrated to a new IP address, only the PV manifest needs to be updated; the application pods and their PVCs remain unchanged, ensuring zero disruption to the application's deployment logic.

Technical Comparison of HA NFS Implementation Paths

Depending on the organizational needs—whether they are using managed cloud services or self-hosted COTS hardware—the choice of implementation varies.

Feature Managed NFS (e.g., DigitalOcean) HA NFS via LINBIT/DRBD LINSTOR Operator
Setup Complexity Low Medium to High Low (K8s Native)
Hardware Control Provider Managed Full (COTS Hardware) Full
Replication Method Provider Proprietary Synchronous (DRBD) Distributed Block
Management Interface Cloud Control Panel CRM (Reactor/Pacemaker) Kubernetes Operator
Primary Use Case Rapid Cloud Deployment Legacy/Cross-Platform HA Kubernetes-Native RWX
Failover Speed Managed by Provider Seconds (via VIP) Native K8s Orchestration

Advanced Operational Considerations for NFS in Kubernetes

Deploying NFS requires a deep understanding of how the network and the file system interact. When using an HA NFS cluster, the "Virtual IP" is the linchpin of the entire system. If the VIP is not correctly configured across the subnet, pods will experience Connection Refused or Timeout errors during a failover.

Furthermore, the use of subdirectories within a root export is a recommended practice for multi-tenant pods. Rather than creating a separate NFS export for every single pod—which would increase the overhead on the NFS server—administrators can export a single root directory (e.g., /drbd/exports/) and create subdirectories for each application. Each pod then points to its specific subdirectory, ensuring data isolation while maintaining a simplified server configuration.

For environments where non-Kubernetes workloads also require access to the same data, the disaggregated HA NFS approach is superior. Because the NFS server exists as a separate entity (or a cluster of entities) on the network, a legacy VM or a physical server can mount the same NFS share using standard Linux mount commands, providing a unified data layer across the entire enterprise infrastructure.

Analysis of Storage Reliability and Performance

The transition from a single NFS server to an HA NFS cluster fundamentally changes the reliability profile of a Kubernetes deployment. In a non-HA scenario, the Mean Time Between Failures (MTBF) of the storage is tied to the MTBF of a single piece of hardware. In an HA cluster utilizing DRBD, the system can withstand the complete loss of a node without data loss, as the synchronous replication ensures the secondary node is an exact mirror of the primary.

The performance impact of synchronous replication is the primary trade-off. Because every write operation must be acknowledged by both the local and remote node before the operation is considered complete, there is an inherent latency penalty. This is why high-speed networking (10GbE or higher) is critical for HA NFS clusters.

However, the flexibility gained outweighs the latency for the vast majority of RWX workloads. The ability to use COTS hardware means organizations can scale their storage capacity horizontally by adding more disks to the DRBD nodes without being forced into the expensive pricing tiers of proprietary SAN vendors.

Sources

  1. LINBIT
  2. DigitalOcean
  3. Kubernetes Users Google Group

Related Posts