vSAN Stretched Cluster Kubernetes Integration and Persistent Volume Orchestration

The integration of Kubernetes with vSAN stretched clusters represents a sophisticated approach to achieving high availability and disaster recovery for containerized workloads. By leveraging a vSAN stretched cluster, organizations can deploy a generic Kubernetes cluster where persistent volumes and node virtual machines are distributed across two distinct physical sites. This architecture ensures that the failure of an entire site does not result in the loss of data or the total unavailability of the Kubernetes control plane and worker nodes. The primary objective of this configuration is to create a seamless storage fabric that transcends physical location, allowing Kubernetes to operate as if it were on a single site while benefiting from the redundancy of a geographically dispersed infrastructure.

The deployment of Kubernetes on a vSAN stretched cluster requires a meticulous approach to storage policy management. Unlike standard Kubernetes deployments, where storage might be handled by a single-site provider, a stretched cluster introduces complexities regarding site affinity, replication, and fault domains. The vSphere Container Storage Interface (CSI) driver facilitates the communication between the Kubernetes orchestration layer and the vSAN storage layer, enabling the dynamic provisioning of Persistent Volumes (PVs). However, the responsibility for the correct alignment of storage policies remains with the vSphere administrator, as the Kubernetes cluster itself does not natively enforce identical storage policies across both the virtual machines (nodes) and the persistent volumes they consume.

Architectural Requirements for vSAN Stretched Cluster Deployment

To successfully deploy a Kubernetes environment on a vSAN stretched cluster, several foundational infrastructure components must be configured. These requirements ensure that the underlying virtualized layer can support the high-availability demands of a container orchestration system.

The initial step involves the creation and setup of the vSAN stretched cluster. This requires a specific physical and network configuration to allow the two sites to communicate and synchronize data. Once the cluster is established, the following system configurations are mandatory:

Distributed Resource Scheduler (DRS) must be enabled on the stretched cluster to ensure optimal placement of virtual machines across the available hosts.
vSphere High Availability (HA) must be turned on to facilitate the automatic restart of virtual machines on surviving hosts in the event of a failure.
Host Monitoring and VM Monitoring must be explicitly set up within the vSphere HA configuration to ensure the cluster can accurately detect failures and trigger failover mechanisms.

Beyond the general cluster settings, the storage policy configuration is the most critical element for ensuring data persistence and availability. A VM storage policy must be created that is fully compliant with the requirements of a vSAN stretched cluster. Within this policy, the site disaster tolerance must be configured. Specifically, the selection of Dual site mirroring ensures that data is mirrored across both sites of the stretched cluster, providing a redundant copy of the data at each location.

The policy must also specify the number of failures to tolerate. In the context of a stretched cluster, this setting defines how many disk or host failures a storage object can withstand for each site. For mirroring configurations, the number of required fault domains, or hosts within a site, to tolerate n failures is calculated using the formula 2n + 1.

Storage Policy Implementation and Replication Strategies

The selection of a replication method significantly impacts the performance and space efficiency of the Kubernetes cluster. vSAN provides several options for achieving failure tolerance, depending on the hardware and the specific needs of the workload.

Raid-1 mirroring is the primary recommendation for performance-sensitive environments. This method ensures that every block of data is mirrored, providing rapid access and high reliability, although it consumes more raw storage capacity. Alternatively, Raid-5 and Raid-6 provide failure tolerance using parity blocks. These options are designed for better space efficiency but are exclusively available on all-flash clusters.

For the operational stability of the Kubernetes cluster, a uniform approach to storage policy application is required. It is mandatory to use the VM storage policy with the same replication and site affinity settings for all storage objects within the cluster. This uniformity must extend to:

All node virtual machines, including the control plane nodes.
All worker node virtual machines.
All Persistent Volumes (PVs) provisioned for the workloads.

If the storage policies are inconsistent, the cluster may experience stability issues or fail to meet the disaster recovery objectives intended by the stretched cluster architecture.

Kubernetes Deployment Scenarios and Topology Constraints

Organizations can deploy multiple Kubernetes clusters with varying storage requirements within the same vSAN stretched cluster, providing flexibility for different application tiers. However, there are strict limitations regarding how Kubernetes interacts with the underlying vSAN topology.

A critical constraint is that the topology feature cannot be used to provision a volume that belongs to a specific fault domain within the vSAN stretched cluster. This means that while vSAN manages the placement of data across the sites for redundancy, Kubernetes cannot use its own topology-aware scheduling to pin a volume to a specific site's fault domain.

To manage the placement of Kubernetes nodes, administrators should use VM-Host affinity rules. These rules allow the vSphere administrator to place Kubernetes nodes on a specific primary or secondary site, such as Site-A. This ensures that the compute resources are aligned with the desired operational model.

One common deployment scenario involves placing the control plane and worker nodes on the primary site. This configuration provides the flexibility to failover to the secondary site if the primary site experiences a total failure. In this scenario, HA Proxy is deployed on the primary site to manage traffic and provide a stable entry point for the cluster.

Persistent Volume Orchestration and CSI Integration

The provisioning of persistent volumes in a vSAN stretched cluster environment relies on the creation of a Storage Class that references the vSAN stretched cluster storage policy. This allows Kubernetes to request volumes that automatically inherit the mirroring and site affinity settings defined in vSphere.

The process for deploying persistent volumes follows a specific sequence:

The vSphere administrator creates a VM storage policy compliant with stretched cluster requirements.
A Kubernetes Storage Class is created using this specific storage policy.
Persistent Volumes are then deployed using the defined storage class.

It is important to note that vSAN stretched clusters do not support ReadWriteMany (RWM) volumes. This limitation means that volumes cannot be mounted as read-write by multiple pods simultaneously across different nodes in the stretched configuration. However, vSAN stretched clusters do support file volumes that are backed by vSAN file shares, providing an alternative for shared storage needs.

To ensure that volumes are provisioned even when certain constraints are met, the enableForce provisioning option should be utilized.

Troubleshooting and Common Failure Modes

Despite the robustness of vSAN stretched clusters, integration with Kubernetes can encounter issues, particularly during the initial deployment of stateful sets. A documented failure mode occurs when pods remain in a pending state indefinitely during the creation of a test service.

In this scenario, the vSphere web UI may report that a container volume was requested, but no container volumes are actually listed in the targeted vSAN cluster. This often manifests as a No compatible datastore found for storagePolicy error. This issue has been observed in environments using vSAN 7.0 and vCenter 7.0, even when using a fresh cluster deployed via Kubespray with default values and an API account with full privileges.

The root cause of such failures typically relates to a mismatch between the requested storage policy and the available datastores that can satisfy the stretched cluster requirements. If the underlying vSAN cluster is not correctly configured to support the requested policy (e.g., insufficient hosts for the 2n + 1 mirroring requirement), the CSI driver will fail to provision the volume, leaving the Kubernetes pod in a pending state.

Failure Analysis and Recovery Impact

The value of a vSAN stretched cluster is best understood by analyzing how it handles various failure scenarios. The following table outlines the impact of different failure types on a Kubernetes deployment.

Failure Scenario	Impact and System Response
Entire primary site failure	Control plane and worker nodes fail over to the secondary site; data remains available via mirrored copies.
Several hosts fail on secondary site	The cluster remains operational; data is still available on the primary site and surviving secondary hosts.
Entire secondary site failure	The cluster continues to operate on the primary site; data remains available as the primary copy is intact.
Intersite network failure	The cluster may enter a partitioned state; vSAN handles the split-brain scenario based on the witness and site configuration.

Upgrading Kubernetes and Persistent Volumes

For environments that already have Kubernetes deployments running on a standard vSAN datastore, transitioning to a vSAN stretched cluster is possible through an upgrade process. This allows for the introduction of site-level redundancy without needing to rebuild the entire cluster from scratch.

The upgrade process involves the following steps:

The existing VM storage policy used for provisioning volumes and node VMs on the vSAN cluster must be edited to add the necessary stretched cluster parameters.
The updated storage policy must be applied to all objects within the cluster to ensure consistency across the infrastructure.
Specific attention must be given to persistent volumes that exhibit an Out of date status. The updated storage policy must be manually applied to these volumes to ensure they are correctly mirrored across the stretched sites.

This upgrade path ensures that existing workloads are brought into compliance with the stretched cluster architecture, enabling the high-availability benefits of site mirroring for previously single-site volumes.

Summary of Technical Specifications

The following table summarizes the critical technical requirements and limitations for deploying Kubernetes on vSAN stretched clusters.

Feature	Requirement / Specification
Site Disaster Tolerance	Dual site mirroring
Host Requirement for Mirroring	`2n + 1` hosts per site to tolerate n failures
Replication Options	Raid-1 (Performance), Raid-5/6 (Space Efficiency, All-Flash only)
Volume Support	RWM volumes not supported; vSAN file shares supported
Topology Awareness	Not supported for specific fault domain provisioning
Node Placement	Controlled via VM-Host affinity rules
Storage Policy Scope	Must be identical for node VMs and PVs

Conclusion

The deployment of Kubernetes on vSAN stretched clusters provides a powerful mechanism for ensuring business continuity. By integrating vSphere's site-level redundancy with Kubernetes' orchestration capabilities, organizations can build a resilient infrastructure capable of surviving the loss of an entire data center. The core of this success lies in the rigid application of VM storage policies. Because Kubernetes does not automatically synchronize these policies between the nodes and the volumes, the vSphere administrator must act as the orchestrator of storage consistency.

The transition from single-site to stretched cluster environments requires a deep understanding of mirroring, site affinity, and the limitations of the CSI driver, particularly regarding the lack of RWM support and topology-based provisioning. While challenges such as No compatible datastore errors may arise, they are typically indicative of a failure to align the physical host count with the requirements of the storage policy. Ultimately, the use of Raid-1 mirroring and the implementation of VM-Host affinity rules create a stable environment where the control plane and worker nodes can seamlessly failover, ensuring that critical containerized services remain online regardless of site-specific catastrophes.