The deployment of Elasticsearch and Kibana within a Kubernetes environment represents a paradigm shift in how distributed search and analytics engines are managed. By leveraging the container orchestration capabilities of Kubernetes, organizations can transition from manual, error-prone installations to a cloud-native operational model. This approach integrates the elasticity of Kubernetes—specifically its ability to handle scaling, self-healing, and resource abstraction—with the powerful indexing and search capabilities of the Elastic Stack. The integration is primarily facilitated through the Elastic Cloud on Kubernetes (ECK) operator, which transforms the complex lifecycle of an Elasticsearch cluster into a series of declarative state definitions.
In a traditional deployment, an administrator would be responsible for the manual configuration of every node, the painstaking setup of TLS certificates, and the complex orchestration of cluster discovery. In contrast, the Kubernetes-native approach utilizes the Operator pattern. This pattern allows the system to watch for custom resources and automatically execute the necessary steps to reach the desired state. Whether deploying on vanilla Kubernetes or managed distributions such as Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), Microsoft Azure Kubernetes Service (AKS), or Red Hat OpenShift, the objective remains the same: to decouple the operational complexity of the software from the underlying infrastructure.
The Elastic Cloud on Kubernetes Operator Framework
The Elastic Cloud on Kubernetes (ECK) offering is built upon the Kubernetes Operator pattern. An operator is essentially a method of packaging, deploying, and managing a Kubernetes application. For the Elastic Stack, this means that the orchestration capabilities of Kubernetes are extended to specifically support the nuances of Elasticsearch and Kibana.
The operator functions by watching for specific custom resources. When a user submits a manifest defining an Elasticsearch cluster, the operator detects this change and begins the process of creating the necessary pods, services, and persistent volume claims. This automation extends beyond the initial deployment to include the entire lifecycle of the cluster.
The specific automation capabilities provided by ECK include:
- Cluster discovery: The operator manages how nodes find each other to form a cohesive cluster, removing the need for manual seed host configuration in many scenarios.
- TLS and cert management: Security is handled automatically, ensuring that communication between nodes and between the client and the cluster is encrypted.
- Upgrades: Version transitions are orchestrated by the operator to ensure minimal downtime and data integrity.
- Node scaling: Adding or removing capacity is as simple as updating the count in the cluster specification.
- Monitoring setup: When integrated with the rest of the Elastic Stack, monitoring for the cluster is automatically configured.
The impact of this framework is a significant reduction in the operational burden on DevOps teams. Instead of executing a series of imperative commands, engineers define the end state in a YAML file, and the operator ensures that state is maintained. This creates a dense web of connectivity between the Kubernetes API and the Elasticsearch internal state, ensuring that if a pod fails, the operator and Kubernetes work in tandem to reschedule it and rebalance the data.
Deployment Methodologies and Comparative Analysis
Depending on the level of control required and the existing expertise of the operations team, there are several ways to deploy Elasticsearch on Kubernetes. These methods range from low-level primitive resources to high-level automated operators.
The following table provides a detailed comparison of the available deployment strategies:
| Method | Best For | Control Level | Complexity | Notes |
|---|---|---|---|---|
| Raw YAML | Custom setups, full control | High | Medium | Best for fine-tuned ops teams |
| Helm | Quick deployments, CI/CD integration | Medium | Low | Easy to install, flexible via values.yaml |
| ECK | Full Elastic Stack automation | Low | Low | Operator-managed, but opinionated |
The Raw YAML approach is ideal for teams that want to tune every aspect of the deployment, from the exact layout of the nodes to the specific volume provisioning, without relying on an external package manager. This typically involves using StatefulSets to ensure that pods maintain a stable identity and persistent storage.
Helm provides a middle ground, offering a standardized way to package the application. It is highly effective for CI/CD pipelines where different environments (development, staging, production) require different configurations, which can be easily managed via a values.yaml file.
The ECK operator is the most streamlined approach. It is designed for teams standardizing around the Elastic Stack who prioritize full lifecycle automation over granular, manual control. While it is more "opinionated" in how it manages the cluster, it significantly reduces the risk of human error during scaling and upgrades.
Technical Implementation via ECK
To implement an Elasticsearch cluster using the ECK operator, the operator must first be installed into the Kubernetes cluster. This process involves applying the Custom Resource Definitions (CRDs) and the operator deployment itself.
The installation is performed using the following commands:
kubectl apply -f https://download.elastic.co/downloads/eck/2.10.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.10.0/operator.yaml
Once the operator is active, a cluster can be defined. A basic quickstart configuration involves specifying the version, the number of nodes, and the basic configuration.
For a simple single-node cluster:
yaml
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 9.4.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
EOF
In this configuration, the node.store.allow_mmap: false setting is used. This specific configuration has performance implications and is generally not recommended for production workloads, where virtual memory tuning is required.
For a more robust, production-ready 3-node cluster, the following specification is utilized:
yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
namespace: elastic-system
spec:
version: 8.11.0
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
limits:
memory: 2Gi
cpu: 1
requests:
memory: 2Gi
cpu: 1
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
This advanced configuration introduces resource limits and requests, ensuring that the pod has at least 2GiB of memory. If the Kubernetes cluster lacks nodes with 2GiB of free memory, the pod will remain in a Pending state. Furthermore, it specifies a volumeClaimTemplates section to request 100GiB of storage using a fast-ssd storage class, providing significantly more control over data persistence than the default 1GiB allocation found in the quickstart.
Cluster Lifecycle and Monitoring
Once the Elasticsearch manifest is applied, the ECK operator begins the process of resource creation. This process is not instantaneous; it may take several minutes for all resources to be provisioned and for the cluster to become ready.
Monitoring the deployment progress can be achieved through several kubectl commands. To get an overview of the current Elasticsearch clusters, including their health and version, the following command is used:
kubectl get elasticsearch
During the initial startup, the PHASE column will be empty and there will be no HEALTH status. As the pods and services initialize, the PHASE will transition to Ready and the HEALTH status will become green. This health status is derived directly from Elasticsearch's internal cluster health API.
To monitor the status of the individual pods, the following command is used:
kubectl get pods --selector='elasticsearch.k8s.elastic.co/cluster-name=quickstart'
If a pod is still starting, it will report a status of Pending. To investigate the internal startup process or troubleshoot errors, the pod logs can be streamed:
kubectl logs -f quickstart-es-default-0
Additionally, the ECK operator automatically creates a ClusterIP service, which provides the necessary HTTP access to the cluster. Users can also inspect the Custom Resource Definition (CRD) to understand the full specification of the Elasticsearch resource:
kubectl describe crd elasticsearch
Storage Infrastructure and Volume Provisioning
Storage is the most critical component of an Elasticsearch deployment on Kubernetes. Because Elasticsearch is a stateful application, the choice of storage class and volume type directly impacts search performance, indexing throughput, and data durability.
The standard approach is to use a dedicated StorageClass to provision volumes. For example, using AWS EBS gp3 volumes provides a balance of cost and performance.
The following configuration defines a fast-ssd StorageClass:
yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iops: "3000"
throughput: "125"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
When selecting storage, different options provide different trade-offs:
- EBS gp3: Provides predictable performance with the ability to set custom baseline IOPS and throughput.
- EBS io2: Best for large-scale indexing or search clusters with heavy disk activity, offering higher durability and guaranteed performance.
- Instance store (ephemeral): Extremely fast but non-persistent. This option is only viable when data can be fully rebuilt from upstream systems, such as log ingestion pipelines.
The use of ReadWriteOnce access mode is standard for Elasticsearch data volumes, ensuring that a volume is mounted by a single node at a time, which prevents data corruption in a distributed environment.
Networking, Node Awareness, and Resource Optimization
Elasticsearch is characterized as a "chatty" service. This means that inter-node communication for replication, cluster state updates, and shard allocation happens frequently and requires low latency.
To optimize networking, it is recommended to use instance types that support Enhanced Networking (ENA) and high network throughput. Examples include m5n, r5n, and c5n instances, which can provide 10 Gbps or more. To ensure these pods land on the correct hardware, nodeSelectors or affinity rules should be employed to pin critical pods to appropriate nodes.
A critical risk in Kubernetes deployments is the possibility of a primary shard and its replica being scheduled on the same physical node. If that node fails, both the primary and the replica are lost, leading to data loss. To mitigate this, shard allocation awareness must be configured.
The following settings should be applied to ensure replicas are distributed across distinct nodes:
cluster.routing.allocation.awareness.attributes: k8s_node_namenode.attr.k8s_node_name: ${HOSTNAME}
This configuration tells Elasticsearch to use the Kubernetes node name as the attribute for awareness, ensuring that the scheduler places replicas on different physical machines.
Alternative Deployment via StatefulSets
For teams that prefer a manual approach over the ECK operator, a StatefulSet is the standard Kubernetes primitive. A StatefulSet provides guarantees about the ordering and uniqueness of pods, which is essential for Elasticsearch.
A basic 3-node cluster using a StatefulSet is defined as follows:
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: elastic-system
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
ports:
- containerPort: 9200
name: rest
- containerPort: 9300
name: inter-node
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
- name: ES_JAVA_OPTS
value: "-Xms1g -Xmx1g"
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast-ssd"
resources:
requests:
storage: 100Gi
This configuration provides several key mechanisms:
- Stable Hostnames: Each pod is given a predictable name (e.g.,
elasticsearch-0), which is critical for thediscovery.seed_hostsconfiguration. - Inter-node Communication: Communication for cluster coordination is handled on port
9300. - Memory Management: The
ES_JAVA_OPTSenvironment variable is used to set the JVM heap size, in this case, to 1GiB. - Data Persistence: The
volumeClaimTemplatesensure that each pod gets its own 100GiB volume from thefast-ssdstorage class.
To deploy this configuration, the following command is used:
kubectl apply -f elasticsearch-statefulset.yaml
Integration with Kibana
Elasticsearch does not operate in a vacuum; it requires a visualization layer for data exploration and cluster management. Kibana serves as the primary UI for querying, visualizing, and debugging Elasticsearch data.
When deployed alongside Elasticsearch in a Kubernetes cluster, Kibana can be integrated via the ECK operator, which simplifies the connection between the visualization layer and the data indexing layer. This allows administrators to monitor the health of the cluster and visualize logs, metrics, and traces in real-time. For teams handling observability data, this integration is essential for managing high cardinality and ingestion spikes.
Conclusion
The deployment of Elasticsearch on Kubernetes is a sophisticated operation that requires a balanced approach to storage, networking, and resource allocation. The transition from raw YAML and StatefulSets to the Elastic Cloud on Kubernetes (ECK) operator represents a move toward full lifecycle automation, reducing the risk of operational failure while increasing the agility of the deployment.
The success of such a deployment hinges on three primary factors: storage performance, network throughput, and high availability. Utilizing fast SSD storage via appropriate StorageClasses ensures that indexing and search operations do not become I/O bound. Ensuring high network throughput through ENA-enabled instances prevents the "chatty" nature of Elasticsearch from becoming a bottleneck. Finally, implementing shard allocation awareness prevents catastrophic data loss by ensuring that replicas are physically separated across Kubernetes worker nodes.
For teams starting their journey, the recommendation is to begin with a small deployment using EKS or a similar managed service to reduce the initial setup overhead. As the cluster grows and the data complexity increases—particularly when dealing with observability logs and metrics—tuning the resource limits and moving toward a dedicated operator like ECK provides the necessary stability and scalability. Ultimately, when tuned correctly, Elasticsearch on Kubernetes provides a robust, self-healing infrastructure capable of handling the most demanding search and analytics workloads.