The modern microservices architecture generates a deluge of log data that demands centralized aggregation, parsing, and visualization. The ELK stack—comprising Elasticsearch, Logstash, and Kibana—remains the industry standard for handling these data streams, offering scalable search, log processing, and interactive dashboards. However, deploying this stack in a Kubernetes environment introduces specific complexities regarding stateful storage, resource allocation, and service discovery. This analysis details the methodologies for deploying ELK on Kubernetes, ranging from cloud-managed clusters with Helm charts to bare-metal or Minikube setups using native Kubernetes manifests. Understanding the architectural nuances of Elasticsearch nodes and shards, alongside the precise configuration of Logstash pipelines and Filebeat agents, is critical for establishing a robust observability infrastructure.
Architectural Foundations and Component Roles
Before initiating deployment, it is essential to understand the distinct roles and internal architecture of each ELK component, as these characteristics dictate their Kubernetes deployment strategies.
- Elasticsearch serves as the scalable search and analytics engine. It functions as a log analytics tool and an application-formed database, making it suitable for data-driven applications. Its architecture relies on nodes, which are dedicated servers running Elasticsearch binaries to handle search and analytics tasks. The database space is logically divided into shards, enabling faster data accessibility and distribution across the cluster. Data within Elasticsearch is organized into indices, which facilitate efficient data management.
- Logstash acts as the log-processing intermediary. It collects logs from various sources, parses them into a structured format, and transmits them to Elasticsearch for storage and analysis. This component is crucial for transforming raw, unstructured log data into queryable information.
- Kibana provides the visualization layer. It is a powerful tool that allows users to explore and analyze data stored in Elasticsearch through interactive charts, graphs, and dashboards. This front-end interface is where the aggregated data becomes actionable intelligence for operations and development teams.
Establishing the Kubernetes Infrastructure
The deployment method varies significantly depending on whether the target environment is a cloud-managed service like Google Kubernetes Engine (GKE) or an on-premise/minikube setup. Resource requirements, particularly memory, are a critical consideration for Elasticsearch.
Google Kubernetes Engine Configuration
Setting up a Kubernetes cluster on Google Cloud involves configuring specific environment variables and selecting appropriate machine types. By default, GKE creates clusters with nodes of the e2-medium machine type, which provides 2 vCPUs and 4 GB of memory. This configuration is insufficient for running Elasticsearch effectively. Therefore, administrators must select a machine type with higher memory capacity.
To list available machine types in a specific location, the following command is used:
bash
gcloud compute machine-types list --filter="us-central1-a"
After selecting a suitable machine type, such as e2-standard-4, the cluster is created using the GKE CLI. The process involves defining the cluster name and location:
```bash
LOCATION=us-central1-a
CLUSTER_NAME=kubetest
gcloud container clusters create $CLUSTER_NAME \
--zone $LOCATION \
--node-locations $LOCATION \
--machine-type e2-standard-4
```
This command results in a running cluster with the specified configuration. The output typically displays details such as the master version, master IP, machine type, node version, and status. For example, a successful creation might show a cluster named kubetest in us-central1-a with a master version of 1.28.8-gke.1095000 and a running status.
To remove the cluster after testing or decommissioning, the following command is executed:
bash
gcloud container clusters delete $CLUSTER_NAME --location $LOCATION
Bare-Metal and Minikube Considerations
For environments outside of major cloud providers, such as bare-metal servers or Minikube, the setup process relies on native Kubernetes manifests rather than Helm. This approach has been tested on Minikube and bare-metal Kubernetes clusters. The initial step involves cloning the necessary repository to download the configuration files locally:
bash
git clone https://github.com/hussainaphroj/ELK-kubernetes.git
Administrators may also need to set up the Kubernetes cluster itself using a separate setup repository if one does not already exist.
Deploying Elasticsearch
The deployment strategy for Elasticsearch differs based on the tooling available. Helm charts offer a streamlined approach for GKE, while native manifests are used for Minikube and bare-metal setups.
Helm-Based Deployment on GKE
When using GKE, Helm charts from the official Elastic repository are the preferred method for deployment. Before proceeding, Helm must be installed on the local machine. The first step is to add the Elastic Helm charts repository to the local Helm configuration.
Once the repository is added, Elasticsearch is deployed using Helm. After the deployment is complete, it is necessary to retrieve the credentials for accessing the Elasticsearch cluster. The username and password are stored in Kubernetes secrets and must be decoded from Base64 format.
To retrieve the username:
bash
kubectl get secrets --namespace=monit elasticsearch-master-credentials -ojsonpath='{.data.username}' | base64 -d
To retrieve the password:
bash
kubectl get secrets --namespace=monit elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d
Native Manifest Deployment for Minikube and Bare-Metal
In environments without Helm, the deployment is handled through direct application of YAML manifests. The process begins by creating a service account with read access to services, endpoints, and namespaces. This is achieved by applying the rbac.yml file:
bash
kubectl apply -f rbac.yml
Next, the Elasticsearch cluster is established using a StatefulSet. This ensures that persistent storage is correctly associated with each node. The deployment is executed with:
bash
kubectl apply -f elastic.yml
Following the StatefulSet creation, an Elasticsearch service is created to facilitate network access within the cluster:
bash
kubectl apply -f elastic-service.yml
To verify the deployment, administrators can forward the service ports to the local machine. This allows access to the Elasticsearch API via a web browser:
bash
kubectl port-forward -n kube-system svc/elasticsearch-logging 9200:9200
Accessing http://localhost:9200 in a browser should return the cluster information, confirming that the Elasticsearch cluster is operational.
Integrating Logstash and Filebeat
With Elasticsearch running, the next phase involves setting up the data ingestion pipeline. Logstash processes the logs, and Filebeat ships them to Logstash.
Logstash Deployment
Logstash is responsible for receiving logs from various sources and formatting them in a way that Elasticsearch can understand. In native manifest deployments, this involves applying both the configuration and the deployment files:
bash
kubectl apply -f logstash-config.yml && kubectl apply -f logstash-deployment.yml
Filebeat Agent Deployment
Filebeat is a lightweight shipper for logs. To ensure that logs from all nodes in the cluster are collected, Filebeat is deployed as a DaemonSet. This Kubernetes resource ensures that a Filebeat pod runs on every node in the cluster.
In typical Kubernetes environments, such as Amazon EKS, container logs are stored in the /var/log/containers directory. Administrators should verify the existence of files in this directory on their nodes. If the directory is empty or logs are not appearing, the path configuration in the Filebeat manifest may need to be adjusted to match the specific node's log location.
The Filebeat DaemonSet is deployed using:
bash
kubectl apply -f filebeat-daemon-set.yml
Visualizing Data with Kibana
The final component of the stack is Kibana, which provides the user interface for exploring the aggregated logs.
Kibana Deployment and Access
In native Kubernetes setups, Kibana is deployed using a dedicated manifest file:
bash
kubectl apply -f kibana.yml
The service type used for Kibana determines how it is accessed. A LoadBalancer service type is commonly used, but a NodePort type can also be utilized. If using NodePort, the service is accessed via the node's IP address and the assigned port. For Minikube users, the public IP of the LoadBalancer can be retrieved and the service exposed using:
bash
minikube service kibana-logging -n kube-system
This command provides a URL that can be opened in a web browser to access the Kibana interface.
Index Pattern Creation and Log Discovery
Once Kibana is accessible, the next step is to configure it to read the data from Elasticsearch. This involves creating an index pattern. In the Kibana interface, navigate to the "Discover" console. Create an index pattern, typically named logstash-*, and select the @timestamp field as the time filter.
After creating the index pattern, users can create a data view to visualize the logs. If logs do not appear immediately, it may be necessary to generate traffic to the deployed applications. A simple web application can be deployed to test the pipeline:
bash
kubectl apply -f web-deployment.yml
Once the application is running, making requests to it will generate logs. These logs should then appear in the Kibana "Discover" tab. Users can filter logs based on Kubernetes label names and error types to troubleshoot issues or monitor application health.
Validation and Troubleshooting
A critical aspect of ELK deployment is validation. After all components are running, administrators should verify the end-to-end flow of data.
In environments using the comprehensive guide approach, administrators navigate to the eks/manifests folder from the cloned repository and deploy a test application:
bash
kubectl apply -f app -n default
After the application is installed, revisiting Kibana allows for the creation of the Elasticsearch index. If logs are not visible, troubleshooting steps include checking the Filebeat configuration to ensure it is pointing to the correct log directory (e.g., /var/log/containers in EKS) and verifying that Logstash is successfully receiving and parsing the data before sending it to Elasticsearch.
Conclusion
Deploying the ELK stack on Kubernetes empowers organizations with robust log analysis and data-driven insights. Whether utilizing Helm charts on a cloud-managed GKE cluster or native manifests on Minikube and bare-metal infrastructure, the core principles remain consistent: ensure adequate resources for Elasticsearch, configure secure access, and establish a reliable pipeline from Filebeat through Logstash to Elasticsearch. Kibana then transforms this raw data into meaningful visualizations, enabling efficient log management and troubleshooting. By understanding the architectural components—nodes, shards, indices—and the deployment mechanics of each service, administrators can build a scalable and resilient observability platform that seamlessly handles large data streams.