Orchestrating Centralized Logging: Deploying the ELK Stack on Kubernetes via Helm and Native Manifests

The modern microservices architecture generates a deluge of log data that demands centralized aggregation, parsing, and visualization. The ELK stack—comprising Elasticsearch, Logstash, and Kibana—remains the industry standard for handling these data streams, offering scalable search, log processing, and interactive dashboards. However, deploying this stack in a Kubernetes environment introduces specific complexities regarding stateful storage, resource allocation, and service discovery. This analysis details the methodologies for deploying ELK on Kubernetes, ranging from cloud-managed clusters with Helm charts to bare-metal or Minikube setups using native Kubernetes manifests. Understanding the architectural nuances of Elasticsearch nodes and shards, alongside the precise configuration of Logstash pipelines and Filebeat agents, is critical for establishing a robust observability infrastructure.

Architectural Foundations and Component Roles

Before initiating deployment, it is essential to understand the distinct roles and internal architecture of each ELK component, as these characteristics dictate their Kubernetes deployment strategies.

  • Elasticsearch serves as the scalable search and analytics engine. It functions as a log analytics tool and an application-formed database, making it suitable for data-driven applications. Its architecture relies on nodes, which are dedicated servers running Elasticsearch binaries to handle search and analytics tasks. The database space is logically divided into shards, enabling faster data accessibility and distribution across the cluster. Data within Elasticsearch is organized into indices, which facilitate efficient data management.
  • Logstash acts as the log-processing intermediary. It collects logs from various sources, parses them into a structured format, and transmits them to Elasticsearch for storage and analysis. This component is crucial for transforming raw, unstructured log data into queryable information.
  • Kibana provides the visualization layer. It is a powerful tool that allows users to explore and analyze data stored in Elasticsearch through interactive charts, graphs, and dashboards. This front-end interface is where the aggregated data becomes actionable intelligence for operations and development teams.

Establishing the Kubernetes Infrastructure

The deployment method varies significantly depending on whether the target environment is a cloud-managed service like Google Kubernetes Engine (GKE) or an on-premise/minikube setup. Resource requirements, particularly memory, are a critical consideration for Elasticsearch.

Google Kubernetes Engine Configuration

Setting up a Kubernetes cluster on Google Cloud involves configuring specific environment variables and selecting appropriate machine types. By default, GKE creates clusters with nodes of the e2-medium machine type, which provides 2 vCPUs and 4 GB of memory. This configuration is insufficient for running Elasticsearch effectively. Therefore, administrators must select a machine type with higher memory capacity.

To list available machine types in a specific location, the following command is used:

bash gcloud compute machine-types list --filter="us-central1-a"

After selecting a suitable machine type, such as e2-standard-4, the cluster is created using the GKE CLI. The process involves defining the cluster name and location:

```bash
LOCATION=us-central1-a
CLUSTER_NAME=kubetest

gcloud container clusters create $CLUSTER_NAME \
--zone $LOCATION \
--node-locations $LOCATION \
--machine-type e2-standard-4
```

This command results in a running cluster with the specified configuration. The output typically displays details such as the master version, master IP, machine type, node version, and status. For example, a successful creation might show a cluster named kubetest in us-central1-a with a master version of 1.28.8-gke.1095000 and a running status.

To remove the cluster after testing or decommissioning, the following command is executed:

bash gcloud container clusters delete $CLUSTER_NAME --location $LOCATION

Bare-Metal and Minikube Considerations

For environments outside of major cloud providers, such as bare-metal servers or Minikube, the setup process relies on native Kubernetes manifests rather than Helm. This approach has been tested on Minikube and bare-metal Kubernetes clusters. The initial step involves cloning the necessary repository to download the configuration files locally:

bash git clone https://github.com/hussainaphroj/ELK-kubernetes.git

Administrators may also need to set up the Kubernetes cluster itself using a separate setup repository if one does not already exist.

Deploying Elasticsearch

The deployment strategy for Elasticsearch differs based on the tooling available. Helm charts offer a streamlined approach for GKE, while native manifests are used for Minikube and bare-metal setups.

Helm-Based Deployment on GKE

When using GKE, Helm charts from the official Elastic repository are the preferred method for deployment. Before proceeding, Helm must be installed on the local machine. The first step is to add the Elastic Helm charts repository to the local Helm configuration.

Once the repository is added, Elasticsearch is deployed using Helm. After the deployment is complete, it is necessary to retrieve the credentials for accessing the Elasticsearch cluster. The username and password are stored in Kubernetes secrets and must be decoded from Base64 format.

To retrieve the username:

bash kubectl get secrets --namespace=monit elasticsearch-master-credentials -ojsonpath='{.data.username}' | base64 -d

To retrieve the password:

bash kubectl get secrets --namespace=monit elasticsearch-master-credentials -ojsonpath='{.data.password}' | base64 -d

Native Manifest Deployment for Minikube and Bare-Metal

In environments without Helm, the deployment is handled through direct application of YAML manifests. The process begins by creating a service account with read access to services, endpoints, and namespaces. This is achieved by applying the rbac.yml file:

bash kubectl apply -f rbac.yml

Next, the Elasticsearch cluster is established using a StatefulSet. This ensures that persistent storage is correctly associated with each node. The deployment is executed with:

bash kubectl apply -f elastic.yml

Following the StatefulSet creation, an Elasticsearch service is created to facilitate network access within the cluster:

bash kubectl apply -f elastic-service.yml

To verify the deployment, administrators can forward the service ports to the local machine. This allows access to the Elasticsearch API via a web browser:

bash kubectl port-forward -n kube-system svc/elasticsearch-logging 9200:9200

Accessing http://localhost:9200 in a browser should return the cluster information, confirming that the Elasticsearch cluster is operational.

Integrating Logstash and Filebeat

With Elasticsearch running, the next phase involves setting up the data ingestion pipeline. Logstash processes the logs, and Filebeat ships them to Logstash.

Logstash Deployment

Logstash is responsible for receiving logs from various sources and formatting them in a way that Elasticsearch can understand. In native manifest deployments, this involves applying both the configuration and the deployment files:

bash kubectl apply -f logstash-config.yml && kubectl apply -f logstash-deployment.yml

Filebeat Agent Deployment

Filebeat is a lightweight shipper for logs. To ensure that logs from all nodes in the cluster are collected, Filebeat is deployed as a DaemonSet. This Kubernetes resource ensures that a Filebeat pod runs on every node in the cluster.

In typical Kubernetes environments, such as Amazon EKS, container logs are stored in the /var/log/containers directory. Administrators should verify the existence of files in this directory on their nodes. If the directory is empty or logs are not appearing, the path configuration in the Filebeat manifest may need to be adjusted to match the specific node's log location.

The Filebeat DaemonSet is deployed using:

bash kubectl apply -f filebeat-daemon-set.yml

Visualizing Data with Kibana

The final component of the stack is Kibana, which provides the user interface for exploring the aggregated logs.

Kibana Deployment and Access

In native Kubernetes setups, Kibana is deployed using a dedicated manifest file:

bash kubectl apply -f kibana.yml

The service type used for Kibana determines how it is accessed. A LoadBalancer service type is commonly used, but a NodePort type can also be utilized. If using NodePort, the service is accessed via the node's IP address and the assigned port. For Minikube users, the public IP of the LoadBalancer can be retrieved and the service exposed using:

bash minikube service kibana-logging -n kube-system

This command provides a URL that can be opened in a web browser to access the Kibana interface.

Index Pattern Creation and Log Discovery

Once Kibana is accessible, the next step is to configure it to read the data from Elasticsearch. This involves creating an index pattern. In the Kibana interface, navigate to the "Discover" console. Create an index pattern, typically named logstash-*, and select the @timestamp field as the time filter.

After creating the index pattern, users can create a data view to visualize the logs. If logs do not appear immediately, it may be necessary to generate traffic to the deployed applications. A simple web application can be deployed to test the pipeline:

bash kubectl apply -f web-deployment.yml

Once the application is running, making requests to it will generate logs. These logs should then appear in the Kibana "Discover" tab. Users can filter logs based on Kubernetes label names and error types to troubleshoot issues or monitor application health.

Validation and Troubleshooting

A critical aspect of ELK deployment is validation. After all components are running, administrators should verify the end-to-end flow of data.

In environments using the comprehensive guide approach, administrators navigate to the eks/manifests folder from the cloned repository and deploy a test application:

bash kubectl apply -f app -n default

After the application is installed, revisiting Kibana allows for the creation of the Elasticsearch index. If logs are not visible, troubleshooting steps include checking the Filebeat configuration to ensure it is pointing to the correct log directory (e.g., /var/log/containers in EKS) and verifying that Logstash is successfully receiving and parsing the data before sending it to Elasticsearch.

Conclusion

Deploying the ELK stack on Kubernetes empowers organizations with robust log analysis and data-driven insights. Whether utilizing Helm charts on a cloud-managed GKE cluster or native manifests on Minikube and bare-metal infrastructure, the core principles remain consistent: ensure adequate resources for Elasticsearch, configure secure access, and establish a reliable pipeline from Filebeat through Logstash to Elasticsearch. Kibana then transforms this raw data into meaningful visualizations, enabling efficient log management and troubleshooting. By understanding the architectural components—nodes, shards, indices—and the deployment mechanics of each service, administrators can build a scalable and resilient observability platform that seamlessly handles large data streams.

Sources

  1. ELK on Kubernetes with Helm Charts
  2. Setup the ELK stack for Kubernetes/microservices
  3. How to Deploy ELK Stack on Kubernetes - Comprehensive Guide

Related Posts