Orchestrating Observability: A Comprehensive Guide to Implementing the EFK and ELK Stacks for Kubernetes Logging

The landscape of modern software engineering has been fundamentally transformed by Kubernetes, a Greek term meaning pilot. This orchestration platform provides an unparalleled suite of in-built observability, self-healing capabilities, and metrics management. However, despite its strengths, Kubernetes possesses a critical vulnerability regarding log persistence: the ephemeral nature of its resources. In a standard Kubernetes environment, resources are designed to be transient. When a pod is destroyed or a server is decommissioned—events that are perfectly normal in a dynamic cloud environment—the logs associated with those resources effectively disappear into the ether.

Without a sophisticated logging strategy, engineers are forced into an archaic, 2005-style workflow, attempting to SSH into specific servers to locate rolled-over log files. This approach is not only inefficient but often impossible in managed environments. Furthermore, attempting to extract logs directly through the Kubernetes API during a failure event places additional, unnecessary stress on the very API required to orchestrate the application architecture. For any organization managing a complex web of microservices, a persistent, reliable, and searchable logging strategy is not merely a luxury; it is a non-negotiable technical requirement. The solution lies in the implementation of a centralized logging pipeline, typically utilizing the Elastic Stack (Elasticsearch, Logstash, Kibana) or the EFK Stack (Elasticsearch, Fluentd, Kibana).

The Architecture of Kubernetes Log Generation

Before implementing a collection strategy, it is essential to understand where logs originate within a cluster. Kubernetes generates several distinct categories of logs, each serving a different diagnostic purpose.

The primary layers of log generation include:

Audit logs: These provide a global understanding of the changes being applied to the cluster. They are indispensable for troubleshooting security events and configuration drift.
OS system level logs: These are generated by the underlying nodes and the container runtime, providing insight into the health of the hardware and kernel.
Events: These are high-level notifications regarding the state of the cluster and the lifecycle of its objects.
Application logs: These are the stdout and stderr streams produced by the containers running within the pods.

The technical mechanism for application logging relies on the kubelet, which writes logs to a specific location on the host filesystem. These logs are stored as files named after the pod ID. To link a specific log entry to a component, the system must identify which component pods are running on the host and correlate them with their respective IDs. This process is complicated by the fact that Kubernetes may scale applications in or out, meaning the pod count representing a specific component is constantly fluctuating.

Initializing a Log-Generating Environment

To demonstrate the transition from a "clear cluster" to a "well-oiled log collecting machine," a baseline application must be deployed. A common method for testing logging pipelines is the use of a busybox container designed to emit a steady stream of data.

The deployment requires a configuration file named busybox.yaml:

yaml apiVersion: v1 kind: Pod metadata: name: counter spec: containers: - name: count image: busybox args: [/bin/sh, -c, 'i=0; while true; do echo "$i: Hello"; i=$((i+1)); sleep 1; done']

To deploy this workload into the cluster, the following command is executed:

kubectl apply -f busybox.yaml

This creates a pod that pushes one log message per second, providing a consistent stream of data for the logging agent to ingest.

Implementing the EFK Stack with Fluentd

The EFK stack replaces Logstash with Fluentd, which is often preferred in Kubernetes environments due to its lightweight nature and specialized plugins for container orchestration.

The DaemonSet Deployment Model

The most production-ready method for deploying Fluentd is as a DaemonSet. A DaemonSet ensures that a copy of the Fluentd pod runs on every single node in the cluster. This is critical because logs are stored locally on the nodes; therefore, the collector must have a local presence on every host to harvest those files.

The flow of data in this architecture is as follows:
1. Fluentd runs on the node.
2. It collects logs from the local server.
3. It pushes those logs to an external Elasticsearch cluster.
4. The logs are then indexed and made searchable via Kibana.

Deployment via Raw YAML

Deploying via raw YAML is the preferred method for those who require explicit control over the cluster changes. This approach allows the operator to define exactly how the DaemonSet interacts with the node's filesystem and how it communicates with the Elasticsearch backend.

Deployment via Helm

For organizations seeking to reduce the complexity of YAML management, Helm provides a streamlined alternative. Helm abstracts the intricate resource lists into a single configuration file.

To transition from a raw YAML deployment to a Helm-managed deployment, the existing DaemonSet must first be removed:

kubectl delete -f fluentd-daemonset.yaml

Following this, a fluentd-daemonset-values.yaml file is created to parameterize the deployment, allowing for easier updates and version control of the logging infrastructure.

Implementing the ELK Stack with Filebeat

An alternative to Fluentd is the use of Filebeat, a lightweight shipper designed to send data to Elasticsearch. Filebeat is particularly effective because it is designed to handle "moving targets" like Kubernetes pods.

The Role of Filebeat as a DaemonSet

Filebeat is deployed as a DaemonSet to ensure total coverage of the cluster. Its primary function is to communicate with the local kubelet API to retrieve the list of pods currently running on the host.

The technical advantages of using Filebeat include:

Metadata Annotation: Filebeat automatically annotates logs with Kubernetes metadata, including the pod ID, container name, labels, and annotations.
Module Integration: By using these annotations, Filebeat can decide which specific logging module to apply to the ingested data.
NGINX Integration: When activated, Filebeat can detect anomalies within NGINX stdout and stderr data, providing immediate visibility into web server errors.

Data Analysis and Visualization in Kibana

Once logs are ingested into Elasticsearch, Kibana serves as the visualization layer. This allows users to move from volatile, unstable log storage to an external, reliable, and highly searchable repository.

Querying Specific Application Logs

To locate the logs from the previously deployed counter application, users navigate to the Discover screen (represented by the compass icon) and utilize the following search query:

kubernetes.pod_name.keyword: counter

The use of the .keyword suffix ensures that the query is treated as an exact match, allowing the logs from the specific application to spring up on the screen.

Leveraging Kubernetes Labels

One of the most powerful aspects of the EFK/ELK integration is the automatic ingestion of labels. These labels require no manual configuration and allow for complex queries based on:

Namespace: Filtering logs by the logical partition of the cluster.
Host Server: Isolating logs to a specific physical or virtual node.
Component: Identifying logs from a specific microservice.

Advanced Monitoring: The ETCD Case Study

A sophisticated logging strategy allows for the monitoring of low-level system components that are often overlooked, such as the ETCD database. ETCD is the primary key-value store for Kubernetes, and its health is vital for cluster stability.

Monitoring ETCD Compaction

ETCD performs a process called "compaction" at regular intervals to maintain performance by cleaning up the keyspace. Monitoring the frequency of this process can prevent catastrophic failures.

To visualize this in Kibana:
1. Navigate to the Visualise button.
2. Select the Logstash index.
3. Use the following Lucene syntax in the search bar:

kubernetes.labels.component.keyword: "etcd" and message.keyword: *finished scheduled compaction*

By adding a Date Histogram to the X-axis using the @timestamp field, a "saw-tooth" graph is generated. This visualization provides a direct insight into the frequency and success of the ETCD scheduled compaction, turning raw log data into a proactive monitoring tool.

Technical Comparison of Logging Agents

The choice between Fluentd and Filebeat often depends on the specific needs of the infrastructure.

Feature	Fluentd (EFK)	Filebeat (ELK)
Deployment	DaemonSet	DaemonSet
Resource Usage	Moderate	Very Low
Configuration	Complex YAML/Config files	Simple YAML / Modules
Metadata Handling	High (via plugins)	Native (via Kubelet API)
Primary Strength	Flexibility and Routing	Speed and Low Overhead

Conclusion: The Path to Full Observability

The transition from basic kubectl logs commands to a full-scale ELK or EFK implementation represents a fundamental shift in operational maturity. While basic Kubernetes logging is sufficient for a development environment, it is wholly inadequate for production. The volatility of pods means that without a centralized shipper like Filebeat or Fluentd, critical forensic data is lost during the very moments it is most needed—during a system crash or a pod restart.

A truly observable system does not stop at logs. To achieve complete visibility, the logging pipeline should be integrated with other observability pillars. This includes using Elastic Uptime to monitor host availability and instrumenting applications with Elastic APM (Application Performance Monitoring). By combining the searchability of Elasticsearch, the visualization of Kibana, and the specialized ingestion of Filebeat or Fluentd, organizations can move beyond simple troubleshooting and into the realm of predictive analysis, such as monitoring ETCD compaction trends to prevent database degradation. This integrated approach ensures that the "pilot" of the Kubernetes cluster has a clear, persistent, and detailed map of every event occurring within the environment.