Architecting Persistent Observability: The Definitive Guide to Kubernetes Logging via the ELK Stack

The orchestration of containerized workloads via Kubernetes introduces a fundamental paradox in system administration: while the platform provides unparalleled scalability and self-healing capabilities, it simultaneously introduces a volatile environment for telemetry. In the traditional monolithic era, logs were written to persistent disks on static servers, allowing administrators to utilize tools like SSH to navigate directory structures and grep through flat files. However, Kubernetes is built upon the principle of ephemerality. Pods are designed to be disposable; they are created, destroyed, and rescheduled across a cluster of nodes based on resource availability and health checks.

When a Kubernetes pod is terminated, its local file system is wiped clean. This creates a catastrophic failure point for observability: any logs stored within the container's writable layer disappear into the ether. Without a centralized logging strategy, critical diagnostic data, security audit trails, and application performance insights are lost the moment a pod is evicted or scaled down. This "volatile storage" problem makes the implementation of a robust logging pipeline not just a luxury, but a non-negotiable requirement for any production-grade microservices architecture.

Furthermore, attempting to manually retrieve logs using the Kubernetes API for a large-scale cluster places an unsustainable burden on the control plane. The API server is designed for orchestration, not for the heavy lifting of streaming gigabytes of log data. A sophisticated logging strategy decouples the collection of logs from the API, ensuring that the cluster remains stable while providing a searchable, persistent repository for all system and application events. The ELK stack—comprised of Elasticsearch, Logstash (or Fluentd/Filebeat), and Kibana—serves as the industry standard for solving this challenge by transforming transient stdout/stderr streams into structured, queryable intelligence.

The Mechanics of Log Collection: DaemonSets and Data Acquisition

To solve the problem of ephemeral logs, the collection mechanism must reside as close to the source as possible. In Kubernetes, this is achieved by deploying the log collector as a DaemonSet. A DaemonSet ensures that a specific pod is running on every single node in the cluster. This is critical because logs produced by containers are written by the kubelet to the node's local filesystem, typically in a directory where files are named after the pod ID.

The role of the collector is to tail these files and ship them to a remote indexing engine. Depending on the chosen tool, the mechanism for discovery varies:

  • Fluentd as a DaemonSet: This approach involves deploying a Fluentd pod to every node. Fluentd acts as a unified logging layer that collects logs from the server itself and pushes them toward an Elasticsearch cluster. By utilizing a DaemonSet, the system ensures that no node is left unmonitored, regardless of how the cluster scales.
  • Filebeat as a DaemonSet: Filebeat is designed specifically as a lightweight shipper. It communicates directly with the local kubelet API to retrieve a list of pods currently running on the host and their associated IDs. This allows Filebeat to map the raw log file on the disk to the actual Kubernetes metadata, such as the container name, pod ID, and labels.

The technical necessity of using a DaemonSet is rooted in the need for local access to the node's filesystem. Because the kubelet manages the logs on the host, the collector must have the permissions and the physical presence on that host to read those files. If a collector were run as a single centralized service, it would lack the direct filesystem access required to tail logs from every node in a distributed cluster.

Deploying the Pipeline: From Raw YAML to Helm Abstractions

There are two primary methodologies for deploying the logging infrastructure: raw YAML manifests and Helm charts. Each approach offers different trade-offs regarding transparency and maintainability.

The Raw YAML Approach

Deploying via raw YAML is the most explicit method of cluster configuration. It involves defining the exact specifications of the DaemonSet, including the image, volume mounts (to access the node's /var/log directory), and environment variables.

The primary advantage of raw YAML is transparency. An engineer can see exactly what is being deployed, which is invaluable for debugging initial connectivity issues or fine-tuning resource limits. However, as a cluster grows and the number of microservices increases, managing hundreds of YAML files becomes an operational nightmare. This "YAML sprawl" leads to a lack of versioning and makes it difficult to maintain consistency across different environments (e.g., development, staging, and production).

To remove a raw YAML deployment, the following command is used:

kubectl delete -f fluentd-daemonset.yaml

The Helm Framework

Helm acts as the package manager for Kubernetes, abstracting the complex YAML manifests into "charts." Instead of managing dozens of separate files, an operator uses a single values.yaml file to tweak parameters.

For instance, when deploying Fluentd via the Kiwigrid repository, the process is streamlined into a few steps. First, the repository is added to the local Helm CLI:

helm repo add kiwigrid https://kiwigrid.github.io

Then, the installation is executed using a custom configuration file:

helm install fluentd-logging kiwigrid/fluentd-elasticsearch -f fluentd-daemonset-values.yaml

The fluentd-daemonset-values.yaml file allows the operator to specify the Elasticsearch host without touching the underlying deployment logic. For example:

yaml elasticsearch: hosts: ["10.0.2.2:9200"]

The shift to Helm provides significant production-ready advantages. One such advantage is the implementation of Role-Based Access Control (RBAC). Raw YAML deployments often tempt users to give pods "god powers" (cluster-admin privileges) to simplify setup. Helm charts, however, typically include predefined RBAC permissions that adhere to the principle of least privilege, ensuring the logging pod can read the necessary metadata without posing a security risk to the entire cluster.

Data Indexing and Management with Elasticsearch

Once Fluentd or Filebeat collects the logs, they are shipped to Elasticsearch, a distributed search and analytics engine. Elasticsearch transforms the unstructured text of a log into a structured document.

The Challenge of Data Volume

In a high-scale environment, the volume of logs generated can quickly exhaust the available disk space of the Elasticsearch cluster. If left unchecked, the cluster will enter a "disk watermark" state, where it stops accepting new data to prevent filesystem corruption. To mitigate this, a data retention strategy must be implemented.

The Elasticsearch Curator is a tool used to manage indices by deleting data that is no longer useful. This creates a "sliding window" of observability. For example, a common configuration is to delete indices older than seven days. This ensures that the organization always has one week of historical logs available for troubleshooting without filling the disks.

The Curator can be deployed via Helm as a CronJob:

helm install curator stable/elasticsearch-curator -f curator-values.yaml

Once deployed, the CronJob can be monitored to ensure it is executing its cleanup tasks:

NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
curator-elasticsearch-curator 0 1 * * * False 0 33s (varies)

This job runs daily, effectively automating the lifecycle of the log data. If an organization requires longer retention periods, they may need to move away from self-managed ELK to managed service options to avoid the operational overhead of managing massive indices.

Securing the Pipeline

Transporting logs in plain text is a security vulnerability. Therefore, credentials for the Elasticsearch cluster must be handled securely. Instead of hardcoding usernames and passwords into the Helm values file, Kubernetes Secrets should be used.

The Curator Helm chart allows for the population of environment variables from secrets using the envFromSecrets property. The syntax for this configuration is as follows:

yaml envFromSecrets: ES_USERNAME: from: secret: es-credentials key: 'ES_USERNAME' ES_PASSWORD: from: secret: es-credentials key: 'ES_PASSWORD'

This ensures that sensitive credentials are never stored in version control or visible in the pod specification, but are instead injected into the container at runtime.

Log Visualization and Analysis via Kibana

Kibana is the visualization layer that sits atop Elasticsearch. It transforms the raw JSON documents stored in the index into human-readable dashboards and searchable tables.

The Power of Metadata and Keywords

One of the most potent features of the ELK stack in Kubernetes is the ability to filter logs based on metadata. Because Fluentd and Filebeat annotate logs with Kubernetes-specific information, users can perform highly targeted queries.

For example, to find logs specifically from a pod named "counter," a user can navigate to the Kibana "Discover" screen and enter the following query in the search bar:

kubernetes.pod_name.keyword: counter

This utilizes the .keyword field, which allows for an exact match search rather than a full-text search, significantly increasing the speed and accuracy of the results.

Log Manipulation and Security Filtering

The pipeline allows for the modification of logs before they ever reach the index. This is a critical capability for security and compliance. Through Fluentd configuration, it is possible to implement filters that can:

  • Redact sensitive information: Automatically remove password fields or API keys from logs.
  • Delete problematic logs: Drop any log entry that contains a specific forbidden word (e.g., "password").

This creates a layer of security that protects the organization from accidental leakage of PII (Personally Identifiable Information) or secrets into the logging backend. However, a word of caution is necessary: increasing the complexity of these filters within the Helm values file can make the configuration unmaintainable. Over-engineering the logic in the values file can lead to "mysteriously lost logs," where logs are dropped by a filter that the engineer has forgotten exists.

Comparative Analysis of Log Shippers

The choice between Fluentd and Filebeat often depends on the required balance between resource consumption and processing power.

Feature Fluentd Filebeat
Resource Footprint Moderate (Ruby-based) Low (Go-based)
Processing Power High (Complex filtering/routing) Low (Primary focus is shipping)
K8s Integration High (via DaemonSet/Plugins) Native (via Kubelet API)
Use Case Complex log transformation High-performance, lightweight shipping

Fluentd is an ideal choice when the logs require significant transformation or need to be routed to multiple destinations (e.g., Elasticsearch and an S3 bucket). Filebeat is superior when the goal is to get logs from the node to Elasticsearch as quickly and efficiently as possible with minimal CPU overhead.

Conclusion: The Shift from Volatility to Reliability

The transition from basic kubectl logs commands to a full ELK stack represents a shift from reactive to proactive observability. By deploying collectors as DaemonSets, the system overcomes the ephemeral nature of Kubernetes pods, ensuring that no diagnostic data is lost during pod rotations or node failures.

The architectural journey from raw YAML to Helm abstractions reflects the natural evolution of a scaling cluster. While raw YAML provides the necessary transparency for initial setup, Helm provides the scalability and security (through RBAC and Secret management) required for production. The integration of the Elasticsearch Curator ensures that the logging infrastructure remains sustainable, preventing the "disk full" scenarios that often plague naive ELK implementations.

Ultimately, the value of this stack lies in the ability to query dimensions such as namespace, host server, and pod name without any manual configuration of the application code. This allows developers to focus on building features while providing operators with the tools to maintain a high-availability environment. For organizations that require even more advanced capabilities, such as machine learning-powered alerting, the transition from open-source Kibana to specialized platforms like Coralogix can further enhance the sophistication of the observability pipeline.

Sources

  1. Coralogix: Kubernetes Logging with Elasticsearch, Fluentd, and Kibana
  2. Elastic: Kubernetes Observability Tutorial - K8s Log Monitoring and Analysis

Related Posts