Architecting Scalable ELK Stacks on Docker: From Single-Node to Clustered Deployments

The deployment of the Elastic Stack—comprising Elasticsearch, Logstash, and Kibana, collectively known as the ELK stack—within containerized environments presents distinct architectural challenges. While the components are often bundled for convenience, production-grade logging infrastructures require a nuanced understanding of container isolation, networking, and scaling properties. For environments ingesting significant log volumes, such as 20 gigabytes per day from approximately 80 server instances and multiple Rails applications, the choice between monolithic single-container setups and modular multi-container architectures is critical. The prevailing best practice for production environments dictates running one service per container, resulting in a minimum of three distinct containers. This approach decouples the scaling requirements of storage, ingestion, and visualization, allowing each component to be optimized independently.

Architectural Patterns: Monolithic vs. Modular Deployment

The foundational decision in deploying the ELK stack via Docker is whether to utilize a pre-built all-in-one image or to separate services into individual containers. A common approach for development or low-volume testing involves pulling a consolidated image, such as sebp/elk, which runs Elasticsearch, Logstash, and Kibana within a single container process. This method simplifies initial setup by exposing ports 5601 (Kibana), 9200 (Elasticsearch), and 5044 (Logstash) simultaneously from one unit. However, this architecture lacks the flexibility required for high-throughput production environments.

In contrast, a modular architecture deploys three separate containers, each dedicated to a single service. This separation is not merely a matter of organizational preference but a technical necessity for scaling. The three components possess divergent scaling properties. Elasticsearch containers may need to be scaled horizontally to increase storage capacity, ensure redundancy through replica shards, and improve query performance. Logstash containers scale based on the rate of data ingestion, requiring more resources as the volume of incoming logs increases. Kibana containers, by contrast, are scaled strictly to handle interactive queries and user interface loads. By isolating these services, administrators can add additional Elasticsearch nodes without inadvertently increasing Logstash or Kibana resource consumption, or vice versa.

Networking Strategies: User-Defined Networks and Legacy Links

Effective communication between the log-emitting application containers and the ELK stack containers relies on robust Docker networking configurations. Modern Docker environments utilize user-defined networks to facilitate container discovery and communication. When deploying an all-in-one ELK container, it should be assigned a specific name and attached to a dedicated network, such as elknet. The command to initiate such a container involves mapping the necessary ports and specifying the network attachment:

sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk --network=elknet sebp/elk

Once the ELK container is operational on this network, log-emitting containers can be started on the same network. From the perspective of the application container, the ELK container is resolvable via the hostname elk, which must be configured in the filebeat.yml configuration file under the hosts parameter. This hostname resolution is automatic within the user-defined network, eliminating the need for static IP configuration.

Historically, Docker relied on the --link option to connect containers over the default bridge network. While this method remains functional, it is considered a deprecated legacy feature that may be removed in future Docker releases. In a linked setup, the ELK container is started with a name, and the application container uses the --link flag to establish a connection:

sudo docker run -p 80:80 -it --link elk:elk your/image

In Docker Compose configurations, this legacy linkage can be represented explicitly, though modern Compose files typically prefer the networks directive for cleaner separation. An example of a Compose file utilizing the legacy link structure includes:

```
yourapp:
image: your/image
ports:
- "80:80"
links:
- elk

elk:
image: sebp/elk
ports:
- "5601:5601"
- "9200:9200"
- "5044:5044"
```

Despite the availability of legacy links, the use of user-defined networks is the recommended standard for production stability and interoperability.

Docker Compose Implementation for Modern Stacks

For production deployments, particularly those utilizing Elasticsearch 8.x, Docker Compose provides a structured method to orchestrate multiple containers. This approach allows for the definition of complex dependencies, environment variables, and data persistence strategies. A robust setup begins with creating a dedicated directory structure, often located outside the user’s home directory for better permission management, such as /opt/containers.

sudo mkdir -p /opt/containers && cd /opt/containers sudo chown -R holu /opt/containers

Within this directory, a project folder is created to house the docker-compose.yaml file and a persistent data directory for Elasticsearch. Elasticsearch requires external volume mounting to prevent data loss upon container restarts. The directory must have correct ownership permissions, typically mapped to the user ID 1000, which is the default user for the Elasticsearch process in official images.

mkdir elk-stack && cd elk-stack && touch docker-compose.yaml && mkdir esdata && sudo chown -R 1000:1000 esdata

The docker-compose.yaml file defines the services required for the stack. A common pattern involves a setup service that runs a temporary container to initialize security credentials before the main services start. This setup container waits for Elasticsearch to become available and then sets the password for the kibana_system user.

```
version: "3"

services:
setup:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.1
environment:
- ELASTICPASSWORD=${ELASTICPASSWORD}
- KIBANAPASSWORD=${KIBANAPASSWORD}
containername: setup
command:
- bash
- -c
- |
echo "Waiting for Elasticsearch availability";
until curl -s http://elasticsearch:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
echo "Setting kibanasystem password";
until curl -s -X POST -u "elastic:${ELASTICPASSWORD}" -H "Content-Type: application/json" http://elasticsearch:9200/security/user/kibanasystem/password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
echo "All done!";
```

This initialization step ensures that the subsequent Elasticsearch and Kibana containers can authenticate correctly, addressing the security requirements introduced in Elasticsearch 8.x.

Selective Service Initialization and Cluster Expansion

While all-in-one images like sebp/elk start all three services by default, production optimization often requires starting only specific services. This is achieved through environment variables that control the startup behavior of the container. Setting ELASTICSEARCH_START or LOGSTASH_START to any value other than 1 prevents the respective service from launching. This capability is crucial when distributing the stack across multiple hosts, where one node might run the full stack while additional nodes run only Elasticsearch to expand storage capacity.

Expanding an Elasticsearch cluster beyond a single node requires careful configuration of discovery settings. In a multi-node setup, each node must be aware of the others to form a coherent cluster. This is traditionally managed via the elasticsearch.yml configuration file. For a slave or additional node, the configuration must specify the network host and the unicast hosts for discovery.

network.host: 0.0.0.0 network.publish_host: <reachable IP address or FQDN> discovery.zen.ping.unicast.hosts: ["elk-master.example.com"]

This configuration file is mounted into the container via volume binding when starting the service. The command to launch an additional Elasticsearch node using a custom configuration file appears as follows:

sudo docker run -it --rm=true -p 9200:9200 -p 9300:9300 \ -v /home/elk/elasticsearch-slave.yml:/etc/elasticsearch/elasticsearch.yml \ sebp/elk

Note that when adding additional Elasticsearch nodes, port 9200 may already be in use by the primary node on the host. Therefore, careful port mapping and network isolation are essential to avoid conflicts. The network.publish_host directive ensures that the node advertises a reachable address, facilitating cluster formation across different physical or virtual hosts.

Cluster Health and Shard Management

After deploying multiple Elasticsearch nodes, verifying the health of the cluster is a mandatory step. The cluster health API provides insights into the status of shards and nodes. A typical health check via curl reveals the operational state of the cluster.

curl http://elk-master.example.com:9200/_cluster/health?pretty

The response includes critical metrics such as the number of nodes, active primary shards, and unassigned shards. A status of yellow indicates that while all primary shards are active, not all replica shards are assigned. This is expected in a single-node cluster where replicas cannot be allocated due to the lack of additional nodes. As additional nodes are joined to the cluster, the status should transition to green, indicating that both primary and replica shards are active and the cluster is fully redundant.

json { "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 6, "active_shards" : 6, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 6, "delayed_unassigned_shards" : 6, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 50.0 }

Understanding these metrics allows administrators to diagnose issues related to node discovery, network connectivity, or resource constraints. When the cluster expands and unassigned shards are successfully allocated to new nodes, the unassigned_shards count drops to zero, and the status shifts to green, confirming a healthy, redundant logging infrastructure.

Image Construction and ARM64 Support

For environments requiring custom configurations or specific platform support, building the Docker image from source is an option. The process involves cloning the Git repository containing the Dockerfile and executing the build command. For vanilla Docker usage, the command is:

sudo docker build -t <repository-name> .

In a Compose environment, the build is initiated by:

sudo docker-compose build elk

Once built, the image can be run using the standard docker run or docker-compose up commands. Additionally, support for ARM64 architectures, such as those found in Apple Silicon Macs or certain cloud instances, requires specific build considerations. The underlying image repositories often provide multi-architecture support, but explicit building may be necessary to ensure compatibility with non-x86 hardware.

Conclusion

Deploying the ELK stack on Docker requires moving beyond simplistic all-in-one containers to embrace modular, scalable architectures. For production environments handling substantial log volumes, separating Elasticsearch, Logstash, and Kibana into distinct containers enables independent scaling and optimized resource utilization. The transition from legacy container linking to user-defined networks ensures robust and maintainable inter-container communication. Furthermore, utilizing Docker Compose for orchestration, combined with proper volume management for data persistence and selective service initialization, provides a flexible foundation for expanding the cluster. Monitoring cluster health through shard analysis ensures that redundancy and data integrity are maintained as the infrastructure grows.