Architecting a Centralized Logging Ecosystem: The Definitive Guide to Deploying the Elastic Stack on Ubuntu 20.04

The modern digital infrastructure generates an astronomical volume of telemetry data, ranging from kernel logs and application traces to complex microservices interactions. For system administrators and DevOps engineers, the challenge is not merely the collection of this data, but the ability to search, analyze, and visualize it in real-time. This is where the Elastic Stack—formerly and still widely known as the ELK Stack—becomes indispensable. The Elastic Stack is a sophisticated collection of open-source software developed by Elastic that facilitates centralized logging. Centralized logging is the strategic practice of aggregating logs from disparate sources—regardless of their original format—into a single, searchable repository.

The primary utility of this architecture is the mitigation of "log fragmentation." In a traditional environment, an engineer would need to SSH into multiple individual servers to grep through local log files to find a specific error. By implementing the Elastic Stack, the engineer can correlate logs across multiple servers during a specific time frame, allowing for the identification of systemic issues that span the entire infrastructure. Whether the goal is to identify a memory leak in a Java application or to track a cascading failure across a Kubernetes cluster, the Elastic Stack provides the visibility required for rapid root cause analysis.

The Anatomy of the Elastic Stack

The Elastic Stack is not a single piece of software but a coordinated ecosystem of four primary components, each serving a distinct role in the data pipeline: from the edge of the network where data is generated, to the backend where it is indexed, and finally to the frontend where it is visualized.

The first core component is Elasticsearch. This is the heart of the stack, functioning as a distributed, RESTful search and analytics engine. Its primary role is to store the data and provide the computational power necessary to perform complex queries across millions of records with sub-second latency.

The second component is Kibana. This serves as the visualization layer and the primary dashboard for the user. Kibana allows administrators to transform raw JSON data from Elasticsearch into intuitive charts, maps, and heatmaps, providing a graphical interface to access and analyze the stored data.

The third component is Logstash. Logstash acts as the dynamic data collection pipeline. It is designed to ingest data from multiple sources simultaneously, transform and process that data using an extensive library of plugins, and then send it to a "sink," typically Elasticsearch.

The final component, and the most lightweight, is the Beats family. Beats are data shippers that reside on the edge machines (the servers producing the logs). Since Logstash can be resource-intensive, Beats are used to forward and centralize logs and files efficiently. A specific example is Filebeat, which monitors log files and ships them to either Logstash or directly to Elasticsearch.

Technical Requirements and Hardware Specifications

Deploying the Elastic Stack requires a careful balance of resources. Because Elasticsearch is memory-intensive due to its reliance on the Java Virtual Machine (JVM), under-provisioning the server will lead to instability or the dreaded "Out of Memory" (OOM) killer terminating the process.

The minimum baseline for a single-server installation on Ubuntu 20.04 requires 4GB of RAM and at least 2 CPU cores. However, for production environments or those handling high volumes of telemetry, 8GB of RAM is strongly recommended to ensure the JVM has enough headroom for indexing and searching.

The resource requirements can be broken down by component as follows:

Component RAM Requirement CPU Requirement Disk Requirement
Elasticsearch 2GB+ 2 cores 50GB+
Logstash 1GB+ 1 core 10GB
Kibana 1GB+ 1 core 1GB

Beyond hardware, the software prerequisites are strict. The system must be running Ubuntu 20.04 (though some configurations support 22.04). Most critically, the stack requires a Java Runtime Environment (JRE) or Java Development Kit (JDK). Depending on the version of the stack being installed, this could be OpenJDK 11 or OpenJDK 17. Furthermore, the installation must be performed by a user with sudo privileges, and the server should have Nginx installed to act as a reverse proxy for Kibana, as Kibana is typically only accessible on the localhost by default.

Establishing the Foundation: Java and Repository Configuration

Before the Elastic components can be installed, the environment must be prepared. The first step is the installation of the Java runtime, which is the engine that powers Elasticsearch and Logstash.

To install Java 17 on Ubuntu 20.04, the following sequence of commands is executed:

bash sudo apt update sudo apt install openjdk-17-jre-headless -y

Once installed, the version must be verified to ensure the environment is correct:

bash java -version

With the runtime environment ready, the system must be pointed to the official Elastic repositories. This ensures that the software is authentic and that updates can be managed via the apt package manager. The process involves importing the GPG key to verify the integrity of the packages and then adding the repository list.

The GPG key is imported using the following command:

bash wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

Following the key import, the repository is added to the system:

bash echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Finally, the package cache must be updated to recognize the new repository:

bash sudo apt update

Deploying and Configuring Elasticsearch

Elasticsearch is the primary data store. When installing it, it is critical to remember a golden rule of the Elastic Stack: you must use the same version across the entire stack (e.g., if you use Elasticsearch 7.7.1, you must use Kibana 7.7.1, Logstash 7.7.1, and Filebeat 7.7.1). Mixing versions can lead to API incompatibilities and data corruption.

The installation is triggered with:

bash sudo apt install elasticsearch -y

Once installed, the configuration is handled via the elasticsearch.yml file. This file defines how the node behaves within a cluster and how it handles data and logs.

The configuration file is edited using a text editor:

bash sudo nano /etc/elasticsearch/elasticsearch.yml

Within this file, several key parameters must be defined:

  • cluster.name: This identifies the cluster. For a basic setup, elk-cluster is common.
  • node.name: This identifies the specific server, such as node-1.
  • path.data: This specifies where the actual indexed data is stored, typically /var/lib/elasticsearch.
  • path.logs: This defines the location for the engine's own internal logs, usually /var/log/elasticsearch.
  • network.host: To restrict access for security, this is often set to localhost.
  • http.port: The default port for REST API communication is 9200.

Implementing Kibana and the Nginx Reverse Proxy

Kibana provides the visual interface for the data stored in Elasticsearch. While the installation is straightforward via apt, the accessibility of the dashboard presents a technical challenge. By default, Kibana binds to the localhost, meaning it is not accessible from an external browser.

To solve this, Nginx is deployed as a reverse proxy. Nginx intercepts requests from the web browser on port 80 or 443 and forwards them to the Kibana service running internally. This adds a layer of security and allows for the implementation of SSL/TLS certificates for encrypted traffic.

The general workflow for Kibana setup involves:

  1. Installing the Kibana package.
  2. Configuring the kibana.yml file to link to the Elasticsearch instance.
  3. Configuring Nginx to proxy requests to the Kibana port.
  4. Setting up a specific role for the Kibana user to ensure proper access control.
  5. Creating an index pattern (such as the Filebeat index pattern) so that Kibana knows how to interpret the incoming log data.

Log Shipping with Filebeat and Logstash

While Elasticsearch stores the data and Kibana visualizes it, the data must actually be moved from the source to the store. This is the role of Filebeat and Logstash.

Filebeat is a lightweight shipper. It is designed to have a minimal footprint on the source server so it does not compete for resources with the actual application. Filebeat monitors specific log files and ships them forward.

Logstash, on the other hand, is a heavy-duty processor. It can take a raw log string and break it down into structured fields using Grok patterns. For example, a raw syslog entry can be transformed into a JSON object with separate fields for timestamp, severity, hostname, and message.

The architecture follows this flow:
Log Sources -> Filebeat -> Logstash -> Elasticsearch -> Kibana -> User.

Advanced Data Management: Backups and Snapshots

In a production environment, data persistence is critical. Elasticsearch provides a snapshot and restore mechanism to ensure that logs are not lost in the event of a catastrophic hardware failure. This involves registering a backup repository, typically on a separate mount point or cloud storage.

To register a filesystem-based backup repository, a PUT request is sent to the Elasticsearch API:

bash curl -X PUT "https://localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mnt/backups/elasticsearch" } }'

Once the repository is established, a snapshot can be triggered manually to create a point-in-time recovery image of the indices:

bash curl -X PUT "https://localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"

Troubleshooting and Validating the Installation

A common failure point in an ELK installation is the lack of data appearing in the Kibana dashboard. When querying the Elasticsearch API, if the output shows "0 total hits," it indicates that the system is not loading logs under the searched index.

This typically points to one of three issues:
- The Filebeat service is not running or does not have permission to read the log files.
- The Logstash pipeline is dropping packets due to a configuration error in the filter section.
- The index pattern in Kibana does not match the index name being created by Elasticsearch.

To verify the connection, an administrator can check the hostname and ID of the incoming logs. A successful log entry will contain metadata such as:

  • hostname: The name of the server (e.g., june-ubuntu-20-04-elasticstack).
  • id: A unique identifier for the log event (e.g., fbd5956f-12ab-4227-9782-f8f1a19b7f32).

Conclusion

The deployment of the Elastic Stack on Ubuntu 20.04 transforms raw, chaotic log files into a structured, actionable intelligence asset. By integrating Elasticsearch for storage, Logstash for processing, Kibana for visualization, and Beats for transport, an organization achieves total observability of its infrastructure. The transition from manual log searching to centralized, indexed search reduces the Mean Time to Resolution (MTTR) for critical system failures. However, the success of this deployment relies heavily on the strict adherence to version parity across the stack and the proper allocation of JVM memory. When configured correctly—supported by a reverse proxy like Nginx for secure access and a robust snapshot strategy for data integrity—the Elastic Stack provides a professional-grade telemetry pipeline capable of scaling from a single server to a massive distributed cluster.

Sources

  1. DigitalOcean Community
  2. HowToForge
  3. OneUptime

Related Posts