Engineering the Elastic Stack: A Comprehensive Guide to Deploying ELK on Ubuntu 22.04

The process of centralized logging is a fundamental requirement for modern infrastructure management, moving away from the inefficient practice of manually inspecting individual log files across disparate servers. The Elastic Stack—historically and commonly referred to as the ELK Stack—represents a sophisticated ecosystem of open-source software designed to facilitate the search, analysis, and visualization of logs generated from any source and in any format. By aggregating logs into a single, searchable repository, administrators can identify systemic failures, correlate events across multiple servers during specific time frames, and gain deep insights into application performance.

At its core, the Elastic Stack is composed of four primary pillars: Elasticsearch, Logstash, Kibana, and the Beats family (specifically Filebeat). This architecture transforms raw, unstructured data into actionable intelligence. Elasticsearch serves as the highly scalable analytics engine that stores and indexes the data. Logstash acts as the data processing pipeline, capable of ingesting data from multiple sources, transforming it, and sending it to the storage layer. Kibana provides the visualization layer, allowing users to create dashboards and query the data through a web interface. Finally, Beats, such as Filebeat, act as lightweight shippers that forward logs from the edge of the network to the central stack.

Deploying this stack on Ubuntu 22.04 requires a precise alignment of software versions. A critical operational requirement for the Elastic Stack is version parity; every component—Elasticsearch, Kibana, Logstash, and Filebeat—must be installed using the exact same version number to ensure compatibility and prevent catastrophic failures in data communication. Whether deploying legacy versions like 7.7.1 or the modern 8.x branch, consistency across the stack is non-negotiable.

Technical Prerequisites and Hardware Specifications

Before initiating the installation process, the underlying hardware must be provisioned to handle the resource-intensive nature of the Elastic Stack. While the system can run on minimum specifications, production environments typically require higher overhead to prevent Java Virtual Machine (JVM) crashes and disk I/O bottlenecks.

The following table outlines the minimum and recommended system requirements for a functional deployment.

Component	Minimum RAM	Recommended RAM	CPU Cores	Disk Space
Elasticsearch	2GB+	4GB - 8GB	2 Cores	50GB+
Logstash	1GB+	2GB	1 Core	10GB
Kibana	1GB+	2GB	1 Core	1GB
Total Server	4GB	8GB	2+ Cores	62GB+

From a technical perspective, the 4GB RAM minimum is a hard floor. Elasticsearch is built on Java and relies heavily on the JVM heap. If the server lacks sufficient memory, the Linux Out-Of-Memory (OOM) killer may terminate the Elasticsearch process to protect the kernel, leading to unexpected downtime. The 2-core CPU requirement ensures that the indexing process does not starve the operating system of resources.

For users deploying on Ubuntu 22.04, it is mandatory to operate with a non-root sudo user. This is a security best practice to prevent the application of administrative privileges to the service accounts themselves, which would create a massive security vulnerability if the software were compromised.

Establishing the Java Runtime Environment

Since the Elastic Stack is written in Java, the presence of a compatible Java Development Kit (JDK) or Java Runtime Environment (JRE) is a prerequisite. For modern installations, Java 11 or Java 17 is required.

The installation process begins with updating the local package index to ensure the latest versions of the software are fetched from the Ubuntu repositories.

bash sudo apt update

Once the update is complete, the OpenJDK 17 headless version is installed. The headless version is preferred for servers because it excludes the graphical user interface (GUI) components, thereby reducing the attack surface and saving disk space.

bash sudo apt install openjdk-17-jre-headless -y

To verify that the environment is correctly configured and the Java binary is accessible in the system path, the following command is executed:

bash java -version

The successful execution of this command confirms that the JVM is ready to host the Elastic Stack components.

Repository Configuration and GPG Key Integration

To ensure the installation of official and signed packages, the Elastic repository must be added to the Ubuntu system. This involves importing the GPG (GNU Privacy Guard) key to verify the authenticity of the downloaded packages.

First, the GPG key is downloaded and stored in the keyrings directory:

bash wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

Following the key import, the official Elastic 8.x repository is added to the system's source list. This tells the apt package manager where to find the latest binaries.

bash echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

The package cache must then be updated again to recognize the newly added repository:

bash sudo apt update

Deploying and Configuring Elasticsearch

Elasticsearch is the heart of the stack. It is a distributed, RESTful search and analytics engine. The installation is performed via the package manager:

bash sudo apt install elasticsearch -y

Once installed, the configuration file located at /etc/elasticsearch/elasticsearch.yml must be modified to define the node's behavior and network identity.

bash sudo nano /etc/elasticsearch/elasticsearch

The configuration requires the following parameters to be set for a single-node installation:

cluster.name: elk-cluster
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
http.port: 9200
discovery.type: single-node

The discovery.type: single-node setting is critical; without it, Elasticsearch will attempt to find other nodes to form a cluster, and the service will fail to start if it cannot find a master node.

Security is a paramount concern. The Elastic Stack handles sensitive system logs that could reveal architectural vulnerabilities to an attacker. Consequently, X-Pack security features must be enabled:

xpack.security.enabled: true
xpack.security.enrollment.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/transport.p12
xpack.security.transport.ssl.truststore.path: certs/transport.p12

To optimize performance, the JVM heap size must be configured. This prevents the JVM from consuming all available system memory or, conversely, from being too small to handle the indexing load. This is configured in the heap.options file:

bash sudo nano /etc/elasticsearch/jvm.options.d/heap.options

The following settings should be applied, generally allocating half of the available system RAM (up to 31GB):

text -Xms2g -Xmx2g

With the configuration complete, the service is enabled to start on boot and initialized:

bash sudo systemctl daemon-reload sudo systemctl enable elasticsearch sudo systemctl start elasticsearch

The current status of the service can be verified using:

bash sudo systemctl status elasticsearch

Authentication and Verification

Upon the first start, Elasticsearch generates security credentials. For administrative access, the password for the default elastic user must be reset or retrieved.

bash sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

The generated password must be saved securely, as it is required for all subsequent configurations in Logstash, Kibana, and Filebeat.

To verify that the node is responding and that SSL is functioning correctly, a curl request is sent to the local endpoint. Since the service uses a self-signed certificate by default, the -k (insecure) flag is used to bypass certificate validation.

bash curl -k -u elastic:YOUR_PASSWORD https://localhost:9200

If SSL is disabled for testing purposes, the command is:

bash curl -u elastic:YOUR_PASSWORD http://localhost:9200

Deploying the Kibana Visualization Layer

Kibana provides the graphical interface for interacting with the data indexed in Elasticsearch. The installation is straightforward:

bash sudo apt install kibana -y

The configuration file is located at /etc/kibana/kibana.yml. This file defines how Kibana connects to the network and how it communicates with the Elasticsearch backend.

bash sudo nano /etc/kibana/kibana.yml

Key configuration parameters include:

server.port: 5601
server.host: "0.0.0.0"
server.name: "kibana-server"

Setting the server.host to 0.0.0.0 allows Kibana to listen on all network interfaces. However, because Kibana is typically only available on localhost by default, it is highly recommended to use Nginx as a reverse proxy. Nginx can be configured to forward external web traffic to port 5601, allowing for the implementation of TLS/SSL certificates for secure browser access.

Integrating Filebeat for Log Shipping

Filebeat is a lightweight shipper for forwarding and centralizing logs. It is the "edge" component of the stack that resides on the servers producing the logs.

During the configuration of Filebeat, users must modify the /etc/filebeat/filebeat.yml file to point the data stream toward the Elasticsearch and Kibana instances. This involves replacing placeholders with actual network values.

In the configuration file, the following section must be updated:

```yaml
output.elasticsearch:
hosts: [""]
username: "elastic"
password: ""

setup.kibana:
host: ""
```

The <es_url> should be replaced with the IP address or hostname of the Elasticsearch server (e.g., https://192.168.1.10:9200). The <kibana_url> should be the address where Kibana is hosted (e.g., http://192.168.1.10:5601). The password is the one generated during the Elasticsearch password reset process.

Architecture Summary and Data Flow

The operational flow of the Elastic Stack follows a linear path from data generation to human interpretation.

Log Sources: Applications, system kernels, or web servers (like Apache) generate raw logs.
Beats/Logstash: Filebeat collects these logs and sends them to Logstash. Logstash processes the data—filtering, parsing, and enriching it—before passing it to the storage layer.
Elasticsearch: The processed data is indexed, making it searchable in near real-time.
Kibana: The user interacts with the Kibana dashboard, which queries Elasticsearch and displays the data visually.
Users: Administrators and analysts view the dashboards to identify trends or errors.

Troubleshooting Common Installation Issues

A frequent point of confusion for new users is the distinction between the Kibana interface and the Elastic dashboard. When accessing http://localhost:5601/, the user may be greeted by the general Elastic home page. This is not an error; it is the entry point to the management console. To access the actual visualization tools, the user must navigate through the "Add Data" or "Management" sections.

Another common issue occurs when users are unable to connect Filebeat to Elasticsearch. This is usually caused by one of three things:
- Incorrect URL: Using http instead of https when X-Pack security is enabled.
- Firewall restrictions: Port 9200 (Elasticsearch) and 5601 (Kibana) must be open in the Ubuntu firewall (ufw).
- Authentication failure: Using an incorrect password for the elastic user.

Conclusion

The deployment of the Elastic Stack on Ubuntu 22.04 is a sophisticated undertaking that requires strict adherence to versioning and resource management. By utilizing the JVM's heap options to stabilize memory and implementing X-Pack security for data integrity, an organization can build a robust observability platform. The transition from local log inspection to a centralized ELK architecture allows for a drastic reduction in Mean Time to Resolution (MTTR) when dealing with system failures. The integration of Nginx as a reverse proxy further secures the environment, ensuring that the visualization layer is not exposed directly to the public internet. Ultimately, the success of the installation depends on the precise synchronization of the four components and the correct configuration of the network and security layers.