Mastering the Elastic Stack Deployment on Ubuntu 22.04: A Comprehensive Engineering Guide

The Elastic Stack, historically and widely known as the ELK Stack, represents a sophisticated ecosystem of open-source software engineered by Elastic. This suite is meticulously designed to facilitate the process of centralized logging, a critical architectural pattern in modern infrastructure management. Centralized logging involves the systematic collection, searching, analysis, and visualization of logs generated from any source and in any format. In a distributed environment, this capability is indispensable; it allows system administrators and DevOps engineers to identify systemic failures or application bottlenecks by searching through a unified repository of logs rather than manually accessing individual servers via SSH. Furthermore, the stack enables complex event correlation, allowing an engineer to track a single request as it traverses multiple microservices by correlating logs across different servers within a specific time window.

The architecture of the Elastic Stack is composed of four primary pillars: Elasticsearch, Logstash, Kibana, and the Beats family (specifically Filebeat). Together, these components create a seamless data pipeline. The workflow begins with data ingestion via Beats or Logstash, moves to the storage and indexing layer in Elasticsearch, and culminates in the presentation layer via Kibana. To ensure operational stability, a fundamental requirement of the Elastic Stack is version parity. All components—Elasticsearch, Kibana, Logstash, and Beats—must run the exact same version number. For instance, if a deployment utilizes Elasticsearch 9.3.3, then Kibana, Logstash, and Filebeat must also be version 9.3.3 to prevent API mismatches and critical integration failures.

Architectural Overview and Data Flow

The movement of data within the Elastic Stack follows a linear progression from the source of the log to the end-user's eyes. Understanding this flow is essential for troubleshooting connectivity issues between the components.

The conceptual pipeline is as follows:

Log Sources: These are the origin points of the data, such as system logs, application logs, or kernel messages.
Beats/Logstash: These act as the ingestion layer. Filebeat forwards logs, while Logstash processes and transforms them.
Elasticsearch: This is the heart of the stack, serving as the analytics engine that stores and indexes the data.
Kibana: This is the visualization interface that queries Elasticsearch to display data in dashboards.
Users: The final layer where engineers interact with the visualized data to perform root cause analysis.

In this specific deployment model, all components are installed on a single Ubuntu 22.04 server, creating what is known as an Elastic Stack server. While this is ideal for development or small-scale environments, the components remain decoupled in their functionality, allowing for future scaling into a multi-node cluster.

Comprehensive Hardware and Software Prerequisites

Before initiating the installation process, the underlying hardware must meet specific thresholds to prevent the Java Virtual Machine (JVM) from crashing due to Out-of-Memory (OOM) errors, which are common in Elasticsearch deployments.

The general system requirements for the server are as follows:

Operating System: Ubuntu 20.04 or Ubuntu 22.04 LTS.
CPU: A minimum of 2 CPU cores is required to handle the indexing and searching overhead.
RAM: A minimum of 4GB of RAM is required, although 8GB is strongly recommended for stability.
User Access: Root or sudo privileges are mandatory for package installation and systemd service management.
Runtime Environment: Java 11 or 17 is required, as Elasticsearch is a Java-based application.

For a more granular understanding of resource allocation, the following table outlines the requirements per component:

Component	RAM	CPU	Disk
Elasticsearch	2GB+	2 cores	50GB+
Logstash	1GB+	1 core	10GB
Kibana	1GB+	1 core	1GB

The disk requirement for Elasticsearch is significantly higher because it must store the indices and the underlying Lucene segments. Failure to provide sufficient disk space can lead to the cluster entering a "read-only" mode once the disk watermark is reached.

Phase 1: Java Runtime Environment Installation

Elasticsearch requires a compatible Java environment to execute. The recommended version for modern deployments is OpenJDK 17.

To install the Java runtime, execute the following commands:

sudo apt update

sudo apt install openjdk-17-jre-headless -y

The headless version of the JRE is chosen because the Elastic Stack server does not require a graphical user interface (GUI), thereby reducing the system footprint and attack surface. After installation, the version must be verified to ensure the binary is correctly mapped in the system path:

java -version

Phase 2: Elastic Repository Configuration

To ensure the installation of official, signed packages from Elastic, the system must be configured to trust the Elastic GPG key and point to the official artifact repository.

First, import the GPG key to ensure the integrity of the downloaded packages:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

Next, add the repository to the APT sources list. For version 8.x, the command is:

echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Finally, update the package cache to recognize the new repository:

sudo apt update

Phase 3: Elasticsearch Installation and Advanced Configuration

Elasticsearch is the distributed, RESTful search and analytics engine that powers the stack. It is responsible for indexing data in near real-time.

The installation is performed via the following command:

sudo apt install elasticsearch -y

Once installed, the configuration must be modified to define the cluster's behavior and security posture. The primary configuration file is located at /etc/elasticsearch/elasticsearch.yml.

sudo nano /etc/elasticsearch/elasticsearch.yml

The following parameters must be configured:

cluster.name: Set to elk-cluster to identify the cluster.
node.name: Set to node-1 for the individual node identifier.
path.data: Set to /var/lib/elasticsearch for the storage of indices.
path.logs: Set to /var/log/elasticsearch for the system logs.
network.host: Set to localhost for security in a single-node setup.
http.port: Set to 9200, which is the default REST API port.
discovery.type: Set to single-node to prevent the node from attempting to find other nodes in the cluster.

Security settings are paramount. In modern versions, X-Pack security is enabled by default. The following settings ensure that SSL/TLS is active:

xpack.security.enabled: true
xpack.security.enrollment.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/transport.p12
xpack.security.transport.ssl.truststore.path: certs/transport.p12

JVM Heap Optimization

One of the most critical steps in deploying Elasticsearch is configuring the JVM heap size. By default, Elasticsearch may attempt to allocate more memory than the system provides, leading to a crash. The heap size should generally be set to half of the available physical RAM, but no more than 31GB.

To configure the heap, edit the options file:

sudo nano /etc/elasticsearch/jvm.options.d/heap.options

Insert the following lines (for a system with 4GB RAM, allocating 2GB to the heap):

-Xms2g
-Xmx2g

The -Xms flag defines the initial heap size, and -Xmx defines the maximum heap size. Setting these to the same value prevents the JVM from dynamically resizing the heap, which can cause performance jitter.

Service Activation and Security Initialization

After configuration, the service must be enabled to start at boot and then manually started:

sudo systemctl daemon-reload

sudo systemctl enable elasticsearch

sudo systemctl start elasticsearch

Verify the status of the service:

sudo systemctl status elasticsearch

Because security is enabled, the elastic superuser requires a password. To reset or generate the password, run:

sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

The generated password must be saved securely as it is required for Kibana and Logstash authentication.

To verify that Elasticsearch is responding correctly over HTTPS, use the following command:

curl -k -u elastic:YOUR_PASSWORD https://localhost:9200

The -k flag is used to ignore SSL certificate validation if using self-signed certificates.

Phase 4: Kibana Installation and Visualization Layer

Kibana serves as the window into the data stored in Elasticsearch. It allows users to create dashboards and visualize logs.

Install Kibana using the package manager:

sudo apt install kibana -y

The configuration file /etc/kibana/kibana.yml must be edited to allow network access:

sudo nano /etc/kibana/kibana.yml

The following settings are required:

server.port: 5601 (Default Kibana port).
server.host: "0.0.0.0" (Allows Kibana to listen on all network interfaces).
server.name: "kibana-server".

By default, Kibana only listens on the localhost. To make it accessible via a web browser on a remote machine, it is highly recommended to use Nginx as a reverse proxy. Nginx manages the external request and forwards it to port 5601. Additionally, for production environments, a TLS/SSL certificate must be installed to encrypt the traffic between the user's browser and the server, as the Kibana interface contains sensitive system and application data.

Phase 5: Logstash and Filebeat Integration

While Elasticsearch stores data and Kibana visualizes it, Logstash and Filebeat are responsible for the "L" and "B" of the ELK stack—Log processing and Beat shipping.

Logstash acts as a data processing pipeline. It can ingest data from multiple sources, transform it (such as parsing a raw string into structured JSON), and send it to Elasticsearch. Filebeat, on the other hand, is a lightweight shipper that resides on the edge of the network, reading log files and forwarding them to Logstash or directly to Elasticsearch.

In the deployment flow, Filebeat often acts as the first point of contact:

Filebeat reads /var/log/syslog.
Filebeat forwards the data to Logstash.
Logstash filters the data (e.g., using Grok patterns).
Logstash pushes the structured data into Elasticsearch.

This separation of concerns ensures that the heavy processing (filtering and transformation) happens in Logstash, while the lightweight shipping happens in Filebeat, minimizing the CPU impact on the source servers.

Deployment Alternatives: Docker and Orchestration

For users who prefer containerization over bare-metal installations, Elastic provides official Docker images via the Elastic Docker Registry. This approach is often preferred for scalability and ease of updates.

Using Docker Compose allows for the simultaneous deployment of multiple nodes. This is particularly useful for creating a highly available (HA) cluster where multiple Elasticsearch nodes are mirrored across different hosts. However, the rule of version parity remains absolute: if the Docker image for Elasticsearch is version 9.3.3, the images for Kibana, Logstash, and Beats must also be 9.3.3.

Summary of Operational Ports

To ensure the stack functions correctly, the following ports must be open in the system firewall (e.g., UFW):

Port	Component	Purpose
9200	Elasticsearch	REST API / HTTP Communication
9300	Elasticsearch	Node-to-Node Transport
5601	Kibana	User Interface / Web Browser Access
5044	Logstash	Beats Input Port

Conclusion: Strategic Analysis of the ELK Deployment

The deployment of the Elastic Stack on Ubuntu 22.04 is a rigorous exercise in resource management and security configuration. The critical path to a successful installation lies in the synergy between the JVM heap settings and the X-Pack security layers. A failure to properly allocate RAM (specifically the -Xms and -Xmx settings) almost inevitably leads to the failure of the Elasticsearch service during high-indexing periods.

From an architectural perspective, the transition from the traditional "ELK" terminology to the "Elastic Stack" reflects the integration of Beats, which solved the historical problem of Logstash's high resource consumption on client machines. By offloading the initial data collection to Filebeat, the system achieves a more efficient distribution of labor.

For those deploying in production, the use of Nginx as a reverse proxy for Kibana is not merely an option but a necessity for security and scalability. Implementing TLS/SSL certificates ensures that the administrative data flowing through the stack is not intercepted. Ultimately, the strength of the Elastic Stack is its ability to transform chaotic, unstructured log data into actionable intelligence, provided that the underlying infrastructure is configured with the precision detailed in this guide.