The Elastic Stack, historically referred to as the ELK stack (Elasticsearch, Logstash, Kibana), represents a robust, open-source ecosystem designed for the aggregation, storage, analysis, and visualization of massive volumes of log data. For system administrators and DevOps engineers managing Ubuntu 20.04 servers, deploying this stack requires precise configuration of underlying dependencies, repository integration, and security protocols. This deployment involves more than simple package installation; it demands a structured approach to network binding, JVM memory allocation, and role-based access control to ensure data integrity and system stability.
Architecture and Component Roles
The Elastic Stack operates on a distributed architecture where data flows from source systems through collection agents into a centralized search engine, finally visualized through a web interface. Understanding the specific role of each component is critical for proper resource allocation and troubleshooting.
- Elasticsearch: The core database and search engine. It stores text-based collected data and provides a distributed RESTful API for searching. It acts as the central repository for all ingested logs.
- Logstash: A dynamic data processing pipeline. It collects and parses incoming data streams before forwarding them to Elasticsearch for indexing. It supports extensible plugins to handle diverse data formats.
- Kibana: The visualization dashboard. It provides a web interface for querying, exploring, and creating visualizations from the analyzed log data stored in Elasticsearch.
- Beats: A suite of lightweight data shippers. These agents reside on edge machines (clients) to aggregate application data and send it to Logstash or directly to Elasticsearch. Common variants include Filebeat for log files and Metricbeat for system metrics.
System Prerequisites and Java Installation
Before initiating the stack installation, the Ubuntu 20.04 server must meet specific hardware and software prerequisites. The system requires at least 4GB of RAM (8GB is recommended for production environments) and a minimum of two CPU cores. Root or sudo access is mandatory for system-level configurations.
The foundational requirement for running Elasticsearch is a compatible Java Runtime Environment (JRE). While earlier guides suggested OpenJDK 11, current best practices and reference materials indicate support for Java 17. The installation process involves updating the package index and installing the headless JRE to minimize overhead.
bash
sudo apt update
sudo apt install openjdk-17-jre-headless -y
java -version
Repository Configuration and Package Installation
Unlike standard Ubuntu repositories, the Elastic Stack packages reside in a dedicated Elastic repository. Integrating this source requires importing the official GPG key to ensure package integrity, followed by adding the repository URL to the apt sources list. This step is critical for both version 7.x and 8.x deployments, though the repository URL structure differs slightly between major versions.
For Elasticsearch 8.x, the configuration involves creating a keyring and referencing it in the source list:
bash
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install elasticsearch -y
For older versions like 7.x, the process uses apt-key and a simpler repository string:
bash
sudo apt -y install gnupg
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt -y install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt update
sudo apt -y install elasticsearch
Specific version pinning is supported by appending the version number to the install command, such as sudo apt -y install elasticsearch=7.10.2.
Elasticsearch Configuration and Resource Management
Proper configuration of Elasticsearch is vital for cluster stability. The primary configuration file is located at /etc/elasticsearch/elasticsearch.yml. Key parameters must be explicitly defined to ensure the service binds correctly to the network and identifies itself within the cluster.
yaml
cluster.name: elk-cluster
node.name: ELK20
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 172.16.0.3
http.port: 9200
cluster.initial_master_nodes: ["ELK20"]
xpack.security.enabled: true
The network.host parameter should be set to the server's internal IP address to restrict external access to the search engine. The node.name should match the hostname defined in the /etc/hosts file. Enabling xpack.security activates the built-in security features, including authentication and authorization.
Memory management is another critical aspect. By default, Elasticsearch allocates 2GB of RAM for the Java Virtual Machine (JVM). On servers with constrained resources, this allocation can be reduced by editing the /etc/elasticsearch/jvm.options file:
bash
sudo nano /etc/elasticsearch/jvm.options
Locate the -Xms and -Xmx lines and adjust them to lower values, such as -Xms1g and -Xmx1g. After saving the file, the service must be reloaded and started:
bash
systemctl daemon-reload
systemctl start elasticsearch
systemctl enable elasticsearch
Security Setup and Password Generation
With security enabled, the next step is to initialize passwords for built-in users. This is achieved using the elasticsearch-setup-passwords utility. Navigate to the Elasticsearch installation directory and execute the auto-generation command:
bash
cd /usr/share/elasticsearch/
bin/elasticsearch-setup-passwords auto -u "http://172.16.0.3:9200"
Upon confirmation, the utility generates random passwords for reserved users including elastic, apm_system, kibana, logstash_system, beats_system, and remote_monitoring_user. These credentials must be securely stored, as they are required for subsequent components like Kibana and Logstash to authenticate against the Elasticsearch cluster.
Kibana Deployment and Nginx Reverse Proxy
Kibana serves as the visualization interface. While Kibana can run directly, it is best practice to place it behind Nginx as a reverse proxy. This setup offloads SSL termination and provides an additional layer of security and performance optimization.
bash
sudo apt install nginx -y
The Kibana configuration file (/etc/kibana/kibana.yml) must be configured to point to the Elasticsearch instance:
yaml
server.port: 5601
server.host: "172.16.0.3"
elasticsearch.hosts: ["http://172.16.0.3:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "generated_password_here"
After configuration, the Kibana service is enabled and started:
bash
systemctl daemon-reload
systemctl start kibana
systemctl enable kibana
Logstash and Beats Integration
Logstash acts as the data pipeline. It ingests data from Beats agents (like Filebeat) or directly from syslog sources, processes the data through filters, and outputs it to Elasticsearch. The architecture follows a clear flow: Log Sources → Beats/Logstash → Elasticsearch → Kibana → Users.
Filebeat is configured to ship logs from client machines. A typical Filebeat configuration includes:
yaml
filebeat.inputs:
- type: log
paths:
- /var/log/auth.log
output.elasticsearch:
hosts: ["http://172.16.0.3:9200"]
username: "beats_system"
password: "generated_password_here"
Index Patterns and Data Visualization
Once data begins flowing from Filebeat to Elasticsearch, Kibana must be configured to recognize and visualize this data. This is done by creating an Index Pattern. In the Kibana interface:
- Navigate to Management > Index Patterns.
- Click Create index pattern.
- Enter
filebeat-*in the index pattern field. - Select
@timestampas the time filter field. - Confirm creation.
To query specific data, use Kibana Query Language (KQL). For example, to filter SSH authentication logs from a specific host:
kql
host.name : client01 and log.file.path: "/var/log/auth.log"
This query retrieves all information regarding SSH authentication events originating from the machine identified as client01, demonstrating the end-to-end functionality of the stack.
Conclusion
Deploying the Elastic Stack on Ubuntu 20.04 is a multi-faceted process that integrates system-level configuration, security enforcement, and data pipeline architecture. Success depends on precise repository management, accurate JVM resource allocation, and strict adherence to security protocols during password generation. By configuring Nginx as a reverse proxy and establishing correct index patterns, administrators create a robust observability platform capable of handling high-volume log data. This setup not only facilitates real-time monitoring but also provides the foundational infrastructure for advanced analytics and incident response in modern IT environments.