Deploying the Elastic Stack on Ubuntu 20.04: Architecture, Configuration, and Security

The Elastic Stack, historically referred to as the ELK stack (Elasticsearch, Logstash, Kibana), represents a robust, open-source ecosystem designed for the aggregation, storage, analysis, and visualization of massive volumes of log data. For system administrators and DevOps engineers managing Ubuntu 20.04 servers, deploying this stack requires precise configuration of underlying dependencies, repository integration, and security protocols. This deployment involves more than simple package installation; it demands a structured approach to network binding, JVM memory allocation, and role-based access control to ensure data integrity and system stability.

Architecture and Component Roles

The Elastic Stack operates on a distributed architecture where data flows from source systems through collection agents into a centralized search engine, finally visualized through a web interface. Understanding the specific role of each component is critical for proper resource allocation and troubleshooting.

Elasticsearch: The core database and search engine. It stores text-based collected data and provides a distributed RESTful API for searching. It acts as the central repository for all ingested logs.
Logstash: A dynamic data processing pipeline. It collects and parses incoming data streams before forwarding them to Elasticsearch for indexing. It supports extensible plugins to handle diverse data formats.
Kibana: The visualization dashboard. It provides a web interface for querying, exploring, and creating visualizations from the analyzed log data stored in Elasticsearch.
Beats: A suite of lightweight data shippers. These agents reside on edge machines (clients) to aggregate application data and send it to Logstash or directly to Elasticsearch. Common variants include Filebeat for log files and Metricbeat for system metrics.

System Prerequisites and Java Installation

Before initiating the stack installation, the Ubuntu 20.04 server must meet specific hardware and software prerequisites. The system requires at least 4GB of RAM (8GB is recommended for production environments) and a minimum of two CPU cores. Root or sudo access is mandatory for system-level configurations.

The foundational requirement for running Elasticsearch is a compatible Java Runtime Environment (JRE). While earlier guides suggested OpenJDK 11, current best practices and reference materials indicate support for Java 17. The installation process involves updating the package index and installing the headless JRE to minimize overhead.

bash sudo apt update sudo apt install openjdk-17-jre-headless -y java -version

Repository Configuration and Package Installation

Unlike standard Ubuntu repositories, the Elastic Stack packages reside in a dedicated Elastic repository. Integrating this source requires importing the official GPG key to ensure package integrity, followed by adding the repository URL to the apt sources list. This step is critical for both version 7.x and 8.x deployments, though the repository URL structure differs slightly between major versions.

For Elasticsearch 8.x, the configuration involves creating a keyring and referencing it in the source list:

bash wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list sudo apt update sudo apt install elasticsearch -y

For older versions like 7.x, the process uses apt-key and a simpler repository string:

bash sudo apt -y install gnupg wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - sudo apt -y install apt-transport-https echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list sudo apt update sudo apt -y install elasticsearch

Specific version pinning is supported by appending the version number to the install command, such as sudo apt -y install elasticsearch=7.10.2.

Elasticsearch Configuration and Resource Management

Proper configuration of Elasticsearch is vital for cluster stability. The primary configuration file is located at /etc/elasticsearch/elasticsearch.yml. Key parameters must be explicitly defined to ensure the service binds correctly to the network and identifies itself within the cluster.

yaml cluster.name: elk-cluster node.name: ELK20 path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 172.16.0.3 http.port: 9200 cluster.initial_master_nodes: ["ELK20"] xpack.security.enabled: true

The network.host parameter should be set to the server's internal IP address to restrict external access to the search engine. The node.name should match the hostname defined in the /etc/hosts file. Enabling xpack.security activates the built-in security features, including authentication and authorization.

Memory management is another critical aspect. By default, Elasticsearch allocates 2GB of RAM for the Java Virtual Machine (JVM). On servers with constrained resources, this allocation can be reduced by editing the /etc/elasticsearch/jvm.options file:

bash sudo nano /etc/elasticsearch/jvm.options

Locate the -Xms and -Xmx lines and adjust them to lower values, such as -Xms1g and -Xmx1g. After saving the file, the service must be reloaded and started:

bash systemctl daemon-reload systemctl start elasticsearch systemctl enable elasticsearch

Security Setup and Password Generation

With security enabled, the next step is to initialize passwords for built-in users. This is achieved using the elasticsearch-setup-passwords utility. Navigate to the Elasticsearch installation directory and execute the auto-generation command:

bash cd /usr/share/elasticsearch/ bin/elasticsearch-setup-passwords auto -u "http://172.16.0.3:9200"

Upon confirmation, the utility generates random passwords for reserved users including elastic, apm_system, kibana, logstash_system, beats_system, and remote_monitoring_user. These credentials must be securely stored, as they are required for subsequent components like Kibana and Logstash to authenticate against the Elasticsearch cluster.

Kibana Deployment and Nginx Reverse Proxy

Kibana serves as the visualization interface. While Kibana can run directly, it is best practice to place it behind Nginx as a reverse proxy. This setup offloads SSL termination and provides an additional layer of security and performance optimization.

bash sudo apt install nginx -y

The Kibana configuration file (/etc/kibana/kibana.yml) must be configured to point to the Elasticsearch instance:

yaml server.port: 5601 server.host: "172.16.0.3" elasticsearch.hosts: ["http://172.16.0.3:9200"] elasticsearch.username: "kibana_system" elasticsearch.password: "generated_password_here"

After configuration, the Kibana service is enabled and started:

bash systemctl daemon-reload systemctl start kibana systemctl enable kibana

Logstash and Beats Integration

Logstash acts as the data pipeline. It ingests data from Beats agents (like Filebeat) or directly from syslog sources, processes the data through filters, and outputs it to Elasticsearch. The architecture follows a clear flow: Log Sources → Beats/Logstash → Elasticsearch → Kibana → Users.

Filebeat is configured to ship logs from client machines. A typical Filebeat configuration includes:

yaml filebeat.inputs: - type: log paths: - /var/log/auth.log output.elasticsearch: hosts: ["http://172.16.0.3:9200"] username: "beats_system" password: "generated_password_here"

Index Patterns and Data Visualization

Once data begins flowing from Filebeat to Elasticsearch, Kibana must be configured to recognize and visualize this data. This is done by creating an Index Pattern. In the Kibana interface:

Navigate to Management > Index Patterns.
Click Create index pattern.
Enter filebeat-* in the index pattern field.
Select @timestamp as the time filter field.
Confirm creation.

To query specific data, use Kibana Query Language (KQL). For example, to filter SSH authentication logs from a specific host:

kql host.name : client01 and log.file.path: "/var/log/auth.log"

This query retrieves all information regarding SSH authentication events originating from the machine identified as client01, demonstrating the end-to-end functionality of the stack.

Conclusion

Deploying the Elastic Stack on Ubuntu 20.04 is a multi-faceted process that integrates system-level configuration, security enforcement, and data pipeline architecture. Success depends on precise repository management, accurate JVM resource allocation, and strict adherence to security protocols during password generation. By configuring Nginx as a reverse proxy and establishing correct index patterns, administrators create a robust observability platform capable of handling high-volume log data. This setup not only facilitates real-time monitoring but also provides the foundational infrastructure for advanced analytics and incident response in modern IT environments.