Centralized Observability Architecture for OPNsense via Grafana and Loki Integration

The establishment of a robust monitoring and logging infrastructure for OPNsense firewall deployments represents a critical frontier in network security and operational intelligence. In modern enterprise and high-availability home lab environments, the ability to move beyond the localized, ephemeral view of firewall logs toward a persistent, searchable, and visually aggregated telemetry stack is paramount. This architecture transitions the OPNsense device from a standalone security appliance into a telemetry producer within a wider observability ecosystem. By leveraging the Grafana LGTM (Loki, Graf/Grafana, Tempo, Mimir) stack principles—specifically focusing on Grafana for visualization and Loki for log aggregation—administrators can achieve unprecedented visibility into traffic patterns, security events, and hardware health. This process involves complex configurations of Prometheus exporters for metric collection and Promtail agents for log ingestion, requiring a deep understanding of both the OPNsense internal logging mechanisms and the Prometheus/Loki pipeline. The objective is to create a unified pane of glass where firewall rules, interface statistics, and system resource utilization are correlated in real-time, allowing for rapid incident response and proactive capacity planning.

Telemetry Ingestion via Prometheus and Node Exporter

The foundation of metric-based monitoring for OPNsense lies in the implementation of a pull-based collection mechanism. This is achieved through the deployment of the Prometheus Node Exporter plugin directly on the OPNsense appliance. This plugin serves as the primary exporter of hardware and system-level metrics, exposing them via a web endpoint that the Prometheus server can scrape periodically.

The deployment process follows a structured three-step methodology to ensure the integrity of the data stream:

  1. Installation of the Prometheus Node Exporter Plugin on the OPNsense hardware.
  2. Modification of the Prometheus scrape configuration to include the new target.
  3. Implementation of the dashboard within the Grafana interface for visualization.

For the Prometheus server to successfully ingest these metrics, the configuration must be precisely defined. A critical requirement for multi-firewall environments is the naming convention of the scrape jobs. Each job name must begin with the prefix opnsense- followed by a unique identifier. This allows for the simultaneous monitoring of multiple firewall instances on a single dashboard without data collision.

The configuration fragment for the Prometheus prometheus.yml file must be structured as follows:

yaml scrape_configs: - job_name: opnsense-<OPNSense_NAME> static_configs: - targets: ['<OPNSense_IP_or_FQDN>:9100']

In this configuration, <OPNSense_NAME> must be substituted with the specific name of the firewall (e.g., opnsense-edge-gateway). The <OPNSense_IP_or_FQDN> must be replaced with the actual IP address or Fully Qualified Domain Name of the device. The port 9100 is the standard port used by the Node Exporter for metric exposure.

The impact of this configuration extends beyond simple visibility; it enables the creation of dynamic dashboards that can iterate through different firewall targets using Grafana variables. This architecture supports complex metric tracking, including:

  • CPU load totals and per-core utilization percentages.
  • Memory and RAM utilization time-series graphs.
  • Disk utilization and storage health.
  • Load average and system uptime tracking.
  • CPU and ACPI temperature sensor readings for hardware thermal monitoring.
  • Gateway response times, specifically monitoring the dpinger process.

Advanced Log Aggregation with Grafana Loki and Promtail

While Prometheus handles the quantitative metrics, the qualitative security data resides in the firewall logs. Traditional log viewing in OPNsense is limited to the "Live View" or the "Plain View" within the local web interface. To achieve long-term retention and advanced querying, logs must be forwarded to a centralized Grafana Loki instance.

The architecture for this log pipeline utilizes a multi-er component stack, typically hosted on a dedicated Ubuntu server. The components function in a linear sequence:

  • OPNsense Firewall: Acts as the log generator and remote logging client.
  • Promtail: Acts as the intermediary agent, receiving logs from OPNsense and performing pre-processing.
  • Loki: Acts as the high-efficiency, long-term storage engine that indexes metadata.
  • Grafana: Acts as the visualization layer for querying the Loki data source.

Loki differs fundamentally from traditional logging systems like Elasticsearch because it does not index the entire content of the log lines. Instead, it indexes only the metadata (labels). This design philosophy significantly reduces the computational overhead and storage costs associated with high-volume logging, though it necessitates a more structured approach to label management.

To initiate the log forwarding, the OPNsense configuration must be adjusted under System -> Settings -> Logging. A new "Remote Target" must be created, specifying the IP address and port of the Promtail agent running on the Loki server.

The heavy lifting of data structuring occurs within the Promtail configuration through the use of pipeline_stages. Because Loki relies on labels for efficient searching, Promtail must be configured to parse the raw, unstructured log lines and extract specific fields into labels. A typical OPNsense firewall log line follows a specific comma-separated format:

1142,,,03b6331b884ca335cbc0e2f022fe07a2,vlan0.100,match,pass,in,4,0x0,,64,9281,0,DF,6,tcp,60,192.168.100.15,192.168.1.251,38336,9100,0,S,2879130891,,64240,,mss;sackOK;TS;nop;wscale

To make this data actionable in Grafana, a regex pattern must be applied within the Promtail pipeline to map these values to identifiable labels. The following pattern can be utilized to extract the essential metadata:

text 1<rule>,,,<rid>,<interface>,<reason>,<action>,<ipversion>,<tos>,,<ttl>,<id>,<offset>,<ipflags>,<protonum>,<proto>,<length>,<src>,<dst>,<srcport>,<dstport>,<datalen>,<tcpflags>,<sequence>,,

By applying this pattern, fields such as <interface>, <action>, <proto>, <src>, and <dst> are converted from raw text into searchable labels. This enables an administrator to execute high-speed queries in the Grafana "Explore" view, such as finding all "pass" actions on the "vlan0.100" interface or filtering for all traffic originating from a specific source IP.

Dashboard Configuration and Metric Visualization

The final stage of the observability pipeline is the deployment of highly specialized Grafana dashboards. These dashboards serve as the visual interface for the entire telemetry stack, aggregating data from both Prometheus (metrics) and Loki (logs).

Effective OPNsense dashboards require specific configuration steps to handle the dynamic nature of network interfaces and firewall targets:

  • Import the dashboard.json file into the Grafana instance.
  • Configure the Prometheus data source to point to the local Prometheus server.
  • Configure the Loki data source to point to the Loki instance.
  • Configure Dashboard Variables to allow for switching between different interfaces (e.g., WAN1, WAN2, LAN1, LAN2).
  • Perform cleanup of unused panels that do not correspond to the specific hardware configuration of the user's deployment.

Advanced monitoring dashboards can be expanded to include even more granular data by utilizing the os-node-exporter plugin and the opnsense-exporter repository. This allows for the inclusion of:

  • WAN and LAN statistics, including traffic throughput and volume.
  • Firewall-specific statistics, such as the number of blocked ports, protocol distribution, and event counts.
  • Geographical identification of blocked IP locations.
  • Top Blocked IP lists for identifying potential scanning or brute-force attempts.
  • Suricata IDS/IPS statistics, if the Suricata service is enabled on the OPNsense instance.

The following table summarizes the core components required for a complete OPNsense monitoring stack:

Component Role Primary Function
OPNsense Data Source Generates firewall logs and system metrics via Node Exporter.
Prometheus Metric Collector Scrapes and stores time-series data from the Node Exporter.
Promtail Log Processor Receives logs from OPNsense, parses via regex, and attaches labels.
Loki Log Engine Stores and indexes the metadata of the processed log lines.
Grafana Visualization Provides the dashboard interface, Explore view, and alerting.

Advanced Implementation Considerations

For engineers managing complex environments, the transition from standard monitoring to advanced observability involves more than just basic setup. It requires the conversion of legacy queries and the integration of secondary security services.

For instance, when migrating from older monitoring stacks, it may be necessary to convert InfluxQL queries to Flux for compatibility with InfluxDB 2.x environments, or to adapt pfSense-specific dashboard functions to work within the OPNsense context. Furthermore, supporting RFC5424 syslog formats can enhance the compatibility of logs across different collectors.

The architecture can also be augmented with Suricata integration. By directing Suricata alert logs through the same Promtail pipeline used for firewall logs, administrators can correlate network-layer security events (IDS alerts) with network-layer traffic statistics (Netflow/Firewall logs) within a single Grafana panel.

The following checklist ensures a successful deployment of the monitoring architecture:

  • Verify that the Prometheus scrape job name starts with opnsense-.
  • Ensure the Promtail pipeline_stage regex accurately matches the OPNsense log format.
  • Confirm that the OPNsense Remote Logging target uses the correct port for the Promtail agent.
  • Validate that the Grafana dashboard variables are correctly mapped to the Prometheus targets.
  • Check that the os-node-exporter is active on the OPNsense appliance.

Analysis of Observability Scalability

The architecture described herein represents a significant departure from reactive troubleshooting toward proactive network management. The use of Loki's metadata-centric indexing strategy provides a scalable foundation that can accommodate the massive log volumes generated by high-throughput firewalls without the linear increase in storage costs seen in full-text indexing engines. However, this scalability comes with the trade-off of increased complexity in the Promtail configuration. The administrator's ability to design effective regex patterns is the limiting factor in the depth of searchable intelligence.

Furthermore, the decoupling of the metric collection (Prometheus) from the log aggregation (Loki) allows for independent scaling of these two pipelines. In an environment where network traffic spikes, the Loki storage can be scaled on its own without needing to reconfigure the Prometheus scraper. This modularity is essential for modern DevOps-oriented network operations. The ultimate success of this implementation is measured not just by the ability to see the logs, but by the ability to correlate a spike in CPU load (from Prometheus) with a specific surge in blocked connection attempts (from Loki), thereby providing a holistic view of the network's security posture and operational health.

Sources

  1. Rudi Martinsen - Grafana Loki OPNsense
  2. Grafana Dashboard - OPNsense Basic Metrics
  3. Grafana Dashboard - OPNsense Metrics
  4. Grafana Dashboard - OPNsense Monitoring
  5. GitHub - OPNsense Dashboard Repository

Related Posts