The deployment of observability stacks within a Kubernetes ecosystem presents a sophisticated networking challenge that extends far beyond simple service exposure. When engineers attempt to route traffic through an NGINX Ingress Controller to visualization tools like Grafana or data aggregators like Prometheus, they frequently encounter a specific class of failure characterized by 302 Found redirection loops, TOO_MANY_REDIRECTS browser errors, and 404 Not Found responses during the authentication handshake. These issues are rarely the result of a failure in the underlying containerized application itself, but rather a fundamental misalignment between the Ingress Controller's path-based routing logic, the NGINX rewrite-target annotations, and the internal root_url configurations of the application's web server. Achieving a seamless, production-grade exposure requires a granular understanding of how NGINX handles URI rewrites, how the Ingress Controller manages TLS termination, and how Grafana’s internal configuration must be synchronized with the externally visible URL to prevent the application from attempting to redirect users to non-existent sub-paths or non-HTTPS endpoints.
The Mechanics of NGINX Ingress Routing and Rewrite Annotations
In a Kubernetes environment, the NGINX Ingress Controller acts as a Layer 7 reverse proxy, intercepting incoming HTTP/HTTPS requests and distributing them to backend services based on defined Ingress resources. A critical component in complex routing scenarios is the use of rewrite annotations, which allow the controller to modify the request URI before it reaches the backend pod.
The nginx.ingress.kubernetes.io/rewrite-target annotation is frequently utilized when exposing services under a specific sub-path, such as /grafana. Without this annotation, the Ingress Controller passes the full path (e.g., /grafana/dashboard) to the backend service. If the backend service is not configured to listen on that specific sub-path, it will return a 404 Not Found error. The directive nginx.ingress.kubernetes.io/rewrite-target: /$1 instructs NGINX to capture a regex group and rewrite the URI to the root or a specific relative path.
The interplay between these annotations and the backend's expected path is where most configuration failures occur. For instance, using nginx.ingress.kubernetes.io/use-regex: "true" is essential when the path definition relies on regular expression captures. If the regex is misconfigured, the backend may receive a path that does not match its internal routing table, leading to the intermittent 504 Gateway Time-out or 404 errors observed in complex cluster topologies.
Furthermore, advanced NGINX configurations often require server-snippets to manage WebSocket connections, which are vital for Grafana's "Live" feature. The following configuration snippet demonstrates how to implement a map for connection upgrades within the NGINX configuration:
nginx
nginx.ingress.kubernetes.io/server-snippets: |
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
This snippet ensures that the Upgrade and Connection headers are correctly handled, preventing the loss of streaming data during real-time dashboard updates.
Analyzing the 302 Redirect Loop and Authentication Failures
A common and frustrating phenomenon in NGINX Ingress deployments is the appearance of 302 status codes in the access logs, often followed by the browser displaying a TOO_MANY_REDIRECTS error. This occurs when there is a logical contradiction between the Ingress Controller's SSL enforcement and the application's internal URL generation.
In many scenarios, an Ingress resource is configured with nginx.ingress.kubernetes.io/force-ssl-redirect: "true". While this is a security best practice, it can trigger a loop if the backend service (Grafana) is unaware that the external connection is secured via HTTPS. If the Ingress terminates SSL and communicates with the backend via plain HTTP, and the backend is configured to redirect all HTTP traffic to HTTPS based on its internal root_URL, the following sequence occurs:
- The client requests
https://grafana.domain.com. - The Ingress Controller receives the request and passes it to the Grafana service via HTTP.
- Grafana sees an incoming HTTP request and, following its internal
root_urllogic, issues a302redirect to the HTTPS version of the same URL. - The client follows the redirect, and the cycle repeats indefinitely.
The presence of the following logs is a definitive indicator of this redirection failure:
text
logger=context t=2022-04-05T03:40:56.28+0000 lvl=info msg="Request Completed" method=GET path=/ status=302 remote_addr=192.168.65.3 time_ms=0 size=29 referer=
logger=context t=2022-04-05T03:40:56.29+0000 lvl=info msg="Request Completed" method=GET path=/ status=302 remote_addr=192.168.65.3 time_ms=0 size=29 referer=
To resolve this, the grafana.ini ConfigMap must be perfectly aligned with the Ingress host and protocol.
Synchronizing Grafana Configuration with Ingress Resources
To prevent the aforementioned redirection loops and 404 errors, the Grafana ConfigMap must explicitly define the domain and root_url to match the external entry point. This ensures that when Grafana generates links for login pages or API endpoints, it uses the fully qualified domain name (FQDN) and the correct protocol (HTTPS).
The following table outlines the critical parameters within the grafana.ini configuration:
| Parameter | Description | Real-world Consequence of Misconfiguration |
|---|---|---|
domain |
The FQDN used to access the instance. | Incorrect host headers causing authentication failures. |
root_url |
The base URL for all links generated by Grafana. | Broken links to /login or /api when using sub-paths. |
serve_from_sub_path |
Indicates if the app is served from a sub-path. | If set to false while using /grafana in Ingress, 404s occur. |
A properly configured ConfigMap for a Grafana deployment in a namespace named grafana should look like this:
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-ini
namespace: grafana
data:
grafana.ini: |
[server]
domain = grafana.malcolmpereira.com
root_url: https://grafana.malcolmpereira.com
serve_from_sub_path: false
If the deployment uses a sub-path like /grafana, the root_url must reflect this (e.g., https://dashboard.domain.com/grafana), and the serve_from_sub_path setting must be adjusted accordingly to ensure the application understands its internal URI structure. Failure to do so results in the application attempting to redirect users to /login instead of the expected /grafiana/login, triggering a 404 error.
Sub-path Routing Challenges for Prometheus and Grafana
When managing a kube-prometheus-stack, engineers often attempt to expose both Prometheus and Grafana through a single Ingress resource using different paths. This introduces significant complexity regarding path-based routing.
A common error involves a configuration similar to the following:
yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- dashboard.domain.com
secretName: domain-com
rules:
- host: dashboard.domain.com
http:
paths:
- path: /grafana
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
In this configuration, the rewrite-target: / instruction is dangerous. When a user hits dashboard.domain.com/grafana, the Ingress Controller strips /grafana and sends a request for / to the Grafana service. However, the Grafana application still believes it is being accessed via its original URL. If the internal application logic still references the /grafana prefix for its assets (JS, CSS, images), the browser will attempt to fetch them from the root, leading to a cascade of 404 errors.
The conflict is even more pronounced when attempting to expose Prometheus alongside Grafana. For example, if Prometheus expects the path /graph but the Ingress routes traffic via /prometheus/graph, the application's internal links will break. This is a fundamental mismatch between the Ingress's path-stripping logic and the application's internal URL generation.
Monitoring the Ingress Controller with Prometheus and Grafana
To maintain the health of the NGINX Ingress Controller, it is essential to implement a monitoring loop. The Ingress Controller provides its own metrics, which can be scraped by Prometheus and visualized in Grafana. This provides visibility into:
- Controller Request Volume: Tracking the total number of incoming requests.
- Controller Connections: Monitoring active TCP connections to the controller.
- Controller Success Rate: Measuring the ratio of non-4xx/5xx responses.
- Config Reloads: Detecting when NGINX configuration changes trigger a reload.
- Ingress Request Volume and Success Rate: Analyzing traffic at the individual Ingress level.
- Resource Pressure: Tracking Network I/O, Average Memory Usage, and Average CPU Usage.
The data collection process follows a specific workflow:
- Ensure the Prometheus deployment includes the correct scrape annotations for the NGINX Ingress Controller service.
- Configure the Prometheus data source within Grafana.
- Import a specialized dashboard, such as the Kubernetes Ingress Controller Dashboard (ID: 12575), to visualize these metrics.
This level of monitoring allows for the detection of "Config Reload" failures, which can happen if a malformed Ingress resource is applied, potentially breaking the entire routing plane for the cluster.
Deployment and Service Verification
When deploying these services, verifying the service type and port mapping is a critical step in troubleshooting connectivity. In a NodePort-based deployment, the following steps are necessary to ensure the dashboard is reachable:
- Apply the necessary Kustomize or YAML configurations for the stack.
- Inspect the services in the
ingress-nginxnamespace:
bash
kubectl get svc -n ingress-nginx
Identify the
NodePortassigned to the Grafana and Prometheus services. For example, ifgrafanais mapped to3000:31086/TCP, the dashboard can be accessed viahttp://{node_ip}:31086.Verify the endpoint readiness to ensure the Ingress Controller can successfully route traffic to the pods.
The following table summarizes the typical service structure in a monitoring deployment:
| Service Name | Type | Port | NodePort (Example) | Purpose |
|---|---|---|---|---|
| default-http-backend | ClusterIP | 80 | N/A | Default backend for unmatched requests |
| ingress-nginx | NodePort | 80, 443 | 30100, 30154 | The primary entry point for all traffic |
| prometheus-server | NodePort | 9090 | 32630 | Prometheus data aggregation |
| grafana | NodePort | 3000 | 31086 | Visualization and dashboarding |
Analytical Conclusion
The complexity of configuring Grafana and Prometheus behind an NGINX Ingress Controller stems from the requirement for absolute synchronization between three distinct layers: the Ingress Controller's rewrite rules, the NGINX server's header manipulation (specifically for WebSockets), and the application's internal root_url configuration. Redirection loops and 404 errors are not random failures but are predictable outcomes of a mismatch in how the URI is interpreted at each layer of the stack.
To build a resilient observability architecture, engineers must move away from simple path-based routing and instead adopt a configuration strategy where the root_url in the application's ConfigMap serves as the single source of truth. By ensuring that the application is aware of its external FQDN and protocol, and by configuring NGINX to respect these boundaries through precise rewrite-target and server-snippet annotations, the "Too Many Redirects" and "404 Not Found" errors can be systematically eliminated. Furthermore, the integration of NGINX Ingress metrics into the very Grafana instance being monitored creates a closed-loop observability system, allowing for the real-time detection of configuration-driven failures before they impact the broader cluster infrastructure.
Sources
- Grafana Community - Nginx Ingress Controller Issue
- Grafana Dashboard - Kubernetes Ingress Controller Dashboard
- GitHub - kube-prometheus-operator issue 1748
- Kubernetes Discuss - Ingress Redirect Issue
- Kubernetes Ingress-Nginx User Guide - Monitoring
- Fabian Lee - Exposing Prometheus and Grafana via Ingress