The Lifecycle and Architectural State of the Grafana OnCall Open Source Repository

The landscape of incident response and on-call management underwent a significant structural transformation regarding its open-source availability on March 24, 2026. The Grafana OnCall Open Source Software (OSS) project, once a cornerstone for self-hosted alerting workflows, transitioned into a fully archived state. This transition was not an overnight occurrence but the culmination of a strategic pivot by the maintainers toward a unified Grafana Cloud Incident Response Management (IRM) experience. Understanding the current state of the grafana/oncall GitHub repository requires a granular examination of its maintenance timeline, the technical implications of its archival, and the operational methodologies required to maintain legacy deployments in a post-Cloud Connection era.

As of March 11, 2025, the project officially entered a maintenance mode. During this phase, the primary objective of the development team shifted away from feature innovation and toward the stability of the existing codebase. For a period of approximately one year, the repository remained active solely for addressing critical vulnerabilities and high-severity bugs. Specifically, security patches for Common Vulnerabilities and Exposures (CVEs) carrying a Common Vulner and Vulnerability Scoring System (CVSS) score of 7.0 or higher were prioritized to ensure that existing users were not left exposed to significant exploits. However, the development of new functionalities or experimental integrations was halted, signaling the beginning of the end for the OSS-specific roadmap.

The finality of this transition was reached on March 24, 2026, when the grafana/oncall repository was moved to a read-only status. This archival status has profound implications for DevOps engineers and site reliability engineers (SREs) who rely on this specific codebase. The cessation of active development means that the repository no longer receives updates for new features, and the infrastructure once provided by Grafana Cloud to support OSS users has been decommissioned. For organizations running the OSS version, this represents a shift from a managed-service-dependent model to a purely self-contained, isolated operational model.

Technical Implications of Repository Archival and Read-Only Status

The transition to a read-only state in the grafana/oncall GitHub repository dictates the boundaries of future technical interventions. When a repository is archived, the standard collaborative workflows—such as opening new issues, submitting pull requests, or initiating discussions—are effectively disabled. This status serves as a permanent marker of the project's lifecycle stage, informing developers that the code they are interacting with is a historical artifact rather than an evolving product.

The loss of Cloud Connection support is perhaps the most disruptive technical consequence of the March 24, 2026, deadline. Prior to this date, Grafana OnCall OSS users could leverage Grafana Cloud services to bridge the gap between their local environments and global notification channels. This connection facilitated essential communication layers that were otherwise difficult to manage in a purely air-and-gap or self-hosted environment.

The discontinuation of Cloud Connection support specifically impacts the following notification vectors:

  • Mobile app push notifications: Users of the mobile application who relied on the Cloud Connection bridge to receive alerts on their handheld devices can no longer receive notifications through this mechanism.
  • SMS notifications: The delivery of Short Message Service (SMS) alerts that were routed through the Grafana Cloud infrastructure has ceased.
  • Phone call notifications: Automated voice alerts and phone call escalations that utilized Cloud Connection for telephony integration are no longer functional.

For engineers maintaining these deployments, the technical debt has increased because the responsibility for providing these communication channels has shifted entirely back to the local infrastructure. To maintain a functional on-call rotation, teams must now implement alternative notification services. This might involve integrating third-party providers such as Twilio for SMS and voice, or configuring independent Telegram and Slack bots that do not rely on the decommissioned Grafana Cloud bridge.

Operational Continuity and Deployment Strategies for Legacy Environments

Despite the archival of the software, the technical reality is that existing deployments of Grafana OnCall OSS are not instantly rendered non-functional. The software remains executable within its host environment, provided the underlying dependencies and local configurations remain intact. The internal logic of on-call schedules, the configuration of integrations, and the execution of predefined workflows will continue to operate as long as the engine and the Grafana instance are running.

The primary challenge in maintaining these "zombie" deployments is the management of the notification layer. Since the dependency on Grafana Cloud has been severed, the following table outlines the shift in operational requirements for engineers:

| Feature | Pre-March 24, 2026 (Cloud Connected) | Post-March 24, 2026 (Self-Sustained) |
| :--- | : and local infrastructure | Requires alternative third-party or local integration |
| Mobile Push | Supported via Cloud Connection | No longer supported via Cloud Connection |
| SMS/Voice | Supported via Cloud Connection | Requires local/third-party provider (e.g., Twilio) |
| Slack/Telegram | Supported via local/cloud bridge | Supported via direct local integration |
| Feature Development | Active feature updates | Read-only; no new features |
| Security Patches | Full support for high-severity CVEs | Limited to critical patches until archival |

To manage the deployment of the OnCall engine, particularly in a Docker-based environment, engineers often utilize docker-compose. For those attempting to maintain or update their hobby environments, the process involves pulling the latest available images for the engine, even if new features are absent.

The command to update the engine is:
docker-compose pull engine

Following the pull, the deployment must be re-initialized to apply the updated image:
docker-compose up -d

This process ensures that even in a read-only state, any critical security patches that were applied prior to the final archival can be propagated through the environment.

Infrastructure as Code and Automation with Terraform and Act-Kit

The management of the Grafana OnCall ecosystem, particularly when utilizing tools like act-kit or the Terraform provider for OnCall, requires a highly structured approach to configuration. The automation of on-call schedules and escalation chains is typically handled through Terraform, which allows for the declaration of complex rotation logic.

The integration of the Grafable alerting system with OnCall is facilitated through the integration.tf file. This specific configuration file contains the resources necessary to link Grafana's contact points to the OnCall engine. A critical component of this architecture is the notification policy, which must be configured to route all incoming alerts to the OnCall service. In a well-architected system, the integration.tf file should remain largely untouched, acting as the bridge between the alerting source and the escalation logic.

The architecture of an alert flow can be visualized through the following logical progression:

  1. The Contact Point (CP) within the Grafana Alerting subsystem receives an alert.
  2. The CP notifies the OnCall engine.
  3. The OnCall Grafana Integration triggers the Escalation Chain (EC).
  4. The Escalative Chain executes a specific step to generate a Notification (N).
  5. The Notification process fetches the specific user currently on call from the Schedule.

The escalation chain itself is governed by the escalation.tf file. This file defines the behavior of the system once an alert is received. The default behavior is programmed to notify the individual currently assigned to the active rotation. This information is dynamically retrieved from the schedule defined in the infrastructure. For more complex requirements, such as time-zone-aware rotations, the system supports different schedule types:

  • simple-rotation: A fundamental rotation that switches the on-call person on a weekly basis.
  • timezone-based-rotation: A sophisticated rotation that utilizes timezone offsets to ensure that no individual is assigned to an on-call shift during late-night hours.

For developers managing these configurations through GitHub Actions, the process of applying changes is automated. However, because the repository is now read-only, these actions are typically used within a user's own fork or organization-specific repository. To use a forked version of the configuration, several critical steps must be performed:

  • Fork the repository to a personal or organizational namespace.
  • Modify the Terraform backend within the main.tf file to point to a preferred state store (e.s., S3, GCS, or Terraform Cloud).
  • Configure the main.tf file as the primary entry point, ensuring it contains the correct provider, backend, and module directives for importing active schedules.

A robust deployment also requires the management of sensitive environment variables within the CI/CD pipeline or local environment. The following variables must be explicitly defined for the Terraform provider to authenticate correctly:

  • TF_VAR_grafana_access_token
  • TF_VAR_oncall_access_token
  • TF_VAR_oncall_url
  • TF_VAR_grafana_url

Plugin Configuration and API-Driven Management

The integration between the Grafana UI and the OnCall engine is managed via specific API endpoints that allow for the configuration of plugin settings and the synchronization of user data. For engineers running Grafana and OnCall in a custom environment (not using the provided Docker Compose files), manual configuration via curl is often necessary to enable the plugin and point it to the correct engine URL.

The initialization of the plugin settings can be achieved with the following command, which sets the onCallApiUrl and grafanaUrl within the plugin's JSON configuration:

bash curl -X POST 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/settings' \ -H "Content-Type: application/json" \ -d '{"enabled":true, "jsonData":{"stackId":5, "orgId":100, "onCallApiUrl":"http://engine:8080", "grafanaUrl":"http://grafana:3000"}}'

Furthermore, if the plugin is not yet installed in the environment, the following command can be used to trigger the installation resource:

bash curl -X POST 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/resources/plugin/install'

In scenarios where user permissions have changed or new users have been added to the Grafana instance, the OnCall engine may not immediately reflect these changes due to the cached nature of the synchronization process. While the system is designed to sync automatically upon page refresh, there is an internal 5-minute timeout designed to prevent excessive load. In cases where an immediate update is required, engineers can manually trigger a synchronization via the following API call:

bash curl -X POST 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/resources/plugin/sync'

To verify the current status of the connection between the Grafana plugin and the OnCall engine, a GET request to the status endpoint provides the most accurate diagnostic information:

bash curl -X GET 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/resources/plugin/status'

Analytical Conclusion: The Future of Incident Response in the Post-OSS Era

The archival of the Grafana OnCall OSS project marks a definitive shift in the strategy of the Grafana ecosystem. The transition from a maintenance-mode state in March 2025 to a fully archived, read-only state in March 2026 represents the completion of a lifecycle move from open-source experimentation to a centralized, managed service model via Grafana Cloud IRM.

For the DevOps community, this creates a bifurcation of choice. On one hand, organizations with strict data sovereignty or air-gapped requirements can continue to operate their existing OSS deployments. However, they must now accept the full technical burden of managing notification gateways, replacing the lost Cloud Connection functionality with local Twilio or Slack integrations. The loss of mobile push and SMS/voice via the Cloud bridge necessitates a more complex, "do-it-yourself" approach to alerting.

On the other hand, the move toward Grafana Cloud IRM offers a streamlined, modern approach to incident response. By centralizing the complexity of integrations, escalations, and notification delivery, the Cloud IRM model eliminates the operational overhead of maintaining the engine, the database, and the notification bridges. For teams seeking a developer-friendly experience with integrated Slack, Telegram, and automated escalation chains without the burden of infrastructure management, the Cloud-native path is the intended destination.

Ultimately, the grafana/oncall repository remains a vital piece of technical history. While it no longer evolves, the patterns established within its Terraform configurations, its escalation logic, and its integration architecture continue to inform how modern engineers design resilient, automated, and highly available incident response systems. The legacy of the OSS version lives on in the architectural blueprints of those who continue to manage their own high-availability alerting stacks.

Sources

  1. Grafana OnCall OSS Documentation
  2. Grafana OnCall GitHub Repository
  3. Grafana Act-Kit Repository

Related Posts