Workflow Dispatch and Scalable Self-Hosted GitHub Action Runners

The modernization of continuous integration and continuous deployment (CI/CD) pipelines has shifted from static, scheduled execution toward dynamic, event-driven orchestration. Within the GitHub Actions ecosystem, the ability to trigger workflows on demand—rather than relying solely on code commits—provides developers and DevOps engineers with the flexibility to execute maintenance tasks, manual deployments, or specialized testing suites without polluting the git history with "bogus commits." This capability is anchored by the workflow_dispatch event, which transforms a passive automation script into an active tool that can be invoked via the GitHub user interface or API.

However, the efficiency of on-demand triggers is inextricably linked to the underlying compute infrastructure. While GitHub-hosted runners offer convenience, enterprise-scale operations often necessitate self-hosted runners to maintain control over the environment, reduce costs, and integrate with specific hardware or proprietary software. When scaling these runners on cloud platforms like Amazon Web Services (AWS), the challenge shifts from simple triggering to the management of ephemeral compute resources. The integration of OpenID Connect (OIDC), attribute-based access control (ABAC), and warm-pool scaling strategies ensures that on-demand workflows are not only flexible in their initiation but also performant in their execution, eliminating the latency typically associated with the cold start of a runner environment.

Manual Triggering via Workflow Dispatch

The standard behavior of GitHub Actions is to react to source code changes, specifically commits, merges, or rebases. While this is ideal for automated testing, it creates a friction point when a developer needs to run a workflow for a reason unrelated to a code change. The solution is the workflow_dispatch keyword.

Technical Implementation of the Trigger

To enable manual triggering, the workflow_dispatch event must be added to the on section of the workflow's YAML configuration file. This keyword signals to GitHub that the workflow is eligible for manual invocation.

yaml on: push: branches: [ "main" ] workflow_dispatch:

In the example above, the workflow maintains its automated trigger upon a push to the main branch but also permits manual activation. The ability to combine workflow_dispatch with other triggers is critical for versatile pipelines. By adding it to the list of triggers, the developer ensures the workflow remains automated for standard CI/CD paths while remaining accessible for on-demand execution.

Operational Execution Process

Once the YAML configuration is updated and committed to the default branch, the manual trigger becomes available through the GitHub web interface. The process for an operator is as follows:

  1. Navigate to the "Actions" tab of the repository.
  2. Select the specific workflow from the list of available workflows on the left sidebar.
  3. Locate the "Run workflow" dropdown menu that appears on the right side of the screen.
  4. Select the desired branch to run the workflow against.
  5. Click the button to initiate the execution.

This mechanism eliminates the need for "bogus commits"—meaningless changes made solely to trigger a pipeline—thereby keeping the project's commit history clean and focused on actual feature and bug fix increments.

Architectural Scaling of Self-Hosted Runners on AWS

For organizations requiring higher control or specific toolsets not available in GitHub-hosted environments, self-hosted runners provide a customizable alternative. When these runners are deployed at scale on AWS, the focus shifts to balancing cost, security, and startup latency.

The Ephemeral Runner Model

A critical best practice for scaling is the use of ephemeral runners. In this configuration, runners are created and destroyed for every individual job.

  • Per-job isolation: Every build job is executed in a fresh, short-lived compute environment.
  • Data leakage prevention: Because the environment is destroyed after the job, there is no risk of sensitive data or build artifacts persisting from one job to another, even in multi-tenanted environments.
  • Auto-scaling simplification: Since jobs do not rely on a specific, long-running instance, the system does not need to wait for a specific "idle" runner to become available, allowing the infrastructure to scale dynamically based on the queue.

Addressing Startup Latency with Warm Pools

A significant drawback of purely on-demand ephemeral runners is the registration time. A new runner must launch and register itself with GitHub, a process that typically takes under two minutes. For high-velocity teams, this delay is often unacceptable. To mitigate this, a "warm pool" strategy is employed.

The warm pool consists of pre-registered ephemeral runners that are already active and listening for incoming GitHub workflow events. This eliminates the launch-and-register phase from the critical path of the job execution.

Lambda-Driven Scaling Logic

To manage this warm pool efficiently, a serverless orchestration layer using AWS Lambda is recommended. The architectural flow operates as follows:

  1. A trigger event (such as a code push or a merge request) creates a GitHub workflow event.
  2. This event is sent via a webhook to an Amazon API Gateway endpoint.
  3. An AWS Lambda function receives the payload, validates it, and logs the event for observability.
  4. Based on the queue depth and current pool size, backend Lambda functions are triggered to either scale up (launch more EC2 instances) or scale down (terminate unused instances).
  5. The EC2 runners are registered with GitHub during their launch phase and immediately enter the warm pool to await assignment.

Identity and Access Management for Scale

When managing a vast number of repositories and runners, the intersection of GitHub identities and AWS IAM roles becomes a bottleneck. Standard IAM role assignments can hit hard limits when every repository requires a unique role.

Scaling Identity Strategies

To overcome these limits, three primary strategies are utilized:

  • Attribute Based Access Control (ABAC): This method matches claims within the GitHub OIDC token—such as the repository name, the branch, or the specific team—directly to tags on the AWS resources. This allows a single IAM role to be used across many repositories, with access controlled by the attributes of the requester.
  • Role Based Access Control (RBAC): Instead of per-repository roles, repositories are logically grouped into Teams or Applications. This creates a smaller subset of roles that are shared among related projects.
  • Identity Broker Pattern: A dedicated broker service is used to dynamically vend credentials based on the identity provided by the GitHub workflow, providing a layer of abstraction between the runner and the target AWS resource.

The Role of OIDC

GitHub Actions exposes an OpenID Connect (OIDC) provider to each run. This allows the runner to request a short-lived token from GitHub, which AWS can then validate to grant temporary security credentials. This removes the need to store long-lived AWS Access Keys as GitHub Secrets, significantly enhancing the security posture of the CI/CD pipeline.

Observability and Performance Metrics

To optimize the efficiency of on-demand runners, organizations must quantify the "build wait time." This is the delta between when a job is queued and when it actually begins executing.

Metric Capture and Analysis

The GitHub workflow event payload contains critical timing elements, specifically started_at and completed_at. By capturing these elements, DevOps teams can calculate the exact latency of their runner infrastructure.

Sample event payload for logging:

json { "hostname": "xxx.xxx.xxx.xxx", "requestId": "aafddsd55-fgcf555", "date": "2022-10-11T05:50:35.816Z", "logLevel": "info", "logLevelId": 3, "filePath": "index.js", "fullFilePath": "/var/task/index.js", "fileName": "index.js", "lineNumber": 83889, "columnNumber": 12, "isConstructor": false, "functionName": "handle", "argumentsArray": [ "Processing Github event", "{\"event\":\"workflow_job\",\"repository\":\"testorg-poc/github-actions-test-repo\",\"action\":\"queued\",\"name\":\"jobname-buildanddeploy\",\"status\":\"queued\",\"started_at\":\"2022-10-11T05:50:33Z\",\"completed_at\":null,\"conclusion\":null}" ] }

Integration with Amazon CloudWatch

To transform these logs into actionable metrics, the following process is implemented:

  • Log Parsing: The system captures specific fields such as status: queued, repository, and name from the JSON payload.
  • Embedded Metric Format (EMF): Using CloudWatch metrics client libraries, these log elements are mapped into dimension fields.
  • Visualization: Once mapped, CloudWatch generates metrics that allow engineers to visualize the queue time across different repositories and jobs, enabling data-driven decisions on the size of the warm pool.

Managed Alternatives: AWS CodeBuild

While manual orchestration of EC2 runners provides maximum control, AWS provides a managed alternative via AWS CodeBuild. CodeBuild can act as a managed self-hosted runner for GitHub Actions.

Advantages of Managed Runners

The transition to CodeBuild for GitHub Actions removes the operational overhead of maintaining the scaling logic and infrastructure.

  • Low Startup Latency: CodeBuild provides a highly optimized environment that reduces the time between a trigger and the actual execution of the job.
  • Strong Security Boundaries: Each runner operates in a strictly isolated environment.
  • Zero Infrastructure Management: There is no need to manage EC2 instances, Lambda functions for scaling, or warm pool replenishment logic.
  • Simple Integration: Setup is achieved by creating a webhook that automatically triggers the CodeBuild environment when a GitHub Action is initiated.

Summary of Infrastructure Options

The following table compares the different approaches to executing GitHub Actions on demand.

Feature GitHub-Hosted Self-Hosted (EC2) Managed (CodeBuild)
Setup Effort Minimal High Moderate
Control Over Env Low Absolute Moderate
Security Isolation High High (if ephemeral) High
Startup Latency Low Moderate (High without warm pool) Low
Cost Predictability Fixed per minute Variable (EC2 costs) Variable (Pay-as-you-go)
Custom Tooling Limited Unlimited High

Conclusion

The transition to on-demand GitHub Actions, facilitated by workflow_dispatch, represents a shift toward more mature, operationalized CI/CD practices. By decoupling the execution of a workflow from the necessity of a code commit, teams gain a powerful tool for administrative and deployment tasks. However, the true value of on-demand triggering is only realized when paired with a scalable, secure, and low-latency infrastructure.

The implementation of ephemeral runners on AWS, supported by a Lambda-driven warm pool and OIDC-based identity management, addresses the primary bottlenecks of self-hosted environments: security risks and startup delays. While the complexity of this setup is significant, the resulting architecture provides a highly isolated, scalable system capable of supporting thousands of concurrent jobs across multiple repositories. For organizations that cannot justify the operational overhead of managing such a system, the integration of AWS CodeBuild offers a compelling middle ground, providing the benefits of a managed environment without sacrificing the flexibility of the GitHub Actions ecosystem. Ultimately, the choice between these paths depends on the specific balance an organization requires between granular environmental control and operational simplicity.

Sources

  1. Ardalis - GitHub Actions on Demand
  2. AWS DevOps Blog - Best Practices Working with Self-Hosted GitHub Action Runners at Scale on AWS
  3. README Guides - GitHub Runners Philips

Related Posts