RunsOn High-Performance AWS Self-Hosted Runner Infrastructure

The landscape of Continuous Integration and Continuous Deployment (CI/CD) has long been dominated by the trade-off between the convenience of GitHub-hosted runners and the cost-efficiency of self-managed infrastructure. RunsOn emerges as a sophisticated architectural bridge, offering a managed experience for self-hosted GitHub Actions runners that reside entirely within a user's own Amazon Web Services (AWS) account. By pivoting away from the standard ubuntu-latest labels and adopting a dynamic, label-based runner assignment system, RunsOn allows organizations to achieve dramatic reductions in operational expenditure while simultaneously increasing raw compute performance. This system is positioned as a robust alternative to the Actions Runner Controller (ARC) on Kubernetes, the Philips Terraform module, and other third-party provider models that typically require intrusive access to an organization's private code and sensitive secrets.

The primary value proposition of RunsOn lies in its ability to provide ephemeral Virtual Machines (VMs) for every individual job. This ensures a clean slate for every build, eliminating the "dirty runner" problem where residual files from previous jobs cause non-deterministic build failures. By utilizing AWS spot pricing with an automatic on-demand fallback mechanism, the system optimizes for cost without sacrificing reliability. Furthermore, the architecture supports multi-Availability Zone (AZ) and multi-environment deployments, ensuring that CI pipelines remain resilient even during localized AWS outages.

Architectural Core and Deployment Framework

RunsOn is designed for rapid deployment and minimal maintenance overhead. The installation process is streamlined to take approximately 10 minutes, leveraging CloudFormation templates to provision the necessary infrastructure within the client's AWS environment. This approach ensures that the user retains full ownership of the infrastructure, as the system integrates via a private GitHub App created specifically for the organization.

The deployment flexibility is further expanded through the availability of a dedicated Terraform module, terraform-aws-runs-on, which allows DevOps engineers to treat their runner infrastructure as code. This provides a scalable path for organizations that prefer HashiCorp's toolchain over AWS CloudFormation.

The system supports a wide array of compute environments, including:

  • Linux runners for standard build workloads.
  • Windows runners for .NET and legacy Windows-based applications.
  • GPU-enabled runners for machine learning, CUDA development, and high-performance compute tasks.

The performance metrics are significant, with raw CPU performance reaching up to 30% higher than the official GitHub-hosted runners. This performance boost, combined with the use of spot instances, allows RunsOn to reduce GitHub Actions costs by a factor of 7x to 15x in many production scenarios.

The RunsOn Labeling System and Workflow Integration

Transitioning from GitHub-hosted runners to RunsOn requires a modification of the runs-on key within the GitHub Actions YAML workflow. While a standard workflow uses a generic label like ubuntu-latest, a RunsOn-integrated workflow utilizes a complex label string that defines the specific runner requirements and extra features.

The transition follows this logic:

Before:
runs-on: ubuntu-latest

After:
runs-on: "runs-on=${{ github.run_id }}/runner=2cpu-linux-x64"

This specific syntax enables the RunsOn orchestrator to identify the exact hardware requirements for the job. The use of ${{ github.run_id }} ensures that the runner is uniquely tied to that specific execution instance, facilitating the ephemeral nature of the VM. Users can reference the job labels documentation to customize runner sizing, choose specific images, and define environment configurations.

Magic Caching and S3 Backend Integration

One of the most critical bottlenecks in CI/CD pipelines is the speed of cache retrieval and storage. RunsOn addresses this through "Magic Caching," which utilizes a built-in S3 cache backend. This system is designed to be significantly faster and larger than the default GitHub Actions cache, which is often limited by strict size quotas and slower network throughput.

To enable this functionality, users must incorporate the runs-on/action@v2 into their workflow steps. When the extras=s3-cache label is added to the runs-on configuration, the infrastructure is prepared to handle high-speed S3 interactions.

Example configuration for magic caching:

yaml jobs: build: runs-on: runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/extras=s3-cache steps: - uses: runs-on/action@v2 - other steps

This architecture allows for a "blazing fast" cache download speed, as noted by SREs in industry testimonials, reducing the overall CI runtime by up to 80% in some environments.

Advanced Compiler Caching with sccache

For developers working with C, C++, Rust, or NVIDIA CUDA, RunsOn provides specialized support for sccache. This tool allows for the caching of compiled objects across different jobs and runners, which is essential for large-scale systems programming where compilation times can be prohibitive.

The runs-on/action@v2 provides a specific parameter for this: sccache: s3. When this is enabled, the action automatically configures the S3 cache backend for sccache using a free S3 cache bucket provided with the RunsOn installation.

The underlying process performed by the action is the equivalent of executing the following environment configurations:

bash echo "SCCACHE_GHA_ENABLED=false" >> $GITHUB_ENV echo "SCCACHE_BUCKET=${{ env.RUNS_ON_S3_BUCKET_CACHE}}" >> $GITHUB_ENV echo "SCCACHE_REGION=${{ env.RUNS_ON_AWS_REGION}}" >> $GITHUB_ENV

An example of a Rust build utilizing this feature is as follows:

yaml jobs: build: runs-on: runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/extras=s3-cache steps: - uses: runs-on/action@v2 with: sccache: s3 - uses: mozilla-actions/[email protected] - run: # your slow rust compilation

Observability, Metrics, and Cost Analysis

RunsOn provides deep visibility into the execution of jobs through the runs-on/action@v2. This action serves multiple purposes: debugging, cost tracking, and performance monitoring.

Cost Reporting

The system integrates with https://ec2-pricing.runs-on.com to provide precise cost data for every workflow job. It calculates costs based on the actual instance type used, the region, the availability zone, and whether the instance was on-demand or a spot instance. A Beta feature also allows users to compare the RunsOn cost against the equivalent GitHub-hosted runner cost.

The cost parameter can be configured in two ways:
- inline: Displays costs directly in the action log output (default).
- summary: Displays costs in both the action log and the GitHub job summary.

A typical cost report output looks like this:

metric value
Instance Type m7i-flex.large
Instance Lifecycle on-demand
Region us-east-1
Duration 2.06 minutes
Cost $0.0040
GitHub equivalent cost $0.0240
Savings $0.0200 (82.8%)

Performance Metrics

For users requiring granular data on resource utilization, the metrics parameter allows the sending of additional telemetry via the CloudWatch agent. These metrics are then displayed as live charts in the post-execution summary.

The supported metrics are:

  • cpu: Tracks usage_user and usage_system.
  • network: Monitors bytes_recv and bytes_sent.
  • memory: Tracks used_percent.
  • disk: Monitors used_percent and inodes_used.
  • io: Tracks io_time, reads, and writes.

Implementation example:

yaml jobs: build: runs-on: runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/extras=s3-cache steps: - uses: runs-on/action@v2 with: metrics: cpu,network,memory,disk,io

Environment Debugging

To assist in troubleshooting, the action can expose all available environment variables to the logs. This is achieved by setting show_env: true.

yaml jobs: build: runs-on: runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/extras=s3-cache steps: - uses: runs-on/action@v2 with: show_env: true

Feature Comparison and Technical Specifications

The following table provides a technical comparison between standard GitHub-hosted runners and the RunsOn AWS-based implementation.

Feature GitHub-Hosted Runners RunsOn (AWS)
Cost Efficiency Standard Pricing 7x to 15x Cheaper
CPU Performance Baseline Up to 30% Higher
Infrastructure Control None (Managed by GitHub) Full (User's AWS Account)
Caching Mechanism GitHub Cache (Limited) S3-Backend (Unlimited/Fast)
VM Lifecycle Ephemeral Ephemeral VMs per Job
Networking Shared Optional Static IPs & SSH
Instance Choice Fixed Tiers Flexible AWS Instance Types
Setup Time Instant ~10 Minutes via CloudFormation

User Experience and Industry Impact

The impact of migrating to RunsOn is most evident in large-scale environments with thousands of daily jobs. For instance, the CTO of Dashdoc reported that costs were divided by four after implementation. Similarly, the Lead DevOps Engineer at Lingoda observed a 70% reduction in GitHub Actions costs and an 80% improvement in CI runtime.

The ease of migration is a key highlight, as existing workflows, caching strategies, and actions typically function without modification. The only required change is the transition of the runs-on label.

Detailed Analysis of Operational Advantages

The transition to RunsOn represents a fundamental shift in how CI/CD infrastructure is consumed. By utilizing a private GitHub App for authentication and installation, RunsOn avoids the security risks associated with third-party providers that require broad read/write access to repositories.

The implementation of spot pricing with on-demand fallback solves the primary risk associated with spot instances: interruption. If a spot instance is reclaimed by AWS, the system can automatically pivot to an on-demand instance to ensure the job completes, maintaining a stable queue time even during large bursts of activity.

From a networking perspective, the ability to implement static IPs and SSH access provides a level of control that is impossible with standard GitHub runners. This is crucial for organizations that must whitelist CI runners within corporate firewalls or connect to private database clusters in specific VPCs.

The integration of sccache and S3 caching transforms the build process from a linear sequence of compilations into a highly efficient, incremental process. By leveraging the high throughput of AWS S3, the time spent downloading and uploading cache artifacts is minimized, which is the primary driver behind the reported 80% improvement in runtime.

Sources

  1. RunsOn Main Page
  2. RunsOn GitHub Repository
  3. RunsOn Action GitHub Repository
  4. RunsOn Organization GitHub

Related Posts