Diagnosing GitHub Actions Health: Statistical Analysis and Workflow Control

GitHub Actions has established itself as a dominant force in automating software development workflows, serving as the backbone for building, testing, and deploying code across modern engineering teams. As product scopes expand and codebases grow, the complexity of these automated pipelines increases accordingly. This growth often introduces instability in test suites and elongates build durations, directly impacting developer productivity. A critical challenge arises when teams lack visibility into the historical performance of their workflows. Without clear metrics, it becomes difficult to identify failing workflows or prolonged execution times early in the development cycle. While the GitHub API provides the raw data necessary to calculate success rates and execution times, leveraging it requires writing custom code to handle HTTP requests and perform statistical aggregations. Shell scripts can interface with the API, but the logic required to compute meaningful statistical information adds unnecessary complexity. To bridge this gap, specialized tools and configuration strategies have emerged to provide immediate, actionable insights into the health of GitHub Actions environments.

Statistical Analysis with GitHub CLI Extensions

For teams seeking to monitor the reliability and efficiency of their CI/CD pipelines without building custom dashboards, the GitHub CLI (gh) offers a streamlined solution through community extensions. The gh-workflow-stats extension, developed by fchimpan, enables users to retrieve comprehensive statistical data regarding the success rate and average execution time of GitHub Actions workflows and their constituent jobs. This tool eliminates the need for manual API interrogation and complex scripting, allowing engineers to focus on interpreting the data rather than constructing the queries.

The extension operates by querying the GitHub API and presenting the results in a human-readable format or as structured JSON data. To utilize this tool, users must have the GitHub CLI installed, be authenticated via gh auth login, and possess Read access permissions for the target workflows. The installation is executed via a single command:

bash gh extensions install fchimpan/gh-workflow-stats

Once installed, the primary command gh workflow-stats accepts parameters for the owner (-o), repository (-r), and workflow file name (-f). This command aggregates data from workflow runs, providing a summary that includes total run counts, success rates, failure rates, and other outcome categories. It also calculates execution time statistics, including minimum, maximum, average, median, and standard deviation values. This depth of statistical analysis allows teams to distinguish between occasional flaky failures and systemic performance issues.

bash gh workflow-stats -o $OWNER -r $REPO -f $WORKFLOW_FILE_NAME

The output typically presents a dashboard-like summary, such as a total of 100 runs with 79 successes (79.0%), 14 failures (14.0%), and 7 other outcomes (7.0%). Execution time metrics might show a minimum of 365.0 seconds, a maximum of 1080.0 seconds, an average of 505.5 seconds, a median of 464.0 seconds, and a standard deviation of 120.8 seconds. Additionally, the output identifies the top three jobs with the highest failure counts and the top three jobs with the longest average execution durations, broken down by operating system runners such as macos-latest, ubuntu-latest, and windows-latest.

For integration into other automated systems or custom dashboards, the extension supports a --json flag. This outputs the data in a structured JSON format, including fields such as workflow_runs_stats_summary, total_runs_count, name, and rate. This flexibility allows teams to pipe the output into other tools for long-term historical tracking or alerting.

Granular Job-Level Metrics

While workflow-level statistics provide a high-level view of pipeline health, the jobs subcommand within the gh-workflow-stats extension offers a more granular perspective. By appending jobs to the command, users can isolate the success rate and average execution time of individual jobs within a workflow. This is particularly useful for identifying specific bottlenecks or fragile steps that may not be apparent when viewing the entire workflow as a single unit.

bash gh workflow-stats jobs -o $OWNER -r $REPO -f $WORKFLOW_FILE_NAME

This command follows the same parameter structure as the parent command but focuses the aggregation on job-level data. The output mirrors the workflow-level format, listing total runs, success/failure counts, and execution time statistics for each job. It also highlights the top jobs with the highest failure counts and longest execution times. For instance, a build job running on windows-latest might exhibit a significantly longer average duration (480.86s) compared to macos-latest (301.65s) or ubuntu-latest (185.24s), indicating a potential need for optimization on that specific runner environment.

The extension also supports filtering results by the actor who triggered the workflow using the --actor option. This allows teams to investigate whether failures are correlated with specific contributors or automated triggers. Additional query parameters available in the GitHub API documentation can also be leveraged to further refine the data retrieval, ensuring that the statistics collected are relevant to the specific investigation at hand.

Conditional Workflow Execution

Beyond statistical monitoring, controlling the flow of execution between workflows is a common requirement in complex CI/CD setups. Teams often need to trigger a subsequent workflow only after a previous one has completed, and potentially only if it succeeded. GitHub Actions provides the workflow_run event to facilitate this inter-workflow dependency.

To trigger a second workflow when a first one finishes, the workflow_run event can be configured with the types property set to completed. This ensures that the second workflow is dispatched only after the first one has reached a terminal state, regardless of whether that state was success or failure. The available activity types for this event include completed, requested, and in_progress.

yaml on: workflow_run: workflows: ["First action"] types: [completed]

However, in many scenarios, running the subsequent workflow after a failure is undesirable. To ensure that the second workflow only runs if the first one succeeds, a conditional statement must be added to the job definition within the second workflow. This condition checks the github.event.workflow_run.conclusion property. If the conclusion is success, the job proceeds; otherwise, it is skipped.

yaml jobs: build: runs-on: ubuntu-latest if: ${{ github.event.workflow_run.conclusion == 'success' }} steps: - run: echo "Previous workflow succeeded"

This pattern allows for precise control over workflow dependencies, ensuring that downstream actions, such as deployment or notification, only occur when upstream validations have passed. It is a crucial technique for maintaining the integrity of CI/CD pipelines and preventing the propagation of failures through automated processes.

Handling Step Failures and Job Status

A nuanced aspect of GitHub Actions is the distinction between step failures and job status. In standard configurations, a job is marked as failed if any step within it fails. However, certain configurations or misunderstandings can lead to jobs being marked as successful even when specific steps have failed. This discrepancy can be confusing, particularly when using third-party actions or complex shell scripts.

For example, users have reported issues where Cypress test suites marked as failed within a GitHub Action still resulted in an overall "Success" status for the workflow run. This can occur if the action is not properly configured to exit with a non-zero status code upon test failure, or if the failure is caught and suppressed by subsequent steps. Addressing such issues requires careful inspection of the action's YAML configuration and the underlying scripts to ensure that exit codes are correctly propagated.

Similarly, in monorepo setups using tools like pnpm and nx, developers may implement strategies to run lint and unit tests only on affected projects. If the step that derives the appropriate SHAs for base and head fails, subsequent steps might need to fall back to running tests on all projects. This is often achieved using the if: ${{ failure() }} condition, which checks if any previous step in the job has failed. However, this condition does not change the overall job status; if the fallback steps succeed, the job may still be marked as successful, depending on how the final exit code is determined.

yaml - name: Run lint & unit tests on ALL projects if: ${{ failure() }} run: | pnpm lint:all pnpm test:all

Understanding these mechanics is essential for accurately interpreting workflow results and ensuring that failures are not inadvertently masked. Proper configuration of step conditions and exit codes is critical for maintaining the reliability of CI/CD pipelines.

Conclusion

The effective management of GitHub Actions requires more than just defining workflows; it demands continuous monitoring and precise control over execution flow. Tools like the gh-workflow-stats extension provide valuable statistical insights into workflow and job performance, enabling teams to identify and address instability and inefficiency. Meanwhile, understanding the nuances of conditional execution and step failure handling ensures that pipelines respond appropriately to success and failure conditions. By leveraging these techniques, development teams can maintain robust, reliable, and efficient CI/CD processes that support rapid and confident software delivery.

Sources

  1. Get success rate and average time from your GitHub Actions workflow runs and jobs
  2. My GitHub Actions run has always been marked a Success even when my cypress spec file test blocks marked as failed
  3. Run GitHub Action After Another Action Finished
  4. Can a job be marked as successful even if one of its steps failed? #2679

Related Posts