Optimizing GitHub Actions Workflows with Conditional File Change Detection

In the landscape of modern software engineering, continuous integration and continuous deployment (CI/CD) pipelines are the backbone of rapid delivery. However, as codebases grow in complexity, particularly within monorepo architectures, the default behavior of running entire workflows for every commit becomes a significant inefficiency. Executing full test suites or deploying microservices that have not been modified wastes computational resources, increases infrastructure costs, and delays feedback loops for developers. The solution lies in implementing conditional execution logic that triggers specific jobs or steps only when relevant files have changed. This approach requires a combination of specialized GitHub Actions, custom reusable workflows, and native scripting capabilities to effectively filter changes based on file paths, extensions, and repository structure.

The Monorepo Challenge and Resource Efficiency

Monorepos, which house multiple services, libraries, or documentation within a single repository, present unique challenges for automation. When a developer modifies a file in one directory, such as directory A, the entire workflow might trigger, attempting to build and deploy unrelated components like service B. This scenario is not only wasteful but can also introduce unnecessary risk and latency into the deployment pipeline. The goal is to ensure that service A deploys when files in its directory change, while service B remains untouched unless its specific files are modified.

This optimization is critical for saving time and resources. Integration tests, which are often slow and resource-intensive, should only run when the code they are testing has been altered. Similarly, deployment jobs should remain dormant unless the artifacts they are responsible for delivering have been updated. By implementing path-based filtering, engineering teams can achieve a more responsive CI/CD environment where the pipeline reacts intelligently to the scope of each change.

Using the paths-filter Action for Conditional Execution

One of the most established methods for detecting file changes is the dorny/paths-filter GitHub Action. This tool enables the conditional execution of workflow steps and jobs based on the files modified by a pull request, on a feature branch, or by recently pushed commits. It allows users to define specific filters for different file types or directories, outputting boolean values that other jobs can consume to determine whether they should execute.

The typical implementation involves creating a dedicated job, often named changed, that runs the paths-filter action. This job checks out the repository and examines the modified files against a predefined set of rules. The outputs from this job are then referenced in downstream jobs using the needs keyword to establish dependencies and the if keyword to apply conditions.

yaml jobs: changed: name: "Check what files changed" outputs: python: ${{ steps.filter.outputs.python }} workflow: ${{ steps.filter.outputs.workflow }} steps: - name: "Check out the repo" uses: actions/checkout - name: "Examine changed files" uses: dorny/paths-filter id: filter with: filters: | python: - "**.py" workflow: - ".github/workflows/testsuite.yml" tests: # Don't run tests if the branch name includes "-notests". # Only run tests if Python files or this workflow changed. needs: changed if: | ${{ !contains(github.ref, '-notests') && ( needs.changed.outputs.python == 'true' || needs.changed.outputs.workflow == 'true' ) }}

In this configuration, the changed job identifies if any Python files (**.py) or the workflow file itself (.github/workflows/testsuite.yml) have been modified. The tests job depends on the changed job and will only run if either condition is true, provided the branch name does not contain the string -notests. This pattern allows for granular control, ensuring that tests run only when necessary. However, developers must be aware that test dependencies can be broad. For instance, changes to requirements files, test output files, or configuration files like tox.ini can also affect test results. Therefore, the filter definitions must be comprehensive to avoid skipping necessary validation steps.

Implementing Reusable Workflows for Path Checking

For organizations with complex, multi-service architectures, embedding filtering logic directly into every workflow file can become repetitive and difficult to maintain. A more scalable approach is to create a reusable workflow that handles the path checking logic. This reusable workflow can be called by other workflows, passing in the specific directory or path to check as an input.

The reusable workflow, often named check-path-changes.yml, is triggered via the workflow_call event. It accepts an input for the path to check and outputs a boolean value indicating whether changes were detected in that path. The calling workflow then uses this output to decide whether to proceed with its specific tasks, such as deployment.

yaml name: Check Path Changes on: workflow_call: inputs: path_to_check: required: true type: string description: "Path prefix to check for changes" outputs: should_run: description: "Whether the calling workflow should run based on path changes" value: ${{ jobs.check_changes.outputs.should_run }} jobs: check_changes: name: "Check for changes in ${{ inputs.path_to_check }}" runs-on: ubuntu-latest outputs: should_run: ${{ steps.check.outputs.should_run }} steps: - name: Check for changes id: check env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # For manual triggers, always run if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then echo "should_run=true" >> $GITHUB_OUTPUT exit 0 fi # Logic to check for changes in the specified path would go here # This typically involves using git commands to compare commits # and checking if any changed file starts with the input path

When integrating this reusable workflow into a deployment pipeline, the deploy job specifies the reusable workflow as a dependency using the needs property. It also includes an if condition that checks the should_run output. This ensures that the deployment job waits for the path check to complete and only executes if changes were detected in the target directory.

yaml jobs: check_changes: name: "Check for changes in directory A" uses: ./.github/workflows/check-path-changes.yml permissions: actions: write # Needed to cancel the workflow run contents: read # Needed to access commit data pull-requests: read # Needed to access Pull Requests with: path_to_check: "directory A" deploy: name: "Deploy" run: | # your deploy logic goes here echo "Deploying..." with: environment: staging needs: [check_changes] if: ${{ needs.check_changes.outputs.should_run == 'true' }} secrets: inherit

This architecture promotes modularity and consistency. The permissions required for the reusable workflow include actions: write to allow for workflow cancellation if necessary, contents: read to access commit data, and pull-requests: read to access pull request details. By abstracting the path checking logic, teams can easily apply the same filtering rules across multiple services and workflows without duplicating code.

Native Scripting Approaches with Git and PowerShell

While third-party actions and reusable workflows are powerful, GitHub Actions does not natively support file change filtering out-of-the-box. However, developers can implement custom filtering logic using native tools like git, PowerShell Core, and GitHub Actions expressions. This approach provides maximum flexibility and does not require additional dependencies beyond the standard GitHub Actions runner environment.

The core of this method involves using the git diff command to retrieve a list of modified files. By checking out the repository with a sufficient fetch depth (typically 2 to compare HEAD with the previous commit), the script can generate a diff between the current commit and the previous one. This list of files can then be filtered using regular expressions or pattern matching to determine if specific files have changed.

yaml name: demo on: push: branches: - 'main' jobs: conditional_step: runs-on: 'ubuntu-20.04' steps: - uses: actions/checkout@v2 with: # Checkout as many commits as needed for the diff fetch-depth: 2 - shell: pwsh # Give an id to the step, so we can reference it later id: check_file_changed run: | # Diff HEAD with the previous commit $diff = git diff --name-only HEAD^ HEAD # Check if a file under docs/ or with the .md extension has changed (added, modified, deleted) $SourceDiff = $diff | Where-Object { $_ -match '^docs/' -or $_ -match '.md$' } $HasDiff = $SourceDiff.Length -gt 0 # Set the output named "docs_changed" Write-Host "::set-output name=docs_changed::$HasDiff" # Run the step only with "docs_changed" equals "True" - shell: pwsh if: steps.check_file_changed.outputs.docs_changed == 'True' run: echo publish docs

In this example, the PowerShell script calculates the difference between the current HEAD and the previous commit. It filters the results to find files in the docs/ directory or files with the .md extension. The result is stored in a step output, which is then used by subsequent steps to conditionally execute tasks, such as publishing documentation.

For conditional jobs, the same logic can be applied, but the output must be defined at the job level to be accessible by other jobs in the workflow.

yaml name: sample on: push: branches: - 'main' jobs: conditional_job_check_files: runs-on: 'ubuntu-20.04' # Declare outputs for next jobs outputs: docs_changed: ${{ steps.check_file_changed.outputs.docs_changed }} steps: - uses: actions/checkout@v2 with: fetch-depth: 2 - shell: pwsh id: check_file_changed run: | $diff = git diff --name-only HEAD^ HEAD $SourceDiff = $diff | Where-Object { $_ -match '^docs/' -or $_ -match '.md$' } $HasDiff = $SourceDiff.Length -gt 0 Write-Host "::set-output name=docs_changed::$HasDiff"

This native approach is particularly useful for teams that prefer to minimize external dependencies or require highly specific filtering logic that may not be easily expressed through standard action configurations. It leverages the power of shell scripting to perform complex file matching and state management within the workflow.

Best Practices for Monorepo Path Filtering

Implementing file change filtering in GitHub Actions, especially within monorepo environments, requires careful consideration of best practices to ensure reliability and maintainability. One common challenge is determining the optimal way to trigger workflows based on paths. While the on.push trigger supports path filtering, it may not be the most flexible solution for all scenarios, particularly when dealing with pull requests or complex directory structures.

Developers often explore community discussions to find the best practices for these use cases. A common pattern is to use a combination of path triggers and conditional jobs. For instance, a workflow might be triggered on pushes to specific folders like /backend/ or /docs/, but additional logic is often required to handle pull requests or to differentiate between different services within the monorepo.

Key considerations include:

  • Comprehensive Filter Definitions: Ensure that all files that could affect a job's outcome are included in the filter. This includes not only source code but also configuration files, dependency lists, and test utilities.
  • Branch Name Conditions: Implement logic to handle special branches, such as those with -notests in the name, to allow for temporary bypasses of certain checks during development or debugging.
  • Reusable Workflows: For complex setups, use reusable workflows to encapsulate path checking logic. This promotes consistency and reduces duplication across multiple workflow files.
  • Permissions: Configure the necessary permissions for jobs that need to access repository data or cancel other workflows. This typically includes contents: read, actions: write, and pull-requests: read.

By adhering to these practices, teams can build robust and efficient CI/CD pipelines that scale with the complexity of their codebases. The ability to conditionally run jobs based on file changes is not just an optimization; it is a fundamental requirement for managing large-scale software projects in the modern era.

Conclusion

The optimization of GitHub Actions workflows through conditional file change detection represents a significant advancement in CI/CD efficiency. Whether through the use of dedicated actions like dorny/paths-filter, the creation of reusable workflows for modularity, or the implementation of native scripting with Git and PowerShell, developers have a variety of tools at their disposal to tailor their pipelines to the specific needs of their projects. In monorepo environments, where the cost of unnecessary builds and deployments is highest, these techniques are essential for maintaining a responsive and cost-effective development process. By carefully defining filter rules, managing job dependencies, and leveraging reusable components, engineering teams can ensure that their automation runs only when truly necessary, saving time, resources, and developer effort.

Sources

  1. GitHub Action paths-filter
  2. Filtering GitHub Actions by Changed Files
  3. Only Run GitHub Actions When Certain Files Have Changed
  4. GitHub Actions Path Filtering for Monorepos
  5. Executing GitHub Actions Jobs or Steps Only When Specific Files Change

Related Posts