Modularizing GitHub Actions with YAML Includes and Preprocessing

The challenge of maintaining scalable CI/CD pipelines often centers on the tension between repeatability and redundancy. In the context of GitHub Actions, the native YAML specification lacks a built-in import or include mechanism, which frequently leads to "configuration drift" where identical logic is duplicated across multiple environment files, such as staging and production. While GitHub provides certain native abstractions, the need for a more flexible, pre-processor based approach has led to the development of tools and strategies designed to "flatten" complex, modularized YAML structures into the static files required by the GitHub Actions runner. This process of modularization allows developers to treat their infrastructure-as-code with the same rigor as application code, implementing dry (Don't Repeat Yourself) principles to reduce human error and maintenance overhead.

The Structural Limitations of Native GitHub Actions YAML

The fundamental struggle with GitHub Actions workflows stems from the fact that YAML, by its own specification, does not handle imports natively. When a project evolves to require multiple workflows—for instance, separating deployment logic by environment—developers often find themselves copying and pasting large blocks of configuration. This duplication creates a maintenance nightmare: a change in a single dependency or a tweak to a shell command must be manually replicated across every single .yml file in the .github/workflows/ directory.

To address this, several native and third-party strategies have emerged, each with distinct trade-offs.

Method Scope Primary Use Case Overhead
Reusable Workflows Cross-repository / Cross-file Organization-wide deployment patterns High (requires separate files)
Composite Actions Step-level encapsulation Bundling sequences of steps Medium (requires action.yml)
YAML Anchors Single-file only Eliminating duplicate variables in one file Low (zero setup)
actions-includes Pre-processed modularity Complex local action nesting and script injection Medium (requires pre-processing step)

Deep Dive into actions-includes for Advanced Workflow Modularization

The actions-includes tool provides a powerful mechanism to bypass the native limitations of GitHub Actions by implementing a preprocessing layer. Instead of relying on the standard uses or run keywords for every interaction, developers utilize the includes keyword. This allows for the creation of a "source" workflow that is later compiled into a "flattened" workflow that GitHub can actually execute.

The Preprocessing Mechanism

The core functionality of actions-includes is to transform an input YAML file (e.g., workflow.in.yml) into an output YAML file (workflow.out.yml) that conforms to GitHub's strict requirements. This transformation is handled via a Python module.

The execution of this transformation can be performed in two primary ways:

  1. Local Python Execution:
    python -m actions_includes ./.github/workflows-src/workflow-a.yml ./.github/workflows/workflow-a.yml

  2. Dockerized Execution:
    docker container run --rm -it -v $(pwd):/github/workspace --entrypoint="" ghcr.io/mithro/actions-includes/image:main python -m actions_includes ./.github/workflows-src/workflow-a.yml ./.github/workflows/workflow-a.yml

This preprocessing step ensures that the final file uploaded to GitHub is a standard, static YAML file, while the developer maintains a modular, easy-to-manage source structure.

Expanded Syntax for Local and Public Actions

The actions-includes tool extends the standard action naming syntax to make referencing local components more intuitive. This is particularly critical for teams that want to maintain a library of internal utility actions without the overhead of creating full-blown public repositories for every small task.

  • Public Actions:
    The syntax {owner}/{repo}@{ref} is used to reference a public action hosted on GitHub.
    The syntax {owner}/{repo}/{path}@{ref} is used for public actions located within a specific sub-path of a repository.

  • Local Actions:
    The syntax ../{path} allows for referencing a local action located at .github/actions/{action-name}.
    The tool specifically introduces the /{name} syntax, which targets local actions stored under ./.github/includes/actions/{name}. This is presented as a corrective measure for how composite actions should ideally function by providing a streamlined path for internal includes.

Script Injection via includes-script

One of the most potent features of this system is the includes-script step. This allows a developer to extract a script—whether written in Python or a shell language—into a separate file and reference it within the workflow.

For example, a developer creates a file named script.py with the following content:
print('Hello world')

In the workflow.yml source file, the step is defined as:
steps: - name: Hello includes-script: script.py

When the preprocessing command python -m actions_includes.py workflow.in.yml workflow.out.yml is run, the tool analyzes the file extension of the script. If it is a .py file, it automatically deduces that the shell should be python. The resulting output in workflow.out.yml becomes:
steps: - name: Hello shell: python run: | print('Hello world')

This allows for a clean separation of logic (the script) and orchestration (the YAML), while still maintaining the ability to manually override the shell parameter if a different interpreter is required.

Implementation of Pre-commit Hooks for Workflow Automation

To prevent the risk of committing an un-processed "source" YAML file to the .github/workflows/ directory—which would cause the GitHub Actions runner to fail—it is recommended to use a pre-commit hook. This ensures that every time a developer runs git commit, the preprocessing tool automatically flattens the source files into the final output files.

Utilizing the pre-commit Framework

The pre-commit package is the industry standard for managing these hooks. To integrate actions-includes, a developer can add a local hook entry to their pre-commit-config.yaml file:

  • repo: local
    hooks:
    • id: preprocess-workflows

      name: Preprocess workflow.yml

      entry: python -m actions_includes.py workflow.in.yml workflow.out.yml

Manual Git Hook Setup

For those not using the pre-commit framework, a manual shell script can be placed in .git/hooks/pre-commit. This script should be designed to fail the commit if the preprocessing fails, ensuring that no invalid YAML reaches the remote repository.

The content of the .git/hooks/pre-commit file would be:
#!/bin/bash
python -m actions_includes.py workflow.in.yml workflow.out.yml || { echo "Failed to preprocess workflow file." }

Because the .git folder is not tracked by version control, this script must be stored in a non-ignored file within the project (such as a setup.sh script) and copied into the hooks directory during the initial project environment setup.

Implementing a GitHub Actions Shell for Manual Execution

Beyond modularization, there is a need for "ad-hoc" execution of commands without the need to modify a YAML file, commit a change, and push to a branch. This is solved by creating a "GitHub Actions Shell" workflow.

By utilizing the workflow_dispatch event, a developer can create a dedicated .yml file (e.g., .github/workflows/shell.yml) that accepts a command as an input.

The Shell Workflow Configuration

The configuration for this manual shell is as follows:

yaml name: "GitHub Actions Shell" on: workflow_dispatch: inputs: command: description: 'The command to run' required: true jobs: run: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - run: ${{ github.event.inputs.command }} env: GIT_COMMITTER_NAME: GitHub Actions Shell GIT_AUTHOR_NAME: GitHub Actions Shell EMAIL: github-actions[bot]@users.noreply.github.com

Real-World Applications of the Manual Shell

This setup eliminates the "context switching cost" where a developer would normally have to stash changes, switch branches, run a script locally, and then switch back. Instead, they can trigger this workflow from the "Actions" tab in the GitHub UI.

Common use cases include:
- Code Formatting: Running prettier to format the README file and pushing the changes back to the repo.
- Maintenance: Batch-optimizing images or updating website screenshots.
- Package Management: Executing npm publish for manual releases.
- Prototyping: Testing new action logic manually before committing it to a formal, automated workflow file.

Analysis of Pathing and Root Directory Conflicts

A common point of failure in GitHub Actions configuration involves the interpretation of the root directory and relative paths. This is often seen when configuring deployment actions (e.g., static.yml).

When a workflow uses path: '.', it instructs the action to upload the current directory or the entire repository. However, a critical failure occurs when a user changes the path to /. While conceptually both point to the root, GitHub Actions interprets / in a way that can lead to an infinite loop during the upload process, preventing the page from ever being deployed.

Furthermore, the use of . (current directory) necessitates the use of relative paths (e.g., ../) when referencing images or assets in previous directories. This creates a debugging overhead, as the relative path in the YAML configuration may not match the relative path expected by the deployed website's file structure.

Comparative Analysis of Duplication Elimination Strategies

The choice between YAML anchors, composite actions, and reusable workflows depends on the specific scope of the duplication.

YAML Anchors: The Minimalist Approach

YAML anchors allow for the definition of a block of data that can be referenced elsewhere in the same file. This is the most efficient method for eliminating duplicate environment variables within a single workflow file because it requires zero setup, no separate files, and no shell specifications. However, it is strictly limited to a single file; it cannot share data across different .yml files.

Composite Actions: The Encapsulation Approach

Composite actions are used to bundle a sequence of steps into a reusable unit stored in .github/actions/.

  • Advantages: They work across different repositories and provide encapsulation via defined inputs and outputs.
  • Disadvantages: They require the creation of a separate action.yml file, they cannot specify the runner environment (which must be done in the calling workflow), and they introduce invocation overhead. For simple duplication within a single file, they are considered heavyweight.

Reusable Workflows: The Organizational Approach

Reusable workflows are designed for sharing entire workflow structures across an organization. While perfect for standardizing deployment patterns across 50 different microservices, they are overkill for removing a few duplicate lines in one file. A significant limitation is that users cannot inject steps before or after a call to a reusable workflow; the call is an all-or-nothing execution of the target workflow.

Conclusion: The Architecture of Modern Workflow Management

The evolution of GitHub Actions from simple linear scripts to complex, modularized pipelines requires a shift in how developers approach YAML. The native lack of import functionality creates a gap that must be filled by either high-level abstractions (Reusable Workflows) or low-level preprocessing (actions-includes).

The most robust architecture for a professional project involves a three-tier strategy:
1. Use YAML Anchors for intra-file variable deduplication.
2. Use Composite Actions for cross-repository step sequences.
3. Use a pre-processing tool like actions-includes combined with git pre-commit hooks to manage complex, nested local dependencies and externalize script logic into standalone files.

By implementing a preprocessing pipeline, teams can maintain the readability of "source" YAML while providing GitHub with the "flattened" static files it requires. This, combined with a manual dispatch shell for prototyping, creates a development environment that is both agile and stable, significantly reducing the risk of configuration errors during the deployment lifecycle.

Sources

  1. actions-includes GitHub Repository
  2. Adding a GitHub Actions Based Shell
  3. GitHub Actions Starter Workflows Issue 245
  4. Community Discussion 64950
  5. GitHub Actions YAML Anchors and Aliases Analysis

Related Posts