In modern continuous integration and continuous delivery (CI/CD) pipelines, the ability to share data between discrete execution units is critical for modular, efficient, and secure workflows. GitHub Actions, by design, executes jobs in parallel to maximize throughput and minimize total execution time. However, this parallelism introduces a fundamental constraint: environment variables and state defined within one job are ephemeral and isolated to that specific runner session. They do not automatically propagate to subsequent jobs. To resolve this, developers must utilize specific mechanisms such as job outputs, the needs dependency keyword, or third-party persistence actions. Understanding the architecture of job isolation, the syntax for data transfer, and the limitations regarding data size and security is essential for constructing robust automation pipelines.
The Architecture of Job Isolation and Parallelism
GitHub Actions workflows are composed of jobs, each defined by a runs-on value that specifies the runner label or series of labels required to execute the task. By default, jobs within a workflow run in parallel. As soon as a suitable runner with the matching labels is available, the job begins execution. This parallelization is beneficial for reducing overall workflow duration, allowing multiple independent tasks to proceed simultaneously without waiting for one another.
However, this default behavior creates a stateless environment. If Job A generates a dynamic value, such as a build version number or a database connection string, Job B cannot access it directly through standard environment variables because Job B is executing on a separate runner instance, potentially in a different execution context. To enforce a specific order of execution and enable data transfer, developers must explicitly define dependencies using the needs keyword.
The needs value is assigned to a job and contains a list of job IDs that must complete successfully before the dependent job can begin. This establishes a linear execution path for specific segments of the workflow. For example, if job2 requires data generated by job1, job2 must declare needs: [job1]. This ensures that job1 writes its outputs to the workflow state before job2 attempts to read them.
```yaml
jobs:
job1:
runs-on: macOS-latest
run: echo "Hello"
job2:
needs: [job1]
runs-on: macOS-latest
run: echo "World"
```
In this configuration, the execution order is strictly enforced. job2 will not start until job1 has finished, allowing the outputs from job1 to be available for job2.
Defining and Accessing Job Outputs
The primary native mechanism for sharing data between jobs in GitHub Actions is the use of job outputs. Outputs are defined within the job that generates the data and are evaluated at the closing of the job execution. These outputs are always strings and serve as the bridge between the step-level execution context and the job-level context, which can then be consumed by dependent jobs.
To define an output, the outputs field is added to the job definition. This field maps output names to step output references. The source of the output is typically a step that writes data to the GITHUB_OUTPUT file. This file is a special environment variable file that GitHub Actions monitors; appending key-value pairs to it allows step outputs to be captured.
```yaml
jobs:
job1:
runs-on: ubuntu-latest
outputs:
output1: ${{ steps.step1.outputs.result }}
steps:
- id: step1
run: echo "result=hello" >> "$GITHUB_OUTPUT"
job2:
runs-on: ubuntu-latest
needs: job1
steps:
- run: echo "Job 1 output: ${{ needs.job1.outputs.output1 }}"
```
In this example, job1 defines an output named output1 which is mapped to the result output of step1. The step executes a shell command that appends result=hello to the $GITHUB_OUTPUT file. Once job1 completes, job2 can access this value using the needs context, specifically ${{ needs.job1.outputs.output1 }}. This approach provides a structured way to carry dynamic information, such as tokens, IDs, or version numbers, across dependent jobs.
Handling Sensitive Data and Masking
A critical aspect of passing data between jobs is security. Job outputs are visible in workflow logs unless explicitly masked. If sensitive information such as API keys, passwords, or tokens are passed as outputs, they must be protected to prevent accidental leakage. GitHub Actions provides functionality for redacting secrets in logs, but for dynamically generated secrets or outputs that are not stored as repository secrets, developers must use the add-mask command.
The add-mask command instructs the runner to mask the value in all subsequent log output. This is particularly important when passing secrets between jobs, as the value will appear in the logs of the job that generates it and potentially the job that consumes it if not masked.
```yaml
jobs:
job1:
runs-on: ubuntu-latest
outputs:
apikey: ${{ steps.generatekey.outputs.key }}
steps:
- id: generatekey
run: |
key="my-sensitive-api-key"
echo ":add-mask::$key"
echo "key=$key" >> "$GITHUBOUTPUT"
job2:
runs-on: ubuntu-latest
needs: job1
steps:
- run: echo "Using API key: ${{ needs.job1.outputs.api_key }}"
```
In this scenario, the API key is generated in job1. Before writing it to the $GITHUB_OUTPUT file, the echo ":add-mask::$key" command is executed. This ensures that the key is redacted in the logs of job1. When job2 accesses the key via needs.job1.outputs.api_key, the value is available for use in steps, but it will also be masked in the logs of job2 because the masking context persists through the workflow run for that specific value.
Limitations of Job Outputs
While job outputs are effective for passing small amounts of dynamic data, they come with specific limitations that developers must consider. First, all outputs are strictly strings. If complex data structures need to be passed, they must be serialized or encoded, such as using Base64 encoding, before being written to the output.
Second, there are hard limits on the size of data that can be passed via outputs. A single action output cannot exceed 1 MB. Furthermore, the total size of all outputs across the entire workflow run cannot exceed 50 MB. These constraints make job outputs unsuitable for transferring large files, build artifacts, or extensive datasets. For larger data, developers should utilize GitHub Actions artifacts, which are designed to store and retrieve files between jobs.
Third-Party Persistence Solutions
For use cases where native job outputs are insufficient or where a more flexible approach to sharing state is desired, third-party actions can be employed. One such solution is the nick-fields/persist-action-data action, which allows data to be shared between jobs and accessed via environment variables and step outputs. This action was originally maintained under the nick-invision account but was transferred to the personal account nick-fields in February 2022 due to the author leaving InVision. The transfer was handled seamlessly by GitHub, ensuring that existing workflow references continue to function, though they now pull from the new repository location.
The persist-action-data action provides a way to persist data from one job and retrieve it in another. It supports defining data to persist, specifying a variable name for access, and retrieving multiple variables in a subsequent job.
```yaml
In the producing job
- uses: nick-fields/persist-action-data@v1
with:
data: ${{ steps.some-step.output.some-output }}
variable: SOMESTEPOUTPUT
In the consuming job
- uses: nick-fields/persist-action-data@v1
with:
retrievevariables: SOMESTEPOUTPUT, SOMEOTHERSTEPOUTPUT
id: global-data - run: echo ${{ steps.global-data.outputs.SOMESTEPOUTPUT }}
```
This approach can be useful for storing values like release URLs or complex configuration states that need to be shared across multiple jobs without hitting the 1 MB limit per action output, although it still relies on the underlying storage mechanisms of the action itself. Users should be aware that relying on third-party actions introduces an external dependency, and maintenance of those actions should be monitored.
Advanced Considerations: Matrix Builds and Unique Names
When employing matrix strategies in GitHub Actions, where a job runs multiple times with different parameters, special care must be taken with output names. If outputs are defined within a matrix job, they must have unique names to avoid conflicts. GitHub Actions handles matrix outputs by appending the matrix variable values to the output name or requiring distinct identifiers to distinguish between the outputs of different matrix combinations.
If a matrix job generates outputs, the consuming job must reference these outputs correctly, often by iterating over the matrix values or using specific output keys that correspond to the matrix dimension. Failure to use unique names can result in overwritten data or inability to retrieve the correct value for a specific matrix configuration.
Workflow Optimization and Analysis
The ability to pass data between jobs cleanly contributes to modular and maintainable CI/CD pipelines. By decoupling jobs and passing only necessary data via outputs, workflows become more flexible and easier to debug. Tools like CICube can assist in optimizing these workflows by tracking runs, identifying bottlenecks, and providing insights into build times. While not a native GitHub feature, integrating such observability tools can help teams understand the impact of job dependencies and data transfer on overall pipeline efficiency.
Conclusion
Passing variables between jobs in GitHub Actions requires a deliberate approach that respects the platform's parallel execution model. Native job outputs, combined with the needs dependency keyword, provide a robust, built-in mechanism for sharing small amounts of dynamic data such as tokens, IDs, and version strings. Security is maintained through the add-mask command, ensuring sensitive data is redacted from logs. For larger datasets, artifacts remain the preferred solution, while third-party actions like persist-action-data offer alternative flexibility for specific state-management needs. Understanding the 1 MB per output and 50 MB total workflow limits is crucial for designing scalable pipelines. By adhering to these patterns and best practices, developers can create secure, efficient, and modular CI/CD workflows that leverage the full power of GitHub Actions.