GitLab CI/CD Execution Logic and the Mechanics of when: on_success

The orchestration of automated software delivery via GitLab CI/CD relies heavily on the precise manipulation of job execution policies. At the core of this orchestration lies the when keyword, a directive that determines the conditions under which a specific job enters the execution queue relative to the status of preceding stages and jobs. Among the various directives available to DevOps engineers, on_success serves as the fundamental baseline for pipeline stability and predictable deployment flows. Understanding on_success requires more than a superficial grasp of its definition; it necessitates a deep comprehension of how GitLab evaluates job dependencies, manages stage transitions, and handles the complex interplay between success, failure, and manual intervention. When a job is configured with when: on_success, it establishes a strict dependency requirement: the job will only be added to the pipeline execution if every single job in the preceding stages has completed with a successful exit code. This creates a sequential, gated mechanism that ensures high-quality code is validated before progressing to more sensitive stages like deployment or production releases.

The Fundamental Definition and Default Behavior of on_success

In the hierarchy of GitLab CI/CD execution policies, on_success represents the standard operating procedure for most automated workflows. It is the implicit default behavior for jobs within a pipeline unless a different when clause is explicitly defined through the rules keyword or the legacy only/except syntax.

The technical specification of on_success dictates that a job will execute only when all jobs from prior stages have succeeded. The impact of this on a developer's workflow is significant; it acts as a safety barrier that prevents broken builds from reaching downstream environments. For example, if a build stage contains three separate jobs and one of them fails, any subsequent stage containing jobs configured with on_success will be skipped entirely. This prevents the "cascading failure" scenario where a deployment job attempts to package a non-existent binary produced by a failed build job.

The relationship between on_success and the pipeline structure is defined by the following attributes:

  • Execution Requirement: Every job in the stages preceding the current job's stage must report a successful status.
  • Default State: If no when keyword is specified, GitLab applies on_success automatically.
  • Sequential Dependency: It enforces a logical progression from validation (tests) to packaging (build) to delivery (deploy).
  • Failure Impact: A single failure in an upstream stage halts all downstream on_success jobs, preserving the integrity of the environment.
Policy Value Execution Condition Default Status
on_success All jobs in prior stages must pass Yes (Implicit)
on_failure At least one job in prior stages must fail No
always Regardless of prior job status No
manual Requires human intervention to trigger No

Advanced Implementation via the Rules Keyword

Modern GitLab CI/CD configurations move away from simple job definitions toward complex logic engines powered by the rules keyword. Within this framework, on_success is often used as a fallback or a specific conditional outcome within a list of rules. The rules keyword allows engineers to combine when policies with if statements that evaluate environment variables, such as $CI_PIPELINE_SOURCE or $CI_COMMIT_BRANCH.

When using rules, the order of the list is critical. GitLab evaluates rules from top to bottom and stops at the first rule that matches. This means that on_success can be used as a "catch-all" at the end of a rule list to ensure that if none of the specific exclusion criteria are met, the job remains a standard part of the pipeline.

A common implementation pattern involves excluding specific pipeline types to prevent duplicate or unnecessary executions. Consider a scenario where a job should not run during merge request events or scheduled tasks, but should run normally for all other events:

yaml job: script: echo "Hello, Rules!" rules: - if: $CI_PIPELINE_SOURCE == "merge_request_event" when: never - if: $CI_PIPELINE_SOURCE == "schedule" when: never - when: on_success

In this configuration, the impact is twofold. First, it prevents the resource waste associated with running jobs in contexts where they are not needed. Second, it utilizes on_success as the final instruction, ensuring that for any other pipeline source (such as a standard push), the job is added to the pipeline with the default successful-execution requirement. However, it is important to note that using when: on_success without properly configured workflow: rules can lead to pipeline warnings in the GitLab UI, specifically regarding the potential for duplicate pipelines if push and merge request pipelines are not carefully synchronized.

Interaction with Failure States and Manual Actions

One of the most complex areas of GitLab CI/CD is the interaction between on_success and jobs configured with on_failure or manual. A common point of confusion for engineers is how a job configured with on_success reacts to a job that was triggered via an on_failure policy.

If a pipeline contains a job (Job B) set to when: on_failure, and that job executes and completes successfully (for example, it successfully runs a cleanup script after a failed test), the next job (Job C) set to when: on_success may behave unexpectedly depending on the overall pipeline status. The documentation and user experiences indicate that if an on_failure job is executed, the pipeline's overall status might still be considered "failed" if the original error that triggered the on_failure job was not resolved. This leads to scenarios where an engineer expects Job C to run because Job B was a "success," but Job C is skipped because the stage preceding it is still marked as a failure.

The limitations of manual actions also intersect here. Currently, if a job is set to when: manual, it cannot simultaneously be configured with other when policies like always. Because manual is a specific type of when value, it inherently falls back to the on_success logic. This means a manual job is typically only visible and available for execution if all previous stages have succeeded. This constraint limits the ability to have a "manual failure recovery" job that is available even if the build fails.

To address the need for more flexibility, there have been proposals to decouple the manual trigger from the success requirement, potentially allowing for a configuration such as:

yaml deploy: script: cap production deploy manual: blocking when: always

Such a change would allow a manual deployment job to be available regardless of whether the testing stage passed or failed, providing a more robust mechanism for emergency hotfixes or manual overrides.

Artifact Management and Runner Dynamics

A critical technical detail often overlooked is the relationship between on_success and the movement of artifacts between jobs. In GitLab CI/CD, jobs within a single stage run in parallel on different runners. There is no inherent sequencing between jobs in the same stage. Therefore, the when: on_success directive only governs the relationship between stages, not between individual jobs within the same stage.

When a job succeeds, its artifacts are zipped and uploaded to the GitLab server. Subsequent jobs in the next stage can then download these artifacts. If a job in a stage fails, the on_success jobs in the following stage will not execute, which naturally prevents them from attempting to download artifacts that were never successfully created or uploaded.

The utility of on_failure in this context is often related to runner maintenance. For example, a job configured with when: on_failure might be used to clean up the local disk space on a shell runner after a build fails. This ensures that a failed job does not leave behind massive, corrupted files that could impact the disk capacity for subsequent, unrelated pipeline runs on that same runner.

Concept Interaction with on_success Real-world Consequence
Artifacts Required for downstream jobs Prevents downloading corrupted/missing files
Parallelism No effect on jobs within the same stage Jobs in the same stage all attempt to run simultaneously
Shell Runners Cleanup jobs use on_failure Prevents disk exhaustion on local runners
Manual Jobs Inherits on_success behavior Manual buttons only appear after success

Troubleshooting Common Logic Errors

Engineers frequently encounter issues when mixing legacy syntax with modern rule-based logic. A primary rule of thumb for maintaining a stable pipeline is to never mix only/except configurations with rules configurations in the same pipeline. While this might not trigger a YAML syntax error, it creates a nightmare of troubleshooting due to the differing default behaviors of the two systems.

Common troubleshooting scenarios include:

  • Duplicate Pipelines: This occurs when a job is defined with rules that match both a branch push and a merge request event. This results in two separate pipelines starting for a single action. The solution is to implement workflow: rules to explicitly define which pipeline type takes precedence.
  • The "Skipped" Deployment: An engineer may set a job to when: on_success, but find it never runs. This is often because an upstream job was not actually "successful" in the eyes of the GitLab orchestrator, perhaps due to an allow_failure: true setting that was not correctly applied, or a job that exited with a non-zero code but was masked by a subsequent command.
  • Variable-based Toggles: When an engineer wants to bypass the standard on_success requirement for a manual deployment (e.g., wanting to auto-deploy only sometimes), the current workaround is to use a custom environment variable.

To implement an auto-deploy toggle, one would use a variable like AUTO_DEPLOY:

yaml deploy_job: script: ./deploy.sh rules: - if: $AUTO_DEPLOY == "true" when: on_success - if: $CI_PIPELINE_SOURCE == "merge_request_event" when: manual

This requires the user to manually set the variable in the "Run pipeline" interface, which is a manual process but provides the desired conditional logic that the standard when: on_success does not allow natively.

Analytical Conclusion on Execution Orchestration

The when: on_success directive is the cornerstone of disciplined CI/CD pipelines, providing the necessary rigor to ensure that only validated code proceeds through the software development lifecycle. However, its simplicity masks a complex layer of dependency management that requires a deep understanding of how GitLab evaluates stage transitions, rule sets, and job statuses.

The inherent limitations of the current implementation—specifically the inability to easily decouple manual actions from the success of previous stages and the potential for confusion when using on_failure jobs—highlight the ongoing evolution of the GitLab CI/CD engine. For the expert engineer, mastering on_success involves more than just knowing it is the default; it requires the ability to architect complex rules that prevent duplicate pipelines, manage artifact integrity through stage-based gating, and implement workarounds for conditional automation using environment variables. As pipelines become increasingly complex, the ability to precisely control the "when" of a job becomes the difference between a resilient, self-healing delivery system and a fragile sequence of unpredictable executions.

Sources

  1. GitLab Issue 17759: Make it possible to define when policy for manual actions
  2. Microfluidics GitLab Help: Job Rules
  3. GitLab Forum: Using onsuccess and onfailure or conditions
  4. GitLab Forum: When onfailure and onsuccess
  5. GitLab Forum: Deploy on success

Related Posts