Securing Non-Human Identities through GitLab CI/CD SSH Integration

The orchestration of modern software delivery pipelines requires a sophisticated balance between seamless automation and rigorous security. In the context of GitLab CI/CD, the use of Secure Shell (SSH) keys serves as the primary mechanism for establishing trust between the ephemeral build environment—the GitLab Runner—and external entities such as private repositories, remote servers, and network devices. This process is not merely a technical configuration but a critical exercise in Machine Identity Management (MIM). When a pipeline executes, it often operates as a Non-Human Identity (NHI), a machine-driven entity that requires specific permissions to perform tasks like checking out internal submodules, deploying applications to platforms such as Heroku, or executing remote commands via rsync. Because these identities lack the intuitive judgment of a human operator, any over-privilege or mismanagement of their credentials can lead to catastrophic security failures.

The inherent risk associated with NHIs is highlighted by historical industry failures. In 2023, Microsoft experienced a significant exposure where a Shared Access Signature (SAS) token—a form of NHI—granted full access to 38TB of internal data, including documents and personal chats. This breach was attributed to a lack of expiration and improper access scoping for the long-lived token. Similarly, BeyondTrust suffered a 2024 breach resulting from an overprivileged API key, which allowed attackers to escalate privileges across multiple systems. These examples underscore the necessity of treating SSH keys in GitLab not as static passwords, but as dynamic, scoped credentials that must be rotated and audited.

Architecting SSH Access for GitLab CI/CD

Implementing SSH keys within a GitLab pipeline is necessary when the build environment needs to interact with resources outside its immediate scope. GitLab does not provide built-in, automated management for SSH keys within the runner's environment; therefore, the responsibility falls on the platform engineer to inject these credentials securely. This injection is typically achieved by extending the .gitlab-ci.yml configuration, ensuring compatibility across various executors, including Docker and shell-based runners.

The primary use cases for this integration include:

Checking out internal submodules that reside in private repositories.
Downloading private packages through managers such as Bundler.
Deploying application binaries or static sites to proprietary servers or cloud platforms like Heroku.
Executing remote shell commands on a target server from the build environment.
Using rsync to synchronize files from the runner to a remote destination.

To implement this, a new SSH key pair must be generated. The industry standard recommends using the Ed25519 algorithm for its superior security and performance. The command to generate this key is:

ssh-keygen -t ed25519 -C “[email protected]”

In environments where Ed25519 is unsupported, a fallback to RSA with a minimum bit length of 4096 is required:

ssh-keygen -t rsa -b 4096 -C “[email protected]”

This process generates a private key (e.g., id_ed25519), which must be kept strictly confidential, and a public key (e.g., id_ed25519.pub), which is shared with the target system. When managing multiple keys, it is a professional best practice to use custom filenames (such as id_gitlab) to prevent overwriting default keys in the ~/.ssh directory.

Variable Management and Secure Key Injection

The secure transmission of the private key from GitLab's encrypted storage to the runner's memory is a critical security boundary. GitLab provides CI/CD variables to facilitate this, offering two primary methods: file-type variables and regular variables.

File-Type CI/CD Variables

Adding an SSH key as a file-type variable is the recommended approach for keys containing whitespace characters. When configuring this in the GitLab UI, the "Visibility" must be set to "Visible". This is because "Masked" or "Masked and hidden" settings are incompatible with the whitespace characters present in SSH private keys.

The configuration process involves:

Setting the Key name to SSH_PRIVATE_KEY.
Pasting the private key content into the Value box.
Ensuring the value ends with a newline (LF character) by pressing Enter at the end of the last line before saving.

A critical security warning applies here: because the variable is not masked, commands such as cat or tee must never be executed on the variable, as this would print the private key directly into the job logs, exposing it to anyone with read access to the pipeline.

Regular CI/CD Variables

If file-type variables are not utilized, regular variables can be used, though they require different handling during the job execution to ensure the key is correctly formatted and loaded into the SSH agent.

Pipeline Implementation and Execution Flow

The actual utilization of the SSH key within a job requires the initialization of an SSH agent to hold the key in memory and the configuration of known hosts to prevent Man-in-the-Middle (MitM) attacks.

The Role of the SSH Agent

To use the injected key, the ssh-agent must be started within the job. This allows the private key to be loaded and used for subsequent authentication attempts without needing to write the key to a physical file on the runner's disk. The typical sequence in a .gitlab-ci.yml script involves:

eval $(ssh-agent -s)

Following the start of the agent, the key is added using:

bash -c 'ssh-add <(echo "${SSH_PRIVATE_KEY}")'

The ssh-add - command is specifically designed to prevent the value of the $SSH_PRIVATE_KEY variable from appearing in the job logs, although this protection is bypassed if debug logging is enabled.

Managing Known Hosts and Host Keys

For a pipeline to connect to a remote server, the server's public host key must be trusted. This is handled via the KNOWN_HOSTS variable. In a network-aware CI/CD pipeline, such as one utilizing Ansible, the host key is retrieved using the ssh-keyscan utility:

ssh-keyscan 172.20.20.2

The output of this command is then stored in a GitLab CI/CD variable named KNOWN_HOSTS. During the job execution, this value is written to the ~/.ssh/known_hosts file to ensure the identity of the remote server is verified before any data is transmitted.

Advanced Integration: Ansible and Network-Aware Pipelines

In complex infrastructure deployments, GitLab CI/CD is often paired with Ansible to manage network devices and servers. This requires a specialized setup where the Ansible container, spawned by a Linux runner using a Docker executor, utilizes the previously defined SSH variables.

The pipeline architecture for such an environment involves:

Pre-check jobs: Validating the state of the target system.
Post-check jobs: Verifying the success of the deployment.
Deploy jobs: Executing the Ansible playbooks.
Rollback jobs: Reverting changes in case of failure.

For authenticating the Git repository itself within these scripts, an access token may be used to set the remote URL:

git remote set-url origin http://root:$ACCESS_TOKEN@gitlab/root/containerlab-project.git

Practical Deployment Example: Static Site via Rsync

A common real-world application of these principles is the deployment of a static site (e.g., Hugo) using a lightweight Alpine Linux image. The use of Alpine is preferred due to its minimal footprint, which reduces build times and attack surfaces.

The configuration for a deployment stage typically includes:

Image: alpine:latest
Tags: private (to ensure the job runs on a specific, secured runner)
Before Script: apk update && apk add openssh-client bash rsync

The deployment script follows this logical flow:

eval $(ssh-agent -s)
bash -c 'ssh-add <(echo "${SSH_PRIVATE_KEY}")'
mkdir -p ~/.ssh
echo "${SSH_HOST_KEY}" > ~/.ssh/known_hosts
rsync -hrvz --delete --exclude=_ -e 'ssh -p 2468' public_html/ "${SSH_USER_HOST_LOCATION}"

In this scenario, the use of a private runner tag is a critical security measure. By limiting the job to a specific Docker instance of a gitlab-runner (e.g., in a home lab), the risk of the SSH private key being leaked or intercepted by other jobs on shared infrastructure is significantly reduced.

SSH Key Governance and Lifecycle Management

Maintaining "SSH key hygiene" is essential to prevent credential abuse and unauthorized code modifications. Because SSH keys are long-lived, they become high-value targets if not managed through a rigorous lifecycle.

Deploy Keys and Scoping

GitLab provides "Deploy Keys," which are tied to individual projects rather than user accounts. This allows for a more granular approach to permissions:

Read-only permissions: Suitable for pulling code for builds.
Read/write permissions: Necessary for automation that pushes tags or updates versions.

Caution must be exercised when granting write access to automation tasks, as a compromised pipeline could lead to unauthorized code injection.

Monitoring and Auditing

Continuous auditing is required to ensure that only authorized keys have access to the infrastructure. This involves:

Regular review of the SSH key list via User > Preferences > SSH Keys.
Utilizing GitLab's API endpoints to automate the inventory of keys.
Setting up alerts for unfamiliar or outdated entries.
Implementing a strict rotation policy where keys are replaced at regular intervals.

For organizations requiring higher levels of security, platforms like Apono can be integrated to provide "just-enough," time-bound permissions. This moves the security model from static, persistent keys to dynamic access, reducing the window of opportunity for an attacker.

Commit Authenticity

While SSH keys provide secure transport and authentication, they do not verify the identity of the person making a change to the code. To ensure commit authenticity, GPG signing should be enabled alongside SSH-based repository access. This creates a cryptographic link between the commit and the verified identity of the developer.

Comparative Analysis of SSH Key Configuration Methods

Feature	File-Type Variable	Regular Variable	Deploy Key
Primary Use	Private Keys with whitespace	Simple strings/tokens	Project-level access
Visibility	Must be "Visible"	Can be "Masked"	Publicly associated with project
Storage	Stored as file on runner	Stored as environment variable	Stored in GitLab database
Security	Risk of log exposure via `cat`	Risk of log exposure via `echo`	Scoped to specific project
Best For	SSH Private Keys	API Tokens / Passwords	Bot/Machine account access

Conclusion: The Imperative of Machine Identity Management

The integration of SSH keys within GitLab CI/CD pipelines is a powerful enablement for automation, but it introduces a significant security vector. The transition from human-managed credentials to Non-Human Identities (NHIs) requires a shift in perspective: the private key is no longer just a tool for a developer, but a critical piece of infrastructure identity.

The failures seen in major organizations like Microsoft and BeyondTrust prove that "long-lived" and "overprivileged" are the two most dangerous characteristics of a machine identity. To mitigate these risks, engineers must move toward a model of least privilege. This means using project-specific deploy keys instead of user-level keys, utilizing ephemeral runners to isolate sensitive operations, and implementing automated auditing via APIs to ensure no "ghost keys" remain in the system.

Furthermore, the technical implementation—starting the ssh-agent, managing known_hosts, and utilizing secure injection methods—must be viewed as a layered defense. The use of lightweight images like Alpine and private runners adds a layer of physical and logical isolation. Ultimately, the goal is to ensure that the pipeline can perform its function with the minimum amount of access required for the shortest amount of time possible.