Architecting Elastic CI/CD Pipelines with GitLab Runner on Kubernetes

The transition from static virtual machine-based build agents to a container-orchestrated environment represents a fundamental shift in the delivery lifecycle. Running GitLab Runners on Kubernetes transforms the CI/CD pipeline from a rigid, often underutilized resource into an elastic, auto-scaling powerhouse. In traditional deployments, runners operate on dedicated servers that either sit idle during low demand or become critical bottlenecks during peak development cycles. By leveraging Kubernetes, organizations can instantiate runner pods on demand, executing a specific job and immediately releasing those resources back to the cluster upon completion. This architectural shift ensures that every single job begins within a fresh, identical environment, eliminating the "snowflake server" problem where incremental state changes on a long-lived VM lead to non-deterministic build failures.

The integration of GitLab Runner with Kubernetes provides a level of resource isolation and multi-tenancy that is unattainable with shell or docker executors on standalone hosts. Each job is encapsulated within its own pod, allowing administrators to define strict CPU and memory limits. This prevents a single runaway build process from consuming all available resources on a host and impacting other concurrent pipelines. Furthermore, the ability to scale dynamically means that the compute cost is directly proportional to the actual workload, drastically increasing cost efficiency by eliminating the need to over-provision hardware for peak loads.

Deployment Methodologies: Operator versus Helm Chart

When deploying GitLab Runners into a Kubernetes cluster, architects must choose between the GitLab Runner Operator and the Helm Chart. These two paths offer different trade-offs regarding control, complexity, and lifecycle management.

The GitLab Runner Operator is designed for simplicity and automation. It utilizes Custom Resource Definitions (CRDs) to manage the runner lifecycle, meaning the operator handles the underlying pods and configurations based on a high-level specification. This is ideal for environments where the goal is to minimize manual overhead and let GitLab manage the runner lifecycle automatically.

Conversely, the Helm Chart is the gold standard for production environments. It provides extensive customization options that the operator may abstract away, allowing for fine-grained control over the deployment specifications. While it requires a moderate level of installation complexity compared to the operator, the resulting flexibility is essential for enterprise-grade security and performance tuning.

The following table delineates the core differences between these two deployment strategies:

Feature	Operator	Helm Chart
Installation complexity	Lower	Moderate
Customization	Limited	Extensive
GitLab version coupling	Tighter	Looser
CRD management	Automatic	Manual
Recommended for	Simple setups	Production environments

Implementing the GitLab Runner Operator

To deploy using the operator, the administrative process begins with adding the official GitLab Helm repository and updating the local cache to ensure the latest version of the operator is retrieved.

bash helm repo add gitlab https://charts.gitlab.io helm repo update

Once the repository is configured, the operator is installed into a dedicated namespace:

bash helm install gitlab-runner-operator gitlab/gitlab-runner-operator \ --namespace gitlab-runner \ --create-namespace

After the operator is active, a Runner resource must be created via a YAML manifest. This Custom Resource tells the operator exactly how to manage the runner, specifying the GitLab instance URL and the secret containing the registration token.

yaml apiVersion: apps.gitlab.com/v1beta2 kind: Runner metadata: name: gitlab-runner namespace: gitlab-runner spec: gitlabUrl: https://gitlab.com token: gitlab-runner-secret config: | [[runners]] [runners.kubernetes] namespace = "gitlab-runner" image = "alpine:latest"

To support this configuration, a Kubernetes secret must be created to hold the runner registration token obtained from the GitLab UI (Settings > CI/CD > Runners):

bash kubectl create secret generic gitlab-runner-secret \ --namespace gitlab-runner \ --from-literal=runner-registration-token="YOUR_REGISTRATION_TOKEN"

Finally, the configuration is applied to the cluster:

bash kubectl apply -f gitlab-runner-cr.yaml

Advanced Configuration and Resource Management

The config.toml (or the config block in YAML) is the heart of the Kubernetes executor. It defines how the runner interacts with the cluster and how the resulting pods are constrained.

Resource Allocation and Limits

Proper resource management is critical to prevent cluster instability. The configuration allows for specific limits and requests for different components of the job lifecycle.

CPU and Memory Limits: These define the maximum amount of resources a job pod can consume. If a process exceeds the limit, Kubernetes may terminate the pod (OOMKilled).
CPU and Memory Requests: These are the minimum resources guaranteed to the pod, used by the Kubernetes scheduler to place the pod on a node with sufficient capacity.
Service and Helper Containers: Specialized limits are applied to sidecar service containers and the helper container (which handles Git cloning and artifact uploading), ensuring they do not starve the primary build container of resources.

The following configuration snippet demonstrates a robust resource definition:

```toml
[[runners]]
name = "kubernetes-runner"
executor = "kubernetes"

[runners.kubernetes]
cpulimit = "2"
cpurequest = "500m"
memorylimit = "4Gi"
memoryrequest = "1Gi"
servicecpulimit = "1"
servicememorylimit = "1Gi"
helpercpulimit = "500m"
helpermemorylimit = "256Mi"

[runners.kubernetes.resources]
limits = { cpu = "500m", memory = "256Mi" }
requests = { cpu = "100m", memory = "128Mi" }
```

Scheduling and Affinity

To optimize performance and avoid resource contention, pod affinity and anti-affinity rules can be applied. This ensures that runner pods are distributed across the cluster rather than bunching up on a single node.

toml [runners.kubernetes] affinity = "podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: [{ weight: 100, podAffinityTerm: { labelSelector: { matchLabels: { app: gitlab-runner } }, topologyKey: 'kubernetes.io/hostname' } }]"

This specific configuration tells the Kubernetes scheduler to prefer placing new runner pods on nodes that do not already host a gitlab-runner pod, maximizing the use of available cluster hardware.

Security Architecture and Contexts

Security in a Kubernetes-based CI/CD environment requires a multi-layered approach, focusing on identity, access control, and network isolation.

User and Group Validation

The Kubernetes executor provides a mechanism to restrict which users and groups are allowed to execute within the pods. This is managed via the allowed_users and allowed_groups lists.

toml [runners.kubernetes] allowed_users = ["1000", "1001", "65534"] allowed_groups = ["1001", "65534"]

Overriding Security Contexts

One of the most powerful features for administrators is the ability to override security contexts at multiple levels. This bypasses the standard allowlist validation, granting unrestricted override control.

Pod Security Context: Sets the default identity for all containers in the pod.
Build Container Security Context: Overrides the pod defaults specifically for the main build container.
Helper Container Security Context: Overrides the pod defaults for the helper container.
Service Container Security Context: Overrides the pod defaults for any sidecar services.

Example configuration for hierarchical security overrides:

```toml
[runners.kubernetes.podsecuritycontext]
runasuser = 1500
runasgroup = 1500

[runners.kubernetes.buildcontainersecuritycontext]
runasuser = 2000
runas_group = 2001

[runners.kubernetes.helpercontainersecuritycontext]
runasuser = 3000
runas_group = 3001

[runners.kubernetes.servicecontainersecuritycontext]
runasuser = 4000
runas_group = 4001
```

In this scenario, the pod defaults are 1500:1500, but the build container will run as 2000:2001. Even though these IDs are not in the allowed_users list, the security context overrides are permitted, allowing administrators to enforce specific UID/GID requirements for specialized build tools.

Per-Job User Specification

Beyond the global runner configuration, individual jobs can specify their own execution user within the .gitlab-ci.yml file. This allows different stages of a pipeline to run with different privileges.

yaml job: image: name: alpine:latest kubernetes: user: "1000" script: - whoami - id

Network Isolation via NetworkPolicy

To prevent CI jobs from accessing sensitive internal services or attacking other pods in the cluster, a NetworkPolicy should be implemented. This restricts the ingress and egress traffic of the job pods.

yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: gitlab-runner-jobs namespace: gitlab-runner spec: podSelector: matchLabels: app: gitlab-runner-job policyTypes: - Ingress - Egress ingress: [] egress: - to: - namespaceSelector: {} ports: - protocol: UDP port: 53 - protocol: TCP port: 53 - to: - ipBlock: cidr: 0.0.0.0/0 ports: - protocol: TCP port: 443 - to: - ipBlock: cidr: 0.0.0.0/0 ports: - protocol: TCP port: 5000

This policy ensures that:
- No inbound traffic is allowed to the job pods.
- DNS resolution is permitted via ports 53 (UDP/TCP).
- Access to the GitLab API is allowed over port 443.
- Access to the container registry is allowed over port 5000.

Secrets Management and Volume Mounts

Handling sensitive data in Kubernetes requires moving away from plain-text environment variables toward volume-based secret injection.

Secret Volume Mounts

The Kubernetes executor allows mounting secrets as volumes, which is more secure than environment variables because secrets are not exposed in the pod description or logs.

toml [[runners.kubernetes.volumes.secret]] name = "ci-secrets" mount_path = "/secrets" read_only = true secret_name = "gitlab-ci-secrets"

Projected Volumes for Complex Configurations

For scenarios requiring multiple secrets or configuration files (such as Docker and NPM configs), projected volumes are used. This allows multiple secrets to be mapped into a single directory.

```toml
[[runners.kubernetes.volumes.projected]]
name = "credentials"
mount_path = "/credentials"

[[runners.kubernetes.volumes.projected.sources.secret]]
name = "docker-config"
items = [
{ key = "config.json", path = "docker/config.json" }
]

[[runners.kubernetes.volumes.projected.sources.secret]]
name = "npm-config"
items = [
{ key = ".npmrc", path = "npm/.npmrc" }
]
```

Runner Tagging and Job Routing

Tags are the primary mechanism for routing jobs to specific runners. In a Kubernetes environment, tagging becomes complex when using automated deployments like FluxCD or Helm, where pods are frequently recreated.

The Tagging Paradox in Helm

In older versions of the GitLab Runner Helm chart, the tags: field was used for registration. However, this has been deprecated. When using a runner-token, specifying runners.tags in the Helm values can cause the runner to fail to start in GitLab Runner 18.0 and later.

The following fields are strictly deprecated and will lead to failures if specified alongside a runner authentication token:
- runnerRegistrationToken
- locked
- tags
- maximumTimeout
- runUntagged
- protected

To maintain tags in a GitOps workflow (e.g., using FluxCD), tags should be managed through the GitLab UI or via the API to ensure they persist across pod recreations, rather than relying on deprecated Helm values.

Routing Strategies and Migration

During the migration from legacy Docker runners to Kubernetes runners, organizations often use a dual-tagging strategy to ensure stability. By maintaining both docker and kubernetes tags, teams can gradually shift workloads.

Docker Tag: Jobs with the docker tag are routed to legacy runners.
Kubernetes Tag: Jobs with the k8s-default tag are routed to the new infrastructure.
Untagged Jobs: By setting untagged: true in the runner configuration, the Kubernetes runner can pick up any job that does not specify a tag, facilitating a forced migration to the new environment.

Troubleshooting and Operational Pitfalls

Migrating to Kubernetes runners involves several "tricky" problems that can disrupt the pipeline if not addressed early.

The Ping Problem

A common failure point in new Kubernetes deployments is the "Ping" mechanism. If the runner's ability to ping the GitLab instance is disabled by default or blocked by network policies, jobs will fail to start or be marked as offline. Ensuring the runner has a clear network path to the GitLab API is mandatory.

Pull Policies and Image Management

The pull_policy determines how the runner fetches images. For Kubernetes, the if-not-present policy is generally preferred to optimize speed by using cached images on the node.

toml [runners.kubernetes] pull_policy = ["if-not-present"]

Privileged Mode and Docker-in-Docker (DinD)

Some CI jobs require building Docker images, which necessitates "Docker-in-Docker." This requires setting privileged = true in the runner configuration.

toml [runners.kubernetes] privileged = true

Warning: Enabling privileged mode is a significant security risk, as it grants the container nearly the same access to the host as the root user. It should only be enabled if absolutely necessary and combined with strict NetworkPolicy constraints.

Conclusion

The implementation of GitLab Runners on Kubernetes is not merely a change in infrastructure but a strategic move toward a more scalable and secure software delivery model. By transitioning from static agents to a pod-based execution model, organizations achieve true elasticity, where compute resources are consumed only during the active phase of a build. The ability to leverage Kubernetes-native features—such as NetworkPolicy for isolation, PodAntiAffinity for performance, and Projected Volumes for secret management—transforms the CI/CD pipeline into a hardened, professional-grade environment. While the transition involves navigating deprecated Helm configurations and managing complex security contexts, the resulting architecture eliminates the operational overhead of VM maintenance and provides a deterministic, reproducible environment for every commit. The synergy between GitLab's orchestration and Kubernetes' agility allows for a seamless flow from code to production, provided that resource limits and security boundaries are rigorously enforced.