GitLab CI PostgreSQL Integration Architecture

The integration of PostgreSQL within GitLab CI represents a critical component for modern software development lifecycles, particularly for applications that rely on relational database persistence for integration testing and verification. The ability to spin up a transient, isolated database instance for every pipeline execution ensures that tests are reproducible, independent, and do not suffer from state pollution across different build jobs. This architecture is supported across various GitLab tiers, including Free, Premium, and Ultimate, and is available regardless of whether the instance is hosted on GitLab.com, GitLab Self-Managed, or GitLab Dedicated. The primary objective is to provide a controlled environment where the application can interact with a real database engine, rather than relying solely on mocked data, thereby increasing the reliability of the test suite.

Docker Executor Configuration and Service Implementation

When employing the Docker executor, GitLab Runner utilizes the concept of services to provide sidecar containers that run alongside the main job container. PostgreSQL is implemented as a service, allowing the primary job container to communicate with the database instance over a shared network.

To initialize this environment, the .gitlab-ci.yml configuration file must explicitly define the service. A basic implementation involves adding the following structure:

yaml default: services: - postgres

The use of the postgres image allows for version-specific deployments. For instance, if a project requires PostgreSQL 16.10, the service definition is modified to postgres:16.10. Similarly, for those utilizing an Alpine-based image for reduced footprint, postgres:12.2-alpine can be specified. The ability to choose specific versions ensures that the CI environment mirrors the production environment exactly, preventing "it works on my machine" discrepancies.

Environment Variable Management and Propagation

A critical technical nuance in GitLab CI is that variables configured within the GitLab UI (Project or Group level variables) are not automatically passed down to service containers. This architectural limitation means that if a user defines POSTGRES_DB in the UI, the PostgreSQL service container will not perceive this value unless it is explicitly mapped in the .gitlab-ci.yml file.

To bridge this gap, variables must be defined within the YAML configuration to act as a conduit. The following configuration is required to ensure the service container receives the necessary credentials:

yaml variables: POSTGRES_DB: $POSTGRES_DB POSTGRES_USER: $POSTGRES_USER POSTGRES_PASSWORD: $POSTGRES_PASSWORD POSTGRES_HOST_AUTH_METHOD: trust

Failure to configure these variables correctly can lead to significant errors. If CI variables are not properly mapped, PostgreSQL may treat the variable name as a literal string. For example, if the configuration results in POSTGRES_USER: $USER, PostgreSQL interprets the username as the literal string $USER. In PostgreSQL 15.4 and later versions, this becomes particularly problematic as the engine does not substitute schemas or owner names into extension scripts if they contain quote ("), backslash (\), or dollar sign ($) symbols. This results in a "Fatal: invalid character in extension" error, which halts the pipeline.

Alternatively, developers may choose to hardcode these values directly as strings within the .gitlab-ci.yml file for simplicity, although this is less secure than using UI-managed variables:

yaml variables: POSTGRES_DB: DB_name POSTGRES_USER: username POSTGRES_PASSWORD: password POSTGRES_HOST_AUTH_METHOD: trust

Application Connectivity and Networking

Once the PostgreSQL service is defined and the variables are passed, the application residing in the primary job container must be configured to connect to the database. In the GitLab CI service architecture, the hostname for the database is simply postgres.

The connection parameters used by the application should be mapped as follows:

Parameter Value
Host postgres
User $POSTGRES_USER
Password $POSTGRES_PASSWORD
Database $POSTGRES_DB

The use of postgres as the host is a result of how GitLab links services to the job. This networking allows the main container and the service container to communicate seamlessly. If the application is configured to look for localhost, the connection will fail because the database is running in a separate container, not within the primary job container itself.

Database Initialization and SQL Pre-filling

A common requirement in integration testing is the need to pre-fill the database with seed data, look-up tables, or specific state configurations before the tests execute. Since the postgres:latest service starts as a blank instance, developers must implement a method to inject SQL data.

For jobs using Ubuntu-based Docker images, the process involves installing the postgresql-client and executing the SQL file via the psql command. The following configuration demonstrates the implementation of this workflow:

```yaml
image: ubuntu:18.04
stages:
- test

test-task:
stage: test
services:
- postgres:latest
variables:
POSTGRESDB: postgres
POSTGRES
USER: postgres
POSTGRES_PASSWORD: "yourPassword"
script:
- apt update
- apt install -y postgresql-client
- env PGPASSWORD=yourPassword psql -h postgres -U postgres -w postgres < path/to/your/initial.sql
```

In this workflow, the env PGPASSWORD command is used to pass the password to the psql utility without requiring interactive input. It is imperative that the variables defined in the variables block match the values used in the psql call. For environments using Debian or Alpine-based images, apk add may be used as the equivalent to apt install to obtain the necessary postgresql-client binaries.

Shell Executor Implementation

While Docker is the most common method, PostgreSQL can also be integrated into GitLab Runner environments using the Shell executor. This requires the PostgreSQL server to be manually installed and configured on the host machine where the runner is operating.

The installation process on a Linux-based runner is as follows:

bash sudo apt-get install -y postgresql postgresql-client libpq-dev

After installation, a specific database user must be created to be used by the application during the CI process. This is done by signing into the PostgreSQL instance using the default postgres administrative user:

bash sudo -u postgres psql -d template1

Once inside the PostgreSQL prompt, the user (for example, a user named runner) is created. This user then serves as the identity used by the application to authenticate and perform database operations during the pipeline execution.

Advanced Orchestration with Docker-in-Docker (DinD)

In complex scenarios involving multiple interdependent services, such as a REST API that interacts with both a database and other utility containers (e.g., MinIO or Tusd), standard GitLab services may be insufficient. In these cases, Docker-in-Docker (DinD) is utilized to create custom networks.

A common architectural pattern involves creating a bridge network to allow multiple containers to communicate. For example, a pipeline may define a docker:dind service and then manually orchestrate the database and API containers:

yaml build: stage: build variables: DOCKER_DRIVER: overlay2 DOCKER_HOST: tcp://docker:2375 SHARED_PATH: ${CI_PROJECT_DIR}/fileserver POSTGRES_DB: ${PG_DB} POSTGRES_PASSWORD: ${PG_PASSWORD} POSTGRES_USER: ${PG_USER} services: - docker:dind script: - docker -H $DOCKER_HOST network create -d bridge localnet - docker -H $DOCKER_HOST run -d --env POSTGRES_DB=${POSTGRES_DB} --env POSTGRES_USER=${POSTGRES_USER} --env POSTGRES_PASSWORD=${POSTGRES_PASSWORD} --name postgres registry.gitlab.com/plantoeducate/api-db:latest - docker network connect localnet postgres - docker -H $DOCKER_HOST build -t restapi:latest --build-arg ARG_CI_JOB_TOKEN=${CI_JOB_TOKEN} --build-arg ARG_CI_REGISTRY=${CI_REGISTRY} --build-arg PG_USER=${PG_USER} --build-arg PG_PASSWORD=${PG_PASSWORD} --build-arg PG_DB=${PG_DB} --build-arg TOKEN=${TOKEN} . - docker -H $DOCKER_HOST run -v /var/run/docker.sock:/var/run/docker.sock --network localnet --privileged=true restapi:latest

In this high-complexity model, the database is not a GitLab service but a container managed by the job script. This allows the database to be pulled from a custom project registry, ensuring that the initial state (including look-up table scripts) is baked into the image.

Network Isolation and Compose Integration

When using docker-compose within a GitLab CI pipeline, networking becomes the primary point of failure. A common issue occurs when a postgres service is defined within a docker-compose.yml file; the containers within that compose instance can communicate with each other using their service names because Docker creates a default network for the compose project.

However, if a pytest instance is running as a separate container (outside the compose network), it will be unable to resolve the postgres hostname. This creates a timeout error when the pytest container attempts to reach the database.

To resolve this, two primary strategies are identified:

  • Use the host network: This allows the pytest container to access services on the host, though it may have higher impact on configuration.
  • Integrate the database into the compose file: By including the postgres service directly in the docker-compose.yml, all other services (such as a listener or minio container) can communicate with the database via the internal network. This is the preferred option for ensuring a unique, isolated instance of PostgreSQL for each pipeline run.

Summary Analysis of PostgreSQL Deployment Strategies

The selection of a PostgreSQL integration strategy in GitLab CI depends heavily on the required level of isolation and the complexity of the application architecture. For standard unit and integration tests, the Docker executor with defined services is the most efficient approach, provided that the variable propagation gap is managed via the .gitlab-ci.yml mapping. This method provides a clean, ephemeral environment with minimal configuration overhead.

For projects requiring pre-populated data, the use of postgresql-client within the script phase allows for dynamic initialization, bridging the gap between a blank container and a production-like state. In contrast, for highly complex systems involving multiple containers, transitioning to a DinD architecture with custom bridge networks or docker-compose is necessary. This allows for the use of custom-built database images that contain pre-initialized schemas, reducing the time spent on setup during the pipeline execution.

Ultimately, the failure points in these integrations typically center on networking (hostname resolution) and environment variable passing. By ensuring that POSTGRES_HOST_AUTH_METHOD: trust is used for internal CI communication and that variables are explicitly declared in the YAML, developers can create a robust, automated testing environment that leverages the full power of PostgreSQL within the GitLab CI ecosystem.

Sources

  1. GitLab Documentation - PostgreSQL Service
  2. The Code Campus - Fill GitLab-CI Postgres Database Service with SQL
  3. Diffblue GitLab CI PostgreSQL Guide
  4. Docker Forums - Accessing Postgres Service from Child Container in GitLab CI

Related Posts