Architecting High-Performance Compute: The Definitive Guide to Docker GPU Integration and Orchestration

The integration of Graphics Processing Units (GPUs) within Docker containers represents a paradigm shift in how compute-intensive workloads are deployed and scaled. By leveraging the massive parallel processing capabilities of NVIDIA GPUs, developers can accelerate artificial intelligence (AI), machine learning (ML), and complex video processing tasks, effectively decoupling the hardware acceleration layer from the underlying host operating system. This architectural approach ensures that the high-performance requirements of CUDA-based applications are met while maintaining the portability and isolation benefits of containerization. Whether deploying on a Windows workstation via WSL 2 or orchestrating complex microservices on a Linux-based production cluster using Docker Compose, the ability to pass GPU resources into a container is critical for achieving the GFLOPs (Giga Floating Point Operations per Second) necessary for modern generative AI and scientific simulations.

GPU Acceleration in Docker Desktop for Windows

Docker Desktop for Windows has evolved to support sophisticated hardware passthrough, specifically targeting NVIDIA GPU Paravirtualization (GPU-PV). This technology allows containers to access the physical GPU resources of the host machine, enabling the execution of compute-heavy workloads without the overhead of traditional virtualization.

The primary mechanism for this functionality is the Windows Subsystem for Linux (WSL 2) backend. GPU support is exclusively available when Docker Desktop is configured to use WSL 2, as this provides the necessary Linux kernel interface required for the NVIDIA drivers to communicate with the containerized environment.

To successfully implement WSL 2 GPU Paravirtualization, the following technical requirements must be satisfied:

A Windows hardware environment equipped with a compatible NVIDIA GPU.
A fully updated installation of Windows 10 or Windows 11.
NVIDIA drivers that specifically support WSL 2 GPU Paravirtualization.
The most recent version of the WSL 2 Linux kernel, which can be ensured by executing the following command:
wsl --update

The impact of these requirements is significant; failure to update the kernel or drivers often results in the container failing to detect the GPU, leading to "device not found" errors during the runtime phase. This connects directly to the need for a stable NVIDIA Container Toolkit installation on Linux hosts, as the underlying logic for device mapping is similar across both platforms.

To verify that GPU access is functioning correctly within the Docker Desktop environment, users can execute a specialized n-body simulation benchmark. This process validates the entire chain from the hardware driver through the WSL 2 layer into the container. The command to perform this verification is:
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

The n-body simulation tool provides several configuration flags to tune the benchmark and test different aspects of the GPU performance:

-benchmark: This flag triggers the performance measurement mode to quantify the GPU's processing power.
-fullscreen: This allows the simulation to be rendered in full-screen mode.
-fp64: This forces the use of double precision floating point values, which is critical for high-accuracy scientific simulations.
-hostmem: This instructs the simulation to store data in the host memory rather than the GPU VRAM.
-numbodies=<N>: This defines the number of bodies (where N must be greater than or equal to 1) to be processed in the simulation.
-device=<d>: This specifies which CUDA device to use (e.g., 0, 1, 2) in multi-GPU setups.
-cpu: This runs the simulation on the CPU to provide a baseline for performance comparison.
-compare: This executes the simulation on both the default GPU and the CPU to provide a direct performance comparison.
-tipsy=<file.bin>: This allows the user to load a specific tipsy model file for simulation.
-numdevices=<i>: This specifies the number of CUDA devices to be utilized.

A successful execution on a device such as a GeForce RTX 2060 with Max-Q Design (Compute Capability 7.5) can yield results showing billions of interactions per second, confirming that the container is leveraging the hardware's full potential.

Orchestrating GPUs with Docker Compose

For complex applications involving multiple services, Docker Compose provides a structured method to reserve and allocate GPU devices. This is managed through the device attribute within the deploy section of the compose.yaml file, allowing for granular control over how hardware is distributed among containers.

The Docker Compose specification requires a precise configuration of the reservations block. A failure to define the capabilities field will result in a deployment error, as the Docker daemon needs to know specifically what features of the device are being requested.

The following table details the specific properties available for GPU reservation in Docker Compose:

Property	Type	Description	Requirement/Constraint
`capabilities`	List of Strings	Defines the device capabilities (e.g., `[gpu]`)	Mandatory; failure to set causes deployment errors.
`count`	Integer/String	Number of GPUs to reserve (e.g., `1` or `all`)	Mutually exclusive with `device_ids`.
`device_ids`	List of Strings	Specific IDs of GPUs from the host (found via `nvidia-smi`)	Mutually exclusive with `count`.
`driver`	String	The driver type, typically `'nvidia'`	Used to specify the hardware driver.
`options`	Key-Value Pairs	Driver-specific configuration options	Optional.

In a practical scenario, a compose.yaml file for a CUDA-based service would be structured as follows:

yaml services: test: image: nvidia/cuda:12.9.0-base-ubuntu22.04 command: nvidia-smi deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]

When executing docker compose up, the system creates the network and attaches the container to the specified GPU. The resulting output of the nvidia-smi command within the container confirms the driver version and CUDA version, ensuring that the containerized environment is correctly mapped to the physical hardware.

Troubleshooting GPU Detection and Runtime Failures

A common failure point in GPU-enabled Docker environments is the "could not select device driver" error. This typically occurs when the Docker daemon is unable to locate the NVIDIA runtime, even if the NVIDIA drivers are installed on the host.

A specific instance of this issue involves users on Ubuntu 24.04 running Docker version 27.5.0. When attempting to run a container with the --gpus all flag, users may encounter the following error:
docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].

The technical cause of this failure is often a missing or misconfigured NVIDIA Container Runtime. To diagnose this, users should check the available runtimes using the following command:
docker info | grep Runtimes

If the output only shows runc and io.containerd.runc.v2 without mentioning nvidia, it indicates that the NVIDIA runtime is not registered with the Docker daemon. The impact is a total inability to utilize GPU acceleration, regardless of whether the hardware is present or the drivers are updated.

To resolve this, the NVIDIA Container Toolkit must be installed and configured. The process involves:

Installing the Docker Engine according to official Ubuntu guidelines.
Enabling the NVIDIA package repository.
Installing the NVIDIA Container Toolkit.
Configuring the container runtime to recognize the NVIDIA driver.

Once the toolkit is installed, the runtime is registered, and the --gpus flag will correctly trigger the NVIDIA driver to expose the GPU to the container.

Advanced Implementation: Docker Model Runner and CUDA Images

For those deploying Large Language Models (LLMs), Docker has introduced the Model Runner, which simplifies the process of utilizing GPU acceleration with backends like vLLM. This removes the need for manual compose.yaml configurations for simple model inference.

To install a runner with vLLM and CUDA acceleration, the following command is used:
docker model install-runner --backend vllm --gpu cuda

Users can verify the status of the runner with:
docker model status

This will display the running versions of backends, such as vllm version: 0.11.0 or llama.cpp, confirming that the GPU-accelerated environment is active. To execute a model, such as SmolLM2, the command is:
docker model run ai/smollm2-vllm hi

Furthermore, selecting the correct base image is paramount for stability. The nvidia/cuda repository on Docker Hub provides a vast array of images tailored to different operating systems and requirements. The selection of an image depends on whether the user needs the base toolkit, the runtime, or the development environment.

The following list represents the variety of available CUDA image configurations for different base OS flavors:

UBI10 (Universal Base Image):
- 13.2.1-cudnn-runtime-ubi10
- 13.2.1-runtime-ubi10
- 13.2.1-cudnn-devel-ubi10
- 13.2.1-devel-ubi10
- 13.2.1-base-ubi10
SUSE 16:
- 13.2.1-runtime-suse16
- 13.2.1-devel-suse16
- 13.2.1-base-suse16
Rocky Linux 9:
- 13.2.1-cudnn-runtime-rockylinux9
- 13.2.1-runtime-rockylinux9
- 13.2.1-cudnn-devel-rockylinux9
- 13.2.1-devel-rockylinux9
- 13.2.1-base-rockylinux9
Rocky Linux 8:
- 13.2.1-cudnn-runtime-rockylinux8
- 13.2.1-runtime-rockylinux8
- 13.2.1-cudnn-devel-rockylinux8
- 13.2.1-devel-rockylinux8
- 13.2.1-base-rockylinux8
Rocky Linux 10:
- 13.2.1-cudnn-runtime-rockylinux10
- 13.2.1-runtime-rockylinux10
- 13.2.1-cudnn-devel-rockylinux10
- 13.2.1-devel-rockylinux10
- 13.2.1-base-rockylinux10

The "devel" images are necessary for those who need to compile CUDA code inside the container, while "runtime" images are optimized for executing pre-compiled binaries, reducing the overall image size and attack surface.

Conclusion

The deployment of GPUs within Docker represents a sophisticated intersection of hardware virtualization and container orchestration. By utilizing the NVIDIA Container Toolkit and ensuring the correct runtime registration, users can transition from standard CPU execution to high-performance GPU acceleration. On Windows, this is streamlined through WSL 2 and GPU Paravirtualization, while on Linux, it requires a precise alignment of the Docker daemon and the NVIDIA runtime. The use of Docker Compose allows for a declarative approach to hardware allocation, ensuring that resources are reserved and capabilities are explicitly defined. Whether through the use of the Docker Model Runner for rapid AI deployment or the selection of a specific nvidia/cuda base image for custom development, the goal remains the same: maximizing the throughput of the hardware while maintaining the agility of the container ecosystem. The failure to adhere to these technical prerequisites—such as neglecting the capabilities field in Compose or ignoring the wsl --update command on Windows—directly results in the inability of the container to interface with the GPU, highlighting the critical nature of the driver-to-runtime communication chain.