Skip to main content
MyITCyberBack to home
← Insights·AI Infrastructure

Docker Compose Does Not Automatically Use the GPU

·2 min read
Docker Compose GPU access configuration. Left panel shows a compose file without the deploy.resources block, with a flow showing container start, GPU chip with red X, nvidia-smi failing, and workload falling to CPU. Right panel shows a compose file with the deploy.resources.reservations.devices block including driver: nvidia and count: 1, with a flow showing container start, GPU chip with green checkmark, nvidia-smi working, and CUDA available. Bottom strip shows six checks: compose file defines GPU, driver: nvidia, count or device_ids, nvidia-smi from inside, no extra flags, predictable behavior.

On Linux GPU servers, Docker Compose does not use the NVIDIA GPU automatically. The service starts, nothing obviously fails, and the workload quietly falls back to CPU. The fix is a few lines in the compose file, but only if you know to look for them.

On Linux GPU servers, Docker Compose does not automatically use the NVIDIA GPU just because the server has one. This is a common mistake in AI, CUDA, and Local LLM environments.

You may test the server directly with Docker and everything works:

bash
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

But then the real service runs with Docker Compose, and the container starts without GPU access. The application may still run, but it falls back to CPU. That means slow inference, poor performance, and a lot of wasted time troubleshooting the wrong thing.

The reason is simple. Docker Compose needs GPU access to be defined in the compose file. A basic service like this starts without errors, but it does not request the GPU:

yaml
services:
  llm:
    image: my-local-llm:latest
    ports:
      - "8080:8080"

For GPU workloads, the compose file must explicitly request NVIDIA GPU access using the deploy block. This is the correct configuration for a service that needs one GPU:

yaml
services:
  llm:
    image: my-local-llm:latest
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Sometimes you do not want to give the container all GPUs. You may want to limit it to a specific GPU, especially when multiple containers, users, or AI workloads share the same server. Use device_ids instead of count:

yaml
services:
  llm:
    image: my-local-llm:latest
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]
              capabilities: [gpu]

Without these limits, one container may take more GPU resources than expected. With clear GPU allocation in the compose file, the environment becomes easier to manage, easier to troubleshoot, and more predictable.

After updating the compose file, restart the service and test GPU access from inside the running container. If nvidia-smi returns the expected output from inside the container, with no extra flags, the configuration is correct.

// related reading