← Insights·AI Infrastructure

RDMA: Why It Matters for AI Infrastructure

May 31, 2026·2 min read

Diagram of an AI cluster using RDMA. On the left, three GPU nodes each with four GPUs labeled 'memory · compute'. In the middle, a glowing 'RDMA fabric' switch with the tags 'high bandwidth · low latency · CPU bypass' and a 'GPU-to-GPU memory access' pill, connected to the GPU nodes and to a fast storage tier on the right labeled 'datasets · checkpoints · models'. Below, an 'RDMA principle' panel compares two flows: 'without RDMA · CPU in the path' showing server A mem → CPU → NIC → NIC → CPU → server B mem in amber with warning icons on the CPU boxes, and 'with RDMA · direct memory access' showing server A mem → NIC → NIC → server B mem in teal with a bypassed CPU off to the side.

Modern AI workloads run across many GPUs, many servers, and large datasets. At that scale the network is just as important as the compute. RDMA lets one server access memory on another with very low latency and minimal CPU involvement, so GPUs can spend time on math instead of waiting on the network. It is not a checkbox, but it is the difference between a fast GPU cluster and an expensive one that is mostly idle.

AI workloads are no longer running on one server only.

Modern AI environments often use multiple GPUs, multiple servers, large datasets, distributed training, and fast storage. In this kind of environment, the network becomes just as important as the compute.

This is where RDMA becomes important.

RDMA stands for Remote Direct Memory Access. In simple words, it lets one server access memory on another server with very low latency and without involving the CPU too much.

That means data can move faster between systems.

For AI workloads, this matters a lot.

When GPUs are training a model or working together on large data, they constantly need to exchange information. If the network is slow, or if the CPU becomes a bottleneck, the GPUs may wait instead of working.

That is expensive.

RDMA helps reduce latency, lower CPU overhead, and improve data movement between servers, GPUs, and storage systems.

It is especially important for distributed training, high-performance storage, large model workloads, and AI clusters where many nodes need to work together.

The main idea is simple.

In AI infrastructure, fast GPUs are not enough.

The network must be fast enough to keep them busy.

RDMA helps AI systems move data with less delay, better efficiency, and higher performance across the cluster.

// related reading

Docker Compose GPU access configuration. Left panel shows a compose file without the deploy.resources block, with a flow showing container start, GPU chip with red X, nvidia-smi failing, and workload falling to CPU. Right panel shows a compose file with the deploy.resources.reservations.devices block including driver: nvidia and count: 1, with a flow showing container start, GPU chip with green checkmark, nvidia-smi working, and CUDA available. Bottom strip shows six checks: compose file defines GPU, driver: nvidia, count or device_ids, nvidia-smi from inside, no extra flags, predictable behavior.

AI Infrastructure

Docker Compose Does Not Automatically Use the GPU

On Linux GPU servers, Docker Compose does not use the NVIDIA GPU automatically. The service starts, nothing obviously fails, and the workload quietly falls back to CPU. The fix is a few lines in the compose file, but only if you know to look for them.

Read article

Docker default runtime configuration for NVIDIA GPU containers. Left panel shows daemon.json with only the runtimes block and no default-runtime set, with a flow showing container start falling back to runc, nvidia-smi failing inside the container, and AI workloads dropping to CPU. Right panel shows daemon.json with both default-runtime: nvidia and the runtimes block, with a flow showing the container always using nvidia-container-runtime, nvidia-smi working inside the container, CUDA available, and consistent behavior after restarts and deployments. Below, a GPU server readiness strip with six checks: daemon.json configured, default runtime nvidia, Docker restarted, nvidia-smi in container, survives reboots, works in automation.

AI Infrastructure

Docker Default Runtime: Keep GPU Containers on NVIDIA

On Linux GPU servers, Docker can know about the NVIDIA runtime and still not use it. If default-runtime is missing from daemon.json, every container falls back to runc, nvidia-smi fails inside the container, AI workloads drop to CPU, and the problem looks like an application issue when it is really a one-line configuration gap.

Read article

Back to all insights