As a backend engineer working primarily in payments, distributed systems, and data infrastructure, I recently had a discussion with a friend about deploying stateful services such as MySQL on Kubernetes. During the conversation, a term kept appearing that sounded almost magical:
RDMA (Remote Direct Memory Access).
At first glance, RDMA seems like another networking optimization. After digging deeper, I realized it represents a fundamentally different way of thinking about communication between machines.
This article summarizes my learning journey.
What Is RDMA?
RDMA stands for Remote Direct Memory Access.
Traditionally, when one server sends data to another, the operating system and TCP/IP stack are heavily involved:
Application
↓
Kernel
↓
TCP/IP Stack
↓
NIC
══════ Network ══════
NIC
↓
TCP/IP Stack
↓
Kernel
↓
Application
With RDMA, a machine can directly read or write memory on another machine:
Application
↓
RDMA NIC
══════ Network ══════
RDMA NIC
↓
Remote Memory
The remote CPU may not even participate in the operation.
This dramatically reduces:
- Context switching
- Kernel overhead
- Data copies
- CPU utilization
Why Is RDMA Fast?
The biggest benefit is not necessarily bandwidth.
The real advantages are:
- Kernel bypass
- Zero-copy communication
- Hardware offloading to the network card
Instead of:
User Space
↔ Kernel Space
↔ TCP/IP Stack
↔ Interrupt
↔ Context Switch
RDMA allows:
NIC DMA Engine
↓
Remote Memory
This reduces latency from tens of microseconds to only a few microseconds in modern datacenters.
Typical numbers:
| Technology | Approximate Latency |
|---|---|
| Local RAM | ~100 ns |
| NVMe SSD | ~100 μs |
| TCP Network | 20–100 μs |
| RDMA | 1–5 μs |
The exact numbers depend on hardware and workload, but the order of magnitude difference is real.
The Most Important Benefit: CPU Savings
Initially I thought RDMA was mainly about latency.
In reality, many production systems adopt RDMA because it dramatically reduces CPU consumption.
Instead of CPUs spending cycles:
Packet Processing
TCP/IP Handling
Interrupts
Buffer Copies
the RDMA-capable NIC performs much of this work.
For large distributed systems, reducing network-related CPU overhead can be more valuable than shaving a few microseconds off latency.
Why RDMA Appears in Kubernetes Discussions
Kubernetes itself does not require RDMA.
The discussion usually appears when people are building compute-storage separated architectures.
Traditional deployment:
MySQL
+
Local SSD
Everything runs on the same machine.
Modern architecture:
MySQL Pod
↓
Distributed Storage
↓
Multiple Storage Nodes
Now storage access travels through the network.
Network latency suddenly becomes part of the database hot path.
This is where RDMA becomes attractive.
Typical examples include:
- Ceph
- NVMe over Fabrics (NVMe-oF)
- BeeGFS
- Lustre
- Other distributed storage systems
The goal is simple:
Make remote storage feel as close as possible to local storage.
HPC: The Original Home of RDMA
Another term that came up during the discussion was HPC.
HPC stands for:
High Performance Computing
Before cloud computing and AI became popular, HPC clusters were already solving problems such as:
- Weather prediction
- Aircraft simulation
- Computational fluid dynamics
- Genomics
- Drug discovery
- Scientific research
Unlike traditional distributed systems, where each node processes independent requests:
Node A -> Transaction A
Node B -> Transaction B
Node C -> Transaction C
HPC systems typically have thousands of nodes collaborating on a single computation:
1000 Nodes
↓
One Massive Job
Communication becomes the bottleneck.
This is why HPC adopted RDMA decades ago.
AI Has Made RDMA Popular Again
Today, AI training clusters are effectively modern HPC systems.
Imagine:
8000 GPUs
training a large language model.
Every training step requires exchanging gradients across thousands of GPUs.
Without RDMA:
GPU
↓
CPU
↓
TCP
↓
CPU
↓
GPU
With RDMA:
GPU
↓
NIC
══════ Network ══════
NIC
↓
GPU
Technologies such as:
- InfiniBand
- RoCE
- GPUDirect RDMA
exist specifically to optimize this communication.
Many AI engineers are now rediscovering concepts that have existed in the HPC world for decades.
Common RDMA Technologies
InfiniBand
The traditional HPC solution.
Characteristics:
- Lowest latency
- Highest performance
- Most expensive
Widely used in supercomputers and large AI clusters.
RoCE
RDMA over Converged Ethernet.
Characteristics:
- Runs on standard Ethernet
- More enterprise-friendly
- Increasingly common in datacenters
Many organizations choose RoCE because it combines RDMA capabilities with existing Ethernet infrastructure.
iWARP
RDMA over TCP.
Historically important, but less common today.
Does MySQL Automatically Benefit from RDMA?
Not necessarily.
Many MySQL workloads are limited by:
- Poor indexing
- SQL design
- Lock contention
- Storage engine behavior
- Buffer pool efficiency
rather than network performance.
If MySQL is running on:
MySQL
+
Local NVMe
RDMA will probably provide little value.
RDMA becomes interesting when:
MySQL
↓
Remote Storage
↓
Distributed Storage Cluster
or when using distributed databases with heavy cross-node communication.
A Useful Mental Model
The most helpful way to think about RDMA is:
Traditional distributed systems:
Call API
↓
Serialize
↓
TCP
↓
Deserialize
RDMA systems:
Read Remote Memory
Write Remote Memory
Instead of treating the network as message passing, RDMA starts treating the network as an extension of memory.
That mental shift explains why RDMA is so important in modern storage systems, AI clusters, and HPC environments.
My Takeaway
As someone coming from a payments and distributed backend background, I don’t think RDMA should be the first topic engineers learn.
The learning order that currently makes the most sense to me is:
- Database internals (InnoDB)
- Kafka internals
- Distributed consensus (Raft)
- Spark/Flink
- Distributed storage systems (Ceph)
- RDMA
- InfiniBand internals
Once you’ve experienced the pain of moving data across machines, RDMA stops looking like black magic and starts looking like a very practical engineering solution.
And perhaps that’s the most interesting realization:
RDMA is not about making the network faster. It’s about making remote resources feel local.