← Back to articles

The Two-Pointer Pattern: How Distributed Systems Guarantee No Data Loss

Kafka, RocketMQ, Redis, MySQL, etc. Those are great tools we are using every day, solve different problem, using different approach, but share analogous patterns

The Two-Pointer Pattern: How Distributed Systems Guarantee No Data Loss

Every serious distributed middleware — Kafka, RocketMQ, Redis, MySQL, etcd — solves the same fundamental problem. They just arrived at the solution from different directions.


The Problem: You Can’t Commit and Replicate Atomically

Imagine a leader node receiving a write. It has two goals that are in direct tension:

  1. Return fast — the client is waiting, latency matters.
  2. Never lose data — if the leader crashes right after returning, a promoted replica must have the same data.

You cannot satisfy both perfectly at the same time. Acknowledging before replication is fast but unsafe. Waiting for every replica to confirm is safe but slow. Every distributed storage system in production today is, at its core, a negotiation between these two forces.

The solution that emerged — independently, across many systems — is what I call the two-pointer pattern.


The Core Idea: Two Pointers, One Gap

Each node tracks two positions in its data log:

PointerMeaning
Local write pointerHow far this node has written. Data exists here locally.
Safe commit pointerHow far it’s safe to expose to readers. Enough replicas have confirmed.

The gap between them is the replication in-flight zone — data that exists on the leader but isn’t yet safe to serve. Consumers and readers are only allowed to see data up to the safe commit pointer.

Log: [ A ][ B ][ C ][ D ][ E ][ F ]
                          ^         ^
                     safe commit   local write
                     pointer       pointer
                     (HW)          (LEO)

    Readers see: A B C D
    Replicating: E F  ← not yet safe

This simple invariant is what prevents data loss on failover. When a new leader is elected, it only serves data up to the point all replicas had confirmed — nothing in the in-flight zone.


Kafka: Where the Terminology Comes From

Kafka made this pattern explicit with named concepts that are now widely referenced.

LEO (Log End Offset) is the local write pointer — the next offset a partition replica will write to. Every replica (leader and follower alike) has its own LEO. It advances every time a message is appended locally.

HW (High Watermark) is the safe commit pointer — the highest offset that has been replicated to all in-sync replicas (ISR). Consumers can only read up to the HW. Even if the leader has written further, those messages are invisible until all ISRs catch up.

ISR (In-Sync Replicas) is the dynamic set of replicas considered current. If a replica falls too far behind, Kafka removes it from the ISR. This matters because HW = min(LEO across all ISR replicas). Shrinking the ISR raises the HW faster; it also reduces durability guarantees.

Leader:   LEO = offset 100,  HW = 95
Replica1: LEO = 97
Replica2: LEO = 95

HW = min(97, 95) = 95  ← consumers read up to here

The elegance of Kafka’s design is that the HW is not explicitly set by the leader — it emerges from the minimum LEO across the ISR. No coordination overhead, just arithmetic.


RocketMQ: The Same Pattern, Different Names

RocketMQ’s broker architecture maps directly onto the same two-pointer model.

The CommitLog is RocketMQ’s append-only write log (analogous to Kafka’s partition log). The master broker writes to it and slaves pull or receive pushes of the data.

  • maxOffset — the local write pointer. How far the master’s CommitLog has advanced. Equivalent to Kafka’s LEO.
  • The slave’s confirmed offset — how far the slave has durably written. The safe commit pointer (HW-equivalent) is derived from this.

The key configuration is the brokerRole:

RoleBehavior
ASYNC_MASTERMaster commits before slave confirms. Gap between pointers can be large. Fast, but data in the gap is at risk.
SYNC_MASTERMaster blocks until at least one slave acknowledges. The two pointers are kept close together. Safer, higher latency.

In SYNC_MASTER mode, the safe commit pointer is effectively gated by the slave — exactly Kafka’s HW gated by ISR. In ASYNC_MASTER, the safe commit pointer is the same as the local write pointer, which means there is no safety gap at all.

RocketMQ 5.x’s DLedger mode goes further, adopting a full Raft consensus log — bringing it in line with etcd’s approach described below.


MySQL: An Evolutionary Story

MySQL’s replication history is perhaps the clearest illustration of how the distributed systems community gradually discovered the two-pointer pattern through operational pain.

Phase 1 — Async Replication (no HW concept)

Classic MySQL replication is fully asynchronous. The primary writes to the binlog and returns success. Replicas pull the binlog in the background. There is no safe commit pointer — the primary simply does not track where replicas are relative to its own writes.

This works well under normal conditions. Under failover, you may promote a replica that is 10, 100, or 10,000 transactions behind the primary. Those transactions are gone.

Phase 2 — Semi-Sync Replication (HW introduced)

Introduced as a plugin in MySQL 5.5, semi-sync replication adds a crucial step: the primary waits for at least one replica to acknowledge receipt of the binlog event before returning success to the client.

This introduces a safe commit pointer for the first time. The primary’s effective commit point is now gated by replica confirmation — conceptually identical to Kafka’s HW gated by one ISR member.

However, there was a subtle bug: in the original AFTER_COMMIT variant, the primary committed to the storage engine before waiting for the slave ack. If the primary crashed between engine commit and slave ack, the transaction was committed on the primary but not replicated. A promoted slave would be missing it.

Phase 3 — Lossless Semi-Sync / AFTER_SYNC (HW before commit)

MySQL 5.7 introduced rpl_semi_sync_master_wait_point = AFTER_SYNC. Now the sequence is:

  1. Write to binlog
  2. Wait for slave ack ← safe commit pointer advances here
  3. Commit to storage engine
  4. Return to client

The slave ack now happens before the engine commit. If the primary crashes after step 2, the slave already has the data. If it crashes before step 2, neither side has committed. Either way, no inconsistency. This is the lossless variant — and it maps precisely to how Kafka’s HW works: readers only see data after the safe pointer advances.

Phase 4 — Group Replication / MGR (quorum-based HW)

MySQL Group Replication (MGR), introduced in 5.7 and refined in 8.0, replaces the single-slave ack model with a Paxos-based quorum. A transaction must be certified and acknowledged by a majority of group members before it commits.

This is no longer just a two-pointer model between primary and one slave — it is a distributed commit log with majority quorum, equivalent to how Raft-based systems (etcd, TiKV, CockroachDB) operate.

Async        →  Semi-sync       →  Lossless semi-sync  →  MGR
(no HW)         (HW = 1 slave)     (HW before commit)     (HW = quorum)
             ← increasingly tight gap between pointers →

Redis: Configurable Gap

Redis replication uses a continuous replication stream from primary to replicas. The primary maintains a repl_offset — how far it has written into the replication backlog. Each replica tracks its own acknowledged offset.

By default, Redis replication is fully asynchronous — the primary does not wait for replicas, and the gap between the two pointers is unbounded. A primary crash can lose all unacknowledged writes.

The WAIT command and min-replicas-to-write / min-replicas-max-lag configuration bring the pattern back:

  • min-replicas-to-write N — the primary will block writes unless at least N replicas are reachable and current. This effectively enforces a maximum gap size.
  • WAIT N timeout — the client can explicitly block until N replicas have acknowledged up to the current offset. This is a manual HW gate that the application controls.

Redis Sentinel and Redis Cluster do not automatically prevent data loss on failover — they rely on the operator having configured min-replicas-to-write and min-replicas-max-lag appropriately. Without it, the safe commit pointer and the local write pointer are the same thing, which means the safety guarantee disappears.


Raft / etcd: The Formal Model

Raft makes the two-pointer pattern a first-class part of the protocol specification.

  • lastLogIndex — the index of the last entry the leader has appended to its local log. This is the local write pointer (LEO-equivalent).
  • commitIndex — the index of the last log entry known to be durably replicated on a majority of servers. This is the safe commit pointer (HW-equivalent).

The rule is precise: a log entry at index i can only be applied to the state machine (made visible) when commitIndex >= i. Until then, it exists in the leader’s log but is invisible to clients.

Raft’s quorum requirement ((N/2) + 1 nodes must confirm) gives a stronger guarantee than Kafka’s ISR. In Kafka, if the ISR shrinks to 1 (just the leader) due to replica lag, the HW advances as fast as the leader writes — no actual replication required. Raft never allows a “quorum of one” unless the cluster size is 1.

This makes Raft-based systems (etcd, TiKV, CockroachDB, YugabyteDB) the most conservative implementors of the two-pointer pattern. The gap between lastLogIndex and commitIndex can only close after a true majority has responded.


Comparison Table

SystemLocal write pointerSafe commit pointer”Enough” definition
KafkaLEO (per replica)High WatermarkAll ISR replicas
RocketMQmaxOffset in CommitLogSlave confirmed offset≥1 slave (SYNC_MASTER)
MySQL asyncBinlog position(same — no gap enforced)Nobody
MySQL semi-syncBinlog positionReplica ack position≥1 replica (AFTER_SYNC)
MySQL MGRBinlog positionPaxos certified positionMajority of group
Redis (default)repl_offset(same — no gap enforced)Nobody
Redis (WAIT)repl_offsetReplica acked offsetConfigurable N
Raft / etcdlastLogIndexcommitIndexMajority quorum

The Tradeoff Is Always the Same

No matter how the two pointers are named, the operator tradeoff is identical across every system:

How large can the gap between the two pointers be?

  • Large gap (async) → low write latency, risk of data loss on failover
  • Small gap (semi-sync / quorum) → higher write latency, strong durability
  • Zero gap (synchronous everywhere) → maximum latency, operational fragility

Every acks, sync, min-replicas, or quorum setting in every distributed system is ultimately an answer to this one question.


Why This Pattern Persists

The two-pointer pattern survives because it is the minimal mechanism that satisfies both constraints simultaneously:

  1. The local write pointer advances immediately on write — keeping write latency low and the leader unblocked.
  2. The safe commit pointer advances only after replication — guaranteeing that a promoted replica will never serve stale data.

Any system that needs both low write latency and no-data-loss failover will converge on some version of this. The names change — LEO/HW, lastLogIndex/commitIndex, maxOffset/minOffset — but the invariant is always the same: serve reads only from committed data, write optimistically, commit conservatively.

Once you see it in Kafka, you see it everywhere.


References and further reading: Kafka replication design, Raft paper (Ongaro & Ousterhout, 2014), MySQL semi-sync replication, Redis replication