System DesignApr 22, 2026·9 min read·

OT vs CRDT — how two clients converge without a referee

Two users, one document, concurrent edits, no fighting. Every collaborative product resolves this with either Operational Transformation or a CRDT. Here's what each actually does — with a step-by-step animation you can scrub, the real ops on the wire, and the reason you'd pick one over the other.

Twitter / X LinkedIn

Play it through — watch both users converge to the same text under each algorithm:

01/06Shared starting state: both clients see "cat".

User A

cat

→ "cat"

User B

cat

→ "cat"

primary sources · Operational Transformation (Wikipedia) · crdt.tech · Automerge concepts

Two people open the same document. Both type. Both edits reach the other side. Somehow, both sides converge to the same result — without any referee telling them "you go first".

This is the central magic of collaborative editing. It comes from one of two algorithms: Operational Transformation ( — what Google Docs uses) or Conflict-free Replicated Data Types (CRDTs — what and Automerge use). Both solve the convergence problem. They pick different points on the complexity / capability / decentralisation curve.

This post walks through the actual mechanics. Not "CRDTs use ID tags" (everyone knows that) — why the ID tags make convergence automatic, and what the transform function actually does in OT.

tl;dr

OT: edits are operations like insert("x", 3). The server receives concurrent ops, transforms each against the ones already applied (e.g. "you wanted to insert at position 3, but someone already inserted one char before that, so your effective position is 4"), and broadcasts. Clients stay thin; server does the hard math. Used by Google Docs. CRDT: every character carries a unique ID. Merging means inserting after the op's referenced left-neighbour ID. Order-independent — any client can apply any subset of ops and converge. Used by Yjs, Automerge. The tradeoff: OT needs a server but the document is lean; CRDT works peer-to-peer but carries per-character metadata.

The convergence problem, stated precisely

Two clients, one shared document. Both apply their own edits immediately (optimistic UI) so the user feels instant response. Then ops propagate to the other side. Question: after all the ops have been exchanged, do both clients end up with the same document?

The answer isn't obvious. Here's why: the same operation, applied to different base states, produces different results.

Starting state: "cat"

User A applies locally:  insert("h", 0)    →  "hcat"
User B applies locally:  insert("s", 3)    →  "cats"

Now A's op arrives at B. B's current state is "cats" — where does position 0 even mean? (At the start, "before the c", same as before B's edit.) B applies: "hcats". ✓

Now B's op arrives at A. A's current state is "hcat". If A naively applies insert("s", 3), the character at position 3 is "a" (shifted by A's earlier insert). A ends up with "hcaste". ✗

Same op, different result. The two clients have diverged.

The job of both OT and is to prevent exactly this divergence. They do it in completely different ways.

Operational Transform — the server does the math

OT's insight: when an op arrives that was authored against a stale base state, transform it before applying.

In our scenario, when B's op insert("s", 3) arrives at A:

A has already applied insert("h", 0), which shifted every later position right by 1.
A's side computes: for B's op to produce the correct result on A's current state, position 3 must be transformed to position 4.
A applies the transformed op: insert("s", 4) → "hcats". ✓

Both clients end up with "hcats". Same state. Converged.

The transform function (sometimes called T or the "transform operator") is deceptively simple in this toy case — shift position by the number of characters inserted before yours. The real implementation is harder because every pair of op types needs its own transform:

T(insert_a, insert_b)  — shift if needed
T(insert, delete)      — shift or split
T(delete, insert)      — shift; may partially cancel
T(delete, delete)      — overlap → clamp, cancel, or shorten

Comprehensive OT libraries (share.js, ot.js, the Google Docs internal version) ship ~20 transform functions to cover every pairing. Each one has edge cases around overlapping ranges. Each one has to respect convergence invariants (TP1 — any order of transforms applied to an op produces the same result — and TP2 — concurrent ops commute after transform).

Google famously had to iterate multiple times on their OT implementation for Docs. It's a well-known engineering hazard. If you're picking OT, use an existing library.

Why OT wants a server

OT's transforms depend on knowing the history of applied ops. In the two-client toy scenario above, it's easy: A knows what A applied, B knows what B applied, they can each transform the other's op. But at three or more clients, each client would need to track everyone else's state globally.

A server simplifies this by being the single linearisation point. The server receives ops from all clients, orders them, transforms each against the tail of the log, and broadcasts transformed ops to other clients. Clients only see already-correctly-transformed ops — they don't need to do transform math themselves.

This is the Google Docs architecture: server-side OT, thin clients, central authority. It's also why OT doesn't work peer-to-peer without a coordinator.

CRDTs — each character carries its identity

CRDT's insight: if every character has a globally-unique ID, merging becomes pointer lookup, not position arithmetic.

Consider the same scenario, but under a CRDT:

Starting state: each char has an ID. "cat" is [A1:c, A2:a, A3:t].
User A inserts "h". A assigns it a new globally-unique ID — say (ClientA, 1) — and notes its left neighbour: null (because "h" is at position 0).
- A's document: [ClientA-1:h, A1:c, A2:a, A3:t].
- A sends the op: { id: ClientA-1, char: "h", left: null }.
User B inserts "s". B assigns (ClientB, 1) and left = A3 (because "s" is after "t").
- B's document: [A1:c, A2:a, A3:t, ClientB-1:s].
- B sends: { id: ClientB-1, char: "s", left: A3 }.

Now the ops propagate.

A receives B's op. A looks up left: A3 in its local document, finds it, and inserts after it. A's document becomes [ClientA-1:h, A1:c, A2:a, A3:t, ClientB-1:s]. ✓

B receives A's op. B looks up left: null → that means "at the start". B inserts at the beginning. B's document becomes [ClientA-1:h, A1:c, A2:a, A3:t, ClientB-1:s]. ✓

Converged. No transform function. No position arithmetic. Pointer lookup + insert.

The magic property: this works regardless of the order ops arrive. If A's op arrives at B before B applied its own op, the result is the same. If a third client applies A's op first, then B's, then a fourth, the result is still the same. That's what makes CRDTs conflict-free.

Concurrent insert at the same position — the tiebreak

The above example had A insert at position 0 and B insert at position 3. What if both insert at the same position (say, both at position 3)?

Real CRDT implementations handle this with a tiebreak rule: both inserts have the same left neighbour, so sort them by their IDs (typically lexicographic on clientId). Every client applies the same rule, so every client converges to the same order. Deterministic.

This is why CRDTs need client IDs that every client agrees on. (clientId, counter) tuples are the canonical shape: clientId is stable per user/device, counter is a monotonically-increasing per-client number.

The cost CRDTs carry

Everything in CRDTs is paid for in metadata. Every character has an ID (typically 10–20 bytes of overhead per character for a text CRDT). Deleted characters can't actually be deleted — they become tombstones because future ops might still reference them as left-neighbours. A long-lived document can carry tens of megabytes of metadata.

Modern CRDT libraries (Yjs especially) compress aggressively:

Runs of characters from the same client with sequential counters are stored as a struct rather than N separate entries.
Tombstones can often be garbage-collected when the library detects no live ops reference them.
Yjs uses integer indexing within its internal "struct store" that keeps the practical overhead to maybe 2–3x the raw text size.

That's workable. A 100k-char document with 5 active editors might use 300–500kB in CRDT state. Not bad for the convergence guarantees.

When to pick which

The short answer:

Server-authoritative + text editor → OT. Google Docs-shaped products. Server does the hard math; clients are thin. Use an existing library (share.js, hosted OT services).
Any P2P / local-first / offline-heavy → CRDT. The whole reason CRDTs exist is the order-independence property that makes P2P feasible. Yjs and Automerge are the production choices.
Server-authoritative + structured content (not plain text) → probably LWW at the right granularity, not OT or CRDT. See the collaborative editor system-design post — Figma uses property-level LWW, Notion uses block-level, Linear uses document-level for things like issue descriptions.

The longer answer: the algorithm is maybe 15% of the hardness. The other 85% is presence, undo, schema migration, access control, persistence. Pick the one that fits your authority model and move on to those.

The thing every post gets wrong about CRDTs

One common oversimplification: "CRDTs are better than OT because they don't need a server." That's true and also misleading.

CRDTs being P2P-capable doesn't mean CRDT-backed products don't use servers. Notion, Linear, Obsidian Sync — they all use servers, and several use CRDTs. A server here is a coordinator and durability layer, not an authority. It doesn't transform ops (the CRDT math runs client-side); it just stores and forwards.

The difference from OT's server:

OT server: CPU-heavy (runs transform functions), stateful (holds the ops log, must order them), hard to scale horizontally.
CRDT server: CPU-light (just forwards ops), relatively stateless (holds the canonical CRDT blob; any client can rebuild it from ops), easy to scale.

This is part of why newer products (Linear, Obsidian) tend toward CRDT even when they're server-hosted: the server role is cheaper to operate.

Primary sources

Operational Transformation — Wikipedia — decent foundational overview with references to the original Ellis & Gibbs paper and the TP1/TP2 conditions.
Google Drive Realtime API architecture — the public doc on Google's OT-based Realtime API (deprecated, but the architecture write-up still reads well).
Automerge — Concepts — the reference CRDT library. Good intuitions for ID generation and merge semantics.
Yjs — Internal data structures — production-grade CRDT used by Obsidian, Relm, JupyterLab. Their compression techniques are the state of the art.
Martin Kleppmann — A critique of the CAP theorem + CRDT papers — academic grounding.
Figma — How Figma's multiplayer technology works — not CRDT/OT, but explicitly explains why they chose property-level LWW instead.
Local-first software — Ink & Switch — the seven ideals and the argument for why P2P + CRDT is worth the complexity.