CRDTs go brrr | Notion

A few years ago I was really bothered by an academic paper.

Some researchers in France put together a comparison showing lots of ways you could implement realtime collaborative editing (like Google Docs). They implemented lots of algorithms - CRDTs and OT algorithms and stuff. And they benchmarked them all to see how they perform. (Cool!!) Some algorithms worked reasonably well. But others took upwards of 3 seconds to process simple paste operations from their editing sessions. Yikes!

Which algorithm was that? Well, this is awkward but .. it was mine. I mean, I didn't invent it - but it was the algorithm I was using for ShareJS. The algorithm we used for Google Wave. The algorithm which - hang on - I knew for a fact didn't take 3 seconds to process large paste events. Whats going on here?

I took a closer look at the paper. In their implementation when a user pasted a big chunk of text (like 1000 characters), instead of creating 1 operation with 1000 characters, their code split the insert into 1000 individual operations. And each of those operations needed to be processed separately. Do'h - of course it'll be slow if you do that! This isn't a problem with the operational transformation algorithm. This is just a problem with their particular implementation.

The infuriating part was that several people sent me links to the paper and (pointedly) asked me what I think about it. Written up as a Published Science Paper, these speed comparisons seemed like a Fact About The Universe. And not what they really were - implementation details of some java code, written by a probably overstretched grad student. One of a whole bunch of implementations that they needed to code up.

"Nooo! The peer reviewed science isn't right everybody! Please believe me!". But I didn't have a published paper justifying my claims. I had working code but it felt like none of the smart computer science people cared about that. Who was I? I was nobody.

Even talking about this stuff we have a language problem. We describe each system as an "algorithm". Jupiter is an Algorithm. RGA is an Algorithm. But really there are two very separate aspects:

The black-box behaviour of concurrent edits. When two clients edit the same region of text at the same time, what happens? Are they merged, and if so in what order? What are the rules?
The white-box implementation of the system. What programming language are we using? What data structures? How well optimized is the code?

If some academic's code runs slowly, what does that actually teach us? Maybe it's like tests. A passing test suite suggests, but can never prove that there are no bugs. Likewise a slow implementation suggests, but can never prove that every implementation of the system will be slow. If you wait long enough, somebody will find more bugs. And, maybe, someone out there can design a faster implementation.

Years ago I translated my old text OT code into C, Javascript, Go, Rust and Swift. Each implementation has the same behaviour, and the same algorithm. But the performance is not even close. In javascript my transform function ran about 100 000 times per second. Not bad! But the same function in C does 20M iterations per second. That's 200x faster. Wow!

Were the academics testing a slow version or the fast version of this code? Maybe, without noticing, they had fast versions of some algorithms and slow versions of others. It's impossible to tell from the paper!

Making CRDTs fast

So as you may know, I've been getting interested in CRDTs lately. For the uninitiated, CRDTs (Conflict-Free Replicated Data types) are fancy programming tools which let multiple users edit the same data at the same time. They let you work locally with no lag. (You don't even have to be online). And when you do sync up with other users & devices, everything just magically syncs up and becomes eventually consistent. The best part of CRDTs is that they can do all that without even needing a centralized computer in the cloud to monitor and control everything.

I want Google Docs without google. I want my apps to seamlessly share data between all my devices, without me needing to rely on some flakey startup's servers to still be around in another decade. I think they're the future of collaborative editing. And maybe the future of all software - but I'm not ready to talk about that yet.

But most CRDTs you read about in academic papers are crazy slow. A decade ago I decided to stop reading academic papers and dismissed them. I assumed CRDTs had some inherent problem. A GUID for every character? Nought but madness comes from those strange lands! But - and this is awkward to admit - I think I've been making the same mistake as those researchers. I was reading papers which described the behaviour of different systems. And I assumed that meant we knew how the best way to implement those systems. And wow, I was super wrong.

How wrong? Well. Running this editing trace, Automerge (a popular CRDT, written by a popular researcher) takes nearly 5 minutes to run. I have a new implementation that can process the same editing trace in 56 milliseconds. Thats 0.056 seconds, which is over 5000x faster. It's the largest speed up I've ever gotten from optimization work - and I'm utterly delighted by it.

Lets talk about why automerge is currently slow, and I'll take you through all the steps toward making it super fast.

Wait, no. First we need to start with:

Making CRDTs fast

What is automerge?