This post describes the considerations that came up in designing the document-change data structure and built-in collaborative editing feature in the upcoming version of CodeMirror (a code editor system). It is something of a followup to the Collaborative Editing in ProseMirror post.
I won't introduce anything new or exciting here—the design I ended up with is a very boring non-distributed operational transformation. In a way, this post is publishing a negative result: I looked into a bunch of interesting alternative approaches, but found they didn't meet the requirements for this system.
Since collaborative editing is a tricky field with a lot of different solutions, all of which have their awkward trade-offs, I think the path towards this boring end result might still provide a useful resource for people working on similar systems.
There's quite a disconnect between the scientific literature on collaborative editing and what most collaborative editors are doing. The literature is largely concerned with truly distributed collaboration, where a number of peers, taking equivalent roles, directly exchange updates among themselves and still somehow converge on the same document. A typical web system, on the other hand, has clients talking to a server, which orchestrates the exchange of updates.
These problems are very different, and if you're aiming to implement the latter, about 95% of collaborative editing literature is discussing a problem you do not have. Working in a truly distributed fashion is very attractive in principle, of course. It is strictly more general, and has connotations of escaping the over-centralized modern web. But it does drag in a lot of problems, even outside of the document convergence—peers have to store the document along with, depending on the technique used, its entire history. They have to somehow discover each other, maintain connections, and so on.
So for the core of a JavaScript-based library, I decided that support for a distributed model wasn't important enough to justify the additional complexity. I'll get back to what that complexity looks like later.
(I'm sorry, I'm going to explain operational transformation again, just like a hundred other blog posts. Hang tight.)
Operational transformation involves, in its simplest form, a transformation function that takes two changes A and B, which both apply to the same document, and produces a new pair Aᴮ (a version of A that applies to the document produced by B) and Bᴬ (B but applies to the document created by A), such that A + Bᴬ and B + Aᴮ (where + indicates change composition) produce the same document.
This can be visualized as a diagram like this:
Docˢ
A ↙ ↘ B
Docᴬ Docᴮ
Bᴬ ↘ ↙ Aᴮ
Docᴬᴮ
You can roughly think of the transformation as moving a change through another change. Sometimes this just involves adjusting the positions it refers to (moving “insert A at 5” through “delete 0-1” yields “insert A at 4”), sometimes it it is more involved (two changes can't delete the same content, so in that case some of the deletions have to be dropped in the transformed change).
For a plain-text document model, such a transformation function is not very hard to define.
The other part of an operational transformation system is its control part, which determines how changes are shared and when they are transformed. A simple centralized setup can do something like this:
Keep a list of unsent local changes.
Periodically try to send those to the server, which will either accept them or, if another client sent a change first, reject them because your changes start in a base document that doesn't match the current central document.
When the server sends you changes, transform your unsent changes and the remote change over each other. Apply the transformed remote change locally, and replace your stored local changes with the transformed version, so that the next time you try to send changes you send those.
Because this makes everything proceed in lockstep, it is easy to see that this converges—the only difference in order in the way clients apply changes is when remote changes come while there are local changes pending. This is exactly the scenario that the transform function's convergence handles.
When there is more than one remote or local change involved, the situation is slightly more tricky. If the change representation that your transform function operates on allows series of changes to be composed into a single change, you can first do that and then perform a single transform.