A bit more than four years ago I started the xi-editor project. Now I have placed it on the back burner (though there is still some activity from the open source community).

The original goal was to deliver a very high quality editing experience. To this end, the project spent a rather large number of “novelty points”:

I still believe it would be possible to build a high quality editor based on the original design. But I also believe that this would be quite a complex system, and require significantly more work than necessary.

I’ve written the CRDT part of this retrospective already, as a comment in response to a Github issue. That prompted good discussion on Hacker News. In this post, I will touch again on CRDT but will focus on the other aspects of the system design.

Origins

The original motivation for xi came from working on the Android text stack, and confronting two problems in particular. One, text editing would become very slow as the text buffer got bigger. Two, there were a number of concurrency bugs in the interface between the EditText widget and the keyboard (input method editor).

The culprit of the first problem turned out to be the SpanWatcher interface, combined with the fact that modern keyboards like to put a spelling correction span on each word. When you insert a character, all the successive spans bump their locations up by one, and then you have to send onSpanChanged for each of those spans to all the watchers. Combined with the fact that the spans data structure had a naive O(n) implementation, and the whole thing was quadratic or worse.

The concurrency bugs boil down to synchronizing edits across two different processes, because the keyboard is a different process than the application hosting the EditText widget. Thus, when you send an update (to move the cursor, for example) and the text on the other side is changing concurrently, it’s ambiguous whether it refers to the old or new location. This was handled in an “almost correct” style, with timeouts for housekeeping updates to minimize the chance of a race. A nice manifestation of that is that swiping the cursor slowly through text containing complex emoji could cause flashes of the emoji breaking.

These problems have a unifying thread: in both cases there are small diffs to the text, but then the data structures and protocols handled these diffs in a less than optimal way, leading to both performance and correctness bugs.

To a large extent, xi started as an exploration into the “right way” to handle text editing operations. In the case of the concurrency bugs, I was hoping to find a general, powerful technique to facilitate concurrent text editing in a distributed-ish system. While most of the Operational Transformation literature is focused on multiple users collaboratively editing a document, I was hoping that other text manipulations (like an application enforcing credit card formatting on a text input field) could fit into the general framework.

That was also the time I was starting to get heavily into Rust, so it made natural sense to start prototyping a new green-field text editing engine. How would you “solve text” if you were free of backwards compatibility constraints (a huge problem in Android)?

When I started, I knew that Operational Transformation was a solution for collaborative editing, but had a reputation for being complex and finicky. I had no idea how deep the rabbithole would be of OT and then CRDT. Much of that story is told in the CRDT discussion previously linked.

The lure of modular software

There is an extremely long history of people trying to build software as composable modules connected by some kind of inter-module communication fabric. Historical examples include DCE/RPC, Corba, Bonobo, and more recently things like Sandstorm and Fuchsia Modular. There are some partial successes, including Binder on Android, but this is still mostly an unrealized vision. (Regarding Binder, it evolved from a much more idealistic vision, and I strongly recommend reading this 2006 interview about OpenBinder).

When I started xi, there were signs we were getting there. Microservices were becoming popular in the Internet world, and of course all Web apps have a client/server boundary. Within Google, gRPC was working fairly well, as was the internal process separation within Chrome. In Unix land, there’s a long history of the terminal itself presenting a GUI (if primitive, though gaining features such as color and mouse). There’s also the tradition of Blit and then, of course, NeWS and X11.