A recent paper suggested a new mathematical point of view on version control. I first found out about it from pijul, a new version control system (VCS) that is loosely inspired by that paper. But if you poke around the pijul home page, you won’t find many details about what makes it different from existing VCSes. So I did a bit of digging, and this series of blog posts is the result.
In the first part (i.e. this one), I’ll go over some of the theory developed in the paper. In particular, I’ll describe a way to think about patches and merging that is guaranteed to never, ever have a merge conflict. In the second part, I’ll show how pijul puts that theory into action, and in the third part I’ll dig into pijul’s implementation.
Before getting into some patch theory, a quick caveat: any real VCS needs to deal with a lot of tedious details (directories, binary files, file renaming, etc.). In order to get straight to the interesting new ideas, I’ll be skipping all that. For the purposes of these posts, a VCS only needs to keep track of a single file, which you should think of as a list of lines.
A patch is the difference between two files. Later in this series we’ll be looking at some wild new ideas, so let’s start with something familiar and comforting. The kind of patches we’ll discuss here go back to the early days of Unix:
In order to actually have a useful VCS, you need to be able to delete lines also. But deleting lines turns out to add some complications, so we’ll deal with them later.
For an example, let’s start with a simple file: my to-do list for this morning.
Looking back at the list, I realize that I forgot something important. Here’s the new one:
To go from the original to-do list to the new one, I added the line with the socks. In the format of the original Unix “diff” utility, the patch would look like this:
The “1a2” line is a code saying that we’re going to add something after line 1 of the input file, and the next bit is obviously telling us what to insert.
Since this blog isn’t a command line tool, we’ll represent patches with pretty diagrams instead of flat files. Here’s how we’ll draw the patch above:
Hopefully it’s self-explanatory, but just in case: an arrow goes from left to right to indicate that the line on the right is the same as the one on the left. Lines on the right with no arrow coming in are the ones that got added. Since patches aren’t allowed to re-order the lines, the lines are guaranteed not to cross.
There’s something implicit in our notation that really needs to be said out loud: for us, a patch is tied to a specific input file. This is the first point where we diverge from the classic Unix ways: the classic Unix patch that we produced using “diff” could in principle be applied to any input file, and it would still insert “* put on socks” after the first line. In many cases that wouldn’t be what you want, but sometimes it is.
The best thing about patches is that they can enable multiple people to edit the same file and then merge their changes afterwards. Let’s suppose that my wife also decides to put things on my to-do list: she takes the original file and adds a line:
Now there are two new versions of my to-do list: mine with the socks, and my wife’s with the garbage. Let’s draw them all together:
This brings us to merging: since I’d prefer to have my to-do list as a single file, I want to merge my wife’s changes and my own. In this example, it’s pretty obvious what the result should be, but let’s look at the general problem of merging. We’ll do this slowly and carefully, and our endpoint might be different from what you’re used to.