Download PDF

Understanding Paxos

The Consensus Problem

Suppose you have a collection of computers and want them all to agree on something. This is what consensus is about; consensus means agreement.

Consensus comes up frequently in distributed systems design. There are various reasons why we want it: to agree on who gets access to a resource (mutual exclusion), agree on who is in charge (elections), or to agree on a common ordering of events among a collection of computers (e.g., what action to take next, or state machine replication).

Replication may be the most common use of consensus. We set up collections (clusters) of servers, all of which will have replicated content. This provides fault tolerance: if any server dies, others are still running. It also provides scale: clients can read data from any available server although writing data will generally require the use of all servers and not scale as well. For many applications, though, reads far outnumber writes. To keep the content replicated, we have two design choices. We can funnel all write requests through one coordinator, which will ensure in-order delivery to all replicas or we can send updates to any system but that system will coordinate those updates with its replicas to ensure that all updates are applied in the same order on each replica. In the first case, we need consensus to elect that coordinator. In the second case, we need to run a consensus algorithm for each update to ensure that everyone agrees on the order.

The consensus problem can be stated in a basic, generic manner: One or more systems may propose some value. How do we get a collection of computers to agree on exactly one of those proposed values?

The formal properties for asynchronous consensus are:


Paxos is an algorithm that is used to achieve consensus among a distributed set of computers that communicate via an asynchronous network. One or more clients proposes a value to Paxos and we have consensus when a majority of systems running Paxos agrees on one of the proposed values. Paxos is widely used and is legendary in computer science since it is the first consensus algorithm that has been rigorously proved to be correct.

Paxos simply selects a single value from one or more values that are proposed to it and lets everyone know what that value is. A run of the Paxos protocol results in the selection of single proposed value. If you need to use Paxos to create a replicated log (for a replicated state machine, for example), then you need to run Paxos repeatedly. This is called multi-Paxos. There are some optimizations that could be implemented for multi-Paxos but we will not discuss those here.

Paxos provides abortable consensus. This means that some processes abort the consensus if there is contention while others decide on the value. Those processes that decide have to agree on the same value. Aborting allows a process to terminate rather than be blocked indefinitely. When a client proposes a value to Paxos, it is possible that the proposed value might fail if there was a competing concurrent proposal that won. The client will then have to propose the value again to another run of the Paxos algorithm.

Our assumptions for the algorithm are: