An Important Problem and an Important Algorithm – Consensus and Paxos

Consensus is such an important problem and Paxos is such an important algorithm that I have to talk about them. If you ever see a web site or blog talking about distributed fundamentals and systems but not mentioning Consensus and Paxos you had better skip that web site or blog.

Why is it important? First it is so important in real world. We live under so many laws which were passed by consensus. We have so many meetings and emails at work mainly to reach agreement on something. In computer world processes/machines need to agree on many things such as committing or aborting a transaction, choosing a leader, granting a lock, replicated state machine, atomic broadcast and so on. These things are often the foundation of distributed systems and critical to higher level applications and services.

Then you might wonder why we need another post about Consensus and Paxos since there are already so many of them. The funny thing is that the reason we need another post is exactly because there are so many. More specifically here are my reasons.

  1. It is easy to get confused since there are so many articles. Even worse some articles have wrong information.
  2. An interesting observation is that some articles which explain Paxos are more complicated than the paper itself
  3. I won’t be able to explain Paxos very well in one post but I will point out one way by which if one follows he/she should be able to understand Paxos algorithm well but the route might not an easy one to everybody.
  4. I will probably talk about the implementation of Paxos a little bit.

I also have the below suggestions.

  1. Set right expectation. To mortals including me Consensus is a hard problem and Paxos is a complicated algorithm. Some posts do give people wrong impression that it is easy then people might not take enough time and efforts to understand it or work on it.
  2. Don’t read too many articles/papers (unfortunately I already did). You just need to read the good and essential ones. I will let you know what are the good and essential ones.
  3. If you get questions you can’t figure out after trying hard, ask people around or ask on Quora (http://www.quora.com). There are quite a few folks there who understand distributed systems very well.

Consensus problem itself is easy to understand. Basically how do multiple people/parties/participants/processes/nodes/… reach agreement? Let’s look at real world examples first. You are Friend A. You and Friend B want to set a time to have dinner together. It is easy. You can just call B to confirm a time which works for both of you. However please try to answer the below questions.

  • What if B is not available when you call?
  • What if you can’t call and you can only send messages?
  • How do you know your message reaches B?
  • If you ask B to send acknowledge back how does B know you receive his/her acknowledgement?

Let’s add Friend C to the case. Then one person among A,B and C can be the coordinator to communicate with the other two and negotiate a time. This is more work but still easy in real life. But I can ask the same questions earlier and more questions as below

  • How do you decide who will be the coordinator?
  • How do you want to reach consensus? by everyone agreeing on the same time or the majority agreeing on the same time is enough or something else?

Finally let’s say that 100 friends want to set a time to have a dinner party. This may need a dedicated coordinator to do the work. The coordinator will probably need to keep a list of all people. This is even not easy in real life and the process may need a few days or even more. I will ask all the questions earlier and also more questions as below.

  • What if the dedicated coordinator becomes sick and can’t perform the duty?
  • What if there are multiple coordinators and they work at the same time?
  • Who keep the list of all people and what if the list is missing?

How do we make sure that consensus can be made in the end?

Answering all the questions will help you understand how to solve the consensus problem in computer world.A few observation about computer world as below

  1. You can set simple rules to force everyone/every process/… to agree on the same thing. Let’s say that Process A and B agree on something. We can set a requirement and force C to agree on that if A and B agrees. In real life it might not work well that way
  2. You cannot make phone calls. Most likely “people” communicate with messages.
  3. The algorithm has to be efficient. You can’t use a few days to reach consensus on the time on the dinner party.
  4. It has to fault tolerant. One or more processes crashing should not block the consensus process if possible
  5. One process might take arbitrary time to process information. A message might take arbitrary time to deliver.

Let’s formalize what properties/requirements a consensus algorithm should satisfy.

  1. Agreement: everyone agrees on the same time for the dinner party even if he/she might not be able to come (in computer world a process might agree on something and then crash)
  2. Validity: the time should not be a random one from nowhere but be some value actually proposed by someone (thus the time makes sense)
  3. Termination: it can’t take forever to decide what time to have dinner. Otherwise it is meaningless. Interestingly it looks like it is happening in US politics. It seems that it will take forever for the GOP and democratic party to reach consensus on many issues like the immigration reform.

The agreement property and the validity property are called safety properties and the termination property is called liveness property. If you don’t understand safety and liveness, don’t worry. I just want to mention them since they will be useful concepts in my future posts. But simply speaking safety is about what is allowed and what is not allowed or nothing ‘bad’ happens. Liveness is about something must happen or something ‘good’ eventually happen.

Now you might have a natural question. Can consensus problem be solved? In real life it is possible in most cases but it looks like that it is not possible in some cases (e.g. no peace in some places of the world due to endless war/conflicts) but you are not sure. Optimistic people may say that 20 years later the war will be over and the peace will come. The pessimistic people may say that it is doomed. But scientifically speaking is consensus possible in all cases or not?

To answer that question we need to look at two system models as below. Basically how do we model the real world in computer world? There are two kinds of system models: synchronous systems and asynchronous systems. Something to keep in mind is that both are trying to model the real world but both are not accurate representation of real world. There is always gap between research/theory and practice.

A synchronous system has the below properties [3].

  • Every message is delivered within a fixed and known amount of time.
  • Every process takes steps at a fixed and know rate.
  • Everyone process has a clock, and all the clocks are synchronized.

On the contrary an asynchronous system doesn’t have these properties.

Here comes another natural question. Are the real world systems synchronous or asynchronous? Just look at the properties and think about it.

Can consensus be solved in synchronous system? The good news is yes. It can be solved even if there are server crashes. I will leave how to solve it as an exercise.

Can consensus be solved in asynchronous system? Unfortunately it is proved [1] that it is not possible. The famous result and proof is called FLP impossibility result. FLP are the acronym of the three authors’ last names. This result is not trivial and it won Dijkstra Prize. The result can be expressed very simply as below. If you want to understand more please feel free to read the paper [3] which is very easy to read.

A consensus is impossible in an asynchronous system if there is even one crash failure.

If consensus can’t be solved in asynchronous system then why do we still strive to find good distributed algorithms for it? it turned out that the systems in reality are not completely asynchronous. The important algorithm Paxos can actually guarantee that consensus can be reached as long as the system is eventually asynchronous (a system model closer to real world). Simply put it works very well in reality. It looks that this post is getting long and explaining Paxos may take even more text so I will stop here and continue the talk in my next Post.

[1] Michael J. Fischer, Nancy Lynch, and Mike Paterson | Impossibility of Distributed Consensus with One Faulty Process | April 1985 | http://discolab.rutgers.edu/classes/cs519-old/papers/p374-fischer.pdf

[2] wiki page for Consensus problem http://en.wikipedia.org/wiki/Consensus_(computer_science)

[3] Seth Gilbert, Nancy Lynch | Perspectives on the CAP Theorem | 2012 | http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6122006