This is my attempt to describe the basic paxos algorithm in a programmer friendly way. I try to merge the description in Leslie Lamport's The Part-Time Parliament paper and subsequent TLA+ specification.
Process- a Paxos participant (priest in PTP paper).Ballot- a referendum on a single decree. Each Ballot is identified by a unique ballot number, and Ballots are ordered by the ballot number.Decree- represents the value being agreed upon, i.e. the value being voted on.Ballot Number- a unique number made up of a pair - proposal number, and process id. Ballot Numbers are sorted by proposal number, followed by process id.Process id- a unique id given to each process.Proposal number- a monotonically increasing sequence number maintained by each process,-1indicates none, valid values are>= 0.Vote v- a vote cast by a process, is a tuple containing the process id of the voter, the ballot number, and the decree being voted. Votes are ordered by ballot numbers.Ledger- each process must maintain some data in persistent storage - the ledger represents the storage data structure.quorumSize-(Number of Participants + 1) / 2, where Number of participants is an odd number.
MaxVote(votes)- function that returns the max vote cast, wheremax()is based on the ordering of votes by ballot number.owner(b)- returns the process id that initiated the ballot numberb.
outcome- this is value of thedecreein the ledger, or NULL if there is nothing written yet.lastTried- The ballot number that the processplast tried to initiate, or(-1,p.id)if none.maxBal- The maximum ballot number that processpever agreed to participate in, or(-1,p.id)ifphas never agreed to participate in a ballot.maxVBal- The ballot number in whichplast voted or(-1,p.id)ifpnever voted.maxVal- The value of the decree associated withmaxVBal, i.e. the decree thatplast voted, or NULL ifpnever voted.
-
status- the status of processp, which can be one of the following:idle- not conducting or trying to begin a ballottrying- trying to begin ballot numberledger.lastTriedpolling- Conducting ballot numberledger.lastTried- On startup the status is assumed to be
idle.
-
prevVotes- the set of votes received inLastVotemessages for the current ballot (i.e. ballot number inledger.lastTried). -
quorum- the set of processes includingp, that responded withLastVotemessages for current ballot, only meaningful whenstatus == polling. -
voters- the set of processes includingp, from whomphas receivedVotedmessages in the current ballot, only meaningful whenstatus == polling. -
decree- ifstatus == polling, then the decree of the current ballot, otherwise meaningless.
NextBallot- aka PREPARE 1a - message sent by the ballot conductor.LastVote- aka PROMISE 1b - message sent by participant to ballot conductor.BeginBallot- aka ACCEPT 2a - messages sent by the ballot conductor.Voted- aka ACCEPTED 2b - message sent by participant to ballot conductor.Success- message sent by ballot conductor to all processes once the ballot is successfully completed.
The content and timing of each message is described below.
The following algorithm must be implemented for each process that is participating in a Basic Paxos run.
This step is invoked when a new ballot must be started, perhaps on client request.
- If
status!=idle, reject request. Ifstatus!=idle,pis already conducting a ballot. - Let
b = ledger.lastTried + 1, where1is added to the proposal number. First valid proposal number is thus0, as initial value ofledger.lastTriedis(-1, p.id). - Set
ledger.lastTriedtob. - Set
p.statustotrying. - Set
p.prevVotesto empty set. - To each process participating in basic paxos, send
NextBallot(ledger.lastTried)PREPARE 1a message, including to itself.
This is executed by each process that receives the NextBallot message.
- Let
b=NextBallot.b. - if
b > ledger.maxBalthen- Set
ledger.maxBaltob - To the sender of ballot
b, i.e.owner(b), sendLastVote(PROMISE 1b) message with following contentsvoter-p.id(i.e. the id of the process sending theLastVote)vVote containingb- ballot numberledger.maxVBalledger.maxVal
- Set
- if
b == ledger.lastTriedandstatus == tryingthen- Set add vote
vto the setprevVotes. Note that two votes that are from different participants (i.e.LastVote.voteris different), must be considered as distinct votes here. - If count of
prevVoteswith ballotbis>=toquorumSizethen start polling.
- Set add vote
This step is enabled when status=trying and there is a quorum of votes in prevVotes as described above.
- Set
statustopolling - Set
quorumto the set of processes inprevVoteswherev.b=ledger.lastTried - Set
votersto the empty set. - Let
maxVote = MaxVote(prevVotes) - If
maxVotehas a ballotbwith proposal number-1, then setdecreeto the value requested by client. - Else if
maxVotehas a ballotbwith proposal number>=0then setdecreetomaxVote.decree. - Send
BeginBallot(b,decree)ACCEPT 2a to all the participants, including itself.
This is executed by each process that receives the BeginBallot message.
- if
BeginBallot.b >= ledger.maxBal- Set
ledger.maxBaltoBeginBallot.b - Set
ledger.maxVBaltoBeginBallot.b - Set
ledger.maxValtoBeginBallot.decree - Send to the owner of ballot
owner(BeginBallot.b), aVoted(b, id)ACCEPTED 2b message wherebisledger.maxBal, andidis the process sending theVotedmessage.
- Set
This step is enabled when status=polling.
- if
Voted.b == ledger.lastTriedandstatus == polling- Add voter process
Voted.voterto the set ofvoters. - if count of
votersis>=toquorumSizethen- If
ledger.outcomeis NULL then setledger.outcometodecree - Send
Success(ledger.outcome)to all participants, including itself - Set
statustoidle
- If
- Add voter process
- If
ledger.outcomeis NULL then setledger.outcometoSuccess.outcome