0:11

In this lecture, we are going to discuss another classical algorithm for

distributed mutual exclusion, and

we also wrap up our discussion of the mutual exclusion topic.

So the Ricart-Agrawala Algorithm that you've seen already from mutual

exclusion requires replies from all of the processes in the group,

which is one of the reasons that it has a high bandwidth.

It has a bandwidth that is order N.

The key idea in the Maekawa's Algorithm, new algorithm,

is that you need to get replies from only some processes in the group,

not all the processes but only some.

But you also need to ensure that only one process is given access to the critical

section at any point of time which means that safety is guaranteed.

0:45

So Maekawa's Algorithm works as follows,

each process Pi is associated with a voting set called Vi.

A voting set consists of a few other processes from the group.

Each process must belong to its own voting set Vi.

Now, the voting set is not all the other process in the group which is

essentially the approach of the Ricart-Agrawala Algorithm, but

the voting set is only a small subset of the group.

However, you need to make sure that given any two voting sets Vi and

Vj of process Pi and Pj, respectively.

The intersection of these two voting sets is non-empty.

Which means that there's at least one process that belongs to

both Vi as well as Vj.

So given any pair of voting sets, Vi and Vj, you need to make sure that at least

one process is in common between those two voting sets.

This might sound familiar to you, in fact, it is the same concept of the same idea

as Quorums, which you have seen elsewhere in the course.

And however,here, the usage of this is slightly different of course.

2:46

So let's see an example.

So suppose I have four processes in the system, of course,

this is a perfect square.

So we can put them in a matrix as shown here, p1, p2, p3, p4.

And if you select a processes drawing column as its voting set,

then you get voting sets as shown on the left side of the slide.

So consider p1 and the process p1, it's row consist of the process p1 and p2 and

its column consists of the process p1 and p3.

So it's voting set consists of those three process, p1, p2 and

p3, as shown by this circle over here.

Similarly, V2 which is the voting set for p2 consists of the processes p1,

p2 and p4, and so on and so forth.

You notice that any pair of processes intersect in at least one,

in many cases, two of the process in the system.

So [INAUDIBLE] V1 and V4, they intersect in two process, p2 and p3,

in other systems.

4:08

Okay, so these are two key differences.

Anyway, let's look at how the algorithm actually works.

So first, initially the state of a given process.

Again, we described the algorithm by describing what happens at a given

process.

The state of a given process is released which means that currently does not

holding the lock or it's not in the particular section.

Also it's state for voted is false, meaning that it has not yet

voted for any other process to get access to the critical section.

Now, when this process Pi wants to enter the critical section it first sets its

state to be wanted, meaning it wants to enter the critical section.

Then it multicast a request message to all the processes in its own voting set Vi.

Notice that this votings in Vi also includes itself.

So, when Pi sends and receives its own request, then you have to process it.

And I'll describe how this processing is done in the next slide.

Then the process, after sending out this request message, it waits for

a reply or a vote message from all the processes in its own voting set.

And that includes a vote from itself.

When it has all these votes, it can then set a state to be held and

then proceed in to the critical section.

5:58

Now, when a process Pi receives a release message from process Pj,

recall from the previous slide that a release message is received

when Pj has just exited the critical section.

And informs all its voting set members that it has exited the critical section.

Pi looks at its queue of waiting requests.

Remember, that Pi may be in not just Pj's voting set, but

also in the voting set of other processes.

And these other processes may be waiting for a vote from Pi.

But if Pi's queue is empty, it means that Pi doesn't have any outstanding requests

and so it sets its voted variable to be false and it exits in the point of time.

However, if there's something in the queue,

then a dequeue is a head of the queue, say a process, Pk.

Remember, that this means that Pk's voting set contains Pi and

that Pk has previously sent a request to Pi which has been queued.

And the queueing happened here in the past or

right at the top of the slide over here.

That's where the queueing happened.

And in this case, the entry for Pk is dequeued and a reply message is

sent to Pk, meaning that Pi now has voted for Pk to enter the critical situation.

And so Pi sets its voted variable to be true.

So why does this algorithm guarantee safety?

Well, when a process Pi receives replies from all its voting set members Vi

including itself, no other process Pj could have received replies from

all its voting set members Vj.

Because Vj would've had intersection with Vi in at least one process Pk.

And Pk could have sent only one reply or vote at a time.

And it always send a vote to Pi, which means that it could not have voted for

Pj as well.

And so this means that safety is guaranteed by the Maekawa Algorithm.

For liveness, a process needs to wait for

at most N-1 other process to finish the critical section.

And you might think this guarantees liveness.

However, Maekawa's Algorithm has a subtle behavior in its original form

that can actually violate liveness because it can result in a deadlock.

Here is an example of a deadlock.

Again, another example of four process P1, P2, P3, P4.

Suppose all the four processes request access to the critical section.

P1 gets some replies but is waiting for P3's reply.

P3, in turn, is waiting for P4's reply, which is in its voting set.

P4 is waiting for P2's reply and P2, in turn, is waiting for P1's reply.

Now, we've complete a cycle or a circle among these processes, and

this is called a deadlock.

There is not going to be any more progress in the system,

because these processes are going to be waiting for these replies forever.

So the original Ricart-Agrawala Algorithm as we've discussed is deadlock prone and

these deadlocks can occur.

There are of course variance of Maekawa's Algorithms that have been published

that address this issue and that are free from deadlocks.

Returning to original Maekawa's Algorithm, let's analyze its performance.

The bandwidth involved in entering the critical section is square root of

N messages that you send out to your voting set members.

And then square root of N replies so you see back, so

that's 2 times the square root of N messages.

For exit operation you're simply sending a reply message to all your votings and

members, that's just square root of N messages.

These numbers are better than Ricart and

Agrawala's Algorithms which had bandwidths of order N.

You might think square root of N is still a large number.

It's not constant for sure, but it can be fairly small.

So if N is a million for instance,

then the square root of N value is about 1,000, which is a fairly small number.

10:14

So why is the square root of N the right size for

the voting sets in Maekawa's Algorithm?

Once again,

each voting set is of size K, each process belongs to M other voting sets.

Now, the total number of voting set members is K times N,

because each of the N processes has K members in its voting set.

And so, K times N is the total number of voting set members.

Of course because the voting sets overlap and

intersect members or processes maybe repeated in this K times N.

But each process belongs to exactly M voting sets, so K times N divided by M

should be equal to N, because each process appears exactly M times in this K times N.

So K*N/M = N.

And if you evaluate this by canceling N out on both sides you get that K = M.

Which means that the sides of the voting set should equal the number

of voting sets that each process is in.

Okay, so let's have that equation and hold it to the side.

Now, consider a process Pi.

Right, so the number of voting sets that are there in the system

is equal to a Pi's own voting sets so that's the +1 over here.

There are K members in Pi's voting set, right, that K processes.

And each of those processes has their own voting sets.

Right, so that's the total number of voting sets that is there in the system.

11:36

Because each of this is a K voting set member of Pi's is involved

in M total voting sets.

But one of them is Pi's own voting set.

So the remaining voting sets are simply (M-1)*K.

Okay, so that's the total number of voting sets present in the system.

However, the number of voting sets should equal the number of processes in

the system, so we have N equaling this value.

Now, we substitute in the value of M equals K from the earlier equation, and

you get N = (K-1) *K + 1.

Solving this you will see that K equals about a square root of N give or

take a little bit.

And this explains why K equals M equals square root of N is the optimal value for

minimizing the overhead of K.

So in order to minimize K, we set N = (M-1)*K + 1.

And then we use the earlier equation, K = M, to give us a K = square root of N.

Now, the Maekawa's Algorithm of course does not handle failures.

Neither does the ring-based or the centralized algorithm or

the Ricart-Agrawala Algorithm.

Of course, there are fault-tolerant versions of all of these algorithms that

exist out there in the literature.

13:14

And that's what we're going to discuss in very brief detail in this

slide and the next.

Chubby provides only Advisory locks, this means that clients must ensure that

the access locks before they access the critical resource, or the object itself.

So if clients forget to access the log,

then hey, then mutual exclusion may be violated.

So this is why this is known as Advisory locks.

So Chubby, of course, uses a cluster of servers, and

there are five servers by default in the Chubby cell.

One of the servers is marked as Master.

Again, this is elected using one of the Leader Election Protocol that we

send you before.

In this particular figure, Server D is a Master.

Chubby allows clients to not only do locking,

but also writes small configuration files.

And Chubby relies on the Paxos consensus algorithm.

Chubby again, is a group of servers, as one elected as the Master,

the other servers in the group, the non-master servers are just slaves.

They replicate the same information as the Master.

When a client wants to read one of these small configuration files

on the lock file, it sends a read request to the Master which can serve it locally.

Okay, so the Master serves all the request locally.

When a client wants to write, however, it sends a write request to the Master,

which then forwards it to all these servers in the group.

It then gets a majority of them to respond back to the Master.

When a majority of them respond back, you reach a quorum in the system.

Then the Master can send back a response to the client acknowledging that the write

has been done.

And so, this is where mutual exclusion comes into play.

If some other client has already got an access to the critical section,

it means that it has access,

it has permissions from at least a quorum of the servers via the Master.

And the next request that goes to the Master will not reach a quorum, and

it will wait until that first process exits its critical section.

15:27

Now being upon discussion mutual inclusion,

it is a very important problem in cloud computing systems.

There are a variety of classical algorithms.

We've discussed four.

The Central, the Ring-based, the Ricart-Agrawala Algorithm and

the Maekawa Algorithm.

They are not all fault-tolerant but

they are fairly efficient and they all try to guarantee safety and liveness.

And as we progress down from Central to Maekawa,

the algorithms have gotten better and better in terms of performance.

Industry uses a variety of systems for coordination and for

mutual exclusion, these include Chubby at Google which is used for

locking and also to maintain small configuration files.

And also Apache Zookeeper which is an open-sourced system that is used for

coordination, and I encourage you to look up Apache Zookeeper on the web.

[MUSIC]