Distributed mutual exclusion
WORK IN PROGRESS
This page is an example of a design doc for a distributed protocol. The protocol
ensures mutual exclusion—each node can acquire the global lock and be
guaranteed that nobody else holds the lock. (For example, this might be useful
if the node needs to perform some actions with the guarantee that no other nodes
are performing similar actions at the same time.)
Preface
- Goals:
- Clients can acquire a global lock with the guarantee that no other client
currently holds the lock. When they are done with the lock, they can
release it so that other clients may acquire it.
- Desired fault model:
- Asynchronous unreliable message delivery with fail-stop node crashes.
- (This protocol does not actually tolerate fail-stop node crashes.)
- Challenges:
- The unreliable network means that can be difficult for nodes to tell
whether a message is "current" or not.
Protocol
- Kinds of node:
- One unique server node that manages the lock state.
- Any number of client nodes that wish to acquire the lock.
- State at each kind of node:
- At the server:
- epoch number: int
- current holder: client id who currently holds the lock or None if nobody holds the lock
- At the client:
- epoch number: int
- mode: {None, Held, Releasing, Released}
- init: None
- None means this client did not participate in the current epoch
- The client does not know whether other clients hold the lock or not.
- Held means this client is the unique lock holder for the current epoch
- In this state, the client knows that no other client currently holds the lock in any epoch.
- Releasing means this client was the unique lock holder for the current epoch, but has begun the process of releasing the lock back to the server.
- The client does not know whether other clients hold the lock or not.
- Released means this client was the unique lock holder for the current epoch, but has successfully finished releasing the lock back to the server.
- The client does not know whether other clients hold the lock or not.
- Messages:
- Request
- Source: Client
- Destination: Server
- Contents: none
- Sent whenever the client wants to acquire the lock.
- What happens at the destination when it is received?
- The server checks whether the lock is currently held.
- If yes, it ignores the Request.
- If no, it sets the current holder to the sender of the request and
sends a Grant message with the current epoch number back to the
sender. The server also sets a GrantRetransmit timer with the
current epoch number.
- Grant
- Source: Server
- Destination: Client
- Contents: epoch number (int)
- Sent when the server decides which client will get the lock in the
given epoch number. Also retransmitted until the server receives a
Release message from the client in that epoch number.
- What happens at the destination when it is received?
- The client checks the epoch number. If it is less than or equal to
client's current epoch number, the message is ignored.
- (It is an invariant of the system that a Grant message cannot be
received for a greater epoch number unless this client is in mode
None or Released.)
- Otherwise, the client sets its current epoch number to the Grant
message's epoch number, and sets mode = Held.
- The client can now perform whatever critical section it needed to
hold the lock for. When the client is done with its critical
section, it should arrange that a ClientRelease timer gets set for
the current epoch number.
- Release
- Source:
- Destination:
- Contents:
- When is it sent?
- What happens at the destination when it is received?
- ReleaseAck
- Source:
- Destination:
- Contents:
- When is it sent?
- What happens at the destination when it is received?
-
Timers:
Conclusion
- Goals achieved:
- For each goal, explain why the protocol and analysis (invariants,
stable properties, fault tolerance analysis, and performance analysis
together) guarantee that the system achieves the goal.
-- Limitations:
- What goals are only achieved sometimes?