Distributed mutual exclusion

WORK IN PROGRESS

This page is an example of a design doc for a distributed protocol. The protocol ensures mutual exclusion—each node can acquire the global lock and be guaranteed that nobody else holds the lock. (For example, this might be useful if the node needs to perform some actions with the guarantee that no other nodes are performing similar actions at the same time.)

Preface

  • Goals:
    • Clients can acquire a global lock with the guarantee that no other client currently holds the lock. When they are done with the lock, they can release it so that other clients may acquire it.
  • Desired fault model:
    • Asynchronous unreliable message delivery with fail-stop node crashes.
      • (This protocol does not actually tolerate fail-stop node crashes.)
  • Challenges:
    • The unreliable network means that can be difficult for nodes to tell whether a message is "current" or not.

Protocol

  • Kinds of node:
    • One unique server node that manages the lock state.
    • Any number of client nodes that wish to acquire the lock.
  • State at each kind of node:
    • At the server:
      • epoch number: int
        • init: 0
      • current holder: client id who currently holds the lock or None if nobody holds the lock
        • init: None
    • At the client:
      • epoch number: int
        • init: 0
      • mode: {None, Held, Releasing, Released}
        • init: None
        • None means this client did not participate in the current epoch
          • The client does not know whether other clients hold the lock or not.
        • Held means this client is the unique lock holder for the current epoch
          • In this state, the client knows that no other client currently holds the lock in any epoch.
        • Releasing means this client was the unique lock holder for the current epoch, but has begun the process of releasing the lock back to the server.
          • The client does not know whether other clients hold the lock or not.
        • Released means this client was the unique lock holder for the current epoch, but has successfully finished releasing the lock back to the server.
          • The client does not know whether other clients hold the lock or not.
  • Messages:
    • Request
      • Source: Client
      • Destination: Server
      • Contents: none
      • Sent whenever the client wants to acquire the lock.
      • What happens at the destination when it is received?
        • The server checks whether the lock is currently held.
        • If yes, it ignores the Request.
        • If no, it sets the current holder to the sender of the request and sends a Grant message with the current epoch number back to the sender. The server also sets a GrantRetransmit timer with the current epoch number.
    • Grant
      • Source: Server
      • Destination: Client
      • Contents: epoch number (int)
      • Sent when the server decides which client will get the lock in the given epoch number. Also retransmitted until the server receives a Release message from the client in that epoch number.
      • What happens at the destination when it is received?
        • The client checks the epoch number. If it is less than or equal to client's current epoch number, the message is ignored.
        • (It is an invariant of the system that a Grant message cannot be received for a greater epoch number unless this client is in mode None or Released.)
        • Otherwise, the client sets its current epoch number to the Grant message's epoch number, and sets mode = Held.
        • The client can now perform whatever critical section it needed to hold the lock for. When the client is done with its critical section, it should arrange that a ClientRelease timer gets set for the current epoch number.
    • Release
      • Source:
      • Destination:
      • Contents:
      • When is it sent?
      • What happens at the destination when it is received?
    • ReleaseAck
      • Source:
      • Destination:
      • Contents:
      • When is it sent?
      • What happens at the destination when it is received?
  • Timers:

    • GrantRetransmit

    • List all the kinds of timers

    • For each kind of timer, list:
      • What kind of node sets it?
      • Contents:
      • When is it set?
      • What happens when it fires?

Conclusion

  • Goals achieved:
    • For each goal, explain why the protocol and analysis (invariants, stable properties, fault tolerance analysis, and performance analysis together) guarantee that the system achieves the goal. -- Limitations:
    • What goals are only achieved sometimes?