Problem Set 2: Primary-backup

Due: Friday, October 20, 11:59pm via Gradescope

Each problem is worth 7 points.

Instructions

An event is either a message delivery, a timer firing, or a message failure or node crash.

If a problem says to "describe an infinite execution", you should describe the infinite sequence of events that occur. Since it is infinite, you cannot just list them, so you have to describe them more generally than that. An event in an infinite execution can be a message delivery, a timer firing, a network failure (drop, duplicate), or a node crashing. Your infinite executions should satisfy "timer fairness", which means that every timer that gets set by a node that doesn't crash eventually gets fired. You do not need to describe the states of your execution, just the events.

If a problem says to "describe a finite execution at a high level", you should describe a finite sequence of events. You do not necessarily need to list out each event explicitly: instead, you can just summarize several events that lead to a certain intermediate state. Use your judgement and try to focus on the most relevant parts of the execution to the problem. Feel free to start your execution in a "fully initialized view" with a primary and a backup, instead of the initial state. You do not need to worry about timer fairness in finite executions. Also, you do not need to describe the states of your execution, just the events.

Unless stated otherwise, assume our default fault model for this course: asynchronous unreliable message delivery with fail-stop node crashes.

The Problems

The first two problems are about just the view server and the ping messages sent to the view server by all other servers. We call this the View Server Ping protocol:

  • 2 kinds of nodes: one view server and any number of other servers
  • State:
    • view server
      • pingSet: set of servers that have pinged since the last PingCheckTimer
      • aliveSet: set of servers considered alive
    • other servers: no relevant state
  • Messages
    • Ping: from server to view server. No data.
  • Timers
    • SendPingTimer:
      • Set by other (non-view) servers. No data.
      • When it fires, the server sends a ping message to the view server and resets the timer.
    • PingCheckTimer
      • Set by the view server. No data.
      • When it fires, the view server overwrites aliveSet with pingSet, and then empties pingSet. Then it resets the timer.

(Note that while this is similar to part of what you will implement in lab 2, you may have to extend it in some ways. Don't necessarily take anything above as instructions for "you should do it this way in lab 2". For example, view numbers are missing on Ping messages in this protocol, but you will need them in lab 2.)

  1. Describe an infinite execution of the View Server Ping protocol where a node crashes but the view server detects it as up forever.

  2. Describe an infinite execution of the View Server Ping protocol where a node does not crash, but the view server detects it as down forever.

In lecture and section we have described all the relevant pieces of the "full" primary-backup protocol, including the view server, state transfer, request forwarding, etc.

  1. For this problem only, consider a modified version of the full primary-backup protocol where we remove state transfer. We call this protocol "primary-backup without state transfer". Instead, as soon as the primary and backup hear from the view server about the new view, they just start accepting requests.

    Describe at a high level a finite execution of the system with only one client that submits only append operations on the key "foo" (and only one outstanding request at a time), where a later append request returns a result that contradicts an earlier append request to the same key. (Surprise Pikachu.)

The next three questions are about the full primary-backup protocol.

Suppose we are in view 3 with a primary \(S_1\) and backup \(S_2\), and state transfer is complete. Client \(C_1\) submits a request \(r\) to \(S_1\) who forwards it to \(S_2\). Call this state the "pre-continuation point" in the execution. Now consider two different continuations of this execution.

  1. Starting from the pre-continuation point, suppose \(S_2\) executes the request and sends an acknowledgement to \(S_1\), but before the acknowledgement arrives, \(S_1\) crashes. The view server detects this and wants to move to the next view. Suppose that \(S_3\) is the only idle server available. Answer the following questions about the new view:

    • Who will be selected as primary in the next view? Who will be selected as backup?
    • After state transfer completes in the new view, has request \(r\) been executed or not?
    • Depending on your answer to the previous bullet point, answer the corresponding question below:
      • If you said that \(r\) has been executed, has \(C_1\) received a response yet? If yes, describe when this response was sent. If no, describe what actions \(C_1\) could take to ensure it gets a response in the future.
      • On the other hand, if you said \(r\) has not been executed, how would \(C_1\) proceed to get it executed?

  1. Starting from the pre-continuation point, now suppose \(S_2\) crashes before executing \(r\). The view server detects this and wants to move to the next view. Suppose that \(S_3\) is the only idle server available. Answer the following questions about the new view:

    • Who will be selected as primary in the next view? Who will be selected as backup?
    • After state transfer completes in the new view, has request \(r\) been executed or not?
    • Depending on your answer to the previous bullet point, answer the corresponding question below:
      • If you said \(r\) has been executed, has \(C_1\) received a response yet? If yes, describe when this response was sent. If no, describe what actions \(C_1\) could take to ensure it gets a response in the future.
      • On the other hand, if you said \(r\) has not been executed, how would \(C_1\) proceed to get it executed?

  1. Using your knowledge of the scenarios from the previous two problems, can \(C_1\) tell whether or not its request has been executed or not? If yes, describe how. If not, explain why not and how \(C_1\) can still make progress without knowing which scenario occurred.

The next three questions consider a few "optimizations" to the full primary-backup protocol.

  1. Suppose we "optimize" the full primary-backup protocol by allowing the view server to pick any idle server as the primary for view \(i + 1\) (even a server that was not the primary or backup of view \(i\)). The rest of the protocol remains unchanged.

    Describe at a high level a finite execution of the system with only one client that submits only append operations on the key "foo" (and only one outstanding request at a time), where a later append request returns a result that contradicts an earlier append request to the same key. (Surprise Pikachu.)

  2. Suppose we "optimize" the full primary-backup protocol by allowing the view server to change the view even if the previous view has not been acknowledged by the primary.

    Describe at a high level a finite execution of the system with only one client that submits only append operations on the key "foo" (and only one outstanding request at a time), where a later append request returns a result that contradicts an earlier append request to the same key. (Surprise Pikachu.)

  3. Suppose we "optimize" the full primary-backup protocol so that when the primary receives a client request it first executes that request locally, immediately replies to the client, and then forwards it to the backup and waits for the backup's acknowledgement before executing any further requests.

    Describe at a high level a finite execution of the system with only one client that submits only append operations on the key "foo" (and only one outstanding request at a time), where a later append request returns a result that contradicts an earlier append request to the same key. (Surprise Pikachu.)

The next two questions ask you determine whether certain scenarios are possible in the full primary-backup protocol or not. Given a property \(P\), we will ask you to categorize it as one of the following:

  • invariant: "always true"
    • On every execution of the system, \(P\) is true in every state throughout that execution.
  • stable: "sometimes false, but once it becomes true it remains true"
    • On every execution of the system, if \(P\) becomes true at some point, it will remain true for the rest of this execution.
    • \(P\) is false at the beginning of some executions (otherwise it would be an invariant).
  • unstable: "sometimes false, and it can go from true to false"
    • There is an execution where \(P\) is true at some point in time and then later false.

  1. For each of the following properties, categorize it as invariant, stable, or unstable.

    If you answer "unstable", justify your answer by describing at a high level a finite execution of the system where the property is true at one point but later false.

    If you answer "invariant" or "stable", you do not need to explain or justify your answer in any way.

    We use the phrase "current view" to mean "the view that the view server is currently advertising". Notice that the "current view" changes when the view server decides to do a view change.

    • "Server \(S_1\) is the primary in view 3".
    • "Server \(S_1\) is the primary of the current view".
    • "If the current view number is greater than 0, then the current view has a primary".
    • "If the current view number is greater than 0, then the current view has a backup".
    • "If the current view number is greater than 0, then the primary of the current view is alive."

  1. For each of the following properties, categorize it as invariant, stable, or unstable.

    If you answer "unstable", justify your answer by describing at a high level a finite execution of the system where the property is true at one point but later false.

    If you answer "invariant" or "stable", you do not need to explain or justify your answer in any way.

    We use the phrase "current view" to mean "the view that the view server is currently advertising". Notice that the "current view" changes when the view server decides to do a view change.

    • "If there is a response message in the network to client \(c\) with sequence number \(n\), then the corresponding request has been executed on the current view's primary's application."
    • "If \(S_1\) is the primary of view \(i\) and \(S_2\) is the backup of view \(i\), and there is a forwarded-request-acknowledgement message with view number \(i\) for request \(r\) in the network from \(S_2\) to \(S_1\), then \(r\) has been executed on \(S_2\)'s application."
    • "If \(S_1\) is the primary of view \(i\) and \(S_2\) is the backup of view \(i\), and there is a forwarded-request-acknowledgement message with view number \(i\) for request \(r\) in the network from \(S_2\) to \(S_1\), then \(r\) has been executed on the current view's backup's application."

  1. Suppose we are in a view with primary \(S_1\) and backup \(S_2\) and that state transfer has completed. Then the view server thinks that \(S_1\) is down, so it moves to the next view, promotes \(S_2\) to primary, and selects idle server \(S_3\) as the new backup. But the message containing information about the new view takes a long time to arrive at \(S_2\). Since \(S_1\) is not actually crashed, clients can continue to execute requests by sending them to \(S_1\), who forwards to \(S_2\), who acknowledges, and so on, as normal.

    Explain briefly (1 to 5 sentences) why is it not a bug that the old view can keep executing even after the view server has moved to a new view. In particular, explain why the new view and the old view cannot operate independently and diverge from each other.