Design questions for Lab 4

Part 2

ShardStoreServer:

  • How are your ShardStoreServers going to keep track of whether they are currently reconfiguring or not?
  • Is it possible for clients to send a key-value operation to the wrong group? If not, why not? If yes, how will you handle this case?
  • How do servers find out about new configurations? What about clients?
  • How will you make sure a server does not skip a configuration?
  • Are there any messages received by the ShardStoreServer that don't need to be replicated?
  • If a ShardMove message gets dropped, who is responsible for detecting and retransmitting? How do you handle duplicate ShardMove messages?
  • How are you going to handle the issue of repeated Query(-1) commands from clients? (See section slide 20.)
  • When are all the AMOApplications for shards initialized? Which node(s) initializes the AMOApplications?

Part 3

Transactions:

  • How to handle transactions that both read and write the same key (eg, swap)?
    • More specifically, in a swap operation, you need to read both keys and then write those values to the other key. How will your system read the values and gather them together? How will your system then communicate the values to the participants who need to do the writes?
  • How to handle transactions that abort and need to retry?
    • The application layer (ie, the test framework) should never observe aborts. Your system must hide them by retrying. Which part of your system will be responsible for retrying? (Participants, Coordinators, ShardMaster, Clients?)
  • Is it possible for a transaction to abort in one configuration and then be retried in a later configuration?
  • How to handle reconfiguration during a transaction?
  • At what point in the protocol can we respond to the client with the result? Who is responsible for sending this response?
    • Hint: The answer is not "after phase 2 is complete".
  • Given your answer to the previous question, what is the point of phase 2? Can we get rid of phase 2 entirely?
  • How to handle a dropped PrepareOk message?
  • How to handle the case where you receive an Abort for a transaction you've never heard of before? What about the case where that happens and then later you receive a Prepare for that same transaction?
  • How to avoid livelock where continuous stream of transactions prevents reconfiguration because the group is always executing a transaction?
  • How to handle a dropped client response message across a reconfiguration boundary?
    • How is this different (if at all) from the corresponding single-key scenario from part 2?
  • How to handle a dropped CommitOK / AbortOK?