How are your ShardStoreServers going to keep track of whether they are
currently reconfiguring or not?
Is it possible for clients to send a key-value operation to the wrong group?
If not, why not? If yes, how will you handle this case?
How do servers find out about new configurations? What about clients?
How will you make sure a server does not skip a configuration?
Are there any messages received by the ShardStoreServer that don't need to be replicated?
If a ShardMove message gets dropped, who is responsible for detecting and
retransmitting? How do you handle duplicate ShardMove messages?
How are you going to handle the issue of repeated Query(-1) commands from
clients? (See section slide 20.)
When are all the AMOApplications for shards initialized? Which node(s)
initializes the AMOApplications?
Part 3
Transactions:
How to handle transactions that both read and write the same key (eg, swap)?
More specifically, in a swap operation, you need to read both keys and
then write those values to the other key. How will your system read the
values and gather them together? How will your system then communicate the
values to the participants who need to do the writes?
How to handle transactions that abort and need to retry?
The application layer (ie, the test framework) should never observe
aborts. Your system must hide them by retrying. Which part of your system
will be responsible for retrying? (Participants, Coordinators,
ShardMaster, Clients?)
Is it possible for a transaction to abort in one configuration and then be
retried in a later configuration?
How to handle reconfiguration during a transaction?
At what point in the protocol can we respond to the client with the result?
Who is responsible for sending this response?
Hint: The answer is not "after phase 2 is complete".
Given your answer to the previous question, what is the point of phase 2? Can
we get rid of phase 2 entirely?
How to handle a dropped PrepareOk message?
How to handle the case where you receive an Abort for a transaction you've
never heard of before? What about the case where that happens and then later
you receive a Prepare for that same transaction?
How to avoid livelock where continuous stream of transactions prevents
reconfiguration because the group is always executing a transaction?
How to handle a dropped client response message across a reconfiguration boundary?
How is this different (if at all) from the corresponding single-key
scenario from part 2?