_____________

				SECTION3

			     Raymond Cheng & Irene Zhang
			     _____________


Table of Contents
_________________

1 Lab 2b architecture
2 Client
.. 2.1 State
.. 2.2 Functions
.. 2.3 Tasks
3 Servers
.. 3.1 State
.. 3.2 Functions
.. 3.3 Primary
.. 3.4 Backup
4 At-most-once RPC design


1 Lab 2b architecture
=====================

Overview:
- In this part of the lab, you are building key-value servers
- Correctness must be maintained as long as (1) the view service is
  never down, and (2) at least one server is up at all times
- Liveness/progress as long as servers are up and resent message get
  eventually delivered (common asynchronous network model)
- Works with network partitions, packet drops
- Only 1 primary active at a time

Assumptions:
- Viewservice never crashes
- Each client has only 1 outstanding put/get

2 Client
========

2.1 State
~~~~~~~~~

  - current view
  - op id
  - no lock! Clients are closed loop, so only execute next op after last
    one returns


2.2 Functions
~~~~~~~~~~~~~

  - MakeClerk (initialization)
  - Get
  - Put


2.3 Tasks
~~~~~~~~~

  1. Get view from view service
    - When no view
    - When operation has failed
  2. Send gets and puts to primary
     - What happens if the primary doesn't respond or responds that it
       is not the primary? Sleep for a while and check the view service
       again
     - How does a server deduplicate 2 identical requests (retries)?
       Label operations with a unique ID


3 Servers
=========

- State machine is a cross product of the state of nodes (view
  service, primary, backup, idleservers, client)
- Any op to client causes state changes at each and messages from one
  node to another (with possible retries)

3.1 State
~~~~~~~~~

- current view
- lock
- data - map[string]string
- log of operations

3.2 Functions
~~~~~~~~~~~~~

  - primary put handler, backup put handler
  - primary get handler, backup get handler
  - primary update view, backup update view
  - tick


3.3 Primary
~~~~~~~~~~~

  1. find view from view service. Do this in the tick() function.
     - What if the primary is out of date between ticks? Stop
     processing ops! But doesn't need to immediately update view.
     - How does the primary find out that it is out of date? The
     backup may refuse an op.
  2. Respond to view service with the view.
     - What happens if the primary does not respond? The view service
     might declare the primary to be dead if something locks for too
     long
     - When does the primary switch to new view?
  3. On new view with new backup, send the complete key/value database
     to the backup
     - What else might have to be transferred? operation log, which we'll
       cover later
     - What happens if it can't contact the backup? Can't make progress
  4. Handle Get and Put. Store keys and values in a map[string]string
     - What about concurrent puts and gets?
  5. Forward ops to backup.
     - What if the primary can't reach the backup? Primary can't proceed
       because the backup might have been promoted!
     - What about gets? They need to be forwarded too otherwise an old
       primary might serve out-of-date gets. Backing them up ensures
       this "split brain" case doesn't happen. Also easy for ensuring
       that the primary and backup make same response to read if they
       get them at different times (although not crucial for this lab)


3.4 Backup
~~~~~~~~~~

  1. find view from view service. Do this in the tick() function.
     - What if the backup is out of date between ticks? Can ignore or
       check view (doesn't matter)
     - How does the back find out that it is out of date? Might get a
       message from a client for the primary and find out that it has
       been promoted
  2. Respond to view service with the view.
     - When does the backup switch its view? Only after it finishes
       getting the latest state from the primary
  3. Handle Get and Put. Store keys and values in a map[string]string
     - What about concurrent puts and gets? Primary should only send one
       at a time
  4. Tells the primary if it was demoted and is still sending ops
     - What case does this happen? When the primary can contact the
       backup but not the view service


4 At-most-once RPC design
=========================

  1. Use a unique client ID and a op id for each put or get
  2. Have servers store response to each op
  3. Have primary check if they have seen and processed each op. If so,
     return the last response
     - What happens when primary fails? The backup should have all of
       the op state, so it can detect the same set of duplicates
     - What state does the backup need to store? The primary needs to
       send the operation log in the transfer. Then the backup needs to
       update its operation log when getting backup ops from primary

5 Ordering of events
====================

When to contact the view server
1. The server should only Ping periodically (in tick())
  never in the Put/Get RPC handlers
2. The client should only ask for a new view if it:
  - doesn't know who the primary is
  - is communicating to the wrong primary