Skip to main content

Lecture 2: More RPC — Whiteboard Descriptions

These are text descriptions of the whiteboard PDF from this lecture.

These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.

Protocol

  • What code and data to run and store on each machine
  • Has the goal of solving some problem
    • in a particular fault model
  • Correctness: works on any execution in the fault model

Key-Value Store

HashMap<String, String> with operations:

  • get(key)
  • put(key, value)
  • append(key, new-stuff)

A Key-Value Store is just a HashMap available over the network. Makes the same get/put/append operations available to clients.

Diagram: A Client box on the left sends get("foo") to a KV Store (Server) box on the right. The server responds with "bar". Note: serialization converts data to bytes.

Diagram: Client sends put("foo", "baz") to KV Store.

Remote Procedure Call (RPC)

An RPC abstracts a network interaction as a method call. It consists of two messages:

  • Request message
    • name of method to call
    • all the arguments
  • Response message
    • how the method terminated
    • success/failure + return value

Naive RPC

  • Allows calling remote methods
  • Failure model: no failures allowed

Naive RPC works like a local method call — send a request, get a response. But this only works if nothing goes wrong with the network or the server.


Standard Fault Model

What can go wrong in the standard fault model?

  • Drop — messages can be lost entirely
  • Delay / reorder — messages can arrive late or out of order
  • Duplicate — the same message can arrive more than once
  • Machine crashes — a machine can stop running at any time

Failure Scenarios

Under the standard fault model, even a simple RPC can fail in multiple ways:

Diagram (Scenario 1 — request dropped): Client sends a request ("req") toward Server. The request is dropped (marked with a red X) before reaching the server. The server never sees the request.

Diagram (Scenario 2 — response dropped): Client sends a request that successfully reaches the Server. The server processes it and sends a response, but the response is dropped (marked with a red X) before reaching the client.

Note that the two scenarios are indistinguishable from the client's perspective — in both cases, the client sent a request and never got a response. But the outcomes are very different: in scenario 1 the server never executed the request, while in scenario 2 it did.

Retry

The client's solution: if you don't get a response, retry — send the request again.

Diagram (Scenario 1 with retry — request dropped): Client sends a request toward Server. The request is dropped (red X). The client retries: it sends the request again, and this time it reaches the Server. The server executes it and responds. (Circled in blue: the client's retry logic.)

Diagram (Scenario 2 with retry — response dropped): Client sends a request that reaches the Server. The server processes it, but the response is dropped (red X). The client retries: it sends the request again, and the server receives it a second time. (Circled in blue: the client's retry logic.)

The problem: in scenario 2, the server executes the request twice. For operations like put, this may be harmless, but for operations like append, executing twice changes the result.

Idempotence

  • f() is equivalent to f(); f()
  • An idempotent operation can be safely retried because executing it multiple times has the same effect as executing it once.
  • If your operations are idempotent, then you can just retry it directly with no extra server logic.
  • We will be interested in making it safe to retry non-idempotent operations.

Naive RPC + Sequence Numbers

  • Unique id (sequence number) on every (new) message
  • Server stores a set of executed sequence numbers (grows without bound)
  • To handle multiple clients: server stores a set of (client_id, seq num, response) tuples
    • Optimization: only store the highest sequence number per client (requires no concurrent requests from one client)

Diagram: Client sends a request to Server (executed, marked with a bracket). The response is dropped (red X). Client retries by sending the same request again. Server recognizes the duplicate sequence number. If we are not careful, we might think the server can just ignore the duplicate, because it was already executed. But the client may not know it was executed. So the server should retransmit the stored response without re-executing (marked with a bracket) — this is safe even for non-idempotent operations.