What do we mean by the phrase "distributed system"?
We want to make a set of multiple computers work together with these properties:
reliably (the system produces correct results in a timely manner)
efficiently (using resources well, and using a reasonable amount of resources)
at huge scale (serving billions of users (6 billion smart phone users globally!),
ideally "just add more computers" to serve more users)
with high availability (almost never "down", "lots of nines")
Why might we want this?
Scale. Early websites started with one server doing everything.
As you get more users, you eventually run out of resources on the one server.
How to scale further? Distributed systems. (sharding)
Availability. In many domains, if the system is unavailable, we lose money (or worse).
If we run our system on one machine, and it needs to be upgraded
or repaired, we have to take it down temporarily.
How to get more availability? Distributed systems. (replication)
Locality: do computation near users to decrease latency
Or because our computation is inherently geographic
Lamport's definition of a distributed system (half joking)
"A distributed system is one where you can't get your work done because some
machine you've never heard of is broken."
(Leslie Lamport is a Turing award winner for his groundbreaking work in
distributes systems. Most of the first half of the class is stuff invented by
him.)
We've made some progress since Lamport's joke.
Today, we think of a distributed system as one where you can get your work done (almost always):
wherever you
whenever you want
even if some components in the system's implementation are broken
no matter how many other people are using the system
as if it was a single dedicated system just for you
How is that possible? We will learn :)
Another definition
Distributed systems are characterized by concurrency and partial failure
Concurrency: doing more than one thing
Partial failure: some parts are not working even though others are not
Concurrency
Concurrency is fundamental to building systems used by multiple people to work on multiple tasks.
This is a tautology, but it has important consequences for many types of systems
Operating Systems
A single computer is used by multiple users, and each user runs multiple processes.
If one program has a null pointer exception, the entire computer does not crash.
How to manage (big) data reliably and efficiently, accessed by multiple users (transactions)
Lots to worry about (including concurrency!) in the single-node setting, but there are also distributed databases.
Partial failure
If we're Google and we have a million machines, some of them are going to be broken at any given time.
But we still want to get our work done.
How to serve a billion clients?
Straw proposal
(A straw proposal is one intended to illustrate its disadvantages.)
Just pay Intel/Apple/IBM/DEC/whoever to build a single really big computer that can serve all of our clients.
This is pretty much how we did it for the first 60 years of computing up to the late 90s!
Had some great help from our friends the computer architects. We just wait a few years and:
Our system would get faster because CPUs get more sophisticated (smaller transistors -> can use more of them to implement CPU)
Our system would get faster because clock frequencies go up (smaller transistors -> smaller chips -> less speed-of-light delay across chip -> can increase clock)
Our system would get more power efficient (smaller transistors require less power, it turns out)
But this doesn't work any more, for a few reasons:
First, many of the architectural free lunches are over.
Clock speeds are constant, power savings have run out, transistors getting smaller much more slowly.
(In fact, lots of exciting recent computer architecture is about how to scale chips further by making them more like networked systems!)
Second, and more importantly, to serve billions of clients, there is no single computer anywhere close to the horizon that can handle this.
The biggest "single" machines are incredibly expensive (e.g., national labs super computers), and if the machine
crashes everybody is out of luck, so they have to be manufactured to higher-than-normal standards.
So the cheapest way to build these big "single" machines is usually to
treat them more like distributed systems of lots of smaller components.
So this straw proposal doesn't really work. What should we do instead to serve billions of clients?
Using more than one machine
We simply have too much work to be done on a single machine. Let's think through using more than one machine.
If we had 10 servers, we could have each of them be responsible for a tenth of the clients.
If we add more servers, can handle even more clients.
-> To serve a huge number of clients requires many components working together.
Suppose we decide we need 1000 servers to server our clients (small by today's standards)
And suppose each of our server experiences a hard drive failure about once every 3 years.
Then across our fleet, we expect 1 hard drive failures per day
-> Individual component failures are common in large-scale systems.
If each of these failures had to be handled manually, we'd be working around the
clock just to keep the system up.
-> Large-scale systems must tolerate failures automatically.
The primary technique we will study for tolerating failures is replication.
The idea is that you run two or more copies of your system at the same time
That way, if some of the copies fail, the other copies are still around and clients can make progress.
-> Failures can be handled automatically through replication.
While replication is great for fault tolerance, it has a couple negative consequences.
First, we need a lot more machines! (At least a factor of 2 more.)
And they should probably be in different cities so that all the replicas don't get taken out once
(And that means its going to take a lot longer for messages to get between replicas)
Second, and more importantly, replication introduces a new obligation on us, the system designers:
We must keep the different replicas in sync with each other. (We call this consistency.)
Otherwise, clients could observe contradictory or wrong results when interacting with different replicas.
-> Any system that uses replication must have a story for how the replicas are kept consistent,
so that clients are blissfully unaware that our service is replicated internally.
But, this consistency strategy has other consequences:
In fact, we may find that the performance of our consistent replicated system
(that uses 2000 or more servers) is so much worse than our original unreplicated
(not fault tolerant) 1000 servers plan, that we need to again increase the
number of servers, say by another factor of 2, to get back that performance.
The fundamental tension
To summarize our discussion above, here is the fundamental tension in distributed systems:
To serve many clients, need many servers.
With many servers, failures are common.
To tolerate failures, replicate.
Replication introduces possibility of inconsistency.
Implement consistency protocol.
Consistency protocol is slow; can't serve all our clients.
Add more servers...
Even rarer failures become common...
Increase replication factor...
Consistency is even more expensive to achieve...
Performance is still unacceptable...
Add even more servers...
In other words:
-> There is a tradeoff between fault-tolerance, consistency, and performance.
We will study several ways of navigating this tradeoff throughout the quarter.
Give up on fault tolerance: use sharding to get high performance, no replication means no consistency worries.
Give up on consistency: can use a cheaper consistency protocol to get better performance while retaining fault tolerance.
Give up on performance: use expensive consistency protocol and assume workload will be relatively low (often makes sense for "administrative" parts of a system).
Challenges
Why is this any different from "normal" programming?
All of us already know how to program. Isn't it just a matter of running a
program on multiple machines?
Yes and no, but mostly no :)
Remember: concurrency and partial failure.
Concurrency is hard in any scenario.
Partial failure is also hard.
machines crash
machines reboot (losing contents of memory)
disks fail (losing their contents)
networks packets are delayed and lost
network itself goes down in part or whole
machines misbehave (have bugs, get hacked)
send messages they aren't supposed to
networks misbehave (have bugs, get hacked)
corrupt packet contents in transit
inject packets that were never sent
people make mistakes
misconfigure machines
misconfigure network
It's super important when designing a system to be really clear about what failures you are trying to handle automatically, and what failures you are willing to handle manually.
Both parts are important! What you leave out means a lot.
We will call this choice a "failure model" or "fault model".
Common failure models
Asynchronous unreliable message delivery with fail-stop nodes:
The network can arbitrarily delay, reorder, drop, and duplicate messages.
Nodes can crash, but if they do, they never come back.
This is the model for the labs
Asynchronous reliable in-order message delivery with fail-stop nodes:
The network can delay messages arbitrarily, but they are delivered in order and eventually exactly once.
Nodes same as above
This is the model for some important distributed algorithms we will study but not implement.
Can be implemented on top of the previous model with sequence numbers and retransmission.
"Byzantine" models refer to models that allow misbehavior
For example:
The network delivers a message to N2 as if it was sent by node N1, but N1 never sent that message.
The network corrupts a message.
A node does not follow the protocol (it got hacked, or cosmic ray, or something)
We will not study these models much, but fun fact:
Surprisingly, you can actually build systems that tolerate some number of Byzantine failures
Want to be correct under all possible behaviors allowed by our fault model.
Distributed systems are usually very challenging to test because there are many combinations of failures to consider.
(This course does a particularly good job at testing!)
Remote Procedure Call (RPC)
Executive Summary
RPCs allow nodes to call functions that execute on other nodes using convenient syntax
The key difference from a local function call is what happens when things fail
To tolerate failures, use sequence numbers and retransmissions
Intro to RPC
What is it?
It's a programming model for distributed computation
"Like a procedure call, but remote"
The client wants to invoke some procedure (function/method), but wants it to run on the server
To the client, it's going to look just like calling a function
To the server, it's going to look just like implementing a function that gets called
Whatever RPC framework we use will handle the work of actually calling the server's implementation when the client asks
For context, Google does about \(10^{10}\) RPCs per second
Local procedure call recap
Remember roughly how local function calls work:
In the high-level language (e.g., C :joy:), we write the name of the function and some expressions to pass as arguments
The compiler orchestrates the assembly code to:
in the caller, evaluate the arguments to values
arrange the arguments into registers/the stack according to the calling convention
use a call instruction of some kind to jump to the label of the callee function
caller instruction pointer (return address) is saved somewhere, typically on the stack
callee executes, can get at its arguments according to calling convention
callee might call other functions
callee eventually returns by jumping back to return address, passing returned values according to calling convention
C programmers rarely have to think about these details (good!)
RPC workflow
Key difference: function to call is on a different machine
Goal: make calling remote functions as easy as local functions
Want the application programmer to be able to just say f(10) or whatever, and have that automatically invoke f on a remote machine, if that's where f lives.
Mechanism for achieving this: an RPC framework that will send request messages to the server that provides f and response messages back to the caller
Most of the same details from the local case apply
Need to handle arguments, which function to call (the label), where to return to, and returned values.
Instead of jumping, we're going to send network messages.
RPC from a programmer's perspective
Key difference: function to call is on a different machine
Goal: make calling remote functions as easy as local functions
Want the application programmer to be able to just say f(10) or whatever,
and have that automatically invoke f on a remote machine, if that's where f lives.
Mechanism for achieving this: an RPC framework that will send request messages
to the server that provides f and response messages back to the caller
Most of the same details from the local case apply
Need to handle arguments, which function to call (the label),
where to return to, and returned values.
Instead of jumping, we're going to send network messages.
Here's how it's done:
In the high-level language, we write the name of the function and some expressions to pass as arguments
The compiler and RPC framework:
orchestrate evaluating the arguments on the clients
the client makes a normal, local procedure call to something called a "stub"
the stub is a normal local function implemented/autogenerated by the RPC framework that:
serializes the arguments (converts them into an array of bytes that can be shipped over the network)
sends the function name being called and the arguments to the server (this is called a request message)
waits for the server to respond with the returned value (this is called a response message)
deserializes the returned value and then does a normal, local return from the stub to return that value to the caller
The server sits there waiting for requests. When it receives one, it:
parses the message to figure out what function is being requested
deserializes the arguments buffer according to the type signature of the function
invokes the requested function on the deserialized arguments
serializes the returned value and sends a response to the client
Here is a diagram of what happens when a client invokes an RPC:
Now is a good time to get familiar with this style of diagram. We will be seeing them a lot this quarter. Key points:
Time flows down
Columns are nodes
Diagonal arrows are messages (they arrive after they were sent, hence pointing slightly down)
And suppose we set up our RPC framework on the client and told it where to find the server, and we've told the framework about incrementBy and its type signature.
Here is what will happen:
The client starts running until line *.
When the client invokes incrementBy, it's really invoking a stub
The client stub for incrementBy will:
Construct a message containing something like "Please call incrementBy on argument 10"
This involves serializing all the arguments (in this case, just 10)
Send this message to the server and wait for a response
When the request message arrives at the server, the RPC framework will:
Parse the message, figure out what function is being requested, and deserialize the data to the right argument types
Invoke the "real" implementation of incrementBy on the provided arguments
Construct a message containing the result
This involves serializing the result (in this case, just 10,
(assuming this is the first time f has run, so total was still
0))
Send this message back to the client
Such messages are called "responses"
When the response arrives back on the client, the stub continues executing:
Parse the response message
Return the result from the stub to the caller.
The key points to notice about RPCs are:
When we implement incrementBy on the server, we just wrote a normal function
When we wanted to call incrementBy on the client, we just wrote a normal function call
The RPC library has to do a bunch of work to make this nice interface possible
Finally, notice that this RPC is stateful: it causes the server to update some state
In this very simple example, just a global variable on the server
But state is very common, and often the server would actually store the state in a database or some other backend
(and the server would communicate with that backend via further RPCs)
If you have an RPC that doesn't manipulate state on the server, then several
optimizations (e.g., caching) become possible that are not possible for
general stateful RPCs.
"Naive" RPC as a distributed protocol
So far we've focused on the programmer's perspective.
Now let's look at our RPC mechanism as a distributed protocol.
We will call the RPC mechanism we've described above "naive RPC",
because as it turns out, it is lacking in several respects, and we
will need to improve it further for it to be useful.
Naive RPC as a distributed protocol:
Preface:
Goals:
Allow clients to call functions that are executed on the server.
This is challenging because the network is unreliable and the server
might crash, so it is difficult for the clients to tell whether their
requests have been executed by the server or not.
(Handling server crashes will take us a few weeks to work up to
properly, but it is an important eventual goal! For lab 1, we will
mostly focus on tolerating network failures.)
Protocol:
Kinds of node:
There are two kinds of nodes: clients and servers.
There can be any number of clients and any number of servers.
State at each kind of node:
Client: None.
Server: None beyond what is required to execute the functions provided over RPC.
Messages:
Request message
Source: Clients
Destination: Servers
Contents:
What function the client is requesting to call.
The (serialized) arguments to pass to that function.
When is it sent?
Whenever a client wants to invoke an RPC.
What happens at the destination when it is received?
The server calls the (local) function on the (deserialized)
provided arguments and sends a response message back to the client
with the (serialized) return value.
Response message
Source: Servers
Destination: Clients
Contents:
The (serialized) return value from the function.
When is it sent?
When a server finishes executing a request.
What happens at the destination when it is received?
The client deserializes the return value and returns it to the application layer.
This is our first example of a distributed protocol. A few things to notice:
The description of the protocol is detailed, but higher level than code.
It should contain enough information that another 552 student could
implement the protocol without having to make any "distributed" design
decisions.
(They might have to make some "local" design decisions about what data
structure to use to store state on each node. That's fine and in fact
should not appear in the protocol.)
The description follows a structured format. We will talk more about this
later. Most importantly, it lists the kinds of nodes, what state they store,
what messages they exchange, and what they do when each message is delivered.
(Later protocols will use timers as well as messages.)
Our next question is: Does the protocol meet its goal?
Well, yes and no.
If there are no failures and the network works reliably, then this protocol
works fine and achieves its goal.
What about if there are failures or the network is not reliable?
That's the fault model we actually care about!
It turns out, the answer is no. This protocol does not tolerate almost any failures.
To see why, we need to do a fault tolerance analysis.
Failures in Naive RPC
A fault tolerance analysis is answers to the following questions:
For each message, how does the protocol handle delays, reorderings,
drops, and duplicates?
For each node, how does the protocol handle the case where that node crashes
fail-stop?
Some important failure cases for RPC
What kinds of failures can happen in our failure model?
Messages delayed, reordered, dropped, and duplicated.
Nodes crash.
Ok, so what are our messages, and what are our nodes?
Messages: Request and Response
Nodes: Client and Server
So we have a bunch of questions to answer—one for each combination of
message and network failure, and one for each node crashing.
Here are some good ones to start with for our discussion:
What if the request message gets dropped?
What if the server crashes fail-stop?
Before the request is sent?
After the request is sent but before it arrives at the server?
After the request is received by the server but before a response is sent?
After the response is sent?
What if the response message gets dropped?
What if a very old request message gets delivered later?
What if a very old response message gets delivered later?
The naive protocol fails in many of these cases:
What if the request message gets dropped?
The client waits forever and never gets its RPC executed. Bad.
What if the server crashes fail-stop?
Before the request is sent?
The client waits forever.
After the request is sent but before it arrives at the server?
The client waits forever.
After the request is received by the server but before a response is sent?
The client waits forever.
After the response is sent?
If the response is delivered to the client, life is good. No thanks to the protocol though :)
What if the response message gets dropped?
The client waits forever.
What if a very old request message gets delivered later?
The server (mistakenly?) executes it again. Bad if RPCs manipulate state.
What if a very old response message gets delivered later?
The client might (mistakenly?) assume the response is for the current
request it sent rather than an old one, and return the wrong result to the
application layer.
Detecting failures
In order to recover from failures, systems often take some corrective action
(retransmit the message, talk to somebody else instead, start up a new machine,
etc.).
A fundamental limitation in distributed computing is that it is impossible to
accurately detect many kinds of failure:
Some are easy: if messages arrive out of order, you can tell by just numbering
them sequentially.
Some are hard: if a node is really slow, another node might think it has
crashed.
The only way to check if a node is up is to send it a message and hope for
a response
But if the node is just super slow, you won't get a response for a
while, and during that time you have no way to know if its because
the node crashed or is slow (or maybe the network is slow or
dropped your message or the response)
Another hard one: "network partition"
Nodes 1 and 2 can talk to each other just fine, and nodes 3 and 4 can do
the same among themselves, but neither 1 nor 2 can talk to 3 or 4 or vice
versa.
Very confusing if your mental model is "a node is either up or down"
In a sense, nodes 1 and 2 are "down" from the perspective of 3 and 4.
Important to realize that this is not a "new" kind of failure:
If you assume the network can drop any message, then the network can
"simulate" a network partition.
The takeaway here is that nodes have only partial information about the global state of the system.
We can't know if the server received our message unless we hear back from the server.
Towards tolerating failures in RPC
The first step is figuring out what we want to happen.
When a client sends a request message, what guarantees does it have about the RPC getting executed?
We call these guarantees "RPC semantics" because they define what an RPC means.
"semantics" is a fancy word for "meaning".
Four options for RPC semantics:
Naive (above, broken, no guarantees)
At least once (NFS, DNS, lab 1b, only possible if you are willing to block forever in the case that the network goes down permanently or the server goes down permanently)
At most once (common)
Exactly once (lab 1c, only possible if you are willing to block forever in the case that the network goes down permanently or the server goes down permanently)
Identifying requests
The most basic problem with Naive RPC is that the client cannot distinguish two
different response messages from the server.
Consider this scenario from the client's perspective:
client sends request f(10)
client receives a response with result 20
client sends request f(15)
client receives a response with result 20
If we don't know anything about f, how would we know whether the second
response is for f(15) or a late-arriving duplicate of the first response?
We can't!
To solve this problem, we need to uniquely identify requests, and then tie the
response message to the request using the identifier.
Typical solution is to use per-client sequence numbers:
Each client numbers its requests with integers starting at 0.
Server includes the request number in its responses.
This solves the above scenario like this:
client sends request f(10) with sequence number 0
client receives a response with result 20 for sequence number 0
client sends request f(15) with sequence number 1
client receives a response with result 20 for sequence number 0
Now we can tell that the response was actually a duplicate.
If instead the client received a response that the result for sequence number 1 is 20
then we would know that it wasn't a duplicate and that the result just
happened to be 20 again.
Naive RPC with request identifiers
Here is an updated protocol now that we have added request identifiers to Naive RPC.
Protocol:
Kinds of node:
There are two kinds of nodes: clients and servers.
There can be any number of clients and any number of servers.
State at each kind of node:
Client:
current sequence number, an integer, initially 0.
set of outstanding requests, a set of sequence numbers, initially empty.
Server: None beyond what is required to execute the functions provided over RPC.
Messages:
Request message
Source: Clients
Destination: Servers
Contents:
What function the client is requesting to call.
The (serialized) arguments to pass to that function.
A sequence number (integer)
When is it sent?
Whenever a client wants to invoke an RPC.
To start a new RPC, the client takes its current sequence
number, \(n\), and sends a Request message to the server with
\(n\) (and the function name and arguments). It then adds
\(n\) to its set of outstanding requests and finally
increments its current sequence number.
What happens at the destination when it is received?
The server calls the (local) function on the (deserialized)
provided arguments and sends a response message back to the client
with the (serialized) return value and the same sequence number as the request.
Response message
Source: Servers
Destination: Clients
Contents:
The (serialized) return value from the function.
A sequence number (integer)
When is it sent?
When a server finishes executing a request.
What happens at the destination when it is received?
The client checks if the sequence number is in its set of
outstanding requests. If not, the message is ignored.
Otherwise, the client removes the message's sequence number
from its set of outstanding requests, deserializes the return
value and returns it to the application layer.
This protocol solves the problem of confusing response messages from different requests.
However, it still suffers from many other problems, such as the client waiting
forever if certain messages get dropped, and the server executing the same
request multiple times.
RPC Semantics
Our options in more detail
Naive
The client might never receive a response (e.g., request gets dropped)
If the client receives a response message to a request:
we know nothing because we cannot distinguish response messages from different requests, so it's possible this response was to a previous request.
(technically, if this is the first request the client has sent, then it
knows the request has been executed at least once, because there are no
other previous response messages that this one could be a duplicate of.)
If the client does not receive a response message:
we know nothing: the request might have been executed (or it might not, or it might have been executed more than once)
Naive with request identifiers
The client might never receive a response (e.g., request gets dropped)
If the client receives a response message to a request:
then the request was executed at least one time (perhaps more than once)
If the client does not receive a response message:
we know nothing: the request might have been executed (or it might not, or it might have been executed more than once)
At least once
The client will eventually receive a response message (or die trying).
If the client receives a response message to a request:
then the request was executed at least one time (perhaps more than once)
If the client does not receive a response message:
it blocks forever retransmitting the request and waiting until it receives a response
At most once
The client might never receive a response (e.g., request gets dropped)
If the client receives a response message to a request:
then the request was executed exactly once
If the client does not receive a response message:
then the request might have been executed (or it might not, but definitely not more than once)
Exactly once
The client will eventually receive a response message (or die trying).
If the client receives a response message to a request:
then the request was executed exactly once
If the client does not receive a response message:
it blocks forever retransmitting the request and waiting until it receives a response
Implementing at least once
send request and wait for response
if you wait for a while and hear nothing, re-transmit request
if you re-transmit a few times and still hear nothing, keep trying
In practice, give up at some point and return error to the calling application
(Note that if you give up, then this is not really "at least once",
since we're not sure whether the request was executed or not.)
Typically an RPC framework would throw some kind of exception from the
stub, or maybe return a special error value to indicate that the call failed
This is pretty different from a local function call! It's an error that
says "I couldn't call the function (I think, but I might have)"
Advantages:
The server keeps no state (beyond what is required to execute the functions locally) -- just executes the requests it receives
Disadvantages:
Can be difficult to build applications that tolerate operations happening more than once
When to use it:
Operations are pure (no side effects) or idempotent (doing it more than once is the same as doing it once)
For example:
reading some data
taking the max with some new data
say n is a global int on the server, then the operation n = max(n, x) is idempotent (where x is an argument to the request)
Implementing at most once and exactly once
Key ideas: filter out duplicate requests on server; retransmit forever to get exactly once.
Client sends request and waits for response
For exactly once semantics:
If client doesn't hear response after a while, re-transmit request with same sequence number
If still nothing after a few retries, give up and return error
If you keep retrying forever, that's "exactly once (or die trying)"
On the server, keep track of which (client, sequence number) pairs have been
executed, and don't re-execute duplicate requests. Instead, return
previously-computed response.
Important to realize that two different clients can use the same sequence
number (they all start at 0!), so you need to store client names along
with the sequence numbers
Advantages:
Usually easier to program on top of
Works well for stateful operations
Disadvantages:
Server has to keep state proportional to the number of requests (but see below for optimizations)
Implementation challenges:
If the client crashes and reboots, it may lose track of the last sequence number it used, and restart numbering at 0, which would cause bugs.
In the labs, we assume a fail-stop model where nodes crash but don't reboot.
In practice, one way to handle this is to change the client's name every time it restarts.
How can we reduce the server state?
Option 1: client tells server "I have heard your response to all sequences numbers \(\le x\)"
server can discard responses for those requests (and remember that it discarded them somehow!)
Option 2: only allow one outstanding RPC per client at a time
when request numbered \(x + 1\) arrives, can discard all previous state about that client
The labs use option 2.
Protocol for at least once
Here is a written out protocol for At Least Once RPC.
The changes relative to Naive RPC with Request Identifiers are marked with "NEW".
Protocol:
Kinds of node:
There are two kinds of nodes: clients and servers.
There can be any number of clients and any number of servers.
State at each kind of node:
Client:
current sequence number, an integer, initially 0.
set of outstanding requests, a map from sequence numbers to request messages, initially empty.
NEW changed from set to map, so that we can store the requests. We'll need them to retransmit later.
Server: None beyond what is required to execute the functions provided over RPC.
Messages:
Request message
Source: Clients
Destination: Servers
Contents:
What function the client is requesting to call.
The (serialized) arguments to pass to that function.
A sequence number (integer)
When is it sent?
Whenever a client wants to invoke an RPC.
To start a new RPC, the client takes its current sequence
number, \(n\), and sends a Request message to the server with
\(n\) (and the function name and arguments). It then adds
\(n \mapsto Request(n, f, x)\) to its map of outstanding requests and
increments its current sequence number. Finally, The client
also sets a RequestRetransmit timer with sequence number \(n\)
NEW: changed the set to a map, also added setting the timer
What happens at the destination when it is received?
The server calls the (local) function on the (deserialized)
provided arguments and sends a response message back to the client
with the (serialized) return value and the same sequence number as the request.
Response message
Source: Servers
Destination: Clients
Contents:
The (serialized) return value from the function.
A sequence number (integer)
When is it sent?
When a server finishes executing a request.
What happens at the destination when it is received?
The client checks if the sequence number is in its map of
outstanding requests. If not, the message is ignored.
Otherwise, the client removes the message's sequence number
from its map of outstanding requests, deserializes the return
value and returns it to the application layer.
Timers:
RequestRetransmit
Set by clients
Contents: a sequence number (integer)
Set whenever a client sends a new RPC
What happens when it fires?
The client checks if the timer's sequence number is still in
its map of outstanding requests. If not, the timer is ignored.
Otherwise, the client retransmits the request message stored
in the outstanding request map for this sequence number, and
then resets the timer again (with same sequence number).
Does TCP solve all our problems?
TCP: reliable bi-directional byte stream between two nodes
Retransmit lost packets
Detect and filter out duplicate packets
Useful! Most RPCs sent over TCP in practice
But TCP itself can time out
For example, if the server crashes or the network goes down for long enough.
Usually TCP will retransmit a few times, but after say 30 seconds, it gives up and returns an error.
Application needs to be able recover, usually involves establishing a new TCP connection.
Question: on reconnection, were my old requests executed or not?
Answer: TCP sure won't tell you that, so you need to implement an
application-layer mechanism (including application-layer sequence numbers,
probably) to figure it out.
What if the server crashes?
If the list of all previous responses is stored in server memory, it will be lost on reboot
After reboot, server would incorrectly re-execute old requests if it received a duplicate.
One option would be to keep the state in non-volatile storage (hdd, ssd)
Another option is to give server new address on reboot (fail stop model)
this is what the labs do
More about RPC in practice
Serialization
Refers to converting the in-memory representation of some (typed) data into a linear sequence of bytes
Usually so we can write those bytes to the network or to disk.
Above we did an example that took an int and returned an int.
Serialization just encodes the int as its four bytes.
Other primitive types (float, long, etc.) work similarly.
What about pointers?
Should we just send the pointer over the network?
No, doesn't make sense because the server won't be able to dereference it
Whatever that pointer points to on the client is probably not even in the servers memory, much less at the same address.
Could convert to a "global reference", or use some kind of global addressing system
Definitely possible! Complicated.
Instead, most of the time what we do is pass a copy of the data pointed to by the pointer.
For example, if we have an RPC to write to a file on the server like void write(char* data) or something,
the client will send the server a copy of the data buffer.
Similarly, if we have an RPC to read the file like char* read(), then we want to get a copy of that returned data back on the client.
More interestingly, what if we had a void read(char* dest) function that wrote the answer into its argument
The dest pointer is an "out pointer", it is passed just so that the function read can write to it.
Then we don't need to send anything, really, to the server
But we need the server to send us back the contents of dest after the function call!
Such "out pointers" have to be handled specially by RPC frameworks
In the labs, we use Java, which has serialization built in. This will make implementing our RPCs relatively easy.
RPC vs procedure calls
From the application programmer's perspective, very similar to a normal (local) procedure call.
Some additional complexities under the hood that don't show up with local procedure calls.
"Binding": the client RPC library needs to know where to find the server
Need to make sure the function we want to call actually exists on the server
And that the server is running the version of the software we expect
Binding is often solved through "service discovery"
Have one well-known server whose job it is to keep track of all the servers, their names/addresses, what RPCs they support, what version they're running, etc.
Then clients first talk to the the service discovery server to find a server that will work for them.
Implementing the stubs
The RPC framework often has a compiler-like thing that will autogenerate code to do serialization, send, recv, deserialization, etc.
Takes as input the signatures of the procedures.
Performance
A local procedure call is very fast, on the order of 10 instructions (a few nanoseconds)
RPC to a machine in the same data center: about 100 microseconds (10k times slower than local call)
RPC to a machine on other side of planet: about 100 milliseconds (10 million times slower than a local call)