From: Cem Paya 98 (Cem.Paya.98_at_Alum.Dartmouth.ORG)
Date: Mon Feb 02 2004 - 15:04:05 PST
Paper review: RPC (Andrew Birrell, Bruce Jay Nelson)
Cem Paya
This paper gives a high level overview of one particular
implementation for remote procedure calls. RPC extends the
concept of procedure call to the scenario where the caller
and callee may reside on different machines. The authors
implemented the proposed design using Mesa programming
language on Dorado machines (which is according to the
claimed figures, fast machine for its time) running an
operating system called Cedar, connected over Ethernet.
Their proposal is conceptually very simple and elegant,
but involves devising or modifying quite a few components
along the way. For example they reinvent a variant of UDP
trying to devise a new network transport protocol, change
the Ethernet driver to route RPC message directly to the
correct process and write an automatic code-generating
code that generates stubs from Mesa inteface descriptions.
Much of the complexity derives from satisfying two
requirements: transparent semantics that approximate local
procedure call as closely possible and high-performance
that allows use in practical distributed systems.
Given the paper’s vintage it’s amazing that design
decisions made here are not too far from contemporary RPC
systems, although reading the paper suggests answers were
not obvious at the time. For example the authors point out
they considered using remote fork() instead of procedure
call as the primitive operation, or debated using shared
address spaces with VM support. In the end the
architecture is relatively simple: suppose the client
(caller) is trying to invoke a procedure at the server
(callee) Client calls into a stub code using local
procedure call semantics. Stub in turn calls the RPC
runtime passing along any parameters, which uses a highly
tweaked network protocol to communicate with the instance
of RPC runtime on the server machine. The server
dispatches the call to the server stub, which uses local
procedure call to call into actual server code
implementing the functionality. There is additional
infrastructure to support this: databases with redundancy
keep track of which machines have procedures that can be
remotely invoked. Servers publish information about
methods they support, called “exporting”. Client in turn
has to bind to remote server by importing an interface,
which can be done very flexibly at runtime without having
address or name of an instance in advance. Again the
import/export paradigm mirrors Mesa’s semantics. The
registry of exported interfaces is stored on Grapevine DB,
using some hacks to identify different machines and
interfaces. Here the system has a centralized, single-
point-of-failure .Without the correct mapping in the DB no
RPC binding can occur. This registry also serves the
equivalent functionality to DNS for mapping user-friendly
names into low level network addresses, so its design
conflates multiple roles.
Most complicated piece is the custom transport protocol
which ends up reinventing UDP, with additional state
management. Authors make two observations: most calls A.
have few bytes of data passed-in/returned and B. execute
in time less than the round-trip for network. So they
optimize for that common scenario by having the call and
response each fit inside a single packet smaller than the
MTU (maximum transmission unit) of underlying network. For
Ethernet this around 1.5K so most RPC is efficient. But
they also add confirmation packets and retransmit with
increasing backoff reminiscent of TCP. UDP has fire-and-
forget semantics RPC proposed here has the notion of a
connection. If server crashes, new connection is started
and calls made out on the existing one all fail, under the
assumption that new server may not share same state and
behave differently, which is reasonable. The whole design
suffers from not having TCP and UDP as primitives. The
third section is full of Rube-Goldberg contraptions that
emulate sequence numbers for detecting replays and ports
for distinguishing between multiple processes at same
server receiving calls. Using a clean transport layer
could conceptually simplify the design here. Instead the
authors opt for building one-off solution to get
connection semantics without paying the cost, and end up
worrying about edge cases such as bulk data. There is also
additional complexity in trying to support Mesa
programming language constructs, in particular exception
handling with the option of resuming contro lor unwinding
the stack. This is the one area where modern RPC systems
have typically cut corners and simplified thing, even when
the OS itself supported structed exception handling
natively as in NT.
This archive was generated by hypermail 2.1.6 : Mon Feb 02 2004 - 15:04:17 PST