From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Wed Jan 28 2004 - 17:04:24 PST
This paper discusses now much-too-familiar RPC protocol.
Authors make a point that the network protocols used for bulk data
transfers are not suitable for RPC communication (which are chatty) due
to the connection managements and acknowledgement overhead. One of the
goals authors work towards is keeping RPC call semantics very close to
local call semantics. This, now as we know, has proven to be
tremendously useful and given rise to explosion of distributed
computing.
Authors start out by stating the challenges in the implementation of
RPC, these are:
* precise semantics of call in case of machine/communiation failure
* address containing arguments - in case of non-shared address spaces
* integrations of remote calls into systems
* binding (how does caller determine the location and identity of
callee)
* suitable protocols for transfer of data and control between caller and
callee
* data integrity and security in an open communication network
They build an RPC system on top of PUP Network which has simple
unreliable (but high probability) datagram service and reliable
flow-controlled byte streams. PUP allows raw Ethernet packet format
between two computers on the same Ethernet.
Authors seek out to build a flexible RPC system that removes all the
"unnecessary" difficulties of RPC and leave only the fundamental
difficulties, which are - timing, independent failures, and coexistence
of independent execution environments.
Authors note a natural tension between powerful call semantics and
efficiency. They talk about alternatives and why they were not chosen.
Among the alternatives were message passing and parallel paradigm which
authors argue do not change the problem in a fundamental way, while
keeping local call semantics is more natural and powerful. The other
alternative is shared address space between remote computations. Authors
rule this one out, out of concerns for efficiency and difficulties of
incorporating remote addressing in programming languages.
Lupine is used to general stubs which make the remote calls transparent
to callers and callees. Grapevine database is used for naming and
binding. Grapevine is also user to enforce security. System allows
flexibility when client (importer) is trying to find a sever (exporter).
Importer can specify only type and not the name and the system can find
the most suitable exporter (with network proximity).
Authors discuss the protocol which optimizes simple calls (wherein
argument/return values fit in a single network packet) at the cost of
bulk transfer. Authors mention the possibility of a protocol which can
incorporate both and internally switch appropriately, and cite that as a
topic for further research. I don't know of such a protocol that is
widely used, even today. Users are forced to use different protocols for
RPC and bulk transfer.
In case of simple calls, client just sends a request packet with
arguments and server replies with a result in single packet. No acks are
exchanged. For the client result itself serves as an ack. For the
server, subsequent call from the client serves as an ack. If the call
lasts for longer than the transmission interval there can be two
additional packets - one for retransmission of the call (which also
indicates that it needs an ack) and another packet an ack. In this case
communication costs are not a limiting factor on performance hence these
additional packets do not degrade perf. Packets include a call
identifier which is (machine identifier + process identifier + sequence
number). I think the assumption is that process identifiers are not
reused on the system; otherwise this could have a problem.
[machine id+ process id] is called activity. An activity has at the most
one outstanding call. This seems restrictive to me.
After a certain timeout callee can discard the state for the caller
obviating need for explicit connection establishment and termination.
While waiting for the result caller can send probe packets to make sure
callee is alive
The interval between probes increases with time (which is a sort of
backoff). This takes care of communication failures but not deadlocks in
the callee (comparable semantics to local procedure call).
If the arguments are too large to fit in a single packet, each but the
last packet requests ack. Implementation can use only one packet buffer
as opposed to buffering and flow control. To eliminate duplicates
call-relative sequence number is used. Consequently the protocol is not
optimized for bulk data transfer - uses more packets than logically
required - more acks. It also doesn't transmit next packet until ack.
Authors suggest a workaround for bulk data transfer wherein transfer is
broken into multiple processes handling a small chunk of the transfer -
essentially transforming bulk transfer to simple calls. They
experimented with it and obtained a decent performance. From the
semantics point of view, this seems cumbersome to me and I haven't heard
of such usage in real life.
The notion of exception is much like as in a local call. Exception
travels through the stubs to the caller.
Optimizations
* processor pool is used. If the packet is for a particular process,
Ethernet handler routes it to the appropriate process directly else a
new process is picked. Client can choose the server of one call in the
next call to avoid process swap.
* If caller and callee are on the same network, bypass the protocol
layer
The system uses encryption based security - Grapevine is used as the
authentication server.
This archive was generated by hypermail 2.1.6 : Wed Jan 28 2004 - 17:04:31 PST