From: Cem Paya 98 (Cem.Paya.98_at_Alum.Dartmouth.ORG)
Date: Wed Feb 04 2004 - 14:04:51 PST
Review: Fine-grained mobility in the Emerald system
Cem Paya, CSE551P
This paper describes the Emerald, a distributed and object-
oriented programming system with its own compiler, kernel,
network protocols and even garbage collector (Last one
sketched out but not implemented at the time of writing).
Main distinguishing feature of Emerald is that it supports
moving objects across nodes in the network at very
granular level, including objects that have active method
invocations in progress. Entire process is frozen and
minimal snapshot, sometimes as small as hundreds of bytes
in one example, compared to massive memory footprint, is
sent to another node to continue execution. All of this
behavior is transparent to programmers—although there are
APIs to explicitly locate and move objects—and the
compiler/runtime combination is responsible for optimally
managing the location and addressing objects.
One intriguing idea in the paper is the notion of call-by-
move and call-by-visit semantics. When objects are passed
as arguments in a remote method invocation, they are
virtually guaranteed to cause additional network traffic
when callee attempts to access them. Call-by-move
addresses the scenario when such accesses are frequent and
it makes more sense to preemptively transport the entire
object to remote node. Call-by-visit involves call-by-move
followed by returning the object back upon completion.
This is all transparent to users, owing to uniform
addressing scheme for all objects based on universal OIDs.
Emerald design also incorporates perf considerations which
leads to special casing local access and small data types.
For example there are 3 different object storage modes,
global, local and direct each accessed differently. Last
one is similar to how virtual machines such as JVM special
case primitives (integer, float etc.) as value types when
everything else in the object system is reference type.
Local access occurs in user mode while global access traps
to kernel.
There is an interesting analog to HTTP redirects (status
code 30x series) for locating objects. Since objects move
around, references can become stale. This is solved by
keeping track of forwarding addresses. Difference from
HTTP is that the node originally contacted for the
invocation itself forwards the query on to where it
believes the object to reside currentlyl; the final node
with the object replies directly to the caller who updates
their location table. This involves fewer messages
compared to HTTP where the first node would simply have
responded to caller with forwarding address but doesn’t
scale as well because nodes are responsible forwarding
requests for objects they used to have.
Most impressive part of the implementation is the
mechanism for moving objects around. Since address space
isn’t shared, all pointers have to be remapped. Tracking
pointers also has implications for garbage collection.
This is where strongly-typed language and compiler support
comes in: templates generated for each type keep track of
where pointers are in the data section. (Contrast with
conservative collectors, which does not know what is a
pointer and must treat every address in memory as
potentially holding a pointer.) One problem is, registers
must be identified as well which means they can’t change
during method invocation. On architectures with few
registers—x86-- this would probably increase register
pressure and slow things down. Explicit awareness of
pointers in memory and register set is one of the
weaknesses, and suggests the sophisticated object
management in Emerald is difficult to decouple from the
language. For example using objects in C or doing low-
level hacks with pointers would be impossible.
Last section describes a very novel, unique way of
implementing email: message object moves between servers.
Instead of a unique copy being delivered to each
recipient’s inbox, there is a single object created by the
sender and that moves around based on demand between nodes
hosting different mailboxes. Compared to accessing the
mail object directly, call-by-move semantics ends up
reducing execution time by about 20% and network traffic
by 7%. Missing from the comparison is the usual approach
where a pointer to the message is sent, and recipients
download the entire message at once. Since email is
generally considered a single unified document accessing
fields individually as in this case may not be a good
example.
This archive was generated by hypermail 2.1.6 : Wed Feb 04 2004 - 14:08:54 PST