CSE584: Software Engineering

CSE503: Software Engineering
Lecture 5 (January 13, 1999)

David Notkin

Today: Open implementation or, "clients know best"

One particular tenet of information hiding is the principle that clients not know about how implementations are implemented and that implementations not know about how clients will use them

One implication is that clients should all be satisfied with the same (e.g., any) implementation

Clearly, this is not so

In particular, (as we have discussed) performance does in fact matter
Even if we make performance a part of the interface, as we have discussed, there is a serious downside: the broad proliferation of interfaces (multiple interfaces might be identical except for performance guarantees, and this has problems of its own)

Example (due to Kiczales, as is much of this)

A window system provides a mkwindow function, that creates a window of a specified size at a specified location on the screen

It also provides mouse-tracking, manages the shared screen, etc., while hiding internal data structures, implementation details, etc.

Somebody who wants to build a spreadsheet might write the following code:

for i := 1 to 100
for j := 1 to 100
mkwindow(100,100,i*100,j*100)
end
end

"Yet few experienced programmers would be surprised to learn that…its performance may be so bad as to render it useless. Why? Because the window-system implementation may not be tuned for this kind of use." (The implementor, for example, may have assumed a O(10) windows, not O(1000) windows.)

This specific, very regularized use, of windows also admits the potential for a far more optimized implementation than may be feasible for generalized window systems

What are possible solutions to this shortcoming of classic information hiding (sometimes called black-box abstraction)?

Break up into small groups for 5 minutes to develop potential solutions

Multiple interfaces, each with their own performance guarantees

This expands the name space, at the very least

Multiple implementations of a single interface, allowing the client to select which implementation is desired

How does the client specify which implementation is to be selected? (One approach is pragmas.)
This also expands the name space, although perhaps indirectly

A smart implementation that knows how to provide effective performance for any possible client

Not generally feasible

Clients can overcome the provision of a single implementation in two different ways

"Hematomas of duplication"

Essentially, use the basic abstraction that is provided and build another one on top of this

In this case, allocate one big window and build another window manager that breaks that big window up into the little spreadsheet-cell-sized windows
This happens a lot with memory managers

"Coding between the lines"

The client writes code in a contorted way to get the desired performance

This example doesn't have a good way of doing it, but a common example is where the programmer allocates objects in a way that provides good performance through an interface, even if this is not the "natural" way to allocate objects

Kiczales and others have proposed an alternative approach, called open implementation (OI)

Modules provide two interfaces: the base interface, which provides the standard functionality of the module; the meta interface, which allows clients to control the module's implementation (in varying degrees)
This is a limited form of reflective computing, in which the underlying operation of a system is essentially fully-programmable

The semantics of interfaces can be changed in a programmatic way
An aggressive example (and there are many) of reflective computing would be changing the meta-object protocol in Smalltalk-80

OI modules already exist in many domains; Kiczales et al. have identified it as a design principle

Some examples include

Virtual memory systems have a simple basic function: a bunch of memory addresses that can be read or written. The mapping dilemmas have primarily to do with how to map virtual addresses to pages, and how to map those pages to physical memory at any given time. A classic mapping conflict happens when a client, such as a database system, does a "scan" of one portion of its memory, while doing random access to another portion. A virtual memory implementation based on an LRU will perform poorly in this case.

One simple approach to this is captured by the Unix madvise system call:

int madvise(addr, len, behav)

caddr_t addr;

size_t len;

int behav;

"The madvise subroutine permits a process to advise the system about its expected future behavior in referencing a mapped file region or anonymous memory region."

MADV_NORMAL: The system provides no further special treatment for the memory region

MADV_RANDOM: The system expects random page references to that memory region.

MADV_SEQUENTIAL: The system expects sequential page references to that memory region.

MADV_WILLNEED: The system expects the process will need these pages.

MADV_DONTNEED: The system expects the process does not need these pages

MADV_SPACEAVAIL: The system will ensure that memory resources are reserved

Parallel programming languages provide a natural way of expressing a computation to be executed in parallel. There are three principle mapping dilemmas: how to distribute the computation across the physical processors, how to distribute the data and how to perform synchronization. A common source of mapping conflicts is in the layout of matrices. As a simple example consider that if the compiler distributes the i^th rows of two matrices A and B to the same processor, then matrix addition will require no communications overhead, but matrix multiplication will be communication intensive

HPF (High-Performance Fortran) provides some support for this using pragmas that allow the programmer to direct the compiler to use specific layouts and distributions to processors

This is, of course, not very different from register and inline in C/C++

A dissertation from UW about 8 years ago, by Gail Alverson, was similar in many ways to this; a key point about Gail's work was that for some parallel language constructs, the only way to get efficient implementations on some classes of machines was to have multiple, independent base interfaces but to have the implementations of those modules interact with each other closely and directly

There are numerous other examples from communications, from databases, more from OS, etc.

Key questions include

What are appropriate modules for OI?

I think one absolute requirement is that the function of the module is extremely well-understood; it's hard enough to build effective base interfaces at first, so trying to build a meta-interface from the beginning, without experience in terms of the functions provided, is likely to fail

Are there design techniques for producing OI modules?