CSE503: Software Engineering
Lecture 5 (April 5, 2000)

David Notkin

 

v     Today: Open implementation or, clients know best

v     One particular tenet of information hiding is the principle that clients not know about how implementations are implemented and that implementations not know about how clients will use them

v     One implication is that clients should all be satisfied with the same (e.g., any) implementation

v     Clearly, this is not so

      In particular, (as we have discussed) performance does in fact matter

      Even if we make performance a part of the interface, as we have discussed, there is a serious downside: the broad proliferation of interfaces (multiple interfaces might be identical except for performance guarantees, and this has problems of its own)

v     Example (due to Kiczales, as is much of this)

      A window system provides a mkwindow function, that creates a window of a specified size at a specified location on the screen

        It also provides mouse-tracking, manages the shared screen, etc., while hiding internal data structures, implementation details, etc.

      Somebody who wants to build a spreadsheet might write the following code:

       for i := 1 to 100
for j := 1 to 100
mkwindow(100,100,i*100,j*100)
end
end

     "Yet few experienced programmers would be surprised to learn thatits performance may be so bad as to render it useless. Why? Because the window-system implementation may not be tuned for this kind of use." (The implementor, for example, may have assumed a O(10) windows, not O(1000) windows.)

     This specific, very regularized use, of windows also admits the potential for a far more optimized implementation than may be feasible for generalized window systems

v     What are possible solutions to this shortcoming of classic information hiding (sometimes called black-box abstraction)?

      Break up into small groups for 5 minutes to develop potential solutions

      Multiple interfaces, each with their own performance guarantees

        This expands the name space, at the very least

      Multiple implementations of a single interface, allowing the client to select which implementation is desired

        How does the client specify which implementation is to be selected? (One approach is pragmas.)

        This also expands the name space, although perhaps indirectly

      A smart implementation that knows how to provide effective performance for any possible client

        Not generally feasible

      Clients can overcome the provision of a single implementation in two different ways

        "Hematomas of duplication"

       Essentially, use the basic abstraction that is provided and build another one on top of this

      In this case, allocate one big window and build another window manager that breaks that big window up into the little spreadsheet-cell-sized windows

      This happens a lot with memory managers

        "Coding between the lines"

       The client writes code in a contorted way to get the desired performance

      This example doesn't have a good way of doing it, but a common example is where the programmer allocates objects in a way that provides good performance through an interface, even if this is not the "natural" way to allocate objects

v     Kiczales and others have proposed an alternative approach, called open implementation (OI)

      Modules provide two interfaces: the base interface, which provides the standard functionality of the module; the meta interface, which allows clients to control the module's implementation (in varying degrees)

      This is a limited form of reflective computing, in which the underlying operation of a system is essentially fully-programmable

        The semantics of interfaces can be changed in a programmatic way

        An aggressive example (and there are many) of reflective computing would be changing the meta-object protocol in Smalltalk-80

v     OI modules already exist in many domains; Kiczales et al. have identified it as a design principle

v     Some examples include

      Virtual memory systems have a simple basic function: a bunch of memory addresses that can be read or written. The mapping dilemmas have primarily to do with how to map virtual addresses to pages, and how to map those pages to physical memory at any given time. A classic mapping conflict happens when a client, such as a database system, does a "scan" of one portion of its memory, while doing random access to another portion. A virtual memory implementation based on an LRU will perform poorly in this case.

        One simple approach to this is captured by the Unix madvise system call:

int madvise(addr, len, behav)

caddr_t addr;

size_t len;

int behav;

        "The madvise subroutine permits a process to advise the system about its expected future behavior in referencing a mapped file region or anonymous memory region."

        MADV_NORMAL: The system provides no further special treatment for the memory region

        MADV_RANDOM: The system expects random page references to that memory region.

        MADV_SEQUENTIAL: The system expects sequential page references to that memory region.

        MADV_WILLNEED: The system expects the process will need these pages.

        MADV_DONTNEED: The system expects the process does not need these pages

       MADV_SPACEAVAIL: The system will ensure that memory resources are reserved

      Parallel programming languages provide a natural way of expressing a computation to be executed in parallel. There are three principle mapping dilemmas: how to distribute the computation across the physical processors, how to distribute the data and how to perform synchronization. A common source of mapping conflicts is in the layout of matrices. As a simple example consider that if the compiler distributes the ith rows of two matrices A and B to the same processor, then matrix addition will require no communications overhead, but matrix multiplication will be communication intensive

        HPF (High-Performance Fortran) provides some support for this using pragmas that allow the programmer to direct the compiler to use specific layouts and distributions to processors

       This is, of course, not very different from register and inline in C/C++

        A dissertation from UW about 8 years ago, by Gail Alverson, was similar in many ways to this; a key point about Gail's work was that for some parallel language constructs, the only way to get efficient implementations on some classes of machines was to have multiple, independent base interfaces but to have the implementations of those modules interact with each other closely and directly

      There are numerous other examples from communications, from databases, more from OS, etc.

v     Key questions include

      What are appropriate modules for OI?

        I think one absolute requirement is that the function of the module is extremely well-understood; it's hard enough to build effective base interfaces at first, so trying to build a meta-interface from the beginning, without experience in terms of the functions provided, is likely to fail

      Are there design techniques for producing OI modules?

 

v     Today

      Layering

      Possible projects

v     Layering

      Layering is a standard software structuring technique

      If youve taken OS, youve seen it in its classic form in Dijkstras T.H.E. system

        Level 5: User Programs

        Level 4: Buffering for input and output devices

        Level 3: Operator Console Device Driver

        Level 2: Memory Management (Software Paging)

        Level 1: CPU Scheduling (priority) P and V operations

        Level 0: Hardware

      Parnas paper discusses layering in terms of the potential ease of extension and contraction of systems

      Strictly, one layer can only use operations at the next lower level

        This constrains, in the extreme, invocations between components at a given level

        This constrains recursion

        It prohibits calling down multiple levels

      But layering does allow for a nice style of building: a lower layer can be debugged and tested without concern for the higher layers, since there are by definition no dependences

v     Invokes vs. uses

      Parnas discusses how the uses relation is distinct from the invokes relation

        A program A uses a program B if the correctness of A depends on the presence of a correct version of B

        Requires specification and implementation of A and the specification of B

        Again, what is the specification? The interface? Implied or informal semantics?

      Here's an example where invokes does not imply uses

        Invocation without use: name service with cached hints

       ipAddr := cache(hostName);
if not(ping(ipAddr))
ipAddr := lookup(hostName)
endif

v     What is the relationship, if any, between layering and information hiding modules?

      HW2 has one question in this regard

      Strict application of layering in operating systems is hard; for example, if you have lots of processes, you may not be able to store all your process tables in memory, so you might have a dependency from process table management to virtual memory

      Here's a small example of this, taken from the FAMOS operating system (family of operating systems) research project from the early 1980's

        An OS might have both a "process ADT" and a "segment ADT", each of which hides secrets such as the data structures to support these functions

        But in terms of layering, you might have the structure:

       Level 3: Segment management
Level 2: Process management
Level 1: Segment creation
Level 0: Process creation

        So, the information hiding modules (the ADTs) span levels in this situation

v     Language support for layering

      Why do we have tons of language support for information hiding modules, but essentially none for layering?