Shamik Basu
CSE 584 Software Engineering Autumn '98
Paper 1: Design
Reconciling Environment Integration and Component Independence by Sullivan
and Notkin.
This paper presents an interesting new technique of design using mediators
and events. The goal is to isolate decisions that will probably change
during evolution of the project. At a high level an environment is a
collection of tools and the relationships between them. Either of these sets
could change so they should be designed as separate components. The common
practice is to have separate components for the tools but the relationships
kind of get dispersed into these objects themselves. Components should be
kept independent of relationships in three ways - they should be able to
execute without having to participate in particular relationships, their
source should be defined without reference to relationships, and
participation in relationships should not prevent other components from
accessing them. Relationships commonly are defined in separate objects that
hide the components they relate - this prevents independent access to the
components and makes it difficult to add relationships involving them.
Otherwise, relationships are embedded in their component objects and this
produces components designed to participate in particular relationships
only. The authors then present several common ways of designing and point
out the problems with each approach. Encapsulation creates relationships
that hide the component objects, hardwiring distributes relationships all
over the components making it hard to evolve existing relationships and
adding new ones. A design involving events is then presented which brings
out the positives of events like making the component independent of the
relationships it participates in. However, not having separate relationship
objects causes their code to be distributed all over the component objects.
So the authors present their solution of mediators and events. Mediators are
first class objects designed to represent and maintain relationships amongst
other components (these could be mediators too). The mediators have to
remain independent of the relationships in which they participate. Events
increase independence by enabling communication without requiring statically
defined connections. Event mechanisms should support 4 requirements. They
should be declared explicitly, any component (including mediators) should be
able to declare events, event names and signatures should not be system
defined and the parameters passed should be defined at registration time.
The authors then describe their lightweight design of such a system and
applications it has been used in. They also point out that most of the
existing systems supporting similar constructs fail to match up to all the
requirements laid out in this paper. The problems remaining to be solved
include designing asynchronous event mechanisms and handling multiple
address spaces and distribution. Open questions are guaranteeing global
consistency and controlling concurrency since any component can be accessed
by any other component. Another potential problem is the understandability
of the code of the system since events don't give the reader any idea about
the clients of a component.
Components, Frameworks, Patterns by Johnson
Frameworks can be defined as "a reusable design of all or part of a system
that is represented by a set of abstract classes and the way their instances
interact" or "the skeleton of an application that can be customized by an
application developer" depending on whether we are defining their structure
or purpose. The author claims we always have to trade simplicity for power
in assembling systems. Software reuse can be design or code reuse. One of
the main problems with design reuse is capturing and expressing it.
Frameworks are in between these two forms. They use an object-oriented
language as the design notation. The motivations to use frameworks are to
save time and money during development, allowing uniformity in UI, network
protocol etc since the framework uses standards for these. Uniformity also
reduces cost of maintenance since maintenance programmers can move between
different framework apps without having to learn their design every time.
Frameworks also allow the building of open systems since components built on
the framework can be mixed and matched. Frameworks enable and reuse analysis
by providing a common vocabulary for discussing problems. The key idea
underlying a framework is the abstract class. The abstract class can leave
some methods unimplemented (abstract methods) or provide a replaceable
default implementation (hook methods). Frameworks use the three main
features of OO languages, namely data abstraction (abstract classes),
polymorphism (allowing an object to change its collaborators at run time)
and inheritance (making it easy to build a new component). The framework
describes the system architecture, the objects in it and their interactions.
There is an inversion of control from traditional library use in that the
framework determines the flow of control and the developer's code gets
called by the framework. There are 3 ways to use a framework. First, by just
connecting existing components. Second, defining new subclasses and third by
extending the abstract classes that form the framework itself. The best way
to learn a framework is by studying examples of its usage. Several factors
need to be considered to evaluate whether a framework is suited for a
project like the platform, programming language, standards supported,
reliability, performance, learning costs and ease of customizing and
extending. Ways of testing for the existence of these features would be to
talk to existing customers of the framework, develop some in house apps to
build expertise and experience and talk to consultants although they may
have their own biases. Developing a framework is difficult and usually takes
iterations. Iterations are necessary because on the first pass designers
usually do domain analysis through toy examples. The framework is then used
to build real applications and this points out unanticipated problems. The
framework make explicit things that will probably change and experience is
the surest way of pointing these out. Also, every example considered over
time goes to make the framework more general and reusable. Problems with
frameworks are that because they are powerful and complex, they are hard to
learn. They require better documentation than other systems and longer
training. They are also very difficult to develop. Use of a framework
restricts the system to the language of the framework. The framework also
reflects all the problems of its underlying language.
On the criteria to be used in decomposing systems into modules by Parnas
A system should be decomposed so that every task represents a module. Each
module, its inputs and outputs and its interfaces with other modules should
be well defined. The benefits of modular programming include shortening of
development time because separate groups can work on the modules with little
need for communication, one module can be changed drastically without
affecting others thus allowing more flexibility and a module can be studied
and understood with out having to understand the entire system code thus
leading to easier and greater comprehension. Two possible decompositions of
an example system are then considered. The first one breaks the system down
by control flow and the second one uses the information hiding principle.
Each module in the second design is characterized by its knowledge of a
design decision, which it hides from all others. In the first design the
data format is used by all the modules. This implies that all development
groups would have to participate in its design. This by itself is
inefficient and if the table had to change all modules would have to be
updated. Other advisable features of a decomposition are: A data structure,
its internal linkings, accessing and modifying procedures should be part of
the same module. Sequence of instructions to call a routine and the routine
itself should be in the same module. Formats of control blocks must be
hidden in a control block module since these change frequently. Character
codes, sort orders and similar data should be hidden in a module for
greatest flexibility. The sequence in which certain items will be processed
should be hidden. The paper then states that clean decomposition and
hierarchy are 2 desirable but independent properties of a system structure.
Hierarchy gives us 2 additional benefits. Parts of the system are benefited
since they use the services of the lower layers. Secondly, it is possible to
cut off the upper levels and still have a usable and useful product that can
also be used in other products. The big contribution of the paper is the
idea that instead of decomposing a system into modules based on its
flowchart, one should begin with a list of difficult design decisions and
decisions that are likely to change and design modules to hide such
decisions from other modules.
Designing Software for Ease of Extension and Contraction by Parnas
Software is usually designed as if we were designing a single product. In
reality we are designing a family of products. We want to exploit
commonalities, share code and reduce maintenance costs. Members of a program
family can differ in hardware configurations, input and output data formats,
data structures and algorithms, data set sizes and feature sets being
subsets or supersets of the other members. Designers must be taught to try
to anticipate changes and design for easy alteration of the system when the
change does occur. The obstacles encountered in trying to expand or shrink
typically fall into 4 categories. Excessive information distribution causes
too many components to be written with certain assumptions. A chain of data
transforming components makes it difficult to remove a link in the chain.
Components that perform more than one function make it difficult to use any
one of the functions by itself. Loops in the "Uses" relation causes
interdependencies among modules as a result of which all the modules have to
work before any one of them works fully. While designing a system, one
should first search for the minimal subset that might conceivably perform a
usable service and then look for a set of minimal increments to the system.
This avoids components that perform more than one function. Information
hiding should then be built into the system. This involves identification of
items that are likely to change ("secrets"). These are then located in
separate modules and intermodule interfaces are defined so that they are
insensitive to the anticipated changes i.e. the secrets of the module are
not revealed by the interface. There is never any reason for a component to
know how many other programs use it. The Virtual Machine concept is also
useful in building a system. The VM instructions are designed to be
generally useful but if a particular program does not use them, they can be
left out. To achieve a true VM the hardware instructions must be unavailable
to the VM client. The VM should be built incrementally and each increment is
usually a useful subset of the system. The "Uses" structure of a system must
be designed carefully. A uses B if the correct functioning of A depends on
the availability of a correct implementation of B. Unrestrained usage of
other modules leads to a system with modules that are highly interdependent.
The uses hierarchy should be loop free in order to reap the benefits of the
uses relationship. If such a hierarchy exists then each level offers a
testable and usable subset of the system. A should be allowed to use B only
if A is essentially simpler because it uses B, B is not substantially more
complex because it is not allowed to use A, there is a useful subset
containing B and not A, and there is no conceivably usable subset containing
A and not B. At the end of the paper there is an interesting point made
about software generality vs. flexibility. Software is general if can be
used without change in a variety of situations. Software is flexible if it
can be easily changed to be used in a variety of situations. It appears
unavoidable that a run time cost is paid for generality and flexibility
incurs a design time cost. One should incur the design time cost only if one
expects to recover it when changes are made.
Design Patterns: Abstraction and Reuse of Object oriented design by Gamma,
Helm, Johnson and Vlissides.
Design patterns are a new mechanism for expressing design structures. They
identify, name and abstract common themes in object oriented design. Design
patterns are useful because they provide a common vocabulary for designers,
they constitute a reusable base of experience for building reusable
software, they reduce the learning time for a class library and provide a
target for the reorganization of class hierarchies. This paper defines
design patterns, provides a means to describe them, defines a system for
their classification and presents a catalog of patterns discovered by the
authors. 3 essential parts make up a design pattern: An abstract description
of a class or object collaboration and its structure, the issue in system
design addressed by the abstract structure, and the consequences of applying
the structure to a system's architecture. Design patterns are classified by
2 orthogonal criteria - jurisdiction and characterization. Jurisdiction is
the domain over which a pattern applies. Patterns having class jurisdiction
deal with the relationships between base classes and their subclasses,
covering static semantics. The object jurisdiction concerns relationships
between peer objects. Compound jurisdiction deals with recursive object
structures. Characterization reflects what a pattern does. Creational
patterns concern the process of object creation. Structural patterns deal
with the composition of classes or objects. Behavioral patterns characterize
the ways in which classes or objects interact and distribute responsibility.
The paper concludes with a summary of observations. Design patterns motivate
developers to go beyond concrete objects. Design patterns can help name
classes (e.g. embedding the pattern name in the class name) and this
enhances readability of code. Patterns can often be applied after the first
implementation of an architecture to improve its design. Patterns are an
effective way to teach object oriented design. Patterns are suited to reuse
because they are abstract. They also reduce the effort needed to learn a
class library.
Experience Assessing an Architectural Approach to Large-Scale Systematic
Reuse by Sullivan and Knight.
This paper addresses the important technical barrier to large-scale reuse
called architectural mismatch and evaluates OLE to see if it ameliorates
this problem by using it to develop a fault tree analysis tool. Garlan et al
identified 4 categories of architectural mismatch: incompatibilities in
assumptions about the nature of the components, nature of the connectors,
global architectural structure and the construction process. The paper first
goes through a brief overview of OLE and OLE automation (object based
framework, support for explicit and implicit invocation across process
boundaries, binary standard, multiple interface model, single interface
model for automation, compile and run time binding). Most applications
devote less than 10% if their code to the overt function of the system, the
other 90% goes into system or administrative code like I./O, GUI, text
editing, dialogs and standard graphics, communications, data validation and
audit trails etc. This high cost of commercial software delivery vehicles -
of all the superstructure needed to make a new technique truly useful in
practice - impedes the transfer of innovations to the market. The authors
implemented a tool for fault tree analysis. The analysis function itself is
under 10,000 lines of C++ code. However, industrial users would look for
features such as a database for fault tree storage, graphical rendering and
direct manipulation interfaces, graphics and database views and analysis
reports prepared for technical and managerial audiences and clean
integration of the tool into the organization's overall process. The tool
thus needed a point and click interface, node annotation capability, tree
layout algorithms, storage and retrieval facilities, manipulation of
collections of trees, simple analysis support, ability to invoke specialized
analysis packages etc. The tool was designed to represent the tree nodes as
Access records and shapes in Visio. The tool needs to ensure the consistence
of the data between the 2 apps. A mediator was designed for this. The design
involved interaction with the 2 apps explicitly and implicitly (procedure
call and event notification). The mediator also provided a UI for the system
as a whole. The authors succeeded in building most of the system they
designed and most of the problems they encountered were not problems with
OLE itself but rather, the design of the component applications. Access 3.0
did not support OLE automation (fixed in Access 4). VB 3.0 did not support
being an OLE automation server (fixed in VB4) and the event interface
exposed by Visio was not rich enough. Observations from the experiment were:
OLE did enable a very high level of productivity. The fault analysis code
itself was only 10,000 lines of code whereas the functionality of the tool
encompasses several million lines of code. Industrial strength capability is
demonstrated by tool. The familiar Windows look and feel is provided.
Performance achieved was adequate. Interesting design issues encountered
were: Visio's event interface was incomplete. Component interfaces should
provide complete, consistent and timely information about their state via
events. Object naming should be consistent. Visio shapes failed to provide
lightweight object IDs. Insufficient component adaptability in that it is
not possible to disable extra functionality in the components. An Visio's
shapes provide extra slots in which client information can be stored. Making
components extensible is important. The conclusion of the paper is that OLE
avoids the complex architectural mismatch problems and defines a complete
framework of design standards intended to support integration. However, if
components are to be composable, they have to be designed for it.
Beyond the Black Box: Open Implementation by Kiczales
The traditional black box abstraction has been that a module should expose
its functionality but hide its implementation. This approach has several
advantages but it can lead to serious performance difficulties. Open
implementation is the principle that a module can be more useful and more
reusable if it allows clients to control its implementation strategy. In
some cases, the best implementation strategy for a module cannot be
determined unless the implementer knows just how the module will be used.
Open implementation is based on the conclusion that it is impossible to hide
all implementation issues behind a module interface. Module implementations
must be opened up to allow clients control over implementation details. The
separation of control principle leads to a design where the client requests
functionality through a primary interface and there also exists a secondary
(but coupled) meta-interface through which the client tunes the
implementation underlying the primary interface. This enhances readability
of the code too since the reader can focus on the functionality and ignore
the secondary interface to start with. The goal of open implementation is to
allow the client to use the primary interface alone when the default
implementation is sufficient, control the module's implementation strategy
decisions when necessary and deal with functionality and implementation
strategy decisions in largely separate ways. Research has focused on
computational reflection, which explores issues of how modules can provide
interfaces for examining and adjusting themselves. Other active research
issues are: how to design meta-interfaces that give clients control over
implementation strategy decisions without drowning them in details, how to
decide what implementation strategy decisions to expose, technologies to
support open implementation, and finding more examples of existing ad-hoc
open implementations and studying them to learn more about the approach.
An Introduction to Software Architecture by Garlan and Shaw
As the size of software systems increases, the algorithms and data
structures no longer compose the major design problems. The organization of
the overall system - the architecture - presents a new set of design
problems. Structural issues include gross organization and global control
structure, protocols for communication, synchronization, data access,
assignment of functionality to design elements, physical distribution,
composition of design elements, scaling and performance, and selection among
design alternatives. Effective software engineering needs knowledge of
architecture design. First, to recognize common paradigms so that high-level
relationships among systems can be understood and so new systems can be
built as variations of old ones. Second, getting the architecture right is
often crucial to the success of the system. Third, understanding of
architectures allows the designer to make principled choices among the
design alternatives. Fourth, an architectural system representation is often
essential to the analysis and description of a complex system. Common
architectural styles are pipes and filters, data abstraction and object
oriented organization, event based implicit invocation, layered systems,
repositories and table driven interpreters. The pros and cons of each of
these styles is then presented by the paper and case studies involving
different architectures are outlined.
The readings present many very interesting ideas. Some of these are
principles that I vaguely applied since they have been handed around by word
of mouth. Design for change, information hiding etc are all mantras we have
grown up with but it is very instructive to read the formation of the ideas
and the fact that a good process can be applied to designing with these
principles in mind. I work in Microsoft Office, one of the largest code
bases around at Microsoft. It will be interesting to apply some of these
principles to the Office code. A lot of these concepts apply particularly to
the teams I work with because we write shared code that is implemented in
one DLL and used by all the Office apps (Word, Excel, PowerPoint etc). A lot
of times we end up writing code that is very tied into the peculiarities of
each app and when a new client comes along, we do not have a good way of
extending the implementation. Putting in good thought into the design leads
to great rewards for us. E.g. in Office 97, the new command bars were
implemented as shared code. The only clients initially were Word and Excel.
However, the list of clients gradually grew to encompass most Microsoft apps
and the command bar code was eventually split off into a DLL of its own. If
we had not put in design time to make this code independent of the clients
we would have had to reimplement the entire command bar code base when this
situation of new clients coming on board arose. Another design for change
that arose for me in the release of Office that we are currently working on
involved storage (something Parnas also talks about layering in his paper).
I had to implement a feature that used an OLE standard called document
properties to persist its information in each client. I could have just made
this assumption and written to the doc properties of each client but I
decided to abstract this read/write of persistent storage into a layer of
its own. Late in the development cycle we discovered that one of our clients
could not write this structure to its document properties. So we ended up
with a different solution for persisting to this client. If the persisting
code had been dispersed through the entire code for the feature, this would
have caused a schedule slip. Luckily, my layer saved me and I only had to
implement the client part of the persistence differently for this client.
We have used events as a client independent way of relaying information to
the clients in several of our larger features and it is a pretty popular
design method here. It is also useful while investigating bugs because the
event handlers are good places to set breakpoints in to study interactions
between the client and server code. The event handlers are also well-known
entry points and so are good starting points in an investigation. We use
Parnas's VM concept extensively to write platform independent code. However,
the biggest problem is to deal with a huge code base evolved over more than
a decade. How does one keep it structured and educate all the new
programmers who start adding code to it every year? How does one retro fit
new features into the existing code base with minimum regression of
functionality while keeping the structural assumptions valid. I would be
very interested in reading what researchers have to say about these
problems.