Shamik Basu

CSE 584 Software Engineering Autumn '98

Paper 1: Design

Reconciling Environment Integration and Component Independence by Sullivan

and Notkin.

This paper presents an interesting new technique of design using mediators

and events. The goal is to isolate decisions that will probably change

during evolution of the project. At a high level an environment is a

collection of tools and the relationships between them. Either of these sets

could change so they should be designed as separate components. The common

practice is to have separate components for the tools but the relationships

kind of get dispersed into these objects themselves. Components should be

kept independent of relationships in three ways - they should be able to

execute without having to participate in particular relationships, their

source should be defined without reference to relationships, and

participation in relationships should not prevent other components from

accessing them. Relationships commonly are defined in separate objects that

hide the components they relate - this prevents independent access to the

components and makes it difficult to add relationships involving them.

Otherwise, relationships are embedded in their component objects and this

produces components designed to participate in particular relationships

only. The authors then present several common ways of designing and point

out the problems with each approach. Encapsulation creates relationships

that hide the component objects, hardwiring distributes relationships all

over the components making it hard to evolve existing relationships and

adding new ones. A design involving events is then presented which brings

out the positives of events like making the component independent of the

relationships it participates in. However, not having separate relationship

objects causes their code to be distributed all over the component objects.

So the authors present their solution of mediators and events. Mediators are

first class objects designed to represent and maintain relationships amongst

other components (these could be mediators too). The mediators have to

remain independent of the relationships in which they participate. Events

increase independence by enabling communication without requiring statically

defined connections. Event mechanisms should support 4 requirements. They

should be declared explicitly, any component (including mediators) should be

able to declare events, event names and signatures should not be system

defined and the parameters passed should be defined at registration time.

The authors then describe their lightweight design of such a system and

applications it has been used in. They also point out that most of the

existing systems supporting similar constructs fail to match up to all the

requirements laid out in this paper. The problems remaining to be solved

include designing asynchronous event mechanisms and handling multiple

address spaces and distribution. Open questions are guaranteeing global

consistency and controlling concurrency since any component can be accessed

by any other component. Another potential problem is the understandability

of the code of the system since events don't give the reader any idea about

the clients of a component.

Components, Frameworks, Patterns by Johnson

Frameworks can be defined as "a reusable design of all or part of a system

that is represented by a set of abstract classes and the way their instances

interact" or "the skeleton of an application that can be customized by an

application developer" depending on whether we are defining their structure

or purpose. The author claims we always have to trade simplicity for power

in assembling systems. Software reuse can be design or code reuse. One of

the main problems with design reuse is capturing and expressing it.

Frameworks are in between these two forms. They use an object-oriented

language as the design notation. The motivations to use frameworks are to

save time and money during development, allowing uniformity in UI, network

protocol etc since the framework uses standards for these. Uniformity also

reduces cost of maintenance since maintenance programmers can move between

different framework apps without having to learn their design every time.

Frameworks also allow the building of open systems since components built on

the framework can be mixed and matched. Frameworks enable and reuse analysis

by providing a common vocabulary for discussing problems. The key idea

underlying a framework is the abstract class. The abstract class can leave

some methods unimplemented (abstract methods) or provide a replaceable

default implementation (hook methods). Frameworks use the three main

features of OO languages, namely data abstraction (abstract classes),

polymorphism (allowing an object to change its collaborators at run time)

and inheritance (making it easy to build a new component). The framework

describes the system architecture, the objects in it and their interactions.

There is an inversion of control from traditional library use in that the

framework determines the flow of control and the developer's code gets

called by the framework. There are 3 ways to use a framework. First, by just

connecting existing components. Second, defining new subclasses and third by

extending the abstract classes that form the framework itself. The best way

to learn a framework is by studying examples of its usage. Several factors

need to be considered to evaluate whether a framework is suited for a

project like the platform, programming language, standards supported,

reliability, performance, learning costs and ease of customizing and

extending. Ways of testing for the existence of these features would be to

talk to existing customers of the framework, develop some in house apps to

build expertise and experience and talk to consultants although they may

have their own biases. Developing a framework is difficult and usually takes

iterations. Iterations are necessary because on the first pass designers

usually do domain analysis through toy examples. The framework is then used

to build real applications and this points out unanticipated problems. The

framework make explicit things that will probably change and experience is

the surest way of pointing these out. Also, every example considered over

time goes to make the framework more general and reusable. Problems with

frameworks are that because they are powerful and complex, they are hard to

learn. They require better documentation than other systems and longer

training. They are also very difficult to develop. Use of a framework

restricts the system to the language of the framework. The framework also

reflects all the problems of its underlying language.

On the criteria to be used in decomposing systems into modules by Parnas

A system should be decomposed so that every task represents a module. Each

module, its inputs and outputs and its interfaces with other modules should

be well defined. The benefits of modular programming include shortening of

development time because separate groups can work on the modules with little

need for communication, one module can be changed drastically without

affecting others thus allowing more flexibility and a module can be studied

and understood with out having to understand the entire system code thus

leading to easier and greater comprehension. Two possible decompositions of

an example system are then considered. The first one breaks the system down

by control flow and the second one uses the information hiding principle.

Each module in the second design is characterized by its knowledge of a

design decision, which it hides from all others. In the first design the

data format is used by all the modules. This implies that all development

groups would have to participate in its design. This by itself is

inefficient and if the table had to change all modules would have to be

updated. Other advisable features of a decomposition are: A data structure,

its internal linkings, accessing and modifying procedures should be part of

the same module. Sequence of instructions to call a routine and the routine

itself should be in the same module. Formats of control blocks must be

hidden in a control block module since these change frequently. Character

codes, sort orders and similar data should be hidden in a module for

greatest flexibility. The sequence in which certain items will be processed

should be hidden. The paper then states that clean decomposition and

hierarchy are 2 desirable but independent properties of a system structure.

Hierarchy gives us 2 additional benefits. Parts of the system are benefited

since they use the services of the lower layers. Secondly, it is possible to

cut off the upper levels and still have a usable and useful product that can

also be used in other products. The big contribution of the paper is the

idea that instead of decomposing a system into modules based on its

flowchart, one should begin with a list of difficult design decisions and

decisions that are likely to change and design modules to hide such

decisions from other modules.

Designing Software for Ease of Extension and Contraction by Parnas

Software is usually designed as if we were designing a single product. In

reality we are designing a family of products. We want to exploit

commonalities, share code and reduce maintenance costs. Members of a program

family can differ in hardware configurations, input and output data formats,

data structures and algorithms, data set sizes and feature sets being

subsets or supersets of the other members. Designers must be taught to try

to anticipate changes and design for easy alteration of the system when the

change does occur. The obstacles encountered in trying to expand or shrink

typically fall into 4 categories. Excessive information distribution causes

too many components to be written with certain assumptions. A chain of data

transforming components makes it difficult to remove a link in the chain.

Components that perform more than one function make it difficult to use any

one of the functions by itself. Loops in the "Uses" relation causes

interdependencies among modules as a result of which all the modules have to

work before any one of them works fully. While designing a system, one

should first search for the minimal subset that might conceivably perform a

usable service and then look for a set of minimal increments to the system.

This avoids components that perform more than one function. Information

hiding should then be built into the system. This involves identification of

items that are likely to change ("secrets"). These are then located in

separate modules and intermodule interfaces are defined so that they are

insensitive to the anticipated changes i.e. the secrets of the module are

not revealed by the interface. There is never any reason for a component to

know how many other programs use it. The Virtual Machine concept is also

useful in building a system. The VM instructions are designed to be

generally useful but if a particular program does not use them, they can be

left out. To achieve a true VM the hardware instructions must be unavailable

to the VM client. The VM should be built incrementally and each increment is

usually a useful subset of the system. The "Uses" structure of a system must

be designed carefully. A uses B if the correct functioning of A depends on

the availability of a correct implementation of B. Unrestrained usage of

other modules leads to a system with modules that are highly interdependent.

The uses hierarchy should be loop free in order to reap the benefits of the

uses relationship. If such a hierarchy exists then each level offers a

testable and usable subset of the system. A should be allowed to use B only

if A is essentially simpler because it uses B, B is not substantially more

complex because it is not allowed to use A, there is a useful subset

containing B and not A, and there is no conceivably usable subset containing

A and not B. At the end of the paper there is an interesting point made

about software generality vs. flexibility. Software is general if can be

used without change in a variety of situations. Software is flexible if it

can be easily changed to be used in a variety of situations. It appears

unavoidable that a run time cost is paid for generality and flexibility

incurs a design time cost. One should incur the design time cost only if one

expects to recover it when changes are made.

Design Patterns: Abstraction and Reuse of Object oriented design by Gamma,

Helm, Johnson and Vlissides.

Design patterns are a new mechanism for expressing design structures. They

identify, name and abstract common themes in object oriented design. Design

patterns are useful because they provide a common vocabulary for designers,

they constitute a reusable base of experience for building reusable

software, they reduce the learning time for a class library and provide a

target for the reorganization of class hierarchies. This paper defines

design patterns, provides a means to describe them, defines a system for

their classification and presents a catalog of patterns discovered by the

authors. 3 essential parts make up a design pattern: An abstract description

of a class or object collaboration and its structure, the issue in system

design addressed by the abstract structure, and the consequences of applying

the structure to a system's architecture. Design patterns are classified by

2 orthogonal criteria - jurisdiction and characterization. Jurisdiction is

the domain over which a pattern applies. Patterns having class jurisdiction

deal with the relationships between base classes and their subclasses,

covering static semantics. The object jurisdiction concerns relationships

between peer objects. Compound jurisdiction deals with recursive object

structures. Characterization reflects what a pattern does. Creational

patterns concern the process of object creation. Structural patterns deal

with the composition of classes or objects. Behavioral patterns characterize

the ways in which classes or objects interact and distribute responsibility.

The paper concludes with a summary of observations. Design patterns motivate

developers to go beyond concrete objects. Design patterns can help name

classes (e.g. embedding the pattern name in the class name) and this

enhances readability of code. Patterns can often be applied after the first

implementation of an architecture to improve its design. Patterns are an

effective way to teach object oriented design. Patterns are suited to reuse

because they are abstract. They also reduce the effort needed to learn a

class library.

Experience Assessing an Architectural Approach to Large-Scale Systematic

Reuse by Sullivan and Knight.

This paper addresses the important technical barrier to large-scale reuse

called architectural mismatch and evaluates OLE to see if it ameliorates

this problem by using it to develop a fault tree analysis tool. Garlan et al

identified 4 categories of architectural mismatch: incompatibilities in

assumptions about the nature of the components, nature of the connectors,

global architectural structure and the construction process. The paper first

goes through a brief overview of OLE and OLE automation (object based

framework, support for explicit and implicit invocation across process

boundaries, binary standard, multiple interface model, single interface

model for automation, compile and run time binding). Most applications

devote less than 10% if their code to the overt function of the system, the

other 90% goes into system or administrative code like I./O, GUI, text

editing, dialogs and standard graphics, communications, data validation and

audit trails etc. This high cost of commercial software delivery vehicles -

of all the superstructure needed to make a new technique truly useful in

practice - impedes the transfer of innovations to the market. The authors

implemented a tool for fault tree analysis. The analysis function itself is

under 10,000 lines of C++ code. However, industrial users would look for

features such as a database for fault tree storage, graphical rendering and

direct manipulation interfaces, graphics and database views and analysis

reports prepared for technical and managerial audiences and clean

integration of the tool into the organization's overall process. The tool

thus needed a point and click interface, node annotation capability, tree

layout algorithms, storage and retrieval facilities, manipulation of

collections of trees, simple analysis support, ability to invoke specialized

analysis packages etc. The tool was designed to represent the tree nodes as

Access records and shapes in Visio. The tool needs to ensure the consistence

of the data between the 2 apps. A mediator was designed for this. The design

involved interaction with the 2 apps explicitly and implicitly (procedure

call and event notification). The mediator also provided a UI for the system

as a whole. The authors succeeded in building most of the system they

designed and most of the problems they encountered were not problems with

OLE itself but rather, the design of the component applications. Access 3.0

did not support OLE automation (fixed in Access 4). VB 3.0 did not support

being an OLE automation server (fixed in VB4) and the event interface

exposed by Visio was not rich enough. Observations from the experiment were:

OLE did enable a very high level of productivity. The fault analysis code

itself was only 10,000 lines of code whereas the functionality of the tool

encompasses several million lines of code. Industrial strength capability is

demonstrated by tool. The familiar Windows look and feel is provided.

Performance achieved was adequate. Interesting design issues encountered

were: Visio's event interface was incomplete. Component interfaces should

provide complete, consistent and timely information about their state via

events. Object naming should be consistent. Visio shapes failed to provide

lightweight object IDs. Insufficient component adaptability in that it is

not possible to disable extra functionality in the components. An Visio's

shapes provide extra slots in which client information can be stored. Making

components extensible is important. The conclusion of the paper is that OLE

avoids the complex architectural mismatch problems and defines a complete

framework of design standards intended to support integration. However, if

components are to be composable, they have to be designed for it.

Beyond the Black Box: Open Implementation by Kiczales

The traditional black box abstraction has been that a module should expose

its functionality but hide its implementation. This approach has several

advantages but it can lead to serious performance difficulties. Open

implementation is the principle that a module can be more useful and more

reusable if it allows clients to control its implementation strategy. In

some cases, the best implementation strategy for a module cannot be

determined unless the implementer knows just how the module will be used.

Open implementation is based on the conclusion that it is impossible to hide

all implementation issues behind a module interface. Module implementations

must be opened up to allow clients control over implementation details. The

separation of control principle leads to a design where the client requests

functionality through a primary interface and there also exists a secondary

(but coupled) meta-interface through which the client tunes the

implementation underlying the primary interface. This enhances readability

of the code too since the reader can focus on the functionality and ignore

the secondary interface to start with. The goal of open implementation is to

allow the client to use the primary interface alone when the default

implementation is sufficient, control the module's implementation strategy

decisions when necessary and deal with functionality and implementation

strategy decisions in largely separate ways. Research has focused on

computational reflection, which explores issues of how modules can provide

interfaces for examining and adjusting themselves. Other active research

issues are: how to design meta-interfaces that give clients control over

implementation strategy decisions without drowning them in details, how to

decide what implementation strategy decisions to expose, technologies to

support open implementation, and finding more examples of existing ad-hoc

open implementations and studying them to learn more about the approach.

An Introduction to Software Architecture by Garlan and Shaw

As the size of software systems increases, the algorithms and data

structures no longer compose the major design problems. The organization of

the overall system - the architecture - presents a new set of design

problems. Structural issues include gross organization and global control

structure, protocols for communication, synchronization, data access,

assignment of functionality to design elements, physical distribution,

composition of design elements, scaling and performance, and selection among

design alternatives. Effective software engineering needs knowledge of

architecture design. First, to recognize common paradigms so that high-level

relationships among systems can be understood and so new systems can be

built as variations of old ones. Second, getting the architecture right is

often crucial to the success of the system. Third, understanding of

architectures allows the designer to make principled choices among the

design alternatives. Fourth, an architectural system representation is often

essential to the analysis and description of a complex system. Common

architectural styles are pipes and filters, data abstraction and object

oriented organization, event based implicit invocation, layered systems,

repositories and table driven interpreters. The pros and cons of each of

these styles is then presented by the paper and case studies involving

different architectures are outlined.

The readings present many very interesting ideas. Some of these are

principles that I vaguely applied since they have been handed around by word

of mouth. Design for change, information hiding etc are all mantras we have

grown up with but it is very instructive to read the formation of the ideas

and the fact that a good process can be applied to designing with these

principles in mind. I work in Microsoft Office, one of the largest code

bases around at Microsoft. It will be interesting to apply some of these

principles to the Office code. A lot of these concepts apply particularly to

the teams I work with because we write shared code that is implemented in

one DLL and used by all the Office apps (Word, Excel, PowerPoint etc). A lot

of times we end up writing code that is very tied into the peculiarities of

each app and when a new client comes along, we do not have a good way of

extending the implementation. Putting in good thought into the design leads

to great rewards for us. E.g. in Office 97, the new command bars were

implemented as shared code. The only clients initially were Word and Excel.

However, the list of clients gradually grew to encompass most Microsoft apps

and the command bar code was eventually split off into a DLL of its own. If

we had not put in design time to make this code independent of the clients

we would have had to reimplement the entire command bar code base when this

situation of new clients coming on board arose. Another design for change

that arose for me in the release of Office that we are currently working on

involved storage (something Parnas also talks about layering in his paper).

I had to implement a feature that used an OLE standard called document

properties to persist its information in each client. I could have just made

this assumption and written to the doc properties of each client but I

decided to abstract this read/write of persistent storage into a layer of

its own. Late in the development cycle we discovered that one of our clients

could not write this structure to its document properties. So we ended up

with a different solution for persisting to this client. If the persisting

code had been dispersed through the entire code for the feature, this would

have caused a schedule slip. Luckily, my layer saved me and I only had to

implement the client part of the persistence differently for this client.

We have used events as a client independent way of relaying information to

the clients in several of our larger features and it is a pretty popular

design method here. It is also useful while investigating bugs because the

event handlers are good places to set breakpoints in to study interactions

between the client and server code. The event handlers are also well-known

entry points and so are good starting points in an investigation. We use

Parnas's VM concept extensively to write platform independent code. However,

the biggest problem is to deal with a huge code base evolved over more than

a decade. How does one keep it structured and educate all the new

programmers who start adding code to it every year? How does one retro fit

new features into the existing code base with minimum regression of

functionality while keeping the structural assumptions valid. I would be

very interested in reading what researchers have to say about these

problems.