Shamik Basu

CSE 584 Software Engineering Autumn '98

Paper 2: Evolution

Automated support for Encapsulating Abstract Data Types by Bowdidge and

Griswold

As repeated modifications are made to a system, the design and

implementation become increasingly less understandable and maintenance

becomes more expensive. At some point the only solutions are to reimplement

or restructure the system. One method of easing maintenance of a system is

by separating it into 2 tasks - restructuring without changing its meaning

to localize scope of modification and then inserting modifications into the

restructured program. An original text based implementation of this approach

proved to be inadequate because the programmer needs to search in order to

identify all computations on a data structure. This can lead to poor design

decisions. The authors' solution was a star diagram that provides graphical

assistance for encapsulation. A single root node on the left denotes the

data structure itself. Operators directly referencing the root are

represented by nodes connected directly by edges denoting the reference. In

turn, for each node the operator consuming its result is represented as a

node connected to it. When 2 nodes denoting the same operator are connected

to the same node, they are stacked on top of each other. The tree is

terminated at the level of a function body. Nodes toward the left of the

tree represent low-level operations while those at the right represent

higher-level operations. Overlapping nodes provides 2 advantages - reduces

size of the tree and overlapped nodes can be identified as a new function

definition. The star diagram also provides additional visual cues to the

user like thick borders where related computations may be hidden etc. There

are 6 basic restructuring tasks that the user can perform - extract a

function, inline a function, extract parameter, inline a parameter, move

into interface, and move out of interface.

A Reverse engineering approach to subsystem structure identification by

Muller et al

This paper shows how top down decomposition can be constructed via bottom up

subsystem composition. The premise is that given sufficient time, an

experienced engineer is usually able to decompose a system better than an

automated procedure can. However the automated procedures can help with the

tedious portions of the task. The first phase of reverse engineering is

language dependent and involves parsing the source code and storing the

artifacts in a repository. The second phase involves a semi-automatic

language-independent subsystem composition methodology to construct

hierarchical subsystem structures. The 2 primary models to describe software

structure are unit interconnection model (describes files, subsystems,

classes etc) and syntactic interconnection model (procedures, functions

etc). The authors think of these models as directed weighted resource flow

graphs. They propose 2 sets of similarity measures for the edges of the RFG.

The interconnection strength is defined as the exact number of syntactic

objects exchanged between the nodes. 2 components are strongly coupled if

their interconnection strength is greater than the high-strength threshold

(there's a corresponding low-strength threshold). In the common

clients/suppliers measure, 2 components are similar is and only if they

provide objects to similar sets of clients. This measure captures the

concept of fewer interfaces are good. The Rigi system provides a UI for the

subsystem composition operations of removing omnipresent nodes, composing by

interconnection strength, composing by common clients/suppliers, composing

by centricity, and composing by name. A case study of reverse engineering a

ray tracing system is then presented. Dead code and omnipresent objects such

as debugging and error reporting functions are first identified. All

functions and data types that begin with certain prefixes can be identified

as belonging to components. Various other components are found by using the

2 similarity measures and varying the thresholds.

Software aging by Parnas

A sign that software engineering has matured will be that we lose our

preoccupation with the first release and focus on the long term health of

our products. There are 2 types of software aging. The first is caused by a

failure of its' owners to modify it to meet changing needs and the second is

the result of the changes that are made. Changes made by people who do not

understand the original design almost always cause the structure of the

program to degrade. The symptoms of software aging are: owners of aging

software find it hard to keep up with the market; aging software often

degrades in space/time performance as a result of a deteriorating structure;

aging software often becomes buggy. The weight gain of aging software is due

to the fact that the easiest way to add a feature is to add new code. This

in turn makes changes more difficult. First, there is more code to change; a

change that might have been made in one place only now needs to be made in

more places and it is harder to find all the routines that need to be

changed. In order to prevent aging, one must design for change. The designer

must identify the pieces that are likely to change and also estimate the

probability of each change. There are many reasons for the failure of the

widespread acceptance of this rule. Textbooks cover it only in a superficial

manner; management is so concerned with deadlines and the next release that

future maintenance costs do not get top priority; designs that result from a

careful application of information hiding are quite different from the

designs that are a result of the programmer's most intuitive work; designers

tend to mimic other designs they've seen; design principles are often

confused with language; programmers have been educated in fields other than

software engineering; software engineering researchers preach to the

converted and ignore the industry. Programmers often think their code is so

good it will not have to be changed. On the contrary, only code that is so

bad that nobody wants to touch it will never get changed. Design principles

and decisions must be recorded in a form that is useful to future

maintainers. When documentation is written, it is usually poorly organized,

incomplete and imprecise. It is not an attractive research topic either.

Documentation does not speed up the immediate next release. Reviews are

another important aspect that is frequently ignored. This is usually

because: many programmers have no professional training in software; even

computer science graduates usually have an education that neglected

professional concerns such as documentation and reviews; many practitioners

do not know how to write a design document; software is usually produced

under time pressures that do not allow time for reviews; many programmers

resent the idea that anyone should review the work they have done. Software

aging is inevitable. Our ability to design for change depends on our ability

to predict the future and we can do so only approximately. Getting the code

to run is not the only thing that matters. Code is being written in a lot of

different industries and the same problems are being solved differently in

each of them. There needs to be more communication. Final conclusions in the

paper are that we cannot assume that old stuff is known and did not work; we

cannot assume that old stuff just works either; we cannot ignore the

splinter software groups and model products must be created.

In Office we spend the down time following each release to do post mortems

and write docs.

Reengineering with Reflexion Models by Murphy and Notkin

The reflexion model allows engineers to rapidly gain task specific knowledge

about a system's source code. This technique involves 5 steps. The user

first defines a high level model by reviewing artifacts, interviewing

experts and other methods. He then applies a tool like a call graph

extractor or a file dependency extractor to extract interaction information

from the source code. The user then defines a map that describes how

entities in the source and high-level models relate. He then invokes a set

of tools to compute a reflexion model. This model lets the user see

interactions in the source code from the viewpoint of the high-level model.

The whole process is iterative so once the reflexion model is computed the

user investigates the divergences and refines the model and the map.

I've worked on the Excel and Office code base in general for the last 4

years and I always feel like I'm going through a huge dark mansion looking

for light switches and flipping them on. I regularly find huge areas of

functionality that I was completely ignorant about. The fact that this model

helped the user come up with a detailed model of the Excel code base in a

month just blows me away. I actually now remember walking through a hallway

and seeing charts of calls between the "layer" in Excel and the code that

sits on top of it. That was probably this very project. The unfortunate

thing is that all that work seems to have completely disappeared now. No one

has ever told me about this model that we came up with. People still just

read the Excel Internals doc, talk to some old timers and single step their

way through the maze to try to figure it out.

Another problem of maintenance and evolution is how to figure out feature

interactions. This is by far the largest problem in the Office code base. No

one knows all the features in the Office apps. We add the functionality of

the new feature that we have come up with and then start discovering

interactions with features we never knew about. I have single handedly

generated more than a thousand bugs in the last few months due to feature

interactions. How does one figure these out in advance to make a decent

schedule? Old file formats to be supported, backward compatibility of every

feature, embedded and automation scenarios, cross platform issues and the

list goes on. The big question is should one do all the detailed work before

hand to figure out the interactions in order to make a decent schedule or

just blast our way through it, fixing bugs as they arise and just keep a

general buffer time for fixing bugs. On the other hand, sometimes issues

come up really late in the process that turn out to be impossible to fix at

that point. Is there a reasonably cost effective way to anticipate these?

The reflexion model of Excel turned out to be huge and in order to figure

out exactly how a new feature would fit in before hand with the model would

take a long time.

I guess it does come down to Parnas' observation that people have to look

beyond the next release and think long term. But once the software grows

beyond a certain level of complexity, the amount of extra time needed to do

things right goes up exponentially. Competitive pressures in the industry

make it really difficult for people to schedule things to get done the right

way.