Shamik Basu
CSE 584 Software Engineering Autumn '98
Paper 2: Evolution
Automated support for Encapsulating Abstract Data Types by Bowdidge and
Griswold
As repeated modifications are made to a system, the design and
implementation become increasingly less understandable and maintenance
becomes more expensive. At some point the only solutions are to reimplement
or restructure the system. One method of easing maintenance of a system is
by separating it into 2 tasks - restructuring without changing its meaning
to localize scope of modification and then inserting modifications into the
restructured program. An original text based implementation of this approach
proved to be inadequate because the programmer needs to search in order to
identify all computations on a data structure. This can lead to poor design
decisions. The authors' solution was a star diagram that provides graphical
assistance for encapsulation. A single root node on the left denotes the
data structure itself. Operators directly referencing the root are
represented by nodes connected directly by edges denoting the reference. In
turn, for each node the operator consuming its result is represented as a
node connected to it. When 2 nodes denoting the same operator are connected
to the same node, they are stacked on top of each other. The tree is
terminated at the level of a function body. Nodes toward the left of the
tree represent low-level operations while those at the right represent
higher-level operations. Overlapping nodes provides 2 advantages - reduces
size of the tree and overlapped nodes can be identified as a new function
definition. The star diagram also provides additional visual cues to the
user like thick borders where related computations may be hidden etc. There
are 6 basic restructuring tasks that the user can perform - extract a
function, inline a function, extract parameter, inline a parameter, move
into interface, and move out of interface.
A Reverse engineering approach to subsystem structure identification by
Muller et al
This paper shows how top down decomposition can be constructed via bottom up
subsystem composition. The premise is that given sufficient time, an
experienced engineer is usually able to decompose a system better than an
automated procedure can. However the automated procedures can help with the
tedious portions of the task. The first phase of reverse engineering is
language dependent and involves parsing the source code and storing the
artifacts in a repository. The second phase involves a semi-automatic
language-independent subsystem composition methodology to construct
hierarchical subsystem structures. The 2 primary models to describe software
structure are unit interconnection model (describes files, subsystems,
classes etc) and syntactic interconnection model (procedures, functions
etc). The authors think of these models as directed weighted resource flow
graphs. They propose 2 sets of similarity measures for the edges of the RFG.
The interconnection strength is defined as the exact number of syntactic
objects exchanged between the nodes. 2 components are strongly coupled if
their interconnection strength is greater than the high-strength threshold
(there's a corresponding low-strength threshold). In the common
clients/suppliers measure, 2 components are similar is and only if they
provide objects to similar sets of clients. This measure captures the
concept of fewer interfaces are good. The Rigi system provides a UI for the
subsystem composition operations of removing omnipresent nodes, composing by
interconnection strength, composing by common clients/suppliers, composing
by centricity, and composing by name. A case study of reverse engineering a
ray tracing system is then presented. Dead code and omnipresent objects such
as debugging and error reporting functions are first identified. All
functions and data types that begin with certain prefixes can be identified
as belonging to components. Various other components are found by using the
2 similarity measures and varying the thresholds.
Software aging by Parnas
A sign that software engineering has matured will be that we lose our
preoccupation with the first release and focus on the long term health of
our products. There are 2 types of software aging. The first is caused by a
failure of its' owners to modify it to meet changing needs and the second is
the result of the changes that are made. Changes made by people who do not
understand the original design almost always cause the structure of the
program to degrade. The symptoms of software aging are: owners of aging
software find it hard to keep up with the market; aging software often
degrades in space/time performance as a result of a deteriorating structure;
aging software often becomes buggy. The weight gain of aging software is due
to the fact that the easiest way to add a feature is to add new code. This
in turn makes changes more difficult. First, there is more code to change; a
change that might have been made in one place only now needs to be made in
more places and it is harder to find all the routines that need to be
changed. In order to prevent aging, one must design for change. The designer
must identify the pieces that are likely to change and also estimate the
probability of each change. There are many reasons for the failure of the
widespread acceptance of this rule. Textbooks cover it only in a superficial
manner; management is so concerned with deadlines and the next release that
future maintenance costs do not get top priority; designs that result from a
careful application of information hiding are quite different from the
designs that are a result of the programmer's most intuitive work; designers
tend to mimic other designs they've seen; design principles are often
confused with language; programmers have been educated in fields other than
software engineering; software engineering researchers preach to the
converted and ignore the industry. Programmers often think their code is so
good it will not have to be changed. On the contrary, only code that is so
bad that nobody wants to touch it will never get changed. Design principles
and decisions must be recorded in a form that is useful to future
maintainers. When documentation is written, it is usually poorly organized,
incomplete and imprecise. It is not an attractive research topic either.
Documentation does not speed up the immediate next release. Reviews are
another important aspect that is frequently ignored. This is usually
because: many programmers have no professional training in software; even
computer science graduates usually have an education that neglected
professional concerns such as documentation and reviews; many practitioners
do not know how to write a design document; software is usually produced
under time pressures that do not allow time for reviews; many programmers
resent the idea that anyone should review the work they have done. Software
aging is inevitable. Our ability to design for change depends on our ability
to predict the future and we can do so only approximately. Getting the code
to run is not the only thing that matters. Code is being written in a lot of
different industries and the same problems are being solved differently in
each of them. There needs to be more communication. Final conclusions in the
paper are that we cannot assume that old stuff is known and did not work; we
cannot assume that old stuff just works either; we cannot ignore the
splinter software groups and model products must be created.
In Office we spend the down time following each release to do post mortems
and write docs.
Reengineering with Reflexion Models by Murphy and Notkin
The reflexion model allows engineers to rapidly gain task specific knowledge
about a system's source code. This technique involves 5 steps. The user
first defines a high level model by reviewing artifacts, interviewing
experts and other methods. He then applies a tool like a call graph
extractor or a file dependency extractor to extract interaction information
from the source code. The user then defines a map that describes how
entities in the source and high-level models relate. He then invokes a set
of tools to compute a reflexion model. This model lets the user see
interactions in the source code from the viewpoint of the high-level model.
The whole process is iterative so once the reflexion model is computed the
user investigates the divergences and refines the model and the map.
I've worked on the Excel and Office code base in general for the last 4
years and I always feel like I'm going through a huge dark mansion looking
for light switches and flipping them on. I regularly find huge areas of
functionality that I was completely ignorant about. The fact that this model
helped the user come up with a detailed model of the Excel code base in a
month just blows me away. I actually now remember walking through a hallway
and seeing charts of calls between the "layer" in Excel and the code that
sits on top of it. That was probably this very project. The unfortunate
thing is that all that work seems to have completely disappeared now. No one
has ever told me about this model that we came up with. People still just
read the Excel Internals doc, talk to some old timers and single step their
way through the maze to try to figure it out.
Another problem of maintenance and evolution is how to figure out feature
interactions. This is by far the largest problem in the Office code base. No
one knows all the features in the Office apps. We add the functionality of
the new feature that we have come up with and then start discovering
interactions with features we never knew about. I have single handedly
generated more than a thousand bugs in the last few months due to feature
interactions. How does one figure these out in advance to make a decent
schedule? Old file formats to be supported, backward compatibility of every
feature, embedded and automation scenarios, cross platform issues and the
list goes on. The big question is should one do all the detailed work before
hand to figure out the interactions in order to make a decent schedule or
just blast our way through it, fixing bugs as they arise and just keep a
general buffer time for fixing bugs. On the other hand, sometimes issues
come up really late in the process that turn out to be impossible to fix at
that point. Is there a reasonably cost effective way to anticipate these?
The reflexion model of Excel turned out to be huge and in order to figure
out exactly how a new feature would fit in before hand with the model would
take a long time.
I guess it does come down to Parnas' observation that people have to look
beyond the next release and think long term. But once the software grows
beyond a certain level of complexity, the amount of extra time needed to do
things right goes up exponentially. Competitive pressures in the industry
make it really difficult for people to schedule things to get done the right
way.