CSE 584 Software Engineering Fall 1998,
Reading 4: Tools
Ben Kaizi-Lutu
Lackwitt: A program Understanding Tool based on Type Inference.
The Lackwitt tool is introduced fill the niche of program discovery and understanding.
The entire scheme hinges on type inference. Type inference works by introducing
a new type system in which representation information is encoded within the
types. The richer and more expressive type system enables the computation of
representation sharing. I think one of the very attractive things about
Lackwitt is that it is highly automated. This is because the type inference
procedure is fully automatic. I especially liked the fact that this tool has
been used to analyze a project of about 17K lines of code to provide
information useful enough for migration. It would be interesting to see how the
tool will fare with a project of about 1/2 million lines of code. This tends to
be a reasonable measure of performance for an industrial type tool. One of the
common problems with this type of tool is the lack of ease of use regarding how
the actual results are gathered from the tool. Generally the problem is that
you get too much information out of the tool and it is hard to know what part
is the real information you are interested in and what part is simply
redundant. This problem is compounded with graphical displays, which tend to
become so over populated that human inspection is practically impossible. This
problem is addressed in Lackwitt by only showing nodes of interest in the
results graph. As for how much detail there should be in the node itself, the
authors hope to add an interactive feature that would allow the user to specify
the level of detail that they would want to see. Overall I like the ideas
behind the tool and it is the kind of tool that I would not mind investing
about a day to try.
Dynamically Discovering Likely Program Invariants to Support
Program Evolution
The subject matter discussed addresses the growing need of program migration
assistance tools. In my experience, more than half of a program's development
is spent in fixing bugs. I think the primary reason for this imbalance in true
development versus program correction is that a lot of the corrective measures
themselves introduce more bugs in the program. This is caused by among other
things the inability to easily identify and specify what should not change
during bug fixing. The method outlined is based on the discovery of invariants
within a program. In order to 'spy' on the program, there is need for
instrumentation. Generally the problem with instrumentation is deciding how
much or where to instrument within the program. It does seem that the choice of
procedure entry and exit points and loop heads made by the authors, would at
least provide the minimal degree of instrumentation points. For a real working tool,
I would like to see a higher degree of flexibility where I could say apply
different levels of instrumentation to different modules or classes. One of the
drawbacks of this method is that it requires the use of a test suite to
exercise the instrumentation points. For one it is not clear how easy it would
be to come up with representative test cases and secondly it suffers from the
common problem of all dynamic analysis in that it would only exercise code
paths that are exposed by the given set of inputs and not any more. However, as
the authors point out, the method can be used in conjunction with other forms
of static analysis in order to understand invariants' implications. Again as in
other tools, presentation of results' information is critical and the authors
do address it well and make good suggestions for summarizing and trimming
information in a working version of the tool. It would be great to see these
ideas developed into a more comprehensive and concrete prototype encapsulating
all the ideas presented into a tool that can be applied to large-scale code
bases.
Static Detection of Dynamic Memory Errors
The paper deals with the extension of LCLint to uncover memory management
related errors in programs. I was not so interested in any discussions on
detection of memory leaks because there has been a lot of work done in this
area and it pretty well understood. I was more drawn to problems like misuse of
null pointers and dangerous aliasing. The authors use annotations to make
assumptions about function interfaces, variables and types. The constraints necessary
to satisfy these assumptions are checked at compile-time revealing any
violations of the constraints thus exposing potential bugs. This method has
great advantages in that it is built on a tool that has been used with great
success on large programs. This means that the learning curve for users who are
already familiar with LCLint would not be steep at all. However adding
annotations is an iterative process where with each iteration, LCLint detects
some anomalies, annotations are added or discovered bugs are fixed and LCLint
is run again to propagate the new annotations. I think this could be cumbersome
for a large program. The authors agree that the method would benefit from work
done to automate the annotation process. It also appears that the method is
still lacking in the presentation of results. There is need to manage the amount
of spurious messages and it is not clear how to determine what messages are
indeed indications of bugs and which ones are redundant. And again since the
method employs static analysis, there will be bugs that are not uncovered
simply because they are dynamic in nature and would need a more dynamic
approach. Something that is mentioned in closing that could be a great idea is
the adding of annotations while a program is being developed. This approach
might be the most practical for a method of this nature and might very well be
a worth investment.
The use of Program Dependence Graphs in Software Engineering
The program dependence graph (pdg) is a language independent program
representation. It can be a great tool in providing program understanding,
determining differences in program versions and creation of new programs by
joining pieces of old programs. Overall the concepts seem to be generally more
complex and involved than was the case for all the other three tools/methods.
There was the general pattern of doing something simple for one-procedure
programs and then having to do something much more involved for multiple
procedure programs. Well seeing as there are very few one-procedure programs in
the real world which would need a tool for understanding, then the tool is most
complex where it is most needed. The authors seem to tackle quite a few topics
in one paper and honestly I kept getting lost between whether I was slicing or
differencing or integrating. Apparently the work has been implemented in a tool
but we are not told about the performance or ease of use of the tool only that
even more additional techniques are implemented in the tool. My guess is that
the tool is very experimental and very difficult to use because it tries to do
too much. That said, the program dependence graph appears to be a powerful
construct that could be used in a lot of ways to ease software program
understanding and evolution