Radu Bacioiu, CSE 584, Fall 1998. Assigned Readings - Analysis and Tools
1. Static Detection of Dynamic Memory Errors
David Evans
This paper describes an extension of the LCLint tool
that's geared towards finding memory management bugs (referencing
invalid storage, leaking allocated memory, etc). New annotations
(in the style of LCLint) are added so that assumptions about the
ownership, validity and allocation state of various pointer
variables are made clear. The annotations are added at what the
author calls "interface points" i.e. at function
entry and exit points. The paper sets a goal for the LCLint
extension to "detect as many real bugs as possible
efficiently and with no programmer interaction" and accepts
the tradeoff of "an analysis that is neither sound nor
complete." From my experience however I believe the classes
of bugs addressed by this extension are by far the most common
ones, and therefore the value added to the LCLint tool is very
real. The paper makes the point that memory management bugs can
exist even in an environment where garbage collection is used.
This is often forgotten when people talk about garbage
collection.
Memory management bugs are a tricky and well represented class of bugs. Traditional testing methods for this class of bugs are usually fairly random: For instance, running "stress" tests is a way to throw lots of commands/data/whatever at your program and see if and where it breaks. One could benefit for more precise methods. As the author points out, finding bugs at run time depends on how good the test cases are.
I was pleasantly surprised at the attention given in this paper to real life problems. The author emphasizes the efficiency of the algorithms employed and makes some drastic tradeoffs in favor of efficiency and simplicity (e.g. treating WHILE loops as IF loops). Also, LCLint annotations in general are very well suited to incremental usage (I cant think of any other tool thats as good at rewarding its user so immediately). On the other hand, I wished I didnt find the various relaxations of conditions described in the paper, although I realize that in practice one cant do without them. The paper correctly points out that "There were situations where simple annotations were not expressive enough to describe constraints, so checking needed to be relaxed to eliminate spurious messages." My concern however is that developers using LCLint will be tempted to take the easy way out of a non-trivial problem by using relaxation of conditions instead of thinking a little harder about how to better rewrite the code.
2. Dynamically Discovering Likely Program
Invariants to Support Program Evolution M.
Ernst, J. Cockrell, W.G. Griswold, D. Notkin
Programmers are lazy; they dont like to write
asserts in their code. Even when they do, they dont write
all possible asserts. This paper describes a tool that is trying
to help. The tool will infer simple relationships between
(tracked) variables based on values of these variables gathered
from a series of runs. Authors use statistical methods to decide
when a certain relationship is more than a mere coincidence and
is probably a correlation. As with any dynamical approach, the
validity of the inferred relationships is heavily dependent on
the quality of the input data used for the repeated executions of
the routine or program in question.
The code is first instrumented so that the values of the tracked variables are recorded for later analysis. The authors enumerate a number of relationship types that the tool checks for. However, there is no mention of extending the set of relationships. For instance, lets assume were dealing with a piece of software for an MRI scanner. One would expect to find lots of exponential decay calculations, so adding formulae like x = a*ln(y) or x = a*exp(x) to the set of inferred relationships could be beneficial.
The user of the tool has leverage over the output in various ways. First and foremost, selecting a representative set of inputs is crucial. Second, the user has to define the "confidence levels" for when to assume an inferred relationship is correct. The tool is efficient enough so that the set of runs can be adjusted and re-run as appropriate in order to address ambiguous or incorrectly inferred invariants. Invalid inferred invariants could also be instrumental in detecting skewed or poor test cases. Since in general the same test cases are used to perform a wide array of dynamical tests and analysis, improving the set of test cases could be a very important positive side effect of using this tool.
The paper mentions that "both variable derivation and invariant inference benefit from access to previously computed invariants." However there is no mention of whether and how the tool is making use of the few explicit program invariants (commonly known as asserts) already present in the code.
3. Lackwit: A Program Understanding Tool
Based on Type Inference R. OCallahan and
D. Jackson
Lackwit is a static tool designed to identify abstract
data types used in an application and help programmers detect and
hopefully fix "unclean usages" of these data types. The
tool is fully automatic and is able to handle Cs pointers
to structures and functions (for many tools pointers seem to be
the most delicate to deal with). For the purpose of
exemplification the authors consider a trivial program starting
from the premise that each variable of type "int" can
really be of a different type and then try to identify which
ints are of the same type. I have to admit that I
didnt quite grasp how much this tool can help in writing an
application that uses a strongly typed language. The tools
implementers also make the choice of displaying the results
graphically but they recognize that this makes presenting useful
information to the user a rather thorny and unscalable problem.
Ironically, they blame this problem on the limitations of the
graph drawing tool employed rather than on their choice of output
presentation.
The tool proves to be instrumental in identifying dead variables and, to some extent, in statically detecting memory leaks.
4. The Use of Program Dependence Graphs in
Software Engineering S. Horwitz and T. Reps
In this paper, the authors define a heavy formalism and
construct a Program Dependence Graph. The vertices in this graph
are operations on variables and the edges represent data and
control dependencies between operations. The graph unfortunately
ends up being way more complex than the code itself. For
instance, a simple stand-alone function eight instructions long
generates a graph with 10 vertices and 20 edges, and two
extremely trivial functions sharing ten lines of code between
them generate a graph with 18 vertices and too many edges to
count. At this point any graphical rendering to the user is
impractical. Authors start by studying the purely academical
example of "single procedure programs" and extrapolate
rather carelessly from a scalability point of view to the more
down to Earth case of a "multi procedure program." The
major applicability of the research seems to be automatic program
integration, described in section 5 of the paper. Authors claim
that "anyone who has had to reconcile divergent lines of
development will recognize that it is a tedious and time
consuming task to merge programs by hand and will appreciate the
need for automatic assistance." I'm not sure I'd subscribe
to this statement. The simple tools available in the industry are
not as bad as the paper tries to suggest. These tools are not
infallible but they excel in simplicity of concepts and ease of
use. In the overwhelming majority of the cases they do their job
really well. In my experience, the worst integration (merging)
conflicts happen when two or more people try to fix the same
problem. If people work on separate assignments/bugs on the same
source code then program integration is almost always handled
correctly by the simple tools we have.
To exemplify a situation when a simple merging tool wouldn't do the right thing, heres a case when two people try to fix the same problem in different ways. The function Next is supposed to return the next element in a list (represented internally as an array) and advance the internal "index" pointer.