Shamik Basu
CSE 584 Software Engineering Autumn '98
Paper 4: Tools
Use of program dependence graphs in Software Engineering by Horwitz and Reps
The problems the authors are concerned with fall into 3 categories. Slicing
problems involve just 1 program. The problems in this category are: Given a
program point p in program P find the points q of P such that if the
behavior at q were different then the behavior at p would be different.
Given p find the points q of P such that if the behavior at p were
different, the behavior at q would be different. Find a projection Q of P
such that when P and Q are run on the same initial state, the behaviors at p
in P and Q are identical. The second category of problems is differencing
problems and these involve 2 programs. The problems in this category are:
Given programs Old and New, and a correspondence between their components,
find the components of New whose behaviors differ from the corresponding
components of Old. Given no correspondence between the components of Old and
New, find the components of New for which there are no components in Old
with the same behavior. Find a program Changes that exhibits all changed
behaviors of New with respect to Old. The third category is integration
problems and these involve 3 or more programs. Given a program Base and 2 or
more variants, together with a correspondence between the programs;
components, determine whether the changes made to create the variants from
Base are compatible. If the changes are compatible, combine the variants to
form a single integrated program, otherwise identify the source of the
interference. Another problem in this category is the same problem without
any correspondence given between the components. The authors then introduce
the concept of program dependence graphs. This is a directed multigraph. The
vertices represent assignment statements and predicates and a special Entry
vertex. The edges can represent control or data dependencies. System
dependence graphs are multi procedure versions of PDGs. Program
Representation Graphs represent procedureless programs just like PDGs but
PRGs include special phi vertices that remove the need for def order edges.
Algorithms are then presented to solve the different problems described
above.
Dynamically discovering likely program invariants to support program
evolution by Ernst et al
Program invariants are usually implicit. This paper describes techniques for
dynamically discovering invariants, along with an instrumentation and
inference engine that embodies these techniques and applies this engine to
two sets of programs. Invariants are very useful because they protect a
programmer from making changes that inadvertently violate assumptions upon
which the program's correct behavior depends. The accuracy of the inferred
invariants depends on the quality and completeness of the test cases. The
technique presented in this paper involves instrumenting the source to trace
the variables of interest, running the instrumented program over a set of
test cases and inferring invariants from the gathered data. The 2 primary
decisions in the instrumenting process are selecting the program points at
which to insert instrumentation and the variables to examine at those
points. The prototype instruments procedure entry and exit and loop heads.
The instrumented program is then run over the test suite. Invariant
inference needs repeated execution in order to get statistically valid
inferences. The invariant detector then uses the output of the instrumented
program to list the invariants. The system checks for constants or limited
range variables, linear relationships, functions, sequences etc. Negative
invariants are also presented. The probability that a property would not
appear is computed and if this is smaller than a user defined threshold, it
is reported as a negative invariant. In addition to variables explicitly
passed to the engine, derived variables like the first and last elements and
the length of an array are also tracked. The invariants are also tracked in
a staged manner so that new invariant detection can use the already detected
invariants.
This technique does seem like it would be incredibly useful in deciphering
new codebases where the assumptions made by past programmers is unknown. I
have violated many assumptions I had no ides existed in the Office codebase.
However, the technique is very dependent on the completeness of the test
suite. I don't see a way around this problem but many times assumptions I've
violated have happened because I did not know of such a code path and hence
my existing test suite didn't cover it. E.g. I assume a pointer is not going
to be null at a certain point in code based on my research but then it turns
out that if the app is booted with a filename from the command line instead
of the file open dialog, the pointer does turn out to be null. A big
positive of the approach on the other hand is that it allows for an
incremental approach. The user needs to only identify relevant parts of the
program for analysis.
Lackwit by O'Callahan and Jackson
This paper, in sharp contrast to the previous one, adopts a static approach
as opposed to a dynamic one, to determine where the structure of a program
requires sets of variables to share a common representation. Abstract data
types, abstraction violations, unused variables and functions, and simple
errors (like failure to close after opening) can be detected using this
method. The paper defines a new type system very different from the
underlying language's type system. Type inference is an attractive basis for
analysis because it is fully automated and elegantly handles rich features
like recursive pointer based data structures and function pointers. The
algorithm has worst-case double exponential time complexity but in practice
it takes little more than linear space and time in the size of the program.
A big negative of this approach in my mind is that it seems to be an all or
nothing approach - the entire code base has to be analyzed at once otherwise
variable assignments can be missed leading to wrong results. On large code
bases the amount of data generated may get overwhelming.
Static detection of dynamic memory errors by Evans
This paper introduces annotations to make assumptions explicit at interface
points. A static checking tool that exploits these annotations can detect a
broad range of errors including misuses of null pointers, use of dead
storage, memory leaks and dangerous aliasing. Run time checking suffers from
the flaw that its effectiveness depends on running the right test cases.
This paper's technique involves adding annotations in the source as
syntactic comments. More than one annotation may be used with a declaration
although certain combinations are disallowed. Full interprocedural analysis
at compile time is too expensive to be practical and does not scale. The
annotations provide enough information to do a good job while checking each
procedure independently. When a function body is checked, annotations on its
parameters and globals are assumed to be true. At all return points the
function must satisfy all given constraints. When a function call is
encountered, the arguments and globals are checked to make sure they satisfy
the constraints. This approach is not complete - it makes many assumptions
that will allow some bugs to slip through.