Shamik Basu

CSE 584 Software Engineering Autumn '98

Paper 4: Tools

Use of program dependence graphs in Software Engineering by Horwitz and Reps

The problems the authors are concerned with fall into 3 categories. Slicing

problems involve just 1 program. The problems in this category are: Given a

program point p in program P find the points q of P such that if the

behavior at q were different then the behavior at p would be different.

Given p find the points q of P such that if the behavior at p were

different, the behavior at q would be different. Find a projection Q of P

such that when P and Q are run on the same initial state, the behaviors at p

in P and Q are identical. The second category of problems is differencing

problems and these involve 2 programs. The problems in this category are:

Given programs Old and New, and a correspondence between their components,

find the components of New whose behaviors differ from the corresponding

components of Old. Given no correspondence between the components of Old and

New, find the components of New for which there are no components in Old

with the same behavior. Find a program Changes that exhibits all changed

behaviors of New with respect to Old. The third category is integration

problems and these involve 3 or more programs. Given a program Base and 2 or

more variants, together with a correspondence between the programs;

components, determine whether the changes made to create the variants from

Base are compatible. If the changes are compatible, combine the variants to

form a single integrated program, otherwise identify the source of the

interference. Another problem in this category is the same problem without

any correspondence given between the components. The authors then introduce

the concept of program dependence graphs. This is a directed multigraph. The

vertices represent assignment statements and predicates and a special Entry

vertex. The edges can represent control or data dependencies. System

dependence graphs are multi procedure versions of PDGs. Program

Representation Graphs represent procedureless programs just like PDGs but

PRGs include special phi vertices that remove the need for def order edges.

Algorithms are then presented to solve the different problems described

above.

Dynamically discovering likely program invariants to support program

evolution by Ernst et al

Program invariants are usually implicit. This paper describes techniques for

dynamically discovering invariants, along with an instrumentation and

inference engine that embodies these techniques and applies this engine to

two sets of programs. Invariants are very useful because they protect a

programmer from making changes that inadvertently violate assumptions upon

which the program's correct behavior depends. The accuracy of the inferred

invariants depends on the quality and completeness of the test cases. The

technique presented in this paper involves instrumenting the source to trace

the variables of interest, running the instrumented program over a set of

test cases and inferring invariants from the gathered data. The 2 primary

decisions in the instrumenting process are selecting the program points at

which to insert instrumentation and the variables to examine at those

points. The prototype instruments procedure entry and exit and loop heads.

The instrumented program is then run over the test suite. Invariant

inference needs repeated execution in order to get statistically valid

inferences. The invariant detector then uses the output of the instrumented

program to list the invariants. The system checks for constants or limited

range variables, linear relationships, functions, sequences etc. Negative

invariants are also presented. The probability that a property would not

appear is computed and if this is smaller than a user defined threshold, it

is reported as a negative invariant. In addition to variables explicitly

passed to the engine, derived variables like the first and last elements and

the length of an array are also tracked. The invariants are also tracked in

a staged manner so that new invariant detection can use the already detected

invariants.

This technique does seem like it would be incredibly useful in deciphering

new codebases where the assumptions made by past programmers is unknown. I

have violated many assumptions I had no ides existed in the Office codebase.

However, the technique is very dependent on the completeness of the test

suite. I don't see a way around this problem but many times assumptions I've

violated have happened because I did not know of such a code path and hence

my existing test suite didn't cover it. E.g. I assume a pointer is not going

to be null at a certain point in code based on my research but then it turns

out that if the app is booted with a filename from the command line instead

of the file open dialog, the pointer does turn out to be null. A big

positive of the approach on the other hand is that it allows for an

incremental approach. The user needs to only identify relevant parts of the

program for analysis.

Lackwit by O'Callahan and Jackson

This paper, in sharp contrast to the previous one, adopts a static approach

as opposed to a dynamic one, to determine where the structure of a program

requires sets of variables to share a common representation. Abstract data

types, abstraction violations, unused variables and functions, and simple

errors (like failure to close after opening) can be detected using this

method. The paper defines a new type system very different from the

underlying language's type system. Type inference is an attractive basis for

analysis because it is fully automated and elegantly handles rich features

like recursive pointer based data structures and function pointers. The

algorithm has worst-case double exponential time complexity but in practice

it takes little more than linear space and time in the size of the program.

A big negative of this approach in my mind is that it seems to be an all or

nothing approach - the entire code base has to be analyzed at once otherwise

variable assignments can be missed leading to wrong results. On large code

bases the amount of data generated may get overwhelming.

Static detection of dynamic memory errors by Evans

This paper introduces annotations to make assumptions explicit at interface

points. A static checking tool that exploits these annotations can detect a

broad range of errors including misuses of null pointers, use of dead

storage, memory leaks and dangerous aliasing. Run time checking suffers from

the flaw that its effectiveness depends on running the right test cases.

This paper's technique involves adding annotations in the source as

syntactic comments. More than one annotation may be used with a declaration

although certain combinations are disallowed. Full interprocedural analysis

at compile time is too expensive to be practical and does not scale. The

annotations provide enough information to do a good job while checking each

procedure independently. When a function body is checked, annotations on its

parameters and globals are assumed to be true. At all return points the

function must satisfy all given constraints. When a function call is

encountered, the arguments and globals are checked to make sure they satisfy

the constraints. This approach is not complete - it makes many assumptions

that will allow some bugs to slip through.