**Key Concept:** program dependence graphs
(PDGs)

**Summary: **The paper uses PDGs for programs
(single-procedure programs) and systems (multi-procedure programs) to present
algorithms to solve eight problems related to the structure of programs:
three slicing problems, three differencing problems, and two integration
problems. Solutions to these problems are potentially useful in a
variety of software engineering tasks, such as program understanding, system
evolution, and version integration. The techniques are static.
In general, the solutions are complete, but not sound.

**Assessment: ** The ideas and approach
presented in this paper sound great in theory. But, as of the writing
of this paper (1992), the implementation of the ideas had not progressed
beyond a toy domain. The language constructs supported are a subset
of those of a useful language and the scale of programs analyzed is very
small. These comments are observations, not criticisms; it
is, of course, a reasonable progression to validate an approach in a toy
domain during early research, and then scale to more useful domains as
the work progresses. A key question for this work is whether it can
be scaled to useful domains. The alarming number of occurrences
of the words "intractable" and "undecidable" in the text discussion of
algorithms does not bode well for the scalability of the approach -- nor
does the graphical complexity of PDGs of even the simplest of programs.

**Mini-Soapbox.**
Assessing this work as research, one probably needs to be judiciously agnostic
about whether it will ever lead to useful tools for practicing software
engineers. In the field of computer science, at least, research ranges
from esoteric mathematical formalism to unapologetically pragmatic commercialism.
As a formal approach to characterizing certain properties of programs and
their variations, the work in this paper is quite appealing, and intuitively
instructive -- that is, it instructs the intuition. But if we assume
that the goal of this research (as well as some of the other research efforts
we have read about this term) is to develop useful tools, then the following
comment is a reasonable one.

In a previous incarnation, I spent alot of time doing back-of-envelope and order-of-magnitude calculations about different phenomena. It seems to me, that with the well-developed understanding of computational complexity in computer science, extensive system parameterization -- processor speed, memory size, etc. -- and sophisticated modeling and simulation techniques, it is possible and would be extremely useful to sketch very roughly the resource profile of a prototype version of this kind of approach. If useful functionality looks like it exceeds current technology by five or six orders of magnitude, then you have a problem -- and probably not a useful tool. If you can guesstimate useful functionality within two or three orders of magnitude of what is currently available, then you may have something useful. The other half of the back-of-the-envelope calculation is, of course, to play what-if with the feature set in an attempt to speculate on what kind of configuration would supply what kind of useful functionality. These comments are way over on the development side of R & D, but in industry, the project would probably live or die by such a pitch.

-- Love the approach, though.

**Ernst, Cockrell, Griswold,
and Notkin. 1997. "Dynamically Discovering Likely Program Invariants
to Support Program Evolution."**

**Key Concept:** inferred invariants in
programs

**Summary: ** The paper describes
dynamic inference of program invariants from execution traces. The
basic approach is to instrument a program to capture variable values at
key program points in a trace for a suite of test runs, and then to use
an inference engine to determine ** likely **invariants.
The engine works with a reasonably short, but extensible, set of possible
invariants to test. The thesis is that the invariants so discovered
are potentially useful in a variety of software engineering tasks, such
as program understanding and system modification.

**Assessment:** I have a couple
of orthogonal (that is, unrelated) reactions to the work described in this
paper. The first of these reactions is the same as the soapbox comments
made about [Horowitz and Reps]: Make me believe that this system has the
potential to scale to something with useful functionality. For this
work, though, the need is much stronger than for [Horowitz and Reps].
Since the basic approach is thoroughly empirical, with very little theoretical
basis (any?), it needs to be justified by potential usefulness. Question:
If a 100 KLOC program were instrumented as described in this paper and
traces generated for a reasonable number of runs, how big would the trace
database be? A few MBs? A few GBs? 1,000 TBs? A
few extremely rough calculations like this would provide significant guidance
about what this approach can reasonably aspire to, and therefore, help
the researchers figure out where to (try to) go next in developing the
approach. For example, if the trace size will be unmanageable for
100 KLOC programs, then it is a priority to be more selective about the
collection of raw data. And there would be a need to develop well-motivated
criteria for making the selections.

Second, the question that stayed at the forefront of my mind while reading this paper was: Isn't an inferred invariant "just" a statistic about a program/test suite? The point being, "inferred invariant" sounds esoteric and significant, but "statistic" sounds entirely more pedestrian and even intellectually suspect. More aggressively, if you are simply working from a laundry list of possible invariants that you walk through -- looking for statistical significance with or without formal motivation -- you are basically mucking around in the data and leave yourself wide-open to discovering spurious correlations -- things that happen to be true statistically, but do not really mean anything, and may be actively misleading. To use a CS idiom, you may have little more than a pathologically eclectic rubbish lister. To use another analogy, the discovery of invariants may be alot like poorly designed data-mining. As I understand the present system, you do not use the spurious correlations generated because the database of invariants is queried with (hopefully meaningful) hypotheses. Nevertheless, there may be alot of spurious correlations in the database masquerading as invariants. At the least, this makes me uneasy because a bad hypothesis can be supported by a spurious correlation.

Third, my statistical skills are not everything they might be, but I get the impression that the statistical analysis employed in this work is pretty basic stuff, . . . and I am not convinced that this is all that is needed. I do not know how high the confidence levels are, but it seems to me that in real systems many of the most troubling bugs result from (statistically) highly unlikely events -- and that is how they find their way into shipped products. If the bugs you want to chase down or avoid are manifest in highly unlikely events to begin with, exactly how much confidence can you place in (even) very high confidence levels? (How) Can you justify appropriate confidence levels? The same observation holds for unlikely but correctly handled cases: Based on empirical traces, the engine may infer an invariant where one does not exist, and where, in fact, the variant case is correctly handled by the program. So a statistical invariant may lead the analyst to misunderstand the program.

Fourth, discovering invariants with this
kind of approach reminds me of the confusion -- so I argued -- in [Muller
et al.] over the difference between ** recovering** program structure
and

So far, I have made a long-winded argument
that I am more of a rationalist than an empiricist. That much firmly
established, my impression is that this approach does, in fact, have considerable
potential usefulness. The key consideration is something like this:
This term we have read about a number of potentially useful approaches
and projects that address different software engineering problems.
Almost all of them seem to be partial solutions. My naive impression
is that if they were used in concert, they would have pretty significant
leverage on some very practical problems. So, for example, the inferred
invariant engine seems like a natural complement to the reflection model
tool. The reflection tool can be used to isolate the structural components
of a system that are of interest for a particular task; the invariant engine
could be used to explore the behavior of these structural components in
more detail -- to instrument variables at the interface of the ad hoc subsystems,
or to understand the internal behavior.

**O'Callahan and Jackson.
1997. "Lackwit: A Program Understanding Tool Based on Type Inference."**

**Key Concept: **static type inference

**Summary:** The paper describes
a tool -- Lackwit -- to infer constraints on and relationships among data
value representations in a program. My understanding of the basic
idea is something like the following: Given an n-variable program, begin
with n different representations, one for each variable. Then by
parsing the structural relationships among the variables in the code base,
and applying a few rules about what the relationships imply about the type
representations, the n type representations can be reduced (by unification)
to some smaller number, k. This k-partition of the n variables not
only provides the type in-/compatibility information of a typical type
system, but because of how the partition is generated, it also implies
that value flows among the variables in a partition connect all the variables,
Graphically, this scheme would be something like the following. If
the n variables are nodes, then each rule application unifies two (or more)
type representations by inserting an undirected edge between them.
When all instances of the rules derivable from the code have been applied,
a k-partition of the graph (a k-forest?) has been generated, with each
partition corresponding to an abstract type. [This probably
obscures more than it summarizes, but it depicts how I understand the system
to work.] The technique does not provide dataflow info. The
authors claim that the method is "efficient, fully automatic, and smoothly
integrates pointer aliasing and higher-order functions." They illustrate
the tool by applying it to a 10+ KLOC C program.

**Assessment:** I like the approach.
I particularly like that the authors take the time to present and discuss
the platform and performance issues -- that is, they convince me that the
technique is reasonably scalable and practically useful.

Three further observations:

(1) The notion of an underlying representation of data values is reasonably intuitive, but I would have been happier if the paper had spelled it out explicitly.

(2) A number of the systems discussed in the Introduction and Related Work are ones that we have looked at. Wish I had time to follow through in detail a comparison of the systems -- problem domains, relative strengths and weaknesses, etc. A useful summary slide for the next time the course is taught?

(3) Related to the last point, the tool sounds
both elegant and practical. But, is this the tool that you would
reach for? -- even when it is more fully developed. The flag was
raised on this question when I read, "[LCLint's] checks incorporate more
information, such as flow-sensitive dataflow analysis, so they will catch
__many__ errors that we cannot. The two tools are complementary."
(emphasis added) I guess this is an empirical question: you'd have
to play with both to see the comparative usefulness -- but the time to
do that playing was the original concern that I raised.

**Evans. 1996.
"Static Detection of Dynamic Memory Errors."**

**Key Concept:** interface annotations

**Summary: **Describes an extension of
the LCLint static checking tool to handle memory allocation issues more
effectively. (LCLint is lint++.) The strategy is to use interface
annotations which are similar to C type qualifiers. Constraining
annotations handle: permissibility of null-valued pointers; undefined,
partially defined, completely defined variables; allocation/deallocation
obligations; aliasing issues. Paper reports annotation of an existing
program and no experience with new programs. Not all bugs were detected
in the case study -- as detected.

**Assessment:** This paper reports
on a thoroughly pragmatic tool extension. It works. It adds
value. It scales. No theoretical antecedents, references, rationale,
or conclusions are offered. No apologies. The reader can, of
course, attempt to generalize or formalize the approach. The author
does not. If you can, and choose to, use LCLint, the extension is
probably worthwhile.

The paper was most useful to me as a quick
refresher on memory bugs.