Horowitz and Reps.  1992.  "The Use of Program Dependence Graphs in Software Engineering."

Key Concept: program dependence graphs (PDGs)

Summary: The paper uses PDGs for programs (single-procedure programs) and systems (multi-procedure programs) to present algorithms to solve eight problems related to the structure of programs: three slicing problems, three differencing problems, and two integration problems.  Solutions to these problems are potentially useful in a variety of software engineering tasks, such as program understanding, system evolution, and version integration.  The techniques are static.  In general, the solutions are complete, but not sound.

Assessment:  The ideas and approach presented in this paper sound great in theory.  But, as of the writing of this paper (1992), the implementation of the ideas had not progressed beyond a toy domain.  The language constructs supported are a subset of those of a useful language and the scale of programs analyzed is very small.   These comments are observations, not criticisms; it is, of course, a reasonable progression to validate an approach in a toy domain during early research, and then scale to more useful domains as the work progresses.  A key question for this work is whether it can be scaled to useful domains.   The alarming number of occurrences of the words "intractable" and "undecidable" in the text discussion of algorithms does not bode well for the scalability of the approach -- nor does the graphical complexity of PDGs of even the simplest of programs.

Mini-Soapbox.  Assessing this work as research, one probably needs to be judiciously agnostic about whether it will ever lead to useful tools for practicing software engineers.  In the field of computer science, at least, research ranges from esoteric mathematical formalism to unapologetically pragmatic commercialism.  As a formal approach to characterizing certain properties of programs and their variations, the work in this paper is quite appealing, and intuitively instructive -- that is, it instructs the intuition.  But if we assume that the goal of this research (as well as some of the other research efforts we have read about this term) is to develop useful tools, then the following comment is a reasonable one.

In a previous incarnation, I spent alot of time doing back-of-envelope and order-of-magnitude calculations about different phenomena.  It seems to me, that with the well-developed understanding of computational complexity in computer science, extensive system parameterization -- processor speed, memory size, etc. -- and sophisticated modeling and simulation techniques, it is possible and would be extremely useful to sketch very roughly the resource profile of a prototype version of this kind of approach.  If useful functionality looks like it exceeds current technology by five or six orders of magnitude, then you have a problem -- and probably not a useful tool.  If you can guesstimate useful functionality within two or three orders of magnitude of what is currently available, then you may have something useful.  The other half of the back-of-the-envelope calculation is, of course, to play what-if with the feature set in an attempt to speculate on what kind of configuration would supply what kind of useful functionality.  These comments are way over on the development side of R & D, but in industry, the project would probably live or die by such a pitch.

-- Love the approach, though.

Ernst, Cockrell, Griswold, and Notkin.  1997.  "Dynamically Discovering Likely Program Invariants to Support Program Evolution."

Key Concept: inferred invariants in programs

Summary:  The paper describes dynamic inference of program invariants from execution traces.  The basic approach is to instrument a program to capture variable values at key program points in a trace for a suite of test runs, and then to use an inference engine to determine likely invariants.  The engine works with a reasonably short, but extensible, set of possible invariants to test.  The thesis is that the invariants so discovered are potentially useful in a variety of software engineering tasks, such as program understanding and system modification.

Assessment:  I have a couple of orthogonal (that is, unrelated) reactions to the work described in this paper.  The first of these reactions is the same as the soapbox comments made about [Horowitz and Reps]: Make me believe that this system has the potential to scale to something with useful functionality.  For this work, though, the need is much stronger than for [Horowitz and Reps].  Since the basic approach is thoroughly empirical, with very little theoretical basis (any?), it needs to be justified by potential usefulness.  Question: If a 100 KLOC program were instrumented as described in this paper and traces generated for a reasonable number of runs, how big would the trace database be?  A few MBs?  A few GBs?  1,000 TBs?  A few extremely rough calculations like this would provide significant guidance about what this approach can reasonably aspire to, and therefore, help the researchers figure out where to (try to) go next in developing the approach.  For example, if the trace size will be unmanageable for 100 KLOC programs, then it is a priority to be more selective about the collection of raw data.  And there would be a need to develop well-motivated criteria for making the selections.

Second, the question that stayed at the forefront of my mind while reading this paper was: Isn't an inferred invariant "just" a statistic about a program/test suite?  The point being, "inferred invariant" sounds esoteric and significant, but "statistic" sounds entirely more pedestrian and even intellectually suspect.  More aggressively, if you are simply working from a laundry list of possible invariants that you walk through -- looking for statistical significance with or without formal motivation -- you are basically mucking around in the data and leave yourself wide-open to discovering spurious correlations -- things that happen to be true statistically, but do not really mean anything, and may be actively misleading.  To use a CS idiom, you may have little more than a pathologically eclectic rubbish lister.  To use another analogy, the discovery of invariants may be alot like poorly designed data-mining.  As I understand the present system, you do not use the spurious correlations generated because the database of invariants is queried with (hopefully meaningful) hypotheses.  Nevertheless, there may be alot of spurious correlations in the database masquerading as invariants.  At the least, this makes me uneasy because a bad hypothesis can be supported by a spurious correlation.

Third, my statistical skills are not everything they might be, but I get the impression that the statistical analysis employed in this work is pretty basic stuff, . . .  and I am not convinced that this is all that is needed.  I do not know how high the confidence levels are, but it seems to me that in real systems many of the most troubling bugs result from (statistically) highly unlikely events -- and that is how they find their way into shipped products.   If the bugs you want to chase down or avoid are manifest in highly unlikely events to begin with, exactly how much confidence can you place in (even) very high confidence levels?  (How) Can you justify appropriate confidence levels?  The same observation holds for unlikely but correctly handled cases: Based on empirical traces, the engine may infer an invariant where one does not exist, and where, in fact, the variant case is correctly handled by the program.  So a statistical invariant may lead the analyst to misunderstand the program.

Fourth, discovering invariants with this kind of approach reminds me of the confusion -- so I argued -- in [Muller et al.] over the difference between recovering program structure and imputing it.  This approach does not discover invariants; it hypothesizes their existence from the empirical evidence.  The only thing the system can establish rigorously is the falsification of a hypothesized invariant through an empirical counter-example.  The "true" invariants in the program can be recovered -- but not with this approach.

So far, I have made a long-winded argument that I am more of a rationalist than an empiricist.  That much firmly established, my impression is that this approach does, in fact, have considerable potential usefulness.  The key consideration is something like this: This term we have read about a number of potentially useful approaches and projects that address different software engineering problems.  Almost all of them seem to be partial solutions.  My naive impression is that if they were used in concert, they would have pretty significant leverage on some very practical problems.  So, for example, the inferred invariant engine seems like a natural complement to the reflection model tool.  The reflection tool can be used to isolate the structural components of a system that are of interest for a particular task; the invariant engine could be used to explore the behavior of these structural components in more detail -- to instrument variables at the interface of the ad hoc subsystems, or to understand the internal behavior.

O'Callahan and Jackson.  1997.  "Lackwit: A Program Understanding Tool Based on Type Inference."

Key Concept: static type inference

Summary:  The paper describes a tool -- Lackwit -- to infer constraints on and relationships among data value representations in a program.  My understanding of the basic idea is something like the following: Given an n-variable program, begin with n different representations, one for each variable.  Then by parsing the structural relationships among the variables in the code base, and applying a few rules about what the relationships imply about the type representations, the n type representations can be reduced (by unification) to some smaller number, k.  This k-partition of the n variables not only provides the type in-/compatibility information of a typical type system, but because of how the partition is generated, it also implies that value flows among the variables in a partition connect all the variables,  Graphically, this scheme would be something like the following.  If the n variables are nodes, then each rule application unifies two (or more) type representations by inserting an undirected edge between them.  When all instances of the rules derivable from the code have been applied, a k-partition of the graph (a k-forest?) has been generated, with each partition corresponding to an abstract type.   [This probably obscures more than it summarizes, but it depicts how I understand the system to work.]  The technique does not provide dataflow info.  The authors claim that the method is "efficient, fully automatic, and smoothly integrates pointer aliasing and higher-order functions."  They illustrate the tool by applying it to a 10+ KLOC C program.

Assessment:  I like the approach.  I particularly like that the authors take the time to present and discuss the platform and performance issues -- that is, they convince me that the technique is reasonably scalable and practically useful.

Three further observations:

(1) The notion of an underlying representation of data values is reasonably intuitive, but I would have been happier if the paper had spelled it out explicitly.

(2) A number of the systems discussed in the Introduction and Related Work are ones that we have looked at.  Wish I had time to follow through in detail a comparison of the systems -- problem domains, relative strengths and weaknesses, etc.  A useful summary slide for the next time the course is taught?

(3) Related to the last point, the tool sounds both elegant and practical.  But, is this the tool that you would reach for? -- even when it is more fully developed.  The flag was raised on this question when I read, "[LCLint's] checks incorporate more information, such as flow-sensitive dataflow analysis, so they will catch many errors that we cannot.  The two tools are complementary." (emphasis added)  I guess this is an empirical question: you'd have to play with both to see the comparative usefulness -- but the time to do that playing was the original concern that I raised.

Evans.  1996.  "Static Detection of Dynamic Memory Errors."

Key Concept: interface annotations

Summary: Describes an extension of the LCLint static checking tool to handle memory allocation issues more effectively.  (LCLint is lint++.)  The strategy is to use interface annotations which are similar to C type qualifiers.  Constraining annotations handle: permissibility of null-valued pointers; undefined, partially defined, completely defined variables; allocation/deallocation obligations; aliasing issues.  Paper reports annotation of an existing program and no experience with new programs.  Not all bugs were detected in the case study -- as detected.

Assessment:  This paper reports on a thoroughly pragmatic tool extension.  It works.  It adds value.  It scales.  No theoretical antecedents, references, rationale, or conclusions are offered.  No apologies.  The reader can, of course, attempt to generalize or formalize the approach.  The author does not.  If you can, and choose to, use LCLint, the extension is probably worthwhile.

The paper was most useful to me as a quick refresher on memory bugs.