Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
[This is a laundry list of potential uses for invariants.  I'm trying to motivate why one would care about invariants.] Better programs when invariants are used in their design.  But here I will focus on uses of invariants in already-constructed programs.
Documentation is guaranteed to be up-to-date; for assert, is machine-checkable.
Invariants established at one point may be depended upon elsewhere.  This was the original motivation for the system.  There’s evidence that maintenance introduces many bugs (and anecdotal evidence that some are due to violating invariants).
The next four are arguably more useful with dynamic than with static invariants:
Unusual (exceptional) conditions may indicate bugs or special cases that should be brought to a programmer’s attention.  (But finding bugs isn’t my focus or what I consider the tool’s strength, though it was effective at that task.  I would rather prevent a bug than have to find it any day.) Value coverage may be inadequate even if a test suite covers every line of a program.  [Example:  only small values.] Use invariants like profile-directed compilation, but at a higher and possibly more valuable level (cf “Value Profiling and Optimization” by Calder et al). Bootstrap proofs:  or, formally prove the invariants.  In theorem-proving, many consider it harder to determine what to prove than to actually do the proof.  [Don’t say “’turn the crank”; it belittles theorem-proving.]  [If you prove the property, the system becomes sound:  but probably don’t raise this red flag yet.]
Now, I’ll try to convince you that this is at least possible.
Say “at a program point.”
…or in formal specifications.
(The last three are not all that likely in an assert statement…)
Segue:  why would we want to find them?  Now I will motivate that.
[Don’t go too fast.]
Important:  Daikon successfully recovered the formal specifications from all the formally-specified programs in the first two chapters (chapters 14 and 15) that contain such programs.  Conclusion:  I succeeded.
The point of using these programs to test the tool is to know what the result should be:  I have a gold standard for validating the output.
This is an example simple enough to fit on a slide.
Emphasize that the precondition, postcondition, and loop invariant come from Gries; it’s our goal to discover them.
[Don’t gloss over this too quickly.]
[Don’t talk about adequacy; or don’t talk about it much yet.  It’s premature.]
[Do not hedge, here or elsewhere.  Be specific and straightforward.]
This is the result from experiment 1; make that clear.
This is actual, complete output from the system.  (Everything else you will see is also actual output.)
Precondition: n ³ 0
Postcondition: s = (Sj: 0 £ j < n : b[j])
Extra properties reported beyond Gries’s invariants give information about the data set.  (If data set is sufficiently broad, the resulting invariants are info about the function in isolation, or about the program.)
Conclusion:  the system is accurate.
But it doesn’t matter how accurate the results are if they aren’t good for anything.  (Now I’ll address that issue.)
Skip this slide.  It’s here as a backup for questions if necessary.
Loop invariant: 0 £ i £ n and s = (Sj: 0 £ j < i : b[j])
This is realistic code.
The program has been used in testing research, so it comes with a good test suite.
Be explicit:
2 programmers in the study
they were my co-authors -- not me.
I provided the Daikon output, then was completely hands off
[Maybe mention that more details are available in my ICSE 99 paper.]
The program author didn’t waste any of those 563 lines on comments.
I decided that adding Kleene + was a useful and realistic task.
I provided the programmers  with the detected invariants, and also permitted them their usual debugging and code understanding tools.  (They often found themselves using those instead of the invariants:  invariants aren’t the panacea for all software engineering tasks.) “Traditional tools”:  Be specific about what the programmers did:  eyeballed the code; grep; search.  (Ran the program?)
[Use “they”, never “we”.]
[Don’t go too fast or assume audience understands the code.]
The point here is that invariants are useful and helpful.  They clarify the code. These are insights obtained by the programmers from the invariants without any interaction with me.
Two forms of regular expression:  raw and compiled.
[Say the function name first (and clarify that it is a function)]
Anticipated lj < j because thought that “lj” might stand for “last j”, which would be less.
[Do not mention yet how small the subset of the test suite was.]
Maintainer expectations:  Indeed, lastj < j; however, no relation between lj and j, so they avoided introducing a bug Bug detection:  unexpected invariant revealed loop might not be entered (they expected it to always be), which was an uncaught array bounds error
[Clearly state the conclusion, not just the cause (which is on the slide).  Maybe put the conclusion on the slide, too.]
Limited procedure use: makepat has 5 parameters, 2 of which were always 0.  The programmers reported that they could perform “mental partial evaluation” rather than coping with its full generality.  Conclusion:  easier to understand.
“Test suite inadequacy” conclusion:  a test case in which the numbers of calls (in which in_set_2 is not called from stclose) should be added.
(only used 300 of 5500 test cases in this particular experiment).
Test suite:  in_set_2 never returned false because calls(in_set_2) == calls(stclose).  We only used 100 of the 5542 test cases supplied w/the program.
Conclusion of invariant diff:  changes were correct.
These are at the exits of the two routines.
Changes in invariants when run using the modified program (on original and modified test suite):  e.g., stclose increased length by 1; plclose increases length by at least 2.
[Do not drop into “we”; keep me separate from the programmers, who made these judgments on their own.]
Make it clear that this is all good.  “Invariants had a number of positive uses” or some such.
Leading programmers to think in terms of invariants:  this is a “guerilla approach” to introducing invariants to programmers.
The useful components were used by the programmers in the study.
The useful components are beyond invariant detection.
Use the word “serendipitous”.  That’s the point:  if you know exactly what you are looking for, or you want to verify a specific property, you are unlikely to get as much from Daikon, which merely automates your task rather than providing entirely new information and functionality.  We want to raise the programmer’s suspicions and draw his/her attention to code or facts that might have otherwise been overlooked.  There are things you expect but didn’t write down (and didn’t want to be bothered to write down).
Seven of these ten experiments are suites of 5-200 programs, each of the indicated size.
If I had time, I could tell, about any of these ten experiments, a story similar to the ones that I have just told you.
[Joke] I have heard that some programs lack exhaustively-specified invariants; we’ll assume for the rest of the talk that we have such invariants.
Claim:  programmers have these in mind, consciously or unconsciously, when they write code.  We want access to that hidden part of the design space.
We might want to check/refine/strengthen programmer-supplied invariants, and the program may be modified/maintained.  (Leveson has noted that program self-checks are often ineffective.)
May want to check/strengthen programmer-supplied assertions.
Static analysis:
For instance, abstract interpretation.
The results of a conservative, sound analysis are guaranteed to be true.
Static techniques can’t report true but undecidable properties.  (or properties of a program context) Pointers:  problematic to represent program state and the heap; must approximate, resulting in too-weak results.
Dynamic techniques are complementary to static ones.
[No setup needed; dive right into this.]
[Remain excited!  (Here and on next slide.)]
All of these steps are automatic.
Instrumenters are source-to-source translators.
Define the front end here, or say “instrumenter” instead of “front end”.  Possibly define back end here; but I think I don’t use that term.
About 20,000 lines of code in about 100 files.
That doesn’t include 5,000 lines of general-purpose utilities I wrote for this project, or the considerably larger amount of others’ code that I was able to take advantage of.  Nor does it include 9,000 lines of additions to Jikes for the Java front end.  (Some of that has since been folded back into the main distribution.)
C/C++ front end based on EDG
Java front end based on Jikes
The list of tools doesn’t include some stuff like regular expressions and getopt, nor does it include my own utilities.
[Don’t be too harsh in the comment about test suites here.]
If you don’t have a test suite -- that is, if you can’t run your program or you never have -- then you are likely to have a number of other problems that you should solve first using standard techniques.  (Another way of saying this is that you have to be able to run your program on some inputs.) I  not sure of the exact properties that make a test suite good for invariant detection.  We want good value coverage.  A test suite that strives for efficiency by executing each statement a minimal number of times would be bad for invariant detection:  Daikon needs multiple executions as a basis for generalization so that there is statistical support for its inferences Test suite:  I did experiments regarding test suite size and noticed a knee in the curve comparing Daikon output; also not so sensitive to the exact cases in the suite so long as it was large enough.  (I found these experiments somewhat surprising.) Like any dynamic analysis such as testing, there is no guarantee of completeness or soundness.
Explain why it’s
 not complete:  infinitely many potential invariants
 not sound: no guarantee that test suite fully characterizes all executions:  the fact that something was true for the first 10,000 executions doesn’t guarantee that it will be true on the next one.  However, it is sound on the training data (all the data presented to it).
It may not be complete or sound, but it is useful, which is more important.
Useful nonetheless:  consider testing, Purify, PREfix (an unsound static tool recently bought by Microsoft for $60 million).  PREfix is symbolic execution, not “abstract interpretation”.
This is the set of templates for invariants.
Be specific about sequence invariants:  for instance, determining that the elements of an array are sorted or that a particular element is always a member of a collection.
How I came up with these invariants:
Made up invariants that seemed basic and natural, based on my experience as a programmer
Analyzed some programs and added invariants that appeared in them and which I believed to be sufficiently general to appear in, and be helpful for understanding, other programs as well
[This is not so different from the way that axioms/lemmas are added to theorem provers (is it?).]
Say how you add new invariants to the system.  The interface is simple:  4 methods instantiate, add_sample, format, confidence.  [But don’t apologize for the lack of a declarative notation.]
I hope that by iterating this process, I will approach a set of universally useful invariants.  There is evidence that I succeeded in obtaining a useful, basic set.  (replace, Weiss).
These can be checked efficiently; see next slide.
All three steps are inexpensive:  instantiation, checking, and number of checks (most checks fail quickly).
There are many invariants, but each one is cheap.
Can think of the property as a generalization from the first few values; then check the rest of the values. Example:  2 degrees of freedom in y=ax+b, so two (x,y) pairs uniquely determine a and b.  (I don’t actually do it that way for reasons of numerical stability, but that is the idea.)
Potentially cubic.  But the number of variables at a program point is generally small.  Cubic in number of variables really means linear in number of invariants checked; and it really means linear in number of invariants discovered, since invariants which are falsified tend to be falsified early, and only a few invariants are checked for each
Each instrumented program point is processed independently.  The choice of program points examined is made by the front end.  The program points currently examined are procedure entries, exits, and in some cases loop heads.  User can suppress arbitrary classes, functions, etc.  [but can’t add arbitrary new instrumentation points].
The point of this is to let the programmer estimate runtime based on factors under his or her control. The claims of this slide are supported both algorithmically (by the structure of the code) and experimentally (by a set of experiments).
For test suite size, the O(n) constant is small.
“A few minutes”:  10,000 samples for 70 variables.
In its current incarnation, the implementation is slow but usable.  There are a number of reasons to believe that it will get substantially faster.  (Omit the specific mention of absolute runtime?)
[Make it clear that this is all completed work; I have solved these problems.] A naive implementation wouldn’t (and didn’t) produce that exact output.  The simplified picture I have given you so far isn’t enough; it wouldn’t produce all the output you saw in the initial examples, and it would produce other undesirable output.  I will now give some additional details.
Discuss what relevance is.
Unused polymorphism:  overly wide types.
I will discuss each of these in turn.
Collectively/overall, how much improvement?  I can’t say, because these are not just quantitative but qualitative improvements.
[Do not say “expressions not appearing in the source”.]
[We saw some of these in the previous output.]
These are a few examples.
[Don’t say “false positive”; it has a technical meaning and people won’t know whether the false positive is regarding truth or relevance.]
The solution to “slowdown” is here on this slide.
The solution to “irrelevant invariants” will come soon.
[Slowly and clearly explain the example.]
Polymorphism available in the source code but not used during execution.
“In the interest of time”, skip this section.
[Slowly and clearly explain the example.]
Also works for subtype polymorphism!
Polymorphism available in the source code but not used during execution.
This is actual code, not a contrived example (actually it’s simplified).
(MyInteger is used instead of Integer because Java 1.1’s Integer doesn’t support the Comparable interface, but myInteger does (though I haven’t shown that on the slide).)
The Daikon implementation is designed to be simple, basic, and language-independent, to reduce errors and increase robustness.  As a result, Daikon has no access to run-time types, and in fact the only types seen by the engine are integers, strings, and sequences, for predeclared variables. The decision about what fields to output is made statically, based on what fields the object is known to contain. The essential idea is to rewrite the program with stricter (and more accurate) types.  We determine those stricter types by doing a dynamic analysis, of course! [The feedback mechanism isn’t completely automatic. The annotation is currently done by hand -- and intentionally so.  When the output may be inaccurate, I choose to put a human in the loop.]
Slide used to say “Annotate with refined type discovered by pass 1”.
Pass 2:  front end reads source comments, proceeds under the assumption that the object is actually of the specified type. Sound for deterministic programs:  the attempted type downcast (the attempt to access a specific field) will never fail.
But Daikon traps any such errors that occur anyway.
“Given three samples for x”:  a particular program point is executed just three times by a test suite.
These are true but not justified.
“Not the sort of conclusion you or I would draw.”
Statistical test over hypothesized distribution (are the hypothesized distributions meaningful?)
Much of the tool can be viewed as tests of hypothesized distributions.
x ¹ 0: say range is from -r/2 to r/2 (approximately); assumes uniform distribution.
[The hypothesized distribution is always:  the uniform distribution.  Don’t say that early in the slide, only mention uniform distribution when discussing range limit test.]
Recall that this is over multiple tests in a test suite, not a single test.
There is a test like this for every invariant.
This indicates the value of reporting the invariant, it is not a statistical test of the probability that it is really true over all possible executions.
This solves much of the problem, but some such undesirable invariants remain.
These were eliminated from the procedure entry and exit, but remain at the loop head.
These values (–556, 539, –88, 99) happened to be the extrema -- the largest and smallest values seen on this particular set of runs of the program.
A variable value contributes to invariant confidence at a program point if the variable was assigned to since the last time the program point was executed.
[Tell the concept here, not just the implementation.]
You can think of this as a dirty bit.
[No need to avoid the term “lvalue”.  Used to say “at each assignment, timestamp the lvalue”.]
Actually there is a bit per lvalue/program point pair.
[Used to say “if small test suite, eliminates desired invariants”; but don’t raise that red flag.]
Don’t check just whether the value is the same, but whether the variable holding that value was assigned to.
[Say “greater than or equal to”, not “greater than”.]
May be implied by one other invariant or by a collection of other invariants.
[Don’t say “tautologically equal”.]
Make it clear that all of this is a good thing.
First two are runtime improvements; third is UI improvement.
Equal to another variable.  EXAMPLE.
Nonsensical:  a[i] when i<0  (or when i>length(a) )
Staged inference and derivation is key to avoiding deriving variables; that is why this works.
All this is a substantial chunk of code.
All of the percentages are cumulative.  It’s not just a quantitative but a qualitative improvement; system wouldn’t run without staging.
Here are two examples of variables that shouldn’t be compared to one another.
“non-null pointer p”.
These are true and completely uninteresting.
Evaluated several solutions for limiting comparisons; here I will discuss two of them.
Repeat the birthyear/weight example here, for concreteness.
[This slide used to say “potentially miss serendipitous invariants”.  I don’t want to get into that here.  In any event, the technique is already incomplete, and this is a tradeoff between completeness and efficiency.]
Declared types are worse for larger procedures
Lackwit types are better for larger procedures
“Few differences”:  among any of the three techniques.
This is a success and a failure:
comparison numbers are new (and gratifyingly low)
runtime improvement
the system already does a good job of eliminating irrelevant invariants (Lackwit is largely, but not entirely, irrelevant) All the invariants eliminated by Lackwit were uninteresting (so it improved output a bit, but not a huge amount).
In this talk I will discuss three other types of invariant.
[Make it clear that this is past work.  Not “I will need to” but “I had to” or “I did”.]
Object/class invariants are well-formedness conditions for data structures.  They require examining multiple program points, not just one.  They are true at entry to and exit from each public method.
Pointers are a challenge only in the context of recursive data structures, where the instrumentation may have to traverse arbitrarily many links.  Otherwise we can just view a pointer as a record with two fields:  the address and the content.
Disjunctions are a variety of conditional, because implications and disjunctions are interconvertible
Goal is to incorporate this within the current infrastructure (with minimal change), subject to constraints on resources (primarily my time).
First five data structures from the text.
Found places where the “expected” invariant doesn’t hold.  For instance, one data structure constructor initialized an index to an out-of-bounds value that it never took on again, because the first operation was sure to increment it before using it.
Output includes both functional invariants and usage properties.
I didn’t tune the engine to this set of invariants -- just the invariants that were already in the engine. I did have to enhance the redundant invariant elimination rules (the logical implication tests) to avoid introducing many irrelevant invariants.
This is an actual invariant that Daikon detected.
Object invariants are checked at multiple program points.
This is an actual invariant that Daikon detected.
There are other invariants at that program point, but I don’t show them here.
“This is a complex but practical invariant.”
Explain it:  walk people through it.
Now I will demonstrate how Daikon detects invariants like the ones I just showed you.
[Don’t say “serialize”, which means something in Java; say “linearize”.]
[Don’t say “optimized”, say “designed”.]
No change to engine required
Need not be frequent:
 stop traversing the data structure when not of interest
 incrementally maintain
 backoff (less frequent checking once it has been satisfied many times)
 integrate with garbage collection
Such invariants require traversal; the same is true for arrays, but there the size is known or at least is generally not enormous.
In the typical case, there will be no significant differences between the invariants detected in the two parts of the data.
[Don’t be defensive about how simple my static analysis is.]
Say that this is a policy decision.
I chose to use a trivial static analysis:  boolean conditions and functions.
No sense doing something complicated until you have demonstrated that simple approaches are insufficient; and on simple examples, this was sufficient.  I will revisit this decision if necessary.