===========================================================================

Delta debugging
 * input and output
 * guarantee
 * run time
 * assumptions

Test outcomes

===========================================================================

Recall that a test case consists of
 * input
 * oracle
Example:  input = file, oracle tests whether output is same as a goal file
Example:  input = sequence of calls, oracle = assert statement

===========================================================================

The goal of delta debugging is to reduce the amount of the program that you have to look at.
 * coverage
 * slicing
    static vs dynamic
    forward vs backward
 * version control history
   (filtering the part of the program to examine, which is what we really care about even when minimizing the input)
 * stack trace
 * fault localization
    * compute how often each line is:
       * covered by passing tests -- more means line is less suspicious
       * covered by failing tests -- more means line is more suspicious
      and combine to rank every line by suspiciousness

===========================================================================

Delta debugging

----------------

Formulation 1:
Input:
 * program
 * failing test case
Output
 * failing test case that is as small as possible

Formulation 2:
Input:
  passing test case
   * The algorithm works with any test case whatsoever.
     Why is it important to use a passing test case, in terms of the problem that the programmer is trying to solve?
  failing test case
  program
Output:
 * test case that fails in the same way as the input failing test case did,
   and that is similar to the input passing test case
 * alternate output:  pair <passing, failing> that are close to one another, and "passing" passes whereas "failing" fails.


Notion of minimizing the diff as opposed to the program.

----------------

Starting point: an input c on which your test fails.
Goal: find minimal test case that yields the *same* failure.

Two quite different "tests".
The first was for correctness of your program.
  Possible outcomes:
    success
    failure
  (We won't discuss this any more.)
  From the starting point, we assume that
    originalsuite(empty_input) = success
    originalsuite(c) = failure
The second was for yielding the same result as a given failure.
  Call this "test".
  Three possible outcomes:
    success: check
      passes the original test
    failure: x
      fails the original test, in the same way
    indeterminate: ?
      fails the original test, in a different way (such as "invalid input")

I presented this by presenting a specific example and asking for debugging
ideas from the floor.
Then by asking for suggestions for the algorithm.
This was somewhat of a success.

A trivial algorithm:
  Try every subset of c, and output the smallest one satisfying the criterion.
  Problem:  far too many subsets to try.

Delta debugging:
  Input: failing test case c of size s
  Split the input c into p parts c1, ..., cp, each of size s/p
    If some part fails (test(ci) = failure), use it:
       return dd(ci)  // size = s/p, number of parts = 1
    If some complement fails (test(\overbar{ci}) = failure), use it
       return dd(\overbar{ci})  // size = s-(s/p), number of parts = p-1
    Else increase granularity
       return dd(c)  // size = s, number of parts = 2p
    Else done

What guarantee do you get?
  The result is 1-minimal.
  This is a sort of local minimum, but definitely not a global minimum.
  c is n-minimal if
    forall c' subset s : (|c| - |c'| \le n  =>  test(c') != failure)

What is the worst-case running time?
  3 |c| + |c|^2 tests
  * all inconsistent until n = |c|
  * last complement fails

When is this applicable?

When does it fail to remove elements that could be?
When the partition function splits two elements that are related to one
another.

What are the assumptions?  How reasonable/realistic are they?
When is it incomplete?
 * input can be divided into parts (eg, via parsing)
 * input parts are not independent
 * input parts are somewhat independent
    (no global length/checksum, unless your parser recreates it)
 * deterministic and reproducible test failures (not "flaky")
 * can distinguish failures from one another

You can use delta debugging to:
 * make two things more similar
 * make one thing smaller (more similar to an empty thing)

Other applications:  You can use delta debugging on:
 * inputs: Minimal difference between succeeding and failing input
 * programs: Apply to the program, not to the input.
 * program states

----

The key idea of Delta Debugging is to find a small difference.  That could be:
  * zero -- small
  * original -- slightly different
  * something in between

The paper applies Delta Debugging to inputs.
Delta Debugging has also been applied to
 * data on the heap
 * program source code
    * problems:
       * comments are elided or are out of date; you have to reverse-engineer 1 or even 2 new programs
       * it may be non-trivial to transplant a fix for the minimized program into the original program (might need to maintain extra state, for example)
    * more effective on test cases than on full program text
For example, the latter could be...
In practice, Delta Debugging is rarely used on program source code.
Give three reasons.  Give reasons that are as different from one as possible.  Give the best reasons you can think of.

[A not very good justification is that you'll get something you don't understand.  It's a subset of something you do understand, and 

[You need a specialized parser -- working by characters or lines doesn't work.  Even a specialized parser doesn't work so well, because you cannot remove a declaration of a variable without removing its uses.]
[A reason is that you can use version control history to minimize, in most cases.  It's not addressing the real problem.]

===========================================================================