=========================================================================== Delta debugging * input and output * guarantee * run time * assumptions Test outcomes =========================================================================== Recall that a test case consists of * input * oracle Example: input = file, oracle tests whether output is same as a goal file Example: input = sequence of calls, oracle = assert statement =========================================================================== The goal of delta debugging is to reduce the amount of the program that you have to look at. * coverage * slicing static vs dynamic forward vs backward * version control history (filtering the part of the program to examine, which is what we really care about even when minimizing the input) * stack trace * fault localization * compute how often each line is: * covered by passing tests -- more means line is less suspicious * covered by failing tests -- more means line is more suspicious and combine to rank every line by suspiciousness =========================================================================== Delta debugging ---------------- Formulation 1: Input: * program * failing test case Output * failing test case that is as small as possible Formulation 2: Input: passing test case * The algorithm works with any test case whatsoever. Why is it important to use a passing test case, in terms of the problem that the programmer is trying to solve? failing test case program Output: * test case that fails in the same way as the input failing test case did, and that is similar to the input passing test case * alternate output: pair that are close to one another, and "passing" passes whereas "failing" fails. Notion of minimizing the diff as opposed to the program. ---------------- Starting point: an input c on which your test fails. Goal: find minimal test case that yields the *same* failure. Two quite different "tests". The first was for correctness of your program. Possible outcomes: success failure (We won't discuss this any more.) From the starting point, we assume that originalsuite(empty_input) = success originalsuite(c) = failure The second was for yielding the same result as a given failure. Call this "test". Three possible outcomes: success: check passes the original test failure: x fails the original test, in the same way indeterminate: ? fails the original test, in a different way (such as "invalid input") I presented this by presenting a specific example and asking for debugging ideas from the floor. Then by asking for suggestions for the algorithm. This was somewhat of a success. A trivial algorithm: Try every subset of c, and output the smallest one satisfying the criterion. Problem: far too many subsets to try. Delta debugging: Input: failing test case c of size s Split the input c into p parts c1, ..., cp, each of size s/p If some part fails (test(ci) = failure), use it: return dd(ci) // size = s/p, number of parts = 1 If some complement fails (test(\overbar{ci}) = failure), use it return dd(\overbar{ci}) // size = s-(s/p), number of parts = p-1 Else increase granularity return dd(c) // size = s, number of parts = 2p Else done What guarantee do you get? The result is 1-minimal. This is a sort of local minimum, but definitely not a global minimum. c is n-minimal if forall c' subset s : (|c| - |c'| \le n => test(c') != failure) What is the worst-case running time? 3 |c| + |c|^2 tests * all inconsistent until n = |c| * last complement fails When is this applicable? When does it fail to remove elements that could be? When the partition function splits two elements that are related to one another. What are the assumptions? How reasonable/realistic are they? When is it incomplete? * input can be divided into parts (eg, via parsing) * input parts are not independent * input parts are somewhat independent (no global length/checksum, unless your parser recreates it) * deterministic and reproducible test failures (not "flaky") * can distinguish failures from one another You can use delta debugging to: * make two things more similar * make one thing smaller (more similar to an empty thing) Other applications: You can use delta debugging on: * inputs: Minimal difference between succeeding and failing input * programs: Apply to the program, not to the input. * program states ---- The key idea of Delta Debugging is to find a small difference. That could be: * zero -- small * original -- slightly different * something in between The paper applies Delta Debugging to inputs. Delta Debugging has also been applied to * data on the heap * program source code * problems: * comments are elided or are out of date; you have to reverse-engineer 1 or even 2 new programs * it may be non-trivial to transplant a fix for the minimized program into the original program (might need to maintain extra state, for example) * more effective on test cases than on full program text For example, the latter could be... In practice, Delta Debugging is rarely used on program source code. Give three reasons. Give reasons that are as different from one as possible. Give the best reasons you can think of. [A not very good justification is that you'll get something you don't understand. It's a subset of something you do understand, and [You need a specialized parser -- working by characters or lines doesn't work. Even a specialized parser doesn't work so well, because you cannot remove a declaration of a variable without removing its uses.] [A reason is that you can use version control history to minimize, in most cases. It's not addressing the real problem.] ===========================================================================