PATHS:  path profiling and achieving path coverage

The DART and path profiling papers are assigned in the same lecture because they are both about paths.

 * profiling paths
 * covering paths
 * uses for paths
    * optimization
    * test coverage metric
    * fault localization (example: Y2K bug)
    * prioritize effort
   a general theme is comparing the profiles produced in two different scenarios.  For example:
    * different programs, same input (fault localization)
    * different input, same program (test coverage, fault localization)
    * different environment, same program and input (Y2K)

Teaser:
Test quality metrics include "structural metrics" that have to do with the structure of the program.
 * line coverage
 * branch coverage
 * decision coverage
 * path coverage
Why is path any better than the others?  Yes, it's more fine-grained and thus it leads to bigger test suites (and a bigger denominator when you compute the coverage value), but why do we believe that it would be better?
It aims to capture the correlation of distant information that would not be obvious from more local metrics.
Example:  After the year 2000, maybe there is someone who is both a school pupil and eligible for social security (age 10 in one computation and 90 in another).
Beforehand, that never happened, even though there were 10-year-olds and 90-year-olds.

===========================================================================

Teaser: what is a path?

Sequence of dynamically executed instructions
There are infinitely many of them

Teaser:

We often think of path profiling as a task that is mainly useful for compiler optimization.
 * Lay out the most frequently-executed paths so that they require taking fewer branches (avoid the cost of a branch).
 * Copy them into a big block and do optimizations that might degrade other branches/paths, such as keeping values in registers or moving instructions from one block to another.

How can path profiling be useful for software engineering tasks?
Here are some examples:
 * How can path profiling be used in determining test suite quality?
 * How can path profiling be used in detecting errors such as the Y2K bug where a date is stored only as 2 digits instead of 4?

----------------

Teaser:
Give a 1-sentence description of what DART does, from a user point of view.  That is, how would you describe it to a programmer who doesn't care about its implementation details?

Give the DART algorithm at a high level.  (You can do this as sequence of 4 or so steps, each explained by a sentence.)

===========================================================================
===========================================================================
===========================================================================

Path profiling

Teaser:
Give the path profiling algorithm at a high level.  (You can do this as sequence of 4 or so steps, each explained by a sentence.)

Note the slightly non-standard definition of a path.  This paper computes *loop-free* paths.  There are a finite number of them, which there aren't of all paths.  How does this limit the usefulness or applicability of the work?

Key idea:
How are paths represented?  As a number instead of as a list of statements.

Algorithm:
1. count number of paths
2. give each one a unique number.  It will be computed in a global register that is initialized to 0.
    * bottom-up algorithm:  arrange that all paths from here forward compute a unique number.
    * if this is not a branch, then don't change the number; it's already correct (if there is only one path from here to the exit, it gets implicitly numbered 0).
    * if a branch has m choices on one side and n on the other, then the downstream branches already assign the values 0..m-1 and 0..n-1.  This branch needs to assign 0..m+n-1, so do so by adding m whenever the branch takes the false branch.


===========================================================================
===========================================================================
===========================================================================

Concolic testing = "whitebox fuzzing"
Should be called "concolic execution"; testing is one application

DART paper by Patrice Godefroid, Nils Klarlund, and Koushik Sen.

----------------

Teaser:
Give a 1-sentence description of what DART does, from a user point of view.  That is, how would you describe it to a programmer who doesn't care about its implementation details?

Give the DART algorithm at a high level.  (You can do this as sequence of 4 or so steps, each explained by a sentence.)

----------------

A good way to understand it is to discuss each of:
 * Directed
 * Automated
 * Random
 * Testing

The randomly-generated state may not be sensible.

----------------

The goal of DART is to create test inputs (not tests!), with the aim of achieving complete branch coverage.

DART uses implicit oracles:  it reports program crashes, which can be due to segmentation faults or to assertions in the program that fail.

Here is a naive algorithm:

For every possible path p:
 * determine its path condition
   (the sequence of if tests, and their outcomes, that causes this path to be taken).
 * solve that path condition with a solver, yielding an input that executes the path

This algorithm yields a set of inputs that, collectively, executes every path

[This could be a teaser.]
What could go wrong?
 * there may be infeasible paths -- the path condition has no solution
 * the solver may not be powerful enough to solve the path condition
 
DART's key idea is to do a hybrid concrete-symbolic analysis:
If the solver fails, then use data from a concrete execution in its place.

Given a concrete execution, DART tries to create another concrete execution that is similar, but not identical, to it.

DART algorithm:
 * run the program once (randomly, or using an existing test or historical input)
 * collect the path condition for the specific path that was executed
 * for each condition c in the path condition, try to find a new test input that takes the branch in the opposite direction
    * create a new path condition that negates c and discards everything after c
    * try to solve the new path condition
    * if some parts of the new path condition are not solvable, replace them by values from the concrete execution
       * this simplified condition may be unsolvable/infeasible even if the original condition was not
       
       
What are the pros and cons of the new test input being similar to an existing execution?
 + it is more likely to be realistic, if the original input was realistic
 + it is more likely to be feasible, since all but the last condition is definitely feasible
 + it can be used for fault localization:  there are two inputs (which may or might not be similar, but that execute similar paths)
 - it provides less new coverage
    * same amount of path coverage: 1 new path
    * probably less of other types of coverage, compared to an arbitrary new path

===========================================================================