Improving test suites via mutation

Mutation testing is a way of evaluating a test suite. It tells you how good your test suite is, and it helps you improve your test suite. First, some terminology:

A “mutant” is a slight variation of a program, for instance changing + to - or changing 1 to 0. For examples, run in this repository:

show_mutant.sh 54
show_mutant.sh 101

When a program is mutated, the mutant might be:

“equivalent”: This means that the mutant is equivalent to the original program. The mutant always behaves exactly like the original program did, even though its source code differs. For example, a change from b = a; x = a + **b**; to b = a; x = a + **a**;.
“non-equivalent”: There exists some input such that the mutant’s behavior differs from the original program.

Because a mutant’s interface (the structure of its inputs and outputs) is the same as the original program, every test case for the original program is a test case for the mutant. To evaluate a test suite, run it with many mutants. A mutant is “detected” or “discovered” or “killed” if the mutant fails the test suite. Otherwise, the mutant is “live”. A mutant that is equivalent will always be live. A mutant is “covered” if it is executed by the test suite. Every uncovered mutant is live.

If a mutant is live but not equivalent, that indicates a problem with the test suite: there exists a small change to the program (namely, the mutation) that introduces a defect that the test suite does not detect. The mutation score is the proportion of mutants that were killed. For example, if the test suite failed for 65 out of 100 mutants, then the mutation score of the test suite is .65. Higher numbers are better. A perfect score is 1.0, in which case we say the test suite is “mutation-adequate”. (All of these terms are with respect to a particular set of mutants, but that set is usually left implicit.)

If you augment a test suite and its mutation score goes up, then the augmented test suite is better than the original, because it detects more defects. (A mutation score of 1.0 does not guarantee that the test suite will find all defects, though.)