In-class exercise: Improving test suites via mutation

High-level goal

The high-level goals of this exercise are to (1) learn about mutation testing and (2) reason about test-goal utility.

Background

Mutation testing is a way of evaluating a test suite. It tells you how good your test suite is, and it helps you improve your test suite. First, some terminology:

A “mutant” is a slight variation of a program, for instance changing + to - or changing 1 to -1. For examples, run in this repository:

show_mutant.sh 54
show_mutant.sh 101

When a program is mutated, the mutant might be:

“equivalent”: This means that the mutant is equivalent to the original program. The mutant always behaves exactly like the original program did, even though its source code differs. For example, a change from b = a; x = a + b; to b = a; x = a + a;.
“non-equivalent”: There exists some input such that the mutant’s behavior differs from the original program.

Because a mutant’s interface (the structure of its inputs and outputs) is the same as the original program, every test case for the original program is a test case for the mutant. To evaluate a test suite, run it with many mutants. A mutant is “detected” or “discovered” or “killed” if the mutant fails the test suite. Otherwise, the mutant is “live”. A mutant that is equivalent will always be live. A mutant is “covered” if it is executed by the test suite. Every uncovered mutant is live.

If a mutant is live but not equivalent, that indicates a problem with the test suite: there is a small change to the program that introduces a defect that the test suite does not detect. The mutation score is the proportion of mutants that were killed. For example, if the test suite failed for 65 out of 100 mutants, then the mutation score of the test suite is .65. Higher numbers are better. A perfect score is 1.0, in which case we say the test suite is “mutation-adequate”. (All of these terms are with respect to a particular set of mutants, but that set is usually left implicit.)

If you augment a test suite and its mutation score goes up, then the augmented test suite is better than the original, because it detects more defects. (A mutation score of 1.0 does not guarantee that the test suite will find all defects, though.)

Setup

Team up in groups of size 2.
Assign yourself to the correct (in-class-3-testing) group on Canvas. (You may work and submit alone, but you must still self-assign to a group on Canvas!)
Use a Unix environment or Git bash on Windows for this exercise. Make sure a Java 8+ JDK and Git are installed. The required software is already installed on attu.cs.washington.edu, if you prefer to do the exercise there.
Clone the following git repository and read its README.md file: https://bitbucket.org/rjust/mutation
Test your setup: compile and test the Triangle program.
Run mutation.sh. The last line printed should start with: Live mutants: 2 3 4 7 8 9 ...

You may use an experimental IntelliJ plugin for this exercise. See the detailed instructions for more details.

Instructions

Read the entire assignment and ask any clarifying questions that you might have.
Run mutation.sh and note the number of covered and detected mutants (see Question 1 below). Note that Major (the mutation tool) refers to “detected mutants” as “killed mutants”.
Add tests to testTable in file test/triangle/TriangleTest.java to satisfy mutation adequacy – that is, until your test suite detects all non-equivalent mutants:
1. Select a live mutant (for which you have not proven equivalence) for analysis.
2. Examine its source code (maybe by running show_mutant.shID) to determine whether it is equivalent.
  - If it is equivalent, provide an argument to establish that fact.
  - If it is not equivalent, write a test that the original program passes but the mutant fails. Run ./gradlew clean test after adding the new test to ensure that it passes on the original program.
3. Run mutation.sh and continue with step a.
You may find the show_mutant.sh script useful for reasoning about a mutant. For example, you can run: show_mutant.sh 45

Note that you will likely observe certain patterns (i.e., similar mutants requiring similar tests) because of the systematic mutation of the source code – adding multiple tests at once may speed up your testing process. Likewise, some mutants are easier to resolve than others – triaging the set of live mutants and selecting mutants out of order may speed up your testing process.

If you get stuck on a mutant, take notes, move on, and revisit unresolved mutants later.
Disable (comment out) the assertEquals statement on line 45 in the testTriangle method and run ./gradlew jacocoTestReport and mutation.sh. Note the code coverage ratio(s) and mutation score (see Question 4 below).

Questions

How many mutants does the initial test suite (1) cover and (2) detect (result from step 1 in the instructions)?
How many mutants are equivalent (to the original program)? Justify your judgments by providing a proof or argument for each equivalent mutant. You do not need to provide a formal proof, but you should demonstrate proper reasoning and provide a valid argument. You may group equivalent mutants in your answer if the reason for equivalence is the same for all mutants in a group (e.g., mutants 4711, 4712, 4713 are equivalent because they exist in dead code).
Were any of the generated mutants unproductive? Briefly explain your answer.
What changes in code coverage ratio and mutation score did you observe after disabling the assertEquals statement in the testTriangle method? What are the implications for using the code coverage ratio as an adequacy criterion?

Deliverables

A plain-text file with your answers to the four questions above. Please list all group members.
Your mutation-adequate TriangleTest.java test suite.
Your <timestamp>.csv file, if you used the IntelliJ plugin.

Steps for turn-in

One team member should upload the deliverables to Canvas, via the Canvas submission site.

Hints

It is possible to write test cases that detect (“kill”) every mutant. If you are not able to create tests for a few of the mutants, don’t sweat it. Turn in whath you have.