CSE 374, Lecture 18: Testing

Writing correct code

A good approach to writing correct code can be represented by these ordered steps:

Choose the right language. If possible, choose languages that prevent certain types of bugs. For example, if you don't need the lower-level performance or control, pick a language like Java rather than C to avoid memory-related bugs.
Think before you code. Avoid writing bugs by understanding how the program will work before going and implementing it. Draw out any data structures and how you will modify them over the course of the program. Write pseudocode and consider all of the different cases you might encounter.
Make defects visible. Use "assert" statements and exceptions (if they exist in your language) to crash your program while developing if something is not valid.
Test the code. Ensure proper behavior by writing another program to exercise the code completely.
Debugging. As a last resort, if your program is not correct, go through the debugging process to find the issues. Examples of debugging include adding print statements, gdb, valgrind or other tools, or adding more test cases.

Today we'll discuss the fourth step: how to write tests to verify that your program is correct.

Some quotes

"Test your software or your users will."

Hunt & Thomas
The Pragmatic Programmer
------------------------------------------------
"There are two ways of constructing a software design:

One way is to make it so simple that there are obviously no deficiencies, and
the other way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult."

Sir C. A. R. Hoare
1980 Turing Award winner
Invented "quicksort"
------------------------------------------------
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

Brian Kernighan
Wrote THE BOOK on C (our book!)
------------------------------------------------
"Program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence."

Edsger Dijkstra
1972 Turing Award lecture

What is testing?

Testing is a systematic way to reveal errors in code, generally by writing another program to test the code. Testing is a very difficult problem in its own right, and its power is by nature limited:

You can only possibly test a relatively small number of inputs.
You can only test a small number of calling contexts, environments, compilers, etc
It requires more things to get right, and presents more opportunities for bugs
If you're testing your own code, you may have biases that prevent you from thinking of the right tests.

There is a perception in computer science that testing is a novice's job, that it is less illustrious and an afterthought. In my opinion, this is WRONG! Testing is a vital part of the software engineering process, and everyone should be capable of and responsible for writing tests.

Types of tests

Unit testing looks for errors in subsystems in isolation (where a "subsystem" means a function, a file, or a class). Unit testing has a number of benefits:
- Small number of things to test.
- Easier to find faults when errors occur
- Can test many components in parallel
Integration testing combines different subsystems or units of the code and tests the interactions between them. Integration tests can catch errors that unit tests will not surface, since while each subsystem may be correct separately, they may not work together properly.
Continuous integration testing is the process of running integration tests and unit tests on every commit to a shared repository in an automated fashion to ensure that nothing is broken.
Performance testing is about measuring the performance of a program: how much memory it takes, how fast or slow it is, and how much CPU (processing power) it requires. Performance tests can reveal errors in how the program's resources are managed - bottlenecks in the system.
Reliability testing is similar to performance testing, but it involves subjecting the program to a high and consistent level of work. This will expose bugs that might not show up during normal usage (for example, concurrency issues - which we'll talk about in a few weeks).
Security testing is about making sure that there are no vulnerabilities in the system that a bad actor could abuse.
UI testing simulates usage of an actual user interface in order to validate that it performs as expected. This is a higher level of testing that mimics how someone would actual use the program.

For the rest of this lecture, we'll focus mainly on unit tests, since they are closest to the kind of programming that we've been doing.

Coverage

Unit tests often seek to thoroughly exercise a piece of code by attempting to provide complete "coverage" over that code. There are several types of "coverage":

Statement coverage is the fraction of lines of code from the original program that are executed in test cases.
Branch coverage is the fraction of if/else branches that are covered (ie is there a test that goes into the if-branch and a second test that goes into the else-branch).
Path coverage is the fraction of all combinations of branches that are executed (ie if there are two if/else statements, then you should have at least 4 test cases - one that goes into both the first and second if, one that goes into the first if but the second else, one that goes into the first else but the second if, and one that goes into both else branches).

An example

This code is supposed to compute something resembling C's "a or b" function. Remember that in C, 0 is false and all other integers are true. How do we test it? How many tests do we need? What kinds of tests should they be?

    int f(int a, int b) {
        int ans = 0;
        if (a) {
            ans += a;
        }
        if (b) {
            ans += b;
        }
        return ans;
    }

Let's consider the different types of coverage in order to come up with some test cases.

Statement coverage: f(1,1) is sufficient - that exercises every single line in the function.
Branch coverage: f(1,1) and f(0,0) are sufficient - they exercise both entering and not-entering both if statements.
Path coverage: f(0,0), f(1,0), f(0,1), and f(1,1) are sufficient - this exercises all combinations of entering and not-entering the if statements.

But even the example path-coverage test suite suggests f is a correct "or" function for C; it is not! There are interactions between cases that we haven't considered. We've also forgotten the possibility of negative integers! When testing, coverage is an important thing to consider, but 100% coverage does NOT mean that your program is fully tested.

In this example, f(-1, 1) would show that our function is not correct.

How could we write an actual test program? There are a number of frameworks that you can use to help you unit test in C, and you are encouraged to explore them, but the basic form of unit testing is using "assert" statements like the following:

    #include <assert.h>
    #include <stdlib.h>

    #include "f.h"

    // Assert statements will fail with a message if not true.
    int main(int argc, char** argv) {
      // Test case 1: f(0,0) => 0
      assert(!f(0, 0));

      // Test case 2: f(0,1) => not-0
      assert(f(0, 1));

      // Test case 3: f(1,0) => not-0
      assert(f(1, 0));

      // Test case 4: f(1,1) => not-0
      assert(f(1,1));

      // Test case 5: f(-1,1) => not-0
      assert(f(-1,1));
      return EXIT_SUCCESS;
    }

Black box testing

The exercise we just went through is an example of what is called "white-box" testing, in which you write a unit test while looking at the implementation of the function that you want to test. There is another type of test called a "black-box" test which is when you write a unit test WITHOUT looking at its implementation.

The pros of black-box testing? You probably won't make the same mistakes as the implementation and you'll think independently in terms of the interface, not the details of the code. However, you might miss some weird internal cases that really should be checked. Conversely, the pros of white-box testing are that you can be efficient and find the corner-cases of the implementation easily, but you can also be biased by assumptions that you make in the implementation and don't think to check (such as negative inputs in our example above).

In either case, you should think about edge cases and come up with tests to exercise those, like loop boundaries, "special constants", max values, empty/full data structures, etc.

As an exercise, what tests might you write for this sample function?

    // Sorts the values in the given array from
    // low to high.
    void sort(int* arr, int length);

Some possibilities:

    Test Input                 Array after the call    Why?
    sort([1, 5, 3, 2], 4)      [1, 2, 3, 5]            basic case
    sort([1, 5, 3, 2], 0)      (same)                  length 0
    sort(NULL, 2)              (no crash)              null input
    sort([1, 2, 2, 5, 3], 5)   [1, 2, 2, 3, 5]         duplicates
    sort([ 1, 3, 2 ], -3)      (same)                  negative length
    sort([-1, 5, -3, 2], 4)    [-3, -1, 2, 5]          negative values
    sort([1, 2, 3, 5], 4)      (same)                  already sorted
    sort([4], 1)               (same)                  length 1 array
    sort([really long], long)  (sorted)                really long array

Stubbing

What if your file A depends on some other functions in file B, but you want to test a function in file A without depending on file B? Take the following example:

We have a program to curve students' grades. In file db.h/db.c, we have code that will read a student's grade from the database and save the grade for a student back to the database. In file curve.c/curve.h, we have a function to curve students' existing grades by a certain number of points. In this situation, curve.c depends on db.h in order to get the grades to curve and save the results.

db.h:

    /**
     * Performs database operations on the student grade
     * database.
     */
    int getGradeForStudent(int studentId, int hwNum);
    void saveStudentGrade(int studentId, int hwNum, int grade);

curve.h:

    /**
     * Curves the grades of all provided students for the
     * given homework by the given number of points, capped
     * at 100 points.
     */
    void curve(int* allStudentIds,
               int numStudents,
               int hwNum,
               int numPoints);

curve.c:

    #include "curve.h"
    #include "db.h"

    void curve(int* allStudentIds,
               int numStudents,
               int hwNum,
               int numPoints) {
      for (int i = 0; i < numStudents; i++) {
        int studentId = allStudentIds[i];
        int currentGrade = getGradeForStudent(studentId, hwNum);
        currentGrade += numPoints;
        if (currentGrade > 100) {
          currentGrade = 100;
        }
        saveStudentGrade(studentId, hwNum, currentGrade);
      }
    }

If we wanted to test the curve function, we quickly run into a problem - we don't actually want to modify real students or real grades in the database! We want to write a test that doesn't actually modify the database, or rely on any database operations.

To accomplish this, we will do something known as "stubbing": we will provide a FAKE IMPLEMENTATION of the functions described in db.h that work well enough for the tests. This implementation should be as small as possible. We call these fake implementations "stubs", and they can be saved in a DIFFERENT FILE from db.c; when we compile the test program, if we include the stub implementation instead of db.c, the stubs will be used instead of the actual db.c implementation.

Example: a test file for curve.c:

    #include <assert.h>
    #include <stdlib.h>

    #include "curve.h"
    #include "db.h"

    // Stub for getGradeForStudent - counts the total number of
    // times that it is called.
    int numGradeCalls = 0;
    int getGradeForStudent(int studentId, int hwNum) {
      numGradeCalls++;
      return 0;
    }

    // Stub for saveStudentGrade does nothing.
    void saveStudentGrade(int studentId, int hwNum, int grade) {}

    int main(int argc, char** argv) {
      int students[] = { 1, 2, 3, 4, 5 };

      // Test case - 5 students, getGradeForStudent should be called
      // 5 times.
      curve(students, 5, 1, 12);
      assert(numGradeCalls == 5);

      return EXIT_SUCCESS;
    }

To compile, we can use:

    $ gcc -Wall -std=c11 -g -o test test.c curve.c  # DOESN'T INCLUDE db.c

You should use stubs if the stubbed code doesn't exist, is buggy, is large and slow, or has side effects (like saving to a database) that are undesireable. There are unit testing frameworks that provide more structured and easy stubbing support, like JUnit for Java - take advantage of these frameworks where they make sense. Some suggestions for what code to put into a stub:

Instead of computing a function, use a small table of pre-encoded answers.
Return wrong answers that won't mess up what you're testing.
Don't do things (eg, print) that won't be missed.
Use a slower algorithm

Testing rules of thumb

Make tests early - even before writing any code of the actual program.
Make tests easy to run - add one or more targets to your Makefile.
One thing per test (keep each test small).
Tests should be independent - the order in which they are run shouldn't matter.
Tests should be commented, so you know what they test.
Rerun tests all the time, and especially before commit.
Write more code than the assignment requires so that you ensure the program works properly.

Debugging

Debugging is not testing! It is not a systematic approach to validating that your program is correct; it is a set of methods for determining what went wrong when something did go wrong. Adding tests can be part of your debugging method, however. Treat debugging as a scientific experiment:

Observation: I see a problem when I do ...
Hypothesis: The problem is because ...
Experiment: Design tests to verify hypothesis.
- Not verified? Start over with a new hypothesis
- Verified? Bug found! Fix it, test it, and add the test that demonstrated the bug to your collection
Alternative experiments: gdb, print debugging, etc.

When you find and fix a bug, it is always recommended that you add a test that would have caught the bug, so that you don't make that mistake again.

Summary

Testing has some concepts worth knowing and using:

Types of tests
Unit test coverage
White-box vs black-box
Stubbing