CSE 374, Lecture 18: Testing

Writing correct code

A good approach to writing correct code can be represented by these ordered steps:

  1. Choose the right language. If possible, choose languages that prevent certain types of bugs. For example, if you don't need the lower-level performance or control, pick a language like Java rather than C to avoid memory-related bugs.
  2. Think before you code. Avoid writing bugs by understanding how the program will work before going and implementing it. Draw out any data structures and how you will modify them over the course of the program. Write pseudocode and consider all of the different cases you might encounter.
  3. Make defects visible. Use "assert" statements and exceptions (if they exist in your language) to crash your program while developing if something is not valid.
  4. Test the code. Ensure proper behavior by writing another program to exercise the code completely.
  5. Debugging. As a last resort, if your program is not correct, go through the debugging process to find the issues. Examples of debugging include adding print statements, gdb, valgrind or other tools, or adding more test cases.

Today we'll discuss the fourth step: how to write tests to verify that your program is correct.

Some quotes

"Test your software or your users will."

Hunt & Thomas
The Pragmatic Programmer

------------------------------------------------

"There are two ways of constructing a software design:

The first method is far more difficult."

Sir C. A. R. Hoare
1980 Turing Award winner
Invented "quicksort"

------------------------------------------------

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

Brian Kernighan
Wrote THE BOOK on C (our book!)

------------------------------------------------

"Program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence."

Edsger Dijkstra
1972 Turing Award lecture

What is testing?

Testing is a systematic way to reveal errors in code, generally by writing another program to test the code. Testing is a very difficult problem in its own right, and its power is by nature limited:

There is a perception in computer science that testing is a novice's job, that it is less illustrious and an afterthought. In my opinion, this is WRONG! Testing is a vital part of the software engineering process, and everyone should be capable of and responsible for writing tests.

Types of tests

For the rest of this lecture, we'll focus mainly on unit tests, since they are closest to the kind of programming that we've been doing.

Coverage

Unit tests often seek to thoroughly exercise a piece of code by attempting to provide complete "coverage" over that code. There are several types of "coverage":

An example

This code is supposed to compute something resembling C's "a or b" function. Remember that in C, 0 is false and all other integers are true. How do we test it? How many tests do we need? What kinds of tests should they be?

    int f(int a, int b) {
        int ans = 0;
        if (a) {
            ans += a;
        }
        if (b) {
            ans += b;
        }
        return ans;
    }

Let's consider the different types of coverage in order to come up with some test cases.

But even the example path-coverage test suite suggests f is a correct "or" function for C; it is not! There are interactions between cases that we haven't considered. We've also forgotten the possibility of negative integers! When testing, coverage is an important thing to consider, but 100% coverage does NOT mean that your program is fully tested.

In this example, f(-1, 1) would show that our function is not correct.

How could we write an actual test program? There are a number of frameworks that you can use to help you unit test in C, and you are encouraged to explore them, but the basic form of unit testing is using "assert" statements like the following:

    #include <assert.h>
    #include <stdlib.h>

    #include "f.h"

    // Assert statements will fail with a message if not true.
    int main(int argc, char** argv) {
      // Test case 1: f(0,0) => 0
      assert(!f(0, 0));

      // Test case 2: f(0,1) => not-0
      assert(f(0, 1));

      // Test case 3: f(1,0) => not-0
      assert(f(1, 0));

      // Test case 4: f(1,1) => not-0
      assert(f(1,1));

      // Test case 5: f(-1,1) => not-0
      assert(f(-1,1));
      return EXIT_SUCCESS;
    }

Black box testing

The exercise we just went through is an example of what is called "white-box" testing, in which you write a unit test while looking at the implementation of the function that you want to test. There is another type of test called a "black-box" test which is when you write a unit test WITHOUT looking at its implementation.

The pros of black-box testing? You probably won't make the same mistakes as the implementation and you'll think independently in terms of the interface, not the details of the code. However, you might miss some weird internal cases that really should be checked. Conversely, the pros of white-box testing are that you can be efficient and find the corner-cases of the implementation easily, but you can also be biased by assumptions that you make in the implementation and don't think to check (such as negative inputs in our example above).

In either case, you should think about edge cases and come up with tests to exercise those, like loop boundaries, "special constants", max values, empty/full data structures, etc.

As an exercise, what tests might you write for this sample function?

    // Sorts the values in the given array from
    // low to high.
    void sort(int* arr, int length);

Some possibilities:

    Test Input                 Array after the call    Why?
    sort([1, 5, 3, 2], 4)      [1, 2, 3, 5]            basic case
    sort([1, 5, 3, 2], 0)      (same)                  length 0
    sort(NULL, 2)              (no crash)              null input
    sort([1, 2, 2, 5, 3], 5)   [1, 2, 2, 3, 5]         duplicates
    sort([ 1, 3, 2 ], -3)      (same)                  negative length
    sort([-1, 5, -3, 2], 4)    [-3, -1, 2, 5]          negative values
    sort([1, 2, 3, 5], 4)      (same)                  already sorted
    sort([4], 1)               (same)                  length 1 array
    sort([really long], long)  (sorted)                really long array

Stubbing

What if your file A depends on some other functions in file B, but you want to test a function in file A without depending on file B? Take the following example:

We have a program to curve students' grades. In file db.h/db.c, we have code that will read a student's grade from the database and save the grade for a student back to the database. In file curve.c/curve.h, we have a function to curve students' existing grades by a certain number of points. In this situation, curve.c depends on db.h in order to get the grades to curve and save the results.

db.h:

    /**
     * Performs database operations on the student grade
     * database.
     */
    int getGradeForStudent(int studentId, int hwNum);
    void saveStudentGrade(int studentId, int hwNum, int grade);

curve.h:

    /**
     * Curves the grades of all provided students for the
     * given homework by the given number of points, capped
     * at 100 points.
     */
    void curve(int* allStudentIds,
               int numStudents,
               int hwNum,
               int numPoints);

curve.c:

    #include "curve.h"
    #include "db.h"

    void curve(int* allStudentIds,
               int numStudents,
               int hwNum,
               int numPoints) {
      for (int i = 0; i < numStudents; i++) {
        int studentId = allStudentIds[i];
        int currentGrade = getGradeForStudent(studentId, hwNum);
        currentGrade += numPoints;
        if (currentGrade > 100) {
          currentGrade = 100;
        }
        saveStudentGrade(studentId, hwNum, currentGrade);
      }
    }

If we wanted to test the curve function, we quickly run into a problem - we don't actually want to modify real students or real grades in the database! We want to write a test that doesn't actually modify the database, or rely on any database operations.

To accomplish this, we will do something known as "stubbing": we will provide a FAKE IMPLEMENTATION of the functions described in db.h that work well enough for the tests. This implementation should be as small as possible. We call these fake implementations "stubs", and they can be saved in a DIFFERENT FILE from db.c; when we compile the test program, if we include the stub implementation instead of db.c, the stubs will be used instead of the actual db.c implementation.

Example: a test file for curve.c:

    #include <assert.h>
    #include <stdlib.h>

    #include "curve.h"
    #include "db.h"

    // Stub for getGradeForStudent - counts the total number of
    // times that it is called.
    int numGradeCalls = 0;
    int getGradeForStudent(int studentId, int hwNum) {
      numGradeCalls++;
      return 0;
    }

    // Stub for saveStudentGrade does nothing.
    void saveStudentGrade(int studentId, int hwNum, int grade) {}

    int main(int argc, char** argv) {
      int students[] = { 1, 2, 3, 4, 5 };

      // Test case - 5 students, getGradeForStudent should be called
      // 5 times.
      curve(students, 5, 1, 12);
      assert(numGradeCalls == 5);

      return EXIT_SUCCESS;
    }

To compile, we can use:

    $ gcc -Wall -std=c11 -g -o test test.c curve.c  # DOESN'T INCLUDE db.c

You should use stubs if the stubbed code doesn't exist, is buggy, is large and slow, or has side effects (like saving to a database) that are undesireable. There are unit testing frameworks that provide more structured and easy stubbing support, like JUnit for Java - take advantage of these frameworks where they make sense. Some suggestions for what code to put into a stub:

Testing rules of thumb

Debugging

Debugging is not testing! It is not a systematic approach to validating that your program is correct; it is a set of methods for determining what went wrong when something did go wrong. Adding tests can be part of your debugging method, however. Treat debugging as a scientific experiment:

When you find and fix a bug, it is always recommended that you add a test that would have caught the bug, so that you don't make that mistake again.

Summary

Testing has some concepts worth knowing and using: