Link

Testing and Debugging

Complete the Reading Quiz by 3:00pm before lecture.

Table of contents

  1. reproduce your bug quickly
  2. accept that it’s probably your code’s fault
  3. start doing experiments
  4. change one thing at a time
  5. check your assumptions
  6. be noisy
  7. be noisy, quickly
  8. understand what the error messages mean
  9. write your code so it’s easier to test
  10. further reading

What does debugging a program look like?

This reading reproduces and adapts Julia Evans’ blog post, What does debugging a program look like?1

There will probably be some jargon you’re not familiar with since she discusses debugging in a variety of programming contexts. That’s okay. Here’s a glossary of the particularly pertinent concepts that we’ll be exploring in class.

Unit Test
Software that verifies that a specific piece of code (“the unit”) works as intended; the unit is usually a single class such as LinkedIntList but sometimes a single method if the method is very complex. These are usually run automatically; for example, you could re-run tests every time you save your edited file. We might write one unit test to check that add behaves correctly, another unit test to check that remove behaves correctly, and so forth. Unit tests are frequently automated because they’re easy to set up: instantiate the unit and you’re ready to test.
Test Case
The smallest component of a unittest, which verifies a single piece of behaviour. For example, your unit test might have 3 test cases: one which verifies add can handle null values, a second test case to verify add with non-null values, and a third to verify that remove is well-defined for non-empty lists.
Test Suite
A collection of unittests. Your test suite might contain unittests for LinkedList, ArrayList, ArrayQueue, and LinkedQueue. As with unittests, these are usually run automatically; however, since they encompass more code sometimes they are run less frequently; for example, you might run them every time code is checked in.
Library
A collection of resources used to support software development. For example, ArrayList is part of the Java standard library.
Debugger
A tool that can pause a program at any point during execution, allowing the programmer to inspect the exact values of different variables.

reproduce your bug quickly

Everybody agrees that being able to consistently reproduce a bug is important if you want to figure out what’s going on.

Everybody also agrees that it’s extremely useful be able to reproduce the bug quickly (if it takes you 3 minutes to check if every change helped, iterating is VERY SLOW).

A suggested approach is to get your bug down to its minimal working example and then write a test case which consistently reproduces the bug. bonus: you can add this to your test suite later if it makes sense

accept that it’s probably your code’s fault

Sometimes I see a problem and I’m like “oh, library X has a bug”, “oh, it’s DNS”, “oh, SOME OTHER THING THAT IS NOT MY CODE is broken”. And sometimes it’s not my code! But, in general, between an established library and my code that I wrote last month, usually it’s my code that’s the problem :).

start doing experiments

@act_gardner gave a nice, short explanation of what you have to do after you reproduce your bug

I try to encourage people to first fully understand the bug - What’s happening? What do you expect to happen? When does it happen? When does it not happen? Then apply their mental model of the system to guess at what could be breaking and come up with experiments.

Experiments could be changing or removing code, making API calls from a REPL, trying new inputs, poking at memory values with a debugger or print statements.

I think the loop here may be:

  • make guess about one aspect about what might be happening (“this variable is set to X where it should be Y”, “this code is never running at all”)
  • do experiment to check that guess
  • repeat until you understand what’s going on

change one thing at a time

Everybody definitely agrees that it is important to change one thing a time when doing an experiment to verify an assumption.

check your assumptions

A lot of debugging is realizing that something you were sure was true (“wait this request is going to the new server, right, not the old one???”) is actually… not true. I made an attempt to list some common incorrect assumptions. Here are some examples:

  • this variable is set to X (“that filename is definitely right”)
  • that variable’s value can’t possibly have changed between X and Y
  • this code was doing the right thing before
  • this function does X
  • I’m editing the right file
  • there can’t be any typos in that line I wrote (“it is just 1 line of code”)
  • the documentation is correct
  • the code I’m looking at is being executed at some point
  • the compiler is not buggy (though, this is last on purpose; the compiler is only very rarely to blame :))

be noisy

By “noisy”, I mean “every single time there’s an error, the program reports to you exactly what happened in an easy-to-understand way”. Whenever my program has a problem and says [something] “error: failure to connect to SOME_IP port 443: connection timeout” I’m like THANK YOU THAT IS THE KIND OF THING I WANTED TO KNOW and I can check if I need to fix a firewall thing or if I got the wrong IP for some reason or what.

be noisy, quickly

Have you ever had a million compiler error messages scroll past? Should you try to fix the first error or the last error? The answer is almost always the first error, because sometimes a single missing character – such as forgetting to close your brace } – can create a cascade of parser failures later. To get closer to the dream of “every single time there’s an error, the program reports to you exactly what happened in an easy-to-understand way” you also need to be disciplined about immediately returning an error message instead of silently writing incorrect data / passing a nonsense value to another function which will do WHO KNOWS WHAT with it and cause you a gigantic headache. This isn’t easy to get right (it’s not always obvious where you should be raising errors!) but it really helps a lot.

understand what the error messages mean

One sub debugging skill that I take for granted a lot of the time is understanding what error messages mean! I came across this nice graphic explaining common Python errors and what they mean, which breaks down things like NameError, IOError, etc.

I think a reason interpreting error messages is hard is that understanding a new error message might mean learning a new concept – NameError can mean “Your code uses a variable outside the scope where it’s defined”, but to really understand that you need to understand what variable scope is! I ran into this a lot when learning Rust – the Rust compiler would be like “you have a weird lifetime error” and I’d like be “ugh ok Rust I get it I will go actually learn about how lifetimes work now!”

write your code so it’s easier to test

Once you figure out what your bug is, what’s the easiest way to prevent it from ever happening again? Write a test! And not just any test, but a minimal working test, so that it can be run quickly and automatically. Tests that can’t be run quickly and automatically – the kind of test that involves setting up a database, creating 300 user accounts, and then trying to log in as the 301st user – simply don’t get run.

In a well-factored test suite, you should be able to reduce your minimal working example into a test case in the relevant unit test. Moreover, not only should your test code be written with an eye towards adding future tests, but your code-under-test – the “unit” you’re testing – should also, as well! This means writing units that does exactly one thing. For example, if you want to write a program that opens a text file, breaks the text into words separated by whitespace, and then inserts every word into a HashSet, you could structure your code like this:

  • “File Opener”: checks to see that the file exists, is openable, opens it, then returns its contents as a single string.
  • “Word Normalizer”: takes the whitespace-separated words and “normalizes” them into a canonical format (eg, making everything lowercase).
  • “Word Collector”: takes the File Opener’s resultant string, breaks it into whitespace-separated words, normalizes those words, and inserts them into the HashSet.

This allows you to write code that verifies you can handle different types of whitespace (in your word collector) without also needing to set up a test file on local disk. It is much easier to hardcode a couple of strings in your test cases than it is to create a couple of test files.

further reading

This reading focuses on the philosophy of testing and debugging. If you’d prefer to dive into the mechanics of IntelliJ’s debugger, as well as more tips and tricks, these slides from 373 19sp are great.

An important part of the debugging process is knowing when you’ve reached the end of what you can do, and knowing what kind of help you now need. A good supplemental reading is Adam Blank’s How To Ask for Help.

In lecture, we will introduce (but not name) two principles for organizing code-under-test; they were the Law of Demeter and Dependency Injection. We will also refer to Apple’s “goto fail” bug and Doug Zongker’s “Chicken paper” (UW CSE PocSCI 2002).


Reading Quiz

  1. Evans, Julia. 2019. What does debugging a program look like? https://jvns.ca/blog/2019/06/23/a-few-debugging-resources/