Debugging a Python program

First, see the CSE 160 lecture on debugging (PDF, PowerPoint). This document provides more specific tips and tricks about debugging your code.

Contents:

Start with English

Before you write any Python code, you should always write an English description of your algorithm — the instructions you would give someone to do the data processing manually. This is helpful in understanding whether your error is with your algorithm or with your implementation of the algorithm.

If you're confused about where to start, write out some sample input and produce the output by hand. Observe what you did, and then write English that describes that process.

Play computer

Sometimes your program will produce unexpected results. One of the best ways to understand why your program behaved the way it did is to play computer: simulate the program. This will help you to find and understand the point at which the actual computation diverged from your expectations.

You will often find it helpful to play computer with the English description of your approach. You should really do this before writing code, but it's useful even if your code already exists. Draw out a small example, such as one of the staff-provided test cases, and go through it on paper showing how your algorithm is supposed to work and produce the correct answer. You may discover what incorrect assignment is occurring this way. If not, and your manual computation provided the correct answer, then proceed to running your code.

Play computer with the same example and your actual code. You will observe how the program actually executes and produces an incorrect answer. You will also see the point at which it diverges from what you wanted it to do, which will indicate to you where you need to change your program.

Tools exist to relieve the tedium of playing computer (but you still have to understand the semantics of Python execution even if you use a tool). The Online Python Tutor lets you execute your code line-by-line, all the while provides a graphical representation of your code and the data.
The Python Tutor has some limitations, such as not being able to read files stored on your hard drive. When you paste your code, replace its call to the main routine by a call to the function you're trying to test. Its argument can be some sample input from a test case that your program is failing.

Another tool is the print statement. You can easily change a print statement and re-run the program, rather than having to step through your program. The disadvantage is that if you want to know something else, you need to change your print statements. So, use both approaches in tandem.

Think backward

Playing computer involves starting at the beginning of your program and waiting until something goes wrong. But, you sometimes know (from an incorrect result or, especially, a crash) where your program had a problem and even what the problem was. Working backward from there can be a more efficient way to localize the defect in your program.

For instance, suppose that you have a function row_to_edge that is being passed a string when it should be passed a dictionary. Where is the function call expression that passed a string to row_to_edge as a parameter? (The stack trace will sometimes tell you this.) Why did it pass a string when it should have passed a dictionary?

You can follow that chain of reasoning back until you find a function whose input parameters were valid and correct, but whose body made an incorrect function call (it passed an incorrect argument). The bug is in or related to this function.

Performance debugging

If your code produces the correct result, but does so too slowly (or if it never produces any result when you run it on large inputs), then you will need to diagnose a peformance problem.

The concepts in the lecture on algorithmic speed (PDF, PowerPoint, Python code) may be helpful.

Where is the problem?

It's critical that you first determine where things are going wrong, and only afterward determine what is going wrong. In other words, don't spend a lot of time optimizing a part of your code that is not the performance bottleneck. Don't just guess at the problem: get data about it. This is similar to our recommended strategy for debugging: perform divide and conquer to localize the problem in the program text, or in the data, or in the time spent during the run of your program.

One approach is to run your program on progressively larger datasets to see how the running time changes — this is similar to the graphs that we drew in the lecture on algorithmic speed. For example, if you expect your program to take ten times as long to compute ten times as much data, but your program actually takes 30 or 100 times as long, then you have uncovered a slow algorithm.

Another approach is to write print statements to determine how long different parts of your program run.

Common preformance problems

Here are some things that might have gone wrong.

Redundant function calls
If you call the same function with the same arguments multiple times, then you can instead store the result in a variable and use the variable instead of re-doing the computation. This is especially important if the function call is expensive or is inside a loop that iterates many times.
Redundant computation
A repeated computation doesn't have to be a function call. Any expression that is repeated multiple times on the same operands can be re-used rather than recomputed. Be sure to do this only for operations that are costly.
Poor algorithm
Perhaps the most common problem is a high-level algorithm that computes a problem in an inefficient way without necessarily repeating work at the statement-by-statement level. To address this, you can rewrite your slowest or most complex functions from scratch. Rewriting the code lets you leverage the lessons you learned from writing it the first time, and you often avoid the same mistakes.

Asking the course staff

When you report any problem, it is essential that you provide complete information and that you indicate what you understand and what you have tried. Providing complete information means you should send the staff your code, cut-and-paste the exact command you ran, and cut-and-paste the exact, complete output — don't give a vague "I ran it and it didn't work". When you write what you understand and what you have tried (again, be specific!), you are likely to understand your problem better and may even solve it on your own. Even if not, then the staff won't waste time repeating things you have already tried (and they'll know that you did try rather than just using them as a crutch).

One excellent way to ask for help is to explain why you think the observed result is wrong or impossible. Why do you think your program should work, or why do you think that the output you see cannot happen? This often lets the staff quickly identify and correct a fundamental misunderstanding.

Don't expect the staff to just give you all the answers — that wouldn't be fair to you. But the staff will give you pointers in the right directions so that you can get unstuck and can learn on your own.