Skip to article frontmatterSkip to article content

This code style guide reproduces common conventions for Python programming. While rules can be occasionally bent with justification, most programming work in this course should follow these guidelines.

Naming

Variable names

Variable names should be descriptive, concise, and lowercase with words separated by underscores as in snake_case. Avoid single letter names for variables because a single letter usually fails to describe the values contained in a variable, with the exception of loop variables. Avoid Python keywords, such as class, print, import, or return, or names of built-in functions or types, such as len.

Function names

Function names should be descriptive, concise, and lowercase with words separated by underscores as in snake_case. As with variable names, avoid names of built-in Python functions. We usually specify the function names in the specification, so make sure to name your functions accordingly.

When writing test functions, the name of your test function should clearly indicate which function it is testing. For example, a test for the function add_two_numbers might be named test_add_two_numbers.

Class names

Class names should be in CapitalCase.

Constant names

Constant names should be in ALL_CAPITAL_CASE_WITH_UNDERSCORES.

Constants are values defined outside the function declarations whose values are manually assigned by programmers and infrequently changed. Constants usually replace “magic values” to make code more readable. Constants should only represent simple values, not large objects such as dataframes. Constants are optional in test programs.

File names

Python (py) files or module names should be in snake_case, such as main.py or dataset_analysis.py.

Jupyter Notebook (ipynb) files do not have a particularly universal naming convention. Course materials use lowercase names with words separated by hyphens as in kebab-case.

Whitespace

Indentation

Unlike other languages that use explicit delimiters to determine what goes inside a loop or function (such as {} in Java), Python only uses indentation. Indentation affects the behavior of programs. IndentationError: unexpected indent means there is an error in code indentation.

Blank lines

Leave two blank lines separating code for different function definitions, class definitions, and other code. The only exception is for methods defined within the same class, in which case a single blank line between methods is preferred.

Within a function definition, avoid introducing more than a single blank line at a time and only for separating complex chunks of code for readability. Careful and consistent of two blank lines can help communicate the start of a significant, new definition.

Between operators

Include spaces between operators and other elements in an expression. Mathematical operators include +, -, *, /, and **. Logical operators include ==, <=, >=, <, and >. Limit space delimiters to 1 space to avoid unnecessary whitespace. The exception to this is parentheses, which can be directly adjacent to whatever they are enclosing.

This does not apply when specifying default parameter values or keyword arguments. Avoid spaces around the assignment operator = for those cases.

Function calls

Include spaces between arguments in a function call. Avoid space(s) between the function name and its argument list. Adding space(s) incorrectly suggests that the parentheses are for grouping an expression when they actually indicate a function call.

Line length

In general, lines should not exceed 100 characters. Occaisional exceptions are permitted when a longer line would be clearer and easier to understand than two or more shorter lines. Avoid long lines with the following strategies.

Function calls
some_function(first_arg="This is a function",
              second_arg="With many arguments",
              third_arg="indent until everything lines up")
Expressions
total = (first_num + second_num + third_num
         + fourth_num + fifth_num)
Strings
print("For very very very very very very long strings, "
      "this is how we might break it up to two lines")
DataFrames
# This filter is too long and combines multiple steps
data = data[(data['primary_color'] == 'Magenta') & (data['size'] != 'Large')]

# Instead, assign names to each step of the filter
is_magenta = data['primary_color'] == 'Magenta'
is_not_large = data['size'] != 'Large'

data = data[is_magenta & is_not_large]

Documentation

Each function should contain a docstring describing what the function does right below the function definition. It should describe any parameters the function takes and values it returns (if any). If the code requires handle some special cases (like case sensitivity, or special returns values for a certain input), include that in the comment as well. The main method does not require a comment.

Long blocks of particularly complex code can be explained by including a comment # on the preceding line(s).

def add_two_numbers(a, b, return_zero=False):
    """
    This function returns the sum of the two given numbers a, b
    if return_zero is False. Otherwise returns 0.
    return_zero has False as its default value.
    """
    if return_zero:
        return 0
    else:
        return a + b

Avoid information that is unnecessary or irrelevant to clients (end-users of your work), such as “initialize variable” or “nested for loop” or “increment count”. Function comments should describe what result the program achieves rather than how it achieves that result.

Consider what a client would find useful. Documentation should contain all the necessary information needed to use a function. The exact details of how the function solves the problem is irrelevant, but the client needs to know the purpose of the parameters and the expected behavior. Avoid mentioning “the spec” or “the assessment” because clients won’t know what this means unless they’ve also taken this course.

The documentation for test functions can be relatively simple. For example, the test function test_add_two_numbers could include the docstring, “Tests the function add_two_numbers”.

Type annotations

Every function should have its parameter types and return types annotated. Fields of objects should also their types annotated. Annotations are not required for local variables.

def function_example(a: str) -> str:
    return "example" + a


class ClassExample:
    def __init__(self, param: str) -> None:
        self.field: str = param

    def method(self, param: str) -> int:
        return len(self.field + param)

If a function doesn’t return anything, indicate its return type is None. You should annotate every function you write, including test functions.

Class definitions

Include a docstring comment describing the class right below the class definition.

class Dog:
    """
    Represents a dog with a name.
    """

    def __init__(self, name: str) -> None:
        """
        Initializes a Dog object with the given name.
        """
        self._name: str = name

    # Other methods for the Dog class

Logical refactoring

If the same lines of code appear in multiple places, reduce redundancy by refactoring.

Conditional logic

Boolean zen

Avoid unnecessary comparisons to True and False. Remember to negate boolean values with the not operator.

Loop zen

Choose loop bounds or loop conditions that generalize code the best.

Computation that only need to happen once should not be recomputed inside a loop.

Avoid unnecessary looping.

Redundant cases

Avoid redundant cases. Before introducing a special case, think about whether a general case could compute the same result.

Program design

Global variables

Only constants be defined as global variables in the top level of indentation.

Argument immutability

Unless an explicit part of the specification, avoid modifying the arguments passed to your functions. While many built-in types like intfloatboolstrtuple are immutable, some built-in types are mutable such as lists, dictionaries, DataFramenumpy arrays, etc.

Private fields

In class definitions, all fields should be initialized in the __init__ method. Fields should be declared private by prefixing their names them with an underscore _. Rather thanself.field_name, use self._field_name.

Avoid unnecessary fields. The more fields in a class definition, the more difficult it becomes to maintain and reason about code. Additionally, each instance of the class will also consume more memory. Watch for fields that are only used in one public method or recomputed/cleared every call to a specific method. These can likely be simplified to local variables or function parameters. Deciding on what fields to include can be tricky: it might be necessary to make something a field to improve efficiency at the cost of simplicity.

Avoid accessing private fields from outside the class except for testing purposes. To allow client access to a private field, declare a public “getter” method that returns the value of the field and use it instead.

Private methods

“Helper” methods should be declared private by prefixing their names with an underscore, such as _private_method. Avoid calling private methods from outside the class except for testing purposes.

Pandas zen

When working with pandas objects, avoid loops since pandas methods are often much faster to run.

Notebook zen

When working in a Jupyter Notebook, ensure that each code cell represents a logical step in a procedure. Avoid writing all the code in a single code cell and instead spread code out across multiple code cells with Markdown cells to explain each step.

The notebook should be top-to-bottom executable: after restarting the Python kernel, we can reproduce your outputs by running all cells from the top of the notebook to the bottom of the notebook.

Unnecessary computation

Avoid computing unnecessary values.

Fit and finish

Remove all debugging print statements and fix all warnings and errors.