Link

Quality

How to assess and evaluate code quality.

Submissions should pass flake8 and meet both the housekeeping and code quality guidelines.

Housekeeping

No global variables
No global variables except for constants.
No debugging print statements
Remove debugging print statements.
No warnings or errors
Fix all program warnings or errors.

Code quality

  1. Naming
    1. Variable names
    2. Function names
    3. Class names
    4. Constant names
    5. File names
  2. Whitespace
    1. Indentation
    2. Blank lines
    3. Operators
    4. Function calls
  3. Line length
  4. Main method pattern
  5. Documentation
  6. Class definitions
  7. Refactoring
    1. Conditional logic
    2. Boolean zen
    3. Loop zen
    4. Pandas zen
    5. Redundant cases
    6. Unnecessary computation

Naming

Other programming languages such as Java have different conventions for naming. Since we are using Python in this class, use the Python naming conventions.

Variable names

Variable names should be descriptive, concise, and lowercase with words separated by underscores as in snake_case. Avoid single letter names for variables because a single letter usually don’t describe the values contained in a variable very well, with the allowable exception of loop variables. Avoid Python keywords, such as class, print, import, or return, or names of built-in functions or types, such as len.

Good
factor
total_weight
data1        # Okay to have a number right after a word
Bad
x            # Most likely not descriptive enough,
             # except for something like an (x, y) coordinate
totalWeight  # Not using snake_case
NewImage     # Not using snake_case
newresult    # Not separating words with underscores
return       # Python keyword
len          # Python built-in function

Function names

Function names should be concise, descriptive, and lowercase with words separated by underscores as in snake_case. As with variable names, avoid names of built-in Python functions. We usually specify the function names in the specification, so make sure to name your functions accordingly.

Good
get_height(person)
plot_population(df)
Bad
setScore(record)  # not using snake_case
max()             # will overwrite the built-in max function in Python

When writing test functions, the name of your test function should clearly indicate which function it is testing. For example, a test for the function add_two_numbers might be named test_add_two_numbers.

Class names

Class names should be in CapitalCase.

Good
University
DataFrame
EdPost
Bad
dataFrame
ed_post

Constant names

Constant names should be in ALL_CAPITAL_CASE_WITH_UNDERSCORES.

Constants are values defined outside the function declarations whose values are not meant to be changed. Constants usually replace “magic values” to make code more readable. Constants should only represent simple values, not large objects such as dataframes. Constants are optional in test programs.

Good
DATA = "data.csv"


def main():
    data = pd.read_csv(DATA)
    # Do something with data


if __name__ == '__main__':
    main()
Bad
DATA = pd.read_csv("data.csv")

File names

File or module names should be in snake_case, such as main.py or ed_post.py.

Whitespace

All whitespace requirements are handled by flake8.

Indentation

Unlike other languages that use explicit delimiters to determine what goes inside a loop or function (such as {} in Java), Python only uses indentation. Indentation affects the behavior of programs. IndentationError: unexpected indent means there is an error in code indentation.

Blank lines

Leave two blank lines separating code for different function definitions and other code. The only exception are for methods defined within the same class. Try to minimize blank lines within function definitions except when separating complex chunks of code for readability.

Good
# Other code


def the_first_function():
    # Implementation for the first function


def the_second_function():
    # Implementation for the second function


# Other code
Good
def compute_avg(x, y, z):
    """
    Sums the given three numbers and returns the average.
    """
    sum_val = x + y + z

    # Compute and return the average value
    result = sum_val / 3.0
    return result


# Other code

Operators

Include spaces between operators and other elements in an expression. Mathematical operators include +, -, *, /, and **. Logical operators include ==, <=, >=, <, and >. Limit space delimiters to 1 space to avoid unnecessary whitespace. The exception to this is parentheses, which can be directly adjacent to whatever they are enclosing.

Good
x + y
(sum ** 2) + 4 * val - 1
x * (4 + 6)
b + math.sqrt(4 * max_val)
Bad
x+y                             # not enough spaces
(sum**2)+4*val-1                # not enough spaces
x * ( 4 + 6 )                   # unnecessary spaces around parentheses
b +  math.sqrt( 4 *   max_val)  # inconsistent spacing

This does not apply when specifying default parameter values or keyword arguments. Avoid spaces around the assignment operator = for those cases.

Good
def add_three_nums(a, b, c=10):
    return a + b + c


add_three_nums(b=20, a=20, c)

Function calls

Include spaces between arguments in a function call. Avoid space(s) between the function name and its argument list. Adding space(s) incorrectly suggests that the parentheses are for grouping an expression when they actually indicate a function call.

Good
x = math.sqrt(n)
range_vals = range(n, 4)
Bad
x = math.sqrt (n)        # too much space before the parenthesis
range_vals = range(n,4)  # no spaces between parameters

Line length

Each line should contain no more than 79 characters. Avoid long lines with the following strategies.

Function calls
some_function(first_arg="This is a function",
              second_arg="With many arguments",
              third_arg="indent until everything lines up")
Expressions
total = (first_num + second_num + third_num
         + fourth_num + fifth_num)
Strings
print("For very very very very very very long strings, "
      "this is how we might break it up to two lines")
DataFrames
# This filter is too long and combines multiple steps
data = data[(data['primary_color'] == 'Magenta') & (data['size'] != 'Large')]

# Instead, assign names to each step of the filter
is_magenta = data['primary_color'] == 'Magenta'
is_not_large = data['size'] != 'Large'

data = data[is_magenta & is_not_large]

Main method pattern

Runnable Python programs should follow the main method pattern.

# Implementation for functions specified in the HW spec


def main():
    # Do the actual work here


if __name__ == '__main__':
    main()

Documentation

Include a header comment at the top of a file with your name, section, and a brief description of the program in docstring """ or ''' format.

Each function should contain a docstring describing what the function does right below the function definition. It should describe any parameters the function takes and values it returns (if any). If the code requires handle some special cases (like case sensitivity, or special returns values for a certain input), include that in the comment as well. The main method does not require a comment.

Long blocks of particularly complex code can be explained by including a comment # on the preceding line(s).

"""
Hunter Schafer
CSE 163 AX
This program implements a function that adds two numbers
"""

def add_two_numbers(a, b, return_zero=False):
    """
    This function returns the sum of the two given numbers a, b
    if return_zero is False. Otherwise returns 0.
    return_zero has False as its default value.
    """
    if return_zero:
        return 0
    else:
        return a + b

Avoid information that is unnecessary or too much implementation detail, such as “initialize variable” or “nested for loop” or “increment count”. Function comments should describe what result the program achieves rather than how it achieves that result.

Bad
# Note that the usage of a set and a loop are implementation details,
# as in they describe HOW the method works.
# Returning the length of the set doesn't mean much to the clients.
def count_unique_characters(filename):
    """
    Opens the given file, loops through each character and adds them
    to a set. Returns the length of the set.
    """
    result = set()
    with open(filename) as f:
        content = f.read()
        for c in content:
            result.add(c)
    return len(result)
Good
def count_unique_characters(filename):
    """
    Takes in a filename and returns the number of unique characters within
    the given file.
    """
    result = set()
    with open(filename) as f:
        content = f.read()
        for c in content:
            result.add(c)
    return len(result)

Consider what a client would find useful. Clients only care about the function declaration and comment, so the comment should contain all the necessary information that needed to properly use the function. The exact details of how the function solves the problem is irrelevant, but the client needs to know the purpose of the parameters and the expected behavior.

The documentation for test functions can be relatively simple. For example, the test function test_add_two_numbers could include the docstring, “Tests the function add_two_numbers”.

Class definitions

When defining classes in Python, follow all code quality guidelines in addition to these extra requirements.

Include a docstring comment describing the class right below the class definition.

class EdPost:
    """
    This class represents an Ed post with a title, a tag, and a list
    of comments
    """
    # Methods for EdPost

Fields should be declared private by prefixing their names them with an underscore _. Rather thanself.field_name, use self._field_name.

Avoid unnecessary fields. The more fields in a class definition, the more difficult it becomes to maintain and reason about code. Additionally, each instance of the class will also consume more memory. Watch for fields that are only used in one public method or recomputed/cleared every call to a specific method. These can likely be simplified to local variables or function parameters. Deciding on what fields to include can be tricky: it might be necessary to make something a field to improve efficiency at the cost of simplicity.

Avoid accessing private fields from outside the class except for testing purposes. Declare a public “getter” method that returns the value of the field and use it instead.

“Helper” methods should be declared private by prefixing their names with an underscore, such as _private_method. Avoid calling private methods from outside the class except for testing purposes.

Refactoring

If the same lines of code appear in multiple places, reduce redundancy by refactoring.

Conditional logic

Bad
# Note that there are repeated lines of logic that actually always happen,
# instead of conditionally like how our structure is set up. We can factor
# these out to simplify and clean our code.
if x % 2 == 0:
    print("Hello!")
    print("I love even numbers too.")
    print("See you later!")
else:
    print("Hello!")
    print("I don't like even numbers either.")
    print("See you later!")
Good
print("Hello!")
if x % 2 == 0:
    print("I love even numbers too.")
else:
    print("I don't like even numbers either.")
print("See you later!")

Boolean zen

Avoid unnecessary comparisons to True and False. Remember that to negate boolean values with the not operator.

Bad
if is_sunny:
    return True
else:
    return False
Good
return is_sunny
Bad
if is_sunny == True:
    go_hiking()
Good
if is_sunny:
    go_hiking()
Bad
return is_sunny == False
Good
return not is_sunny

Loop zen

Choose loop bounds or loop conditions that generalize code the best.

Bad
l = [1, 2, 3]
total += l[0]
for i in range(1, len(l)):
    total += l[i]
Good
for i in range(len(l)):
    total += l[i]

Computation that only need to happen once should not be recomputed inside a loop.

Bad
# Note that the mean of the whole list remains unchanged for each loop iteration
# so it should be outside the loop to avoid unnecessary computation
def demean(l):
    """
    Takes in a list of numbers l and returns a new list with the mean
    value of the original list subtracted from each corresponding value
    """
    result = []
    for i in range(len(l)):
        mean = sum(l) / len(l)
        result.append(l[i] - mean)
    return result
Good
def demean(l):
    result = []
    mean = sum(l) / len(l)
    for i in range(len(l)):
        result.append(l[i] - mean)
    return result

Avoid unnecessary looping.

Bad
# Note that the total score for each sex could be computed at the same time
def get_total_for_each_sex(data):
    """
    Takes in data containing scores of students as a list of dictionaries.
    Returns a tuple containing the total score for each sex in the format
    of (male total, female total).
    """
    for row in data:
        if row['sex'] == 'M':
            male_total += row['score']
    for row in data:
        if row['sex'] == 'F':
            female_total += row['score']
    return male_total, female_total
Good
def get_total_for_each_sex(data):
    for row in data:
        if row['sex'] == 'M':
            male_total += row['score']
        else:
            female_total += row['score']
    return male_total, female_total

Pandas zen

When working with pandas objects, avoid loops since pandas methods are much better optimized.

Bad
total = 0
for row in data:
    total += row['score']
Good
total = data['score'].sum()

Redundant cases

Avoid redundant cases. Before introducing a special case, think about whether a general case could compute the same result.

Bad
# Note the first case is unnecessary since if a == 0 and b != 0, a / b will
# still return 0.
def divide(a, b):
    """
    Takes in two numbers a, b and returns the result of a / b.
    Returns 0 if b == 0.
    """
    if a == 0:
        return 0
    elif b == 0:
        return 0
    else:
        return a / b
Good
def divide(a, b):
    if b == 0:
        return 0
    else:
        return a / b

Unnecessary computation

Avoid precomputing values unless necessary.

Bad
def get_average_score_for_sex(data, sex):
    """
    Takes in a DataFrame data containing scores for students and a sex and
    returns the average score for the given sex as a Series. Assume sex only
    takes the value 'M'(male) and 'F'(female).
    """
    male_avg = data[data['sex'] == 'M']
    female_avg = data[data['sex'] == 'F']
    if sex == 'M':
        return male_avg
    else:
        return female_avg
Good
def get_average_score_for_sex(data, sex):
    return data[data['sex'] == sex]