CSE 163, Winter 2021: Code Quality Guide

Some of the requirements are handled by flake8 while others requires you to double check them manually. All the code files you submitted are expected to pass flake8 and follow the code quality guidelines as outlined below

Naming Conventions

Note: Other programming languages (e.g. Java) have different conventions for naming. Since we are using Python in this class, you are expected to use the Python conventions.

Variable Names

Your variable names should be descriptive, concise, and lowercase with words separated by underscores (snake_case). You should try to not use single letter names for variables because a single letter usually don't describe the values contained in a variable very well, but they are fine for loop variables. You should also avoid using Python keywords, such as class, print, import, or return, or names of built-in functions or types, such as len (those will be highlighted in the code editor).

A list of keywords can be found by typing the following into the Python interpreter:

import keyword
keyword.kwlist

Examples:

# Good variable names
factor
total_weight
data1         # Okay to have a number right after a word
        
# Bad variable names
x             # Most likely not descriptive enough,
              # except for something like an x, y coordinate
totalWeight   # Not using snake_case
NewImage      # Not using snake_case
newresult     # Not separating words with underscores
return        # Python keyword
len           # Python built-in function

Function Names

Your function names should be concise, descriptive, and lowercase with words separated by underscores (snake_case). Just like variable names, you should also try to avoid using names of built-in Python functions. We usually specify the function names in the spec and you should make sure that you name your functions accordingly.

In some of the assignments, we ask you to write test functions to test your code. The name of your test function should clearly indicate which function it is testing.

Examples:

# Good function names
get_height(person)
plot_population(df)

# Bad function names
setScore(record)   # not using snake_case
max()              # will overwrite the built-in max function in Python

Class Names

Your class names should be in CapitalCase.

Examples:

# Good class names
University
DataFrame
EdPost

# Bad class names
dataFrame
ed_post

File Names

Your file/module names should be in snake_case. Examples: main.py, ed_post.py.

Constant Names

Your constant names should be in ALL_CAPITAL_CASE_WITH_UNDERSCORES. Constants are values defined outside the function declarations and their values are not meant to be changed. People usually define constants with descriptive names to replace "magic values" in their code to make it more readable. Note that constants should be light-weight in the sense that they are not supposed to store big objects (e.g. DataFrames). In CSE 163, the use of constants is optional in test files, but not required.

Examples:

# Bad- constants should be light-weight
DATA = pd.read_csv("data.csv")

# Good
DATA = "data.csv"


def main():
    data = pd.read_csv(DATA)
    # Do something with data


if __name__ == '__main__':
    main()

White Space

Note: most of the whitespace requirements below should be handled by flake8- you will receive a warning if you are not meeting some of the requirements.

Indentation

Unlike other languages that use explicit delimiters (like {} in Java) to determine what goes inside a loop or function, Python only uses indentation. It is extremely important to properly indent your code because it might lead to unexpected bugs in your program. If you see an error saying IndentationError: unexpected indent, it means there is an error in your indentation.

Blank Lines

In general, there should be two blank lines separating code for different functions and function definitions from other code. The only exception are the methods defined within the same class (see Classes and Objects below). Try to minimize blank lines within function definitions, except when separating complex chunks of code/logic to provide readability

Good Example 1 (two blank lines separating functions):

# Other code


def the_first_function():
    # Implementation for the first function


def the_second_function():
    # Implementation for the second function


# Other code

Good Example 2 (one blank line separating "complex" chunks of code):

def compute_avg(x, y, z):
    """
    Sums the given three numbers and returns the average.
    """
    sum_val = x + y + z

    # compute and return the average value
    result = sum_val / 3.0
    return result

    
# other code

Space Between Operators

Include spaces between mathematical and logical operators and other elements in an expression. Mathematical operators include +, -, *, /, and **. Logical operators include ==, <=, >=, < and >. Limit space delimiters to 1 space, to avoid unnecessary whitespace. The exception to this is parentheses, which can be directly adjacent to whatever they are enclosing.

Examples:

# Good
x + y
(sum ** 2) + 4 * val - 1
x * (4 + 6)
b + math.sqrt(4 * max_val)

# Bad
x+y			          # not enough spaces
(sum**2)+4*val-1 	# not enough spaces
x * ( 4 + 6 )   	# unnecessary spaces around parentheses
b +   math.sqrt( 4 *     max_val)  # inconsistent spacing

Note that this convention does not apply when specifying the value for a default parameter or passing in parameters by name when calling a function. In general you should not put space around the = for those cases.

def add_three_nums(a, b, c=10):
    return a + b + c


add_three_nums(b=20, a=20, c)

Space Between Function Names and Parameter List

Avoid adding extra space(s) between a function name and its associated parameter list. Using the space suggests--incorrectly--that the parenthesis are for grouping an expression, when in fact they are for calling the function.

You should, however, include spaces between individual parameters in the parameter list. This makes your function definitions and calls more readable.

Examples:

# Good
x = math.sqrt(n)
range_vals = range(n, 4)

# Bad
x = math.sqrt (n)        # too much space before the parenthesis
range_vals = range(n,4)  # no spaces between parameters

Line Length

According to flake8, the maximum number of characters that you should have on a given line is 79 characters. You should try to avoid writing code with long lines, but here are some common ways to break a long line:

# Calling a function with many arguments
some_function(first_arg="This is a function",
              second_arg="With many arguments",
              third_arg="indent until everything lines up")

# Breaking a long expression
# (use \ in the end and indent once on the second line)
total = first_num + second_num + third_num + \
    fourth_num

# Breaking a long string
print("When you have a very very very long string, "
      "this is how you could break it properly")

# Breaking a long DataFrame filter into separate variables
# This line is too long and less readable
data = data[(data['primary_color'] == 'Magenta') & (data['size'] != 'Large')]

# You can do the following instead
is_magenta = data['primary_color'] == 'Magenta'
is_not_large = data['size'] != 'Large'

data = data[is_magenta & is_not_large]

Main Method Pattern

For most of the assignments, when we say that certain files should use the main method pattern, it means those files should follow the structure below:

# Implementation for functions specified in the HW spec


def main():
    # Code calling all functions that you implemented


if __name__ == '__main__':
    main()

Documentation

For every Python file you write, you should include a header comment at the top of a file with your name, section, and a brief description of what the program in the file does. The header comment should be in doc-string (""" or ''') format.

Each function should contain a doc-string describing what the function does, right below the function definition. It should describe any parameters the function takes and values it returns (if any). If the spec requires you to handle some special cases (like case sensitivity, or special returns values for a certain input), you should also mention that in the comments as well. It is okay to not include a comment for you main method

Also, long blocks of code with a particular purpose or small bits of particularly complex code could be labeled/explained by a comment (using #) on the preceding line(s) to help with readability.

Example:

"""
Hunter Schafer
CSE 163 AX
This program implements a function that adds two numbers
"""

def add_two_numbers(a, b, return_zero=False):
    """
    This function returns the sum of the two given numbers a, b
    if return_zero is False. Otherwise returns 0.
    return_zero has False as its default value.
    """
    if return_zero:
        return 0
    else:
        return a + b

Try to avoid commenting on information that is unnecessary or containing too much implementation detail (i.e. "initialize variable" or "nested for loop" or "increment count"). Your function comments should describe what the program does (its behavior), not how.

Example:

# Bad
# Note that the usage of a set and a loop are implementation details,
# as in they tell you HOW the method works.
# Returning the length of the set doesn't mean much to the clients.
def count_unique_characters(file_name):
    """
    Opens the given file, loops through each character and adds them
    to a set. Returns the length of the set. 
    """
    result = set()
    with open(file_name) as f:
        content = f.read()
        for c in content:
            result.add(c)
    return len(result)


# Good
def count_unique_characters(file_name):
    """
    Takes in a file name and returns the number of unique characters within
    the given file.
    """
    result = set()
    with open(file_name) as f:
        content = f.read()
        for c in content:
            result.add(c)
    return len(result)

One way to think about it is if you are the client of the code and you can only see the function declaration and the comment, the comment should contain all the necessary information that you need in order to properly use the function. You probably won't care whether the function is using a for loop, while loop, or just some if-else statements, but you will need to know the input parameters and the expected behavior of the function, especially under special cases.

Testing

In some of the assignments, we ask you to write a test program to test your implementation. Your test files should meet the same style requirements as specified above, including:

  • When we run your code, it should produce no errors or warnings. The point of writing a test program is that you can use it to verify the correctness of your code, so you should make sure to run it before turning in your assignments.
  • There is a file header comment at the top of with your name, section, and a brief description of what that program does.
  • Use the main method pattern.
  • Good naming convention of variables and functions (snake_case).
  • Each of the test functions should have a descriptive name that indicates which function is being tested. For example, if it is testing the function add_two_numbers, the test function should be named test_add_two_numbers.
  • Each of the test functions should be commented in doc-string format. Unlike comments for the main solution of your take home assessment, the comments for test functions can be relatively simple. For example, if the function is testing add_two_numbers, the comment can simply be "Tests the function add_two_numbers".
  • Each test function should contain the required number of tests from the spec and additional tests that you come up with. A single test is considered a call on assert_equals. You are highly encouraged to add more test cases to test your functions more comprehensively.
  • All test functions should be called from main.

Classes and Objects

When implementing classes and objects in Python, you should follow the style requirements as specified above with some extra requirements:

Class Comment

When you are implementing a class, you should also include a doc-string comment describing the class right below the class definition.

Example:

class EdPost:
    """
    This class represents an Ed post with a title, a tag, and a list
    of comments
    """
    # Methods for EdPost

Private Fields

All your fields should be declared as private. In Python, it means starting your field name with an underscore (_). For example, instead of self.field_name, you should always have self._field_name.

You should also avoid making extra fields. The more fields your class has, the more difficult it becomes to maintain and reason about your code. With more fields, each instance of the class will also take up more memory. When revising your code, watch out for fields that are only used in one public method, and/or that are recomputed / cleared every call to a specific method. These can likely be simplified to local variables within the scope of that public method / its private helper methods. Also watch out for redundant fields that can be inferred from the properties of other fields with a simple method call.

Designing a class always comes with its own set of tradeoffs, and deciding on what fields to include can be particularly tricky. In some situations, it might be necessary to make something a field to improve efficiency. You should try your best to choose an appropriate solution that matches what the spec emphasizes and requires.

You should not access private fields from outside the class, except for testing purposes. You could declare a public getting method that returns the value of the field and use it instead.

Private Methods

You might need some private helper methods when implementing your class and you should also name them starting with an underscore (e.g. _private_method). You should not use private methods from outside the class except for testing purposes.

Imports

All packages that you need for completing the take-home assessments will be stated in the spec and you should try to avoid importing extra packages since it might include advanced material that makes the problem trivial to solve (see Advanced Material below).

All import statements should be located at the top of each file, below your file header comment. You should also remove unused imports as warned by flake8.

Efficiency and Redundancy

In general, your code should avoid redundancy and unnecessary computation.

Factoring

If you have lines of code that appear in multiple places in your program, you should consider trying to cut down on redundancy with some kind of factoring. Don't write the same code again if you already have a function that performs that action. Just call the function.

if/else Factoring

# Bad
# Note that there are repeated lines of logic that actually always happen,
# instead of conditionally like how our structure is set up. We can factor
# these out to simplify and clean our code.
if x % 2 == 0:
    print("Hello!")
    print("I love even numbers too.")
    print("See you later!")
else:
    print("Hello!")
    print("I don't like even numbers either.")
    print("See you later!")

# Good
print("Hello!")
if x % 2 == 0:
    print("I love even numbers too.")
else:
    print("I don't like even numbers either.")
print("See you later!")

Boolean Zen

When working with bool values, you should treat them like the True and False that they are instead of comparing them with == and !=. Remember that you can use not to negate a boolean value.

# Bad Example 1
if is_sunny:
    return True
else:
    return False

# Good Example 1
return is_sunny


# Bad Example 2
if is_sunny == True:
    go_hiking()

# Good Example 2
if is_sunny:
    go_hiking()


# Bad Example 3
return is_sunny == False

# Good Example 3
return not is_sunny

Loop Zen

Loop Bounds

When writing loops, choose loop bounds or loop conditions that help generalize code the best. For example, the code before this for loop is unnecessary.

# Bad
l = [1, 2, 3]
total += l[0]
for i in range(1, len(l)):
    total += l[i]

# Good (should just loop over the whole list instead)
for i in range(len(l)):
    total += l[i]

Only Repeated Tasks

If you have something that only happens once, then don't put the code for it inside of your loop.

# Bad
# Note that the mean of the whole list remains unchanged for each loop iteration
# so it should be outside the loop to avoid unnecessary computation
def demean(l):
    """
    Takes in a list of numbers l and returns a new list with the mean
    value of the original list subtracted from each corresponding value
    """
    result = []
    for i in range(len(l)):
        mean = sum(l) / len(l)
        result.append(l[i] - mean)
    return result


# Good
def demean(l):
    result = []
    mean = sum(l) / len(l)
    for i in range(len(l)):
        result.append(l[i] - mean)
    return result

Avoid Unnecessary Cases

Try to avoid making something a special case if unnecessary. Before you make something a special case, think about whether a more general operation in other cases could yield the same result. If so, you should combine those cases.

# Bad
# Note the first case is unnecessary since if a == 0 and b != 0, a / b will
# still return 0.
def divide(a, b):
    """
    Takes in two numbers a, b and returns the result of a / b.
    Returns 0 if b == 0.
    """
    if a == 0:
        return 0
    elif b == 0:
        return 0
    else:
        return a / b


# Good
def divide(a, b):
    if b == 0:
        return 0
    else:
        return a / b

Avoid Looping Over the Same Data Repeatedly

If there are values that you could compute within the same loop iteration, you should avoid writing an extra loop and compute them separately.

# Bad
# Note that the total score for each sex could be computed at the same time
def get_total_for_each_sex(data):
    """
    Takes in data containing scores of students as a list of dictionaries.
    Returns a tuple containing the total score for each sex in the format
    of (male total, female total).
    """
    for row in data:
        if row['sex'] == 'M':
            male_total += row['score']
    for row in data:
        if row['sex'] == 'F':
            female_total += row['score']
    return male_total, female_total


# Good
def get_total_for_each_sex(data):
    for row in data:
        if row['sex'] == 'M':
            male_total += row['score']
        else:
            female_total += row['score']
    return male_total, female_total

Avoid Unnecessary Precomputation

Try to avoid precomputing values unless necessary, especially for functions taking in parameters; you should make use of the given parameter to compute the desired value.

# Bad
def get_average_score_for_sex(data, sex):
    """
    Takes in a DataFrame data containing scores for students and a sex and
    returns the average score for the given sex as a Series. Assume sex only
    takes the value 'M'(male) and 'F'(female).
    """
    male_avg = data[data['sex'] == 'M']
    female_avg = data[data['sex'] == 'F']
    if sex == 'M':
        return male_avg
    else:
        return female_avg


# Good
def get_average_score_for_sex(data, sex):
    return data[data['sex'] == sex]

Avoid Looping with Pandas

When working with pandas, you should not use any loops since loops in Python are inefficient compared to built-in pandas functions. In general, if you feel like using loops when working with pandas, there is probably a way to solve it with the pandas functions introduced in the word bank we provided with the assignments.

# Bad
def get_sum(data):
    """
    Takes in a DataFrame data storing score information and returns the sum of
    all scores.
    """
    total = 0
    for row in data:
        total += data['score']
    return total


# Good
def get_sum(data):
    return data['score'].sum()

Miscellaneous

The following section contains some other miscellaneous requirements.

Global Variable Usage

No global variables are allowed in CSE 163. If you intend to use constants, you should name them following the naming convention specified above.

Remove Debugging Print Statements

You should remove all print statements for debugging instead of commenting them out in your final take-home assessment submission.

Fix All Warnings and Errors in Code

Your code should generate no warnings or errors when run. You should ask for help during office hours if you encounter any error or warning that you don't know how to resolve. You can also refer to this page for a common list of Python errors students encounter and possible causes.

Avoid Modifying Input

Unless otherwise stated, you should avoid modifying the objects passed into your functions. While most of the built-in types (int, float, bool, str, tuple) are immutable so you don't need to worry about those, you should try to avoid modifying mutable data structures (list of dictionaries, DataFrameDataFrame, numpy arrays, etc.) The details of whether a data structure will be modified from certain function calls will be introduced in lessons, but you can always ask clarifying questions on the discussion board if unsure.

# data is a DataFrame
data['new_col'] = ...   # caution: will change the DataFrame stored in data

# data is a list of dictionaries
data.append(...)        # will change the list

# data is a 2D numpy array
data[:, :] = ...        # will change the numpy array

# okay: won't change the original structure, only assigning a new value
# to store in data
data = ...

Advanced Material

There is no automatic deduction for using some advanced feature or using material that we have not covered in class yet, but if it violates the restrictions of the assignment, it is possible you will lose points. It's not possible for us to list out every possible thing you can't use on the assignment, but we can say for sure that you are safe to use anything we have covered in class so far as long as it meets what the specification asks and you are appropriately using it as we showed in class.

For example, some things that are probably okay to use even though we didn't cover them:

  • Using the update method on the set class even though I didn't show it in lecture. It was clear we talked about sets and that you are allowed to use them on future assignments and if you found a method on them that does what you need, it's probably fine as long as it isn't violating some explicit restriction on that assignment.
  • Using something like a ternary operator in Python. This doesn't make a problem any easier, it's just syntax.

For example, some things that are probably not okay to use:

  • Importing some random library that can solve the problem we ask you to solve in one line.
  • If the problem says "don't use a loop" to solve it, it would not be appropriate to use some advanced programming concept like recursion to "get around" that restriction.

These are not allowed because they might make the problem trivially easy or violate what the learning objective of the problem is.

You should think about what the spec is asking you to do and as long as you are meeting those requirements, we will award credit. If you are concerned that an advanced feature you want to use falls in that second category above and might cost you points, then you should just not use it! These problems are designed to be solvable with the material we have learned so far so it's entirely not necessary to go look up a bunch of advanced material to solve them.

tl;dr; We will not be answering every question of "Can I use X" or "Will I lose points if I use Y" because the general answer is "You are not forbidden from using anything as long as it meets the spec requirements. If you're unsure if it violates a spec restriction, don't use it and just stick to what we learned before the assignment was released."