Some of the requirements are handled byflake8
while others requires you to double check them manually. All the code files you submitted are expected to passflake8
and follow the code quality guidelines as outlined below
Note: Other programming languages (e.g. Java) have different conventions for naming. Since we are using Python in this class, you are expected to use the Python conventions.
Your variable names should be descriptive, concise, and lowercase with words separated
by underscores (snake_case
). You should try to not use single letter names for variables
because a single letter usually don't describe the values contained in a variable very well,
but they are fine for loop variables. You should also avoid using Python keywords, such as
class
, print
, import
, or return
, or names
of built-in functions or types, such as len
(those will be highlighted in the code editor).
A list of keywords can be found by typing the following into the Python interpreter:
import keyword
keyword.kwlist
Examples:
# Good variable names
factor
total_weight
data1 # Okay to have a number right after a word
# Bad variable names
x # Most likely not descriptive enough,
# except for something like an x, y coordinate
totalWeight # Not using snake_case
NewImage # Not using snake_case
newresult # Not separating words with underscores
return # Python keyword
len # Python built-in function
Your function names should be concise, descriptive, and lowercase with words separated by underscores
(snake_case
). Just like variable names, you should also try to avoid using names of built-in
Python functions. We usually specify the function names in the spec and you should make sure that you name
your functions accordingly.
In some of the assignments, we ask you to write test functions to test your code. The name of your test function should clearly indicate which function it is testing.
Examples:
# Good function names
get_height(person)
plot_population(df)
# Bad function names
setScore(record) # not using snake_case
max() # will overwrite the built-in max function in Python
Your class names should be in CapitalCase
.
Examples:
# Good class names
University
DataFrame
EdPost
# Bad class names
dataFrame
ed_post
Your file/module names should be in snake_case
.
Examples: main.py
, ed_post.py
.
Your constant names should be in ALL_CAPITAL_CASE_WITH_UNDERSCORES
.
Constants are values defined outside the function declarations
and their values are not meant to be changed. People usually define
constants with descriptive names to replace "magic values" in their
code to make it more readable. Note that constants should be light-weight
in the sense that they are not supposed to store big objects
(e.g. DataFrame
s). In CSE 163, the use of constants is optional
in test files, but not required.
Examples:
# Bad- constants should be light-weight
DATA = pd.read_csv("data.csv")
# Good
DATA = "data.csv"
def main():
data = pd.read_csv(DATA)
# Do something with data
if __name__ == '__main__':
main()
Note: most of the whitespace requirements below should be handled by flake8
-
you will receive a warning if you are not meeting some of the requirements.
Unlike other languages that use explicit delimiters (like {}
in Java)
to determine what goes inside a loop or function, Python only uses indentation.
It is extremely important to properly indent your code because it might lead to
unexpected bugs in your program. If you see an error saying
IndentationError: unexpected indent
, it means there is an error in your
indentation.
In general, there should be two blank lines separating code for different functions and function definitions from other code. The only exception are the methods defined within the same class (see Classes and Objects below). Try to minimize blank lines within function definitions, except when separating complex chunks of code/logic to provide readability
Good Example 1 (two blank lines separating functions):
# Other code
def the_first_function():
# Implementation for the first function
def the_second_function():
# Implementation for the second function
# Other code
Good Example 2 (one blank line separating "complex" chunks of code):
def compute_avg(x, y, z):
"""
Sums the given three numbers and returns the average.
"""
sum_val = x + y + z
# compute and return the average value
result = sum_val / 3.0
return result
# other code
Include spaces between mathematical and logical operators and other elements in an expression.
Mathematical operators include +
, -
, *
, /
,
and **
. Logical operators include ==
, <=
, >=
,
<
and >
.
Limit space delimiters to 1 space, to avoid unnecessary whitespace. The exception to this is parentheses,
which can be directly adjacent to whatever they are enclosing.
Examples:
# Good
x + y
(sum ** 2) + 4 * val - 1
x * (4 + 6)
b + math.sqrt(4 * max_val)
# Bad
x+y # not enough spaces
(sum**2)+4*val-1 # not enough spaces
x * ( 4 + 6 ) # unnecessary spaces around parentheses
b + math.sqrt( 4 * max_val) # inconsistent spacing
Note that this convention does not apply when specifying the value for a default parameter
or passing in parameters by name when calling a function. In general you should not put
space around the =
for those cases.
def add_three_nums(a, b, c=10):
return a + b + c
add_three_nums(b=20, a=20, c)
Avoid adding extra space(s) between a function name and its associated parameter list. Using the space suggests--incorrectly--that the parenthesis are for grouping an expression, when in fact they are for calling the function.
You should, however, include spaces between individual parameters in the parameter list. This makes your function definitions and calls more readable.
Examples:
# Good
x = math.sqrt(n)
range_vals = range(n, 4)
# Bad
x = math.sqrt (n) # too much space before the parenthesis
range_vals = range(n,4) # no spaces between parameters
According to flake8
, the maximum number of characters that you
should have on a given line is 79 characters. You should try to avoid writing
code with long lines, but here are some common ways to break a long line:
# Calling a function with many arguments
some_function(first_arg="This is a function",
second_arg="With many arguments",
third_arg="indent until everything lines up")
# Breaking a long expression
# (use \ in the end and indent once on the second line)
total = first_num + second_num + third_num + \
fourth_num
# Breaking a long string
print("When you have a very very very long string, "
"this is how you could break it properly")
# Breaking a long DataFrame filter into separate variables
# This line is too long and less readable
data = data[(data['primary_color'] == 'Magenta') & (data['size'] != 'Large')]
# You can do the following instead
is_magenta = data['primary_color'] == 'Magenta'
is_not_large = data['size'] != 'Large'
data = data[is_magenta & is_not_large]
For most of the assignments, when we say that certain files should use the main method pattern, it means those files should follow the structure below:
# Implementation for functions specified in the HW spec
def main():
# Code calling all functions that you implemented
if __name__ == '__main__':
main()
For every Python file you write, you should include a header comment
at the top of a file with your name, section, and a brief description
of what the program in the file does. The header comment should be in
doc-string ("""
or '''
) format.
Each function should contain a doc-string describing what the function does,
right below the function definition. It should describe any parameters the
function takes and values it returns (if any). If the spec requires you to handle
some special cases (like case sensitivity, or special returns values for a certain
input), you should also mention that in the comments as well. It is okay to not
include a comment for you main
method
Also, long blocks of code with a particular purpose or small bits of particularly
complex code could be labeled/explained by a comment (using #
) on
the preceding line(s) to help with readability.
Example:
"""
Soham Pardeshi
CSE 163 AX
This program implements a function that adds two numbers
"""
def add_two_numbers(a, b, return_zero=False):
"""
This function returns the sum of the two given numbers a, b
if return_zero is False. Otherwise returns 0.
return_zero has False as its default value.
"""
if return_zero:
return 0
else:
return a + b
Try to avoid commenting on information that is unnecessary or containing too much implementation detail (i.e. "initialize variable" or "nested for loop" or "increment count"). Your function comments should describe what the program does (its behavior), not how.
Example:
# Bad
# Note that the usage of a set and a loop are implementation details,
# as in they tell you HOW the method works.
# Returning the length of the set doesn't mean much to the clients.
def count_unique_characters(file_name):
"""
Opens the given file, loops through each character and adds them
to a set. Returns the length of the set.
"""
result = set()
with open(file_name) as f:
content = f.read()
for c in content:
result.add(c)
return len(result)
# Good
def count_unique_characters(file_name):
"""
Takes in a file name and returns the number of unique characters within
the given file.
"""
result = set()
with open(file_name) as f:
content = f.read()
for c in content:
result.add(c)
return len(result)
One way to think about it is if you are the client of the code and you can only see the function declaration and the comment, the comment should contain all the necessary information that you need in order to properly use the function. You probably won't care whether the function is using a for loop, while loop, or just some if-else statements, but you will need to know the input parameters and the expected behavior of the function, especially under special cases.
In some of the assignments, we ask you to write a test program to test your implementation. Your test files should meet the same style requirements as specified above, including:
snake_case
).
add_two_numbers
, the test function should be named
test_add_two_numbers
.
add_two_numbers
, the comment
can simply be "Tests the function add_two_numbers
".
assert_equals
. You are highly encouraged
to add more test cases to test your functions more comprehensively.
main
.
When implementing classes and objects in Python, you should follow the style requirements as specified above with some extra requirements:
When you are implementing a class, you should also include a doc-string comment describing the class right below the class definition.
Example:
"""
Soham Pardeshi
Section AA
This file contains the EdPost class
"""
class EdPost:
"""
Represents an Ed post with a title, a tag, and a list
of comments. Contains functionality to allow users to
modify this post.
"""
# Methods for EdPost
All your fields should be declared as private. In Python, it means
starting your field name with an underscore (_
). For example,
instead of self.field_name
, you should always have
self._field_name
.
You should also avoid making extra fields. The more fields your class has, the more difficult it becomes to maintain and reason about your code. With more fields, each instance of the class will also take up more memory. When revising your code, watch out for fields that are only used in one public method, and/or that are recomputed / cleared every call to a specific method. These can likely be simplified to local variables within the scope of that public method / its private helper methods. Also watch out for redundant fields that can be inferred from the properties of other fields with a simple method call.
Designing a class always comes with its own set of tradeoffs, and deciding on what fields to include can be particularly tricky. In some situations, it might be necessary to make something a field to improve efficiency. You should try your best to choose an appropriate solution that matches what the spec emphasizes and requires.
You should not access private fields from outside the class, except for testing purposes. You could declare a public getting method that returns the value of the field and use it instead.
You might need some private helper methods when implementing your class
and you should also name them starting with an underscore
(e.g. _private_method
). You should not use private methods
from outside the class except for testing purposes.
All packages that you need for completing the take-home assessments will be stated in the spec and you should try to avoid importing extra packages since it might include advanced material that makes the problem trivial to solve (see Advanced Material below).
All import statements should be located at the top of each file, below your
file header comment. You should also remove unused imports as warned by
flake8
.
In general, your code should avoid redundancy and unnecessary computation.
if/else
Factoring# Bad
# Note that there are repeated lines of logic that actually always happen,
# instead of conditionally like how our structure is set up. We can factor
# these out to simplify and clean our code.
if x % 2 == 0:
print("Hello!")
print("I love even numbers too.")
print("See you later!")
else:
print("Hello!")
print("I don't like even numbers either.")
print("See you later!")
# Good
print("Hello!")
if x % 2 == 0:
print("I love even numbers too.")
else:
print("I don't like even numbers either.")
print("See you later!")
When working with bool
values, you should treat them like the
True
and False
that they are instead of comparing them
with ==
and !=
. Remember that you can use not
to negate a boolean value.
# Bad Example 1
if is_sunny:
return True
else:
return False
# Good Example 1
return is_sunny
# Bad Example 2
if is_sunny == True:
go_hiking()
# Good Example 2
if is_sunny:
go_hiking()
# Bad Example 3
return is_sunny == False
# Good Example 3
return not is_sunny
When writing loops, choose loop bounds or loop conditions that help
generalize code the best. For example, the code before this
for
loop is unnecessary.
# Bad
l = [1, 2, 3]
total += l[0]
for i in range(1, len(l)):
total += l[i]
# Good (should just loop over the whole list instead)
for i in range(len(l)):
total += l[i]
If you have something that only happens once, then don't put the code for it inside of your loop.
# Bad
# Note that the mean of the whole list remains unchanged for each loop iteration
# so it should be outside the loop to avoid unnecessary computation
def demean(l):
"""
Takes in a list of numbers l and returns a new list with the mean
value of the original list subtracted from each corresponding value
"""
result = []
for i in range(len(l)):
mean = sum(l) / len(l)
result.append(l[i] - mean)
return result
# Good
def demean(l):
result = []
mean = sum(l) / len(l)
for i in range(len(l)):
result.append(l[i] - mean)
return result
When working with pandas
, you should not use any
loops since loops in Python are inefficient compared to built-in
pandas
functions. In general, if you feel like using
loops when working with pandas
, there is probably a way
to solve it with the pandas
functions introduced in the
word bank we provided with the assignments.
# Bad
def get_sum(data):
"""
Takes in a DataFrame data storing score information and returns the sum of
all scores.
"""
total = 0
for row in data:
total += data['score']
return total
# Good
def get_sum(data):
return data['score'].sum()
Try to avoid making something a special case if unnecessary. Before you make something a special case, think about whether a more general operation in other cases could yield the same result. If so, you should combine those cases.
# Bad
# Note the first case is unnecessary since if a == 0 and b != 0, a / b will
# still return 0.
def divide(a, b):
"""
Takes in two numbers a, b and returns the result of a / b.
Returns 0 if b == 0.
"""
if a == 0:
return 0
elif b == 0:
return 0
else:
return a / b
# Good
def divide(a, b):
if b == 0:
return 0
else:
return a / b
If there are values that you could compute within the same loop iteration, you should avoid writing an extra loop and compute them separately.
# Bad
# Note that the total score for each sex could be computed at the same time
def get_total_for_each_sex(data):
"""
Takes in data containing scores of students as a list of dictionaries.
Returns a tuple containing the total score for each sex in the format
of (male total, female total).
"""
for row in data:
if row['sex'] == 'M':
male_total += row['score']
for row in data:
if row['sex'] == 'F':
female_total += row['score']
return male_total, female_total
# Good
def get_total_for_each_sex(data):
for row in data:
if row['sex'] == 'M':
male_total += row['score']
else:
female_total += row['score']
return male_total, female_total
Try to avoid precomputing values unless necessary, especially for functions taking in parameters; you should make use of the given parameter to compute the desired value.
# Bad
def get_average_score_for_sex(data, sex):
"""
Takes in a DataFrame data containing scores for students and a sex and
returns the average score for the given sex as a Series. Assume sex only
takes the value 'M'(male) and 'F'(female).
"""
male_avg = data[data['sex'] == 'M']
female_avg = data[data['sex'] == 'F']
if sex == 'M':
return male_avg
else:
return female_avg
# Good
def get_average_score_for_sex(data, sex):
return data[data['sex'] == sex]
The following section contains some other miscellaneous requirements.
No global variables are allowed in CSE 163. If you intend to use constants, you should name them following the naming convention specified above.
global_variable = "cat" # BAD!
GLOBAL_CONSTANT = "dog" # OK sometimes (see above)
def main():
print(global_variable) # this is forbidden
print(GLOBAL_CONSTANT) # this is ok if the variable is light-weight
if '__name__' == '__main__':
main()
You should remove all print statements for debugging instead of commenting them out in your final take-home assessment submission.
Your code should generate no warnings or errors when run. You should ask for help during office hours if you encounter any error or warning that you don't know how to resolve. You can also refer to this page for a common list of Python errors students encounter and possible causes.
Unless otherwise stated, you should avoid modifying the objects passed into
your functions. While most of the built-in types (int
, float
,
bool
, str
, tuple
) are immutable so
you don't need to worry about those, you should try to avoid modifying mutable
data structures (list of dictionaries, DataFrame
DataFrame,
numpy
arrays, etc.) The details of whether a data structure will
be modified from certain function calls will be introduced in lessons, but you
can always ask clarifying questions on the discussion board if unsure.
# data is a DataFrame
data['new_col'] = ... # caution: will change the DataFrame stored in data
# data is a list of dictionaries
data.append(...) # will change the list
# data is a 2D numpy array
data[:, :] = ... # will change the numpy array
# okay: won't change the original structure, only assigning a new value
# to store in data
data = ...
There is no automatic deduction for using some advanced feature or using material that we have not covered in class yet, but if it violates the restrictions of the assignment, it is possible you will lose points. It's not possible for us to list out every possible thing you can't use on the assignment, but we can say for sure that you are safe to use anything we have covered in class so far as long as it meets what the specification asks and you are appropriately using it as we showed in class.
For example, some things that are probably okay to use even though we didn't cover them:
update
method on the set
class even though I didn't show it in lecture. It was clear we talked about sets and that you are allowed to use them on future assignments and if you found a method on them that does what you need, it's probably fine as long as it isn't violating some explicit restriction on that assignment.
For example, some things that are probably not okay to use:
These are not allowed because they might make the problem trivially easy or violate what the learning objective of the problem is.
You should think about what the spec is asking you to do and as long as you are meeting those requirements, we will award credit. If you are concerned that an advanced feature you want to use falls in that second category above and might cost you points, then you should just not use it! These problems are designed to be solvable with the material we have learned so far so it's entirely not necessary to go look up a bunch of advanced material to solve them.
tl;dr; We will not be answering every question of "Can I use X" or "Will I lose points if I use Y" because the general answer is "You are not forbidden from using anything as long as it meets the spec requirements. If you're unsure if it violates a spec restriction, don't use it and just stick to what we learned before the assignment was released."