Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Learning objective: Read, process, and group CSV data to compute descriptive statistics with and without Pandas.

Setting Up

You can find the starter code for this homework on JupyterHub.

Context

In the Pokémon video game series, the player catches pokemon, fictional creatures trained to battle each other as part of a sport franchise. Pokémon exerted significant cultural influence on people who grew up in the late 1990s and early 2000s not only in its country of origin, Japan, but also around the world. More recently, Pokémon Go became a viral hit as hundreds of millions of people played the augmented-reality game at its peak during the summer of 2016. You do not need to understand the details of Pokémon or need to have played the game to do the assignment. All you need to understand is the statistics we provide in our dataset about each pokemon.

The pokemon_box.csv file stores some imagined data about a player’s pokemon in the following format.

idnamelevelpersonalitytypeweaknessatkdefhpstage
1Bulbasaur12JollyGrassFire45501121

In the Charmander species, Charmander begins at stage 1, evolves into a Charmeleon at stage 2, and finally evolves into Charizard at stage 3.

Charmander species evolution

The problems in this homework focus on providing descriptive statistics for summarizing the pokemon dataset, such as computing the mean or count of a certain column. You will be asked to solve each problem in two ways.

Here are some examples about how these functions can be called-- you don’t need to add them to your files:

Python functions:

data = parse('pokemon_box.csv')

print('Number of species:', hw2_python.species_count(data))
print('Highest level pokemon:', hw2_python.max_level(data))
print('Low-level Pokemon:', hw2_python.filter_range(data, 1, 9))
print('Average attack for fire types:', hw2_python.mean_attack_for_type(data, 'fire'))

Pandas functions:

data = pd.read_csv('pokemon_box.csv')

print('Number of species:', hw2_pandas.species_count(data))
print('Highest level pokemon:', hw2_pandas.max_level(data))
print('Low-level Pokemon:',  hw2_pandas.filter_range(data, 1, 9))
print('Average attack for fire types:', hw2_pandas.mean_attack_for_type(data, 'fire'))

Assume the data is never empty (there’s at least one pokemon) and that there’s no missing data (each pokemon has every attribute).

Note: Canonically, Pokémon cannot have an attack stat of 0. For this homework, however, you should NOT make that assumption.

Hint: Remember to use the PandasWordbank.ipynb Jupyter Notebook to explore the data and work in a more interactive environment. You might even prefer writing all of your code in the notebook and periodically transferring functions over to your submission files to run the automated checks.

Testing

As with Homework 1, you’ll also check your solutions by adding tests to hw2_test.py using assert statements.

Note: Do not use the pokemon_box.csv file in your own test cases. The file is too large to come up with the correct answer on your own—it’s not valid to run your code and then paste it as the expected output.

Programming

main

Task: Begin by taking a look at how the provided main in hw2_test.py is set up. This function calls both the parse and pd.read_csv functions on one of the Pokemon datasets. You can change this file name, or add more file name constants, at the top of hw2_test.py.

Once you begin writing your testing functions for each task, you can feed these two variables as parameters to each testing function. As a reminder, you should have a single testing function for both the Python and Pandas version of each task. Inside of each testing function, you can feed the result of parse to your Python solution and the result of pd.read_csv to your Pandas solution.

Note: You have already been provided with the function headers and type annotations in hw2_python.py and hw2_pandas.py due to their complexity. Use these, along with the spec, as guidance on the behavior of each function, as well as what they take in and return. You still need to define the function headers and type annotations in hw2_test.py however. In future Homeworks, you will be expected to provide the headers and annotations for a larger portion of the assignment. You must complete the docstrings for all functions, as they have not been provided for you! Don’t forget to include a module header at the top of each file as well.

species_count

Task: Write in hw2_python.py a function species_count that takes a parsed pokemon dataset (a list of dictionaries) and returns the number of unique pokemon species in the dataset as determined by the name attribute without using Pandas.

For the pokemon_test.csv file, species_count(data) should return 3.

Task: Write in hw2_pandas.py the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame.

Info: For type annotating a parameter of type DataFrame, use the type pd.DataFrame (assuming pandas is imported as pd).

Task: Write in hw2_test.py one or more test functions that call species_count with some CSV files and compares the output of the program with the expected value using assert. In addition to the pokemon_test.csv file, add one additional test case for both the Python and pandas implementation by creating your own CSV file.

max_level

Task: Write in hw2_python.py a function max_level that takes a parsed pokemon dataset and returns a 2-element tuple of the (name, level) of the pokemon with the highest level in the dataset. If there is more than one pokemon with the highest level, return the pokemon that appears first in the file.

For the pokemon_test.csv file, max_level(data) should return the 2-element tuple, ('Lapras', 72).

Task: Write in hw2_pandas.py the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame.

Task: Write in hw2_test.py one or more test functions that call max_level with some CSV files and compares the output of the program with the expected value using assert. In addition to the pokemon_test.csv file, add one additional test case for both the Python and pandas implementation by creating your own CSV file.

filter_range

Task: Write in hw2_python.py a function filter_range that takes a parsed pokemon dataset and two integers: a lower bound (inclusive) and upper bound (exclusive). The function returns a list of the names of pokemon whose level fall within the bounds in the same order that they appear in the dataset.

For the pokemon_test.csv file, filter_range(data, 35, 72) should return ['Arcanine', 'Arcanine', 'Starmie']. Note that Lapras is not included because the upper bound is exclusive of Lapras, which is exactly level 72.

Task: Write in hw2_pandas.py the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame. To convert a Pandas Series to a list, use the built-in list function.

For example:

# data is a DataFrame storing following info:
# name,age,species
# Fido,4,dog
# Meowrty,6,cat
# Chester,1,dog
# Phil,1,axolotl
names = data['name']  # Series
list(names)  # ['Fido', 'Meowrty', 'Chester', 'Phil']
row = data.loc[1]  # Series
list(row)  # ['Meowrty', 6, 'cat']

Task: Write in hw2_test.py one or more test functions that call filter_range with some CSV files and compares the output of the program with the expected value using assert. In addition to the pokemon_test.csv file, add one additional test case for both the Python and pandas implementation by creating your own CSV file.

mean_attack_for_type

Task: Write in hw2_python.py a function mean_attack_for_type that takes a parsed pokemon dataset and a str representing the pokemon type. The function returns the average atk for all the pokemon in the dataset with the given type. If there are no pokemon of the given type, return None.

For the pokemon_test.csv file, mean_attack_for_type(data, 'fire') should return 47.5.

Task: Write in hw2_pandas.py the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame.

Task: Write in hw2_test.py one or more test functions that call mean_attack_for_type with some CSV files and compares the output of the program with the expected value using assert. In addition to the pokemon_test.csv file, add one additional test case for both the Python and pandas implementation by creating your own CSV file.

Quality

Homework submissions should pass these checks: flake8, and code quality guidelines. The code quality guidelines are very thorough. For this homework, the most relevant rules can be found in these sections (new sections bolded):

Reminder: Make sure to provide descriptive comments in docstring format including at the top of every module you write, as well as docstrings for each function you write!

Submission

Submit your work by uploading the following files to Gradescope:

Submit as often as you want until the deadline for the initial submission. Note that we will only grade your most recent submission.

Please make sure you are familiar with the resources and policies outlined in the syllabus.