Useful CSE 163 Resources¶
Learning objective: Read, process, and group CSV data to compute descriptive statistics with and without Pandas.
-
hw2_manual.py
is the file for you to put your implementations for solving each problem without Pandas. -
hw2_pandas.py
is the file for you to put your implementations for solving each problem with Pandas. -
hw2_test.py
is the file for you to put your own tests. The Run button executes this program. -
cse163_utils.py
is a helper file that has code to help you test your code. -
pokemon_box.csv
is a large CSV file that stores information about pokemon. -
pokemon_test.csv
is a very small CSV file that stores information about pokemon used for the example cases. -
PandasWordbank.ipynb
is a Jupyter Notebook that reviews all of the Pandas features needed for this assessment. Feel free to edit this Jupyter Notebook to prototype ideas and explore the data. -
pets.csv
is the tiny example dataset used only by thePandasWordbank.ipynb
.
Context¶
In the Pokémon video game series, the player catches pokemon, fictional creatures trained to battle each other as part of a sport franchise. Pokémon exerted significant cultural influence on people who grew up in the late 1990s and early 2000s not only in its country of origin, Japan, but also around the world. More recently, Pokémon Go became a viral hit as hundreds of millions of people played the augmented-reality game at its peak during the summer of 2016. You do not need to understand the details of Pokémon or need to have played the game to do the assignment. All you need to understand is the statistics we provide in our dataset about each pokemon.
The pokemon_box.csv
file stores some imagined data about a player’s pokemon in the following format.
id | name | level | personality | type | weakness | atk | def | hp | stage |
---|---|---|---|---|---|---|---|---|---|
1 | Bulbasaur | 12 | Jolly | Grass | Fire | 45 | 50 | 112 | 1 |
-
id
is a unique numeric identifier corresponding to the species of a pokemon. All pokemon of the same species share the sameid
. -
name
is the name of the species of pokemon, such as Bulbasaur. -
level
is the integer level of the pokemon. -
personality
is a one-word string describing the personality of the pokemon, such as Jolly. -
type
is a one-word string describing the type of the pokemon, such as Grass. -
weakness
is the enemy type that this pokemon is weak toward. Bulbasaur is weak to fire-type pokemon. -
atk
,def
,hp
are integers that indicate the attack power, defense power, and hit points of the pokemon. -
stage
is an integer that indicates the particular developmental stage of the pokemon.
In the Charmander species, Charmander begins at stage 1, evolves into a Charmeleon at stage 2, and finally evolves into Charizard at stage 3.
The problems in this assessment focus on providing descriptive statistics for summarizing the pokemon dataset, such as computing the mean or count of a certain column. Solve each problem in two ways. The examples below shows how these functions can be called- you don’t need to add them to your files.
Without Pandas (using Python built-in data structures plus Python math functions) in hw2_manual.py
. These 6 functions take a list of dictionaries representing the parsed pokemon dataset.
data = parse('pokemon_box.csv')
print('Number of species:', hw2_manual.species_count(data))
print('Highest level pokemon:', hw2_manual.max_level(data))
print('Low-level Pokemon:', hw2_manual.filter_range(data, 1, 9))
print('Average attack for fire types:', hw2_manual.mean_attack_for_type(data, 'fire'))
print('Count of each Pokemon type:')
print(hw2_manual.count_types(data))
print('Average attack for each Pokemon type:')
print(hw2_manual.mean_attack_per_type(data))
With Pandas (plus Python math functions) in hw2_pandas.py
. Do not use any loops or list/dictionary comprehensions. These 6 functions take a Pandas DataFrame
representing the parsed pokemon dataset.
data = pd.read_csv('pokemon_box.csv')
print('Number of species:', hw2_pandas.species_count(data))
print('Highest level pokemon:', hw2_pandas.max_level(data))
print('Low-level Pokemon:', hw2_pandas.filter_range(data, 1, 9))
print('Average attack for fire types:', hw2_pandas.mean_attack_for_type(data, 'fire'))
print('Count of each Pokemon type:')
print(hw2_pandas.count_types(data))
print('Average attack for each Pokemon type:')
print(hw2_pandas.mean_attack_per_type(data))
Assume the data is never empty (there’s at least one pokemon) and that there’s no missing data (each pokemon has every attribute).
Note
Canonically, Pokémon cannot have an attack stat of 0. For this assessment, however, you should NOT make that assumption.
Hint
Remember to use the PandasWordbank.ipynb
Jupyter Notebook to explore the data and work in a more interactive environment. You might even prefer writing all of your code in the notebook and periodically transferring functions over to hw2.py
to run the automated checks.
As with previous assessments, you’ll also check your solutions by adding tests to hw2_test.py
using the assert_equals
function.
-
To help test functions solved without Pandas,
cse163_utils.py
defines aparse
function that takes a filename and returns the dataset as a list of dictionaries. To help test functions solved with Pandas, callpd.read_csv
to return the dataset as aDataFrame
. You can see examples of both functions being used inhw2_test.py
-
Create your own testing CSV file by pressing the + icon in the editor and selecting New file. When specifying file names, use absolute paths, such as
/home/pokemon_test.csv
. Do not use thepokemon_box.csv
file in your own test cases. The file is too large to come up with the correct answer on your own—it’s not valid to run your code and then paste it as the expected output. -
Write at least one test function for each problem and give it a descriptive name that indicates the function being tested, such as
test_species_count
. One test function per problem is fine since both ways of solving the problem should compute the same result. In addition to the providedpokemon_test.csv
example, add one additional test case for each problem (both manual andpandas
implementation) using your own data.
species_count
¶
Task: Write in hw2_manual.py
a function species_count
that takes a parsed pokemon dataset and returns the number of unique pokemon species in the dataset as determined by the name
attribute without using Pandas.
For the pokemon_test.csv
file, species_count(data)
should return 3.
Task: Write in hw2_pandas.py
the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame
.
Task: Write in hw2_test.py
one or more test functions that call species_count
with some CSV files and compares the output of the program with the expected value using assert_equals
. In addition to the pokemon_test.csv
file, add one additional test case for both the manual and pandas
implementation by creating your own CSV file.
max_level
¶
Task: Write in hw2_manual.py
a function max_level
that takes a parsed pokemon dataset and returns a 2-element tuple of the (name, level)
of the pokemon with the highest level
in the dataset. If there is more than one pokemon with the highest level
, return the pokemon that appears first in the file.
For the pokemon_test.csv
file, max_level(data)
should return the 2-element tuple, ('Lapras', 72)
.
Task: Write in hw2_pandas.py
the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame
.
Task: Write in hw2_test.py
one or more test functions that call max_level
with some CSV files and compares the output of the program with the expected value using assert_equals
. In addition to the pokemon_test.csv
file, add one additional test case for both the manual and pandas
implementation by creating your own CSV file.
filter_range
¶
Task: Write in hw2_manual.py
a function filter_range
that takes a parsed pokemon dataset and two numbers: a lower bound (inclusive) and upper bound (exclusive). The function returns a list of the names of pokemon whose level
fall within the bounds in the same order that they appear in the dataset.
For the pokemon_test.csv
file, filter_range(data, 35, 72)
should return ['Arcanine', 'Arcanine', 'Starmie']
. Note that Lapras is not included because the upper bound is exclusive of Lapras, which is exactly level 72.
Task: Write in hw2_pandas.py
the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame
. To convert a Pandas Series
to a list
, use the built-in list
function.
For example:
# data is a DataFrame storing following info:
# name,age,species
# Fido,4,dog
# Meowrty,6,cat
# Chester,1,dog
# Phil,1,axolotl
names = data['name'] # Series
list(names) # ['Fido', 'Meowrty', 'Chester', 'Phil']
row = data.loc[1] # Series
list(row) # ['Meowrty', 6, 'cat']
Task: Write in hw2_test.py
one or more test functions that call filter_range
with some CSV files and compares the output of the program with the expected value using assert_equals
. In addition to the pokemon_test.csv
file, add one additional test case for both the manual and pandas
implementation by creating your own CSV file.
mean_attack_for_type
¶
Task: Write in hw2_manual.py
a function mean_attack_for_type
that takes a parsed pokemon dataset and a str
representing the pokemon type
. The function returns the average atk
for all the pokemon in the dataset with the given type
. If there are no pokemon of the given type
, return None
.
For the pokemon_test.csv
file, mean_attack_for_type(data, 'fire')
should return 47.5.
Task: Write in hw2_pandas.py
the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame
.
Task: Write in hw2_test.py
one or more test functions that call mean_attack_for_type
with some CSV files and compares the output of the program with the expected value using assert_equals
. In addition to the pokemon_test.csv
file, add one additional test case for both the manual and pandas
implementation by creating your own CSV file.
count_types
¶
Task: Write in hw2_manual.py
a function count_types
that takes a parsed pokemon dataset and returns a dictionary representing for each pokemon type
the number of pokemon of that type
. The order of the keys in the returned dictionary does not matter.
For the pokemon_test.csv
file, count_types(data)
should return {'fire': 2, 'water': 2}
.
Task: Write in hw2_pandas.py
the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame
. To convert a Pandas Series
to a dict
, use the built-in dict
function. The dictionary keys are determined by the series index.
For example:
# data is a DataFrame storing following info:
# name,age,species
# Fido,4,dog
# Meowrty,6,cat
# Chester,1,dog
# Phil,1,axolotl
names = data['name'] # Series
dict(names) # {0: 'Fido', 1: 'Meowrty', 2: 'Chester', 3: 'Phil'}
row = data.loc[1] # Series
dict(row) # {'name': 'Meowrty', 'age': 6, 'species': 'cat'}
Task: Write in hw2_test.py
one or more test functions that call count_types
with some CSV files and compares the output of the program with the expected value using assert_equals
. In addition to the pokemon_test.csv
file, add one additional test case for both the manual and pandas
implementation by creating your own CSV file.
mean_attack_per_type
¶
Task: Write in hw2_manual.py
a function mean_attack_per_type
that takes a parsed pokemon dataset and returns a dictionary representing for each pokemon type
the average atk
of pokemon of that type
. The order of the keys in the returned dictionary does not matter.
For the pokemon_test.csv
file, mean_attack_per_type(data)
should return {'fire': 47.5, 'water': 140.5}
.
Task: Write in hw2_pandas.py
the same function using Pandas. Rather than take a list of dictionaries, this takes a Pandas DataFrame
.
Task: Write in hw2_test.py
one or more test functions that call mean_attack_per_type
with some CSV files and compares the output of the program with the expected value using assert_equals
. In addition to the pokemon_test.csv
file, add one additional test case for both the manual and pandas
implementation by creating your own CSV file.
Quality¶
Assessment submissions should pass these checks: flake8
, and code quality guidelines. The code quality guidelines are very thorough. For this assessment, the most relevant rules can be found in these sections (new sections bolded):
-
-
Boolean Zen
-
Loop Zen
-
Factoring
-
Unnecessary Cases
-
Avoid Looping with Pandas
-
Reminder
Make sure to provide descriptive comments in docstring format!
Submission¶
Submit your work by pressing the Mark button. Submit as often as you want until the deadline for the initial submission. Note that we will only grade your most recent submission. You can view your past submissions using the “Submissions” button.
Please make sure you are familiar with the resources and policies outlined in the syllabus and the take-home assessments page.
THA 2 - Pokemon
Initial Submission by Thursday 07/14 at 11:59 pm.
Submit on Ed