In this part of the homework, you will write code to parse the data from a file and perform various analytical operations on that data. Sounds a bit like Déjà vu! The functions you write for this assignment are exactly the same, but instead you will be using the pandas
library to solve the functions.
hw2_pandas.py
math
and pandas
modules, but you may not use any other imports to solve these problems.pandas
objects. The one exception is when a function asks you to return a Python list/dictionary and you need to convert from a pandas
object to a Python list/dictionary; if this is the case you are allowed to have one loop at the end of the function to build up the list/dictionary, but this loop should not have any real logic in it besides moving values from the pandas
structure to the list/dictionary. The goal of this part of the assignment is to use pandas
as a tool to help answer questions about your dataset.First, you will implement a function called parse
to help process the CSV file into a more useful structure for analysis. This function should take a file name of a CSV as an argument and should return a DataFrame
with contents of the file. This function is different than the one in Part 1 since it does not need to take a list of integer columns.
For this step of the assignment, you will be implementing various functions to answer questions about the dataset. All of the problems are the same as Part 0 save for the fact that the input is now a DataFrame
; you can check back to that page to see any examples you might need for this part.
Each function should take the DataFrame
returned by the parse
function as the first argument, along with any other arguments specified in each problem. For example, for the third function, we would call filter_range(data, 1, 10)
where data
was the DataFrame
returned by parse
. This data structure should not be modified by any function you write. Every problem that deals with strings should be case-sensitive (this means "chArIzard" is a different species than "Charizard"). You may assume the DataFrame
is non-empty for all functions you implement. For each problem, you may assume we pass parameters of the expected types described for that problem and that those parameters are not None
. You should make no other assumptions about the parameters or the data.
pandas
wordbankspecies_count
Write a function species_count
that returns the number of unique Pokemon species (determined by the name
attribute) found in the dataset. You may assume that the data is well formatted in the sense that you don't have to transform any values in the name
column.
max_level
Write a function max_level
that finds the Pokemon with the max level and returns a tuple of length 2, where the first element is the name
of the Pokemon and the second is its level
. If there is a tie, the Pokemon that appears earlier in the file should be returned.
filter_range
Write a function called filter_range
that takes as arguments a smallest (inclusive) and largest (exclusive) level value and returns a list of Pokemon names having a level within that range. The list should return the species names in the same order that they appear in the provided list of dictionaries.
Note that you will want to return a Python list for this function so you will have to convert from a pandas
object to a list.
mean_attack_for_type
Write a function called mean_attack_for_type
that takes a Pokemon type (string) as an argument and that returns the average attack stat for all the Pokemon in the dataset with that type.
If there are no Pokemon of the given type, this function should return None
.
count_types
Write a function called count_types
that returns a dictionary with keys that are Pokemon types and values that are the number of times that type appears in the dataset.
The order of the keys in the dictionary does not matter.
Note that you will want to return a Python dictionary for this function so you will have to convert from a pandas
object to a dictionary.
highest_stage_per_type
Write a function called highest_stage_per_type
that calculates the largest stage reached for each type of Pokemon in the dataset. This function should return a dictionary that has keys that are the Pokemon types and values that are the highest value of stage column for that type of Pokemon.
The order of the keys in the returned dictionary does not matter.
Note that you will want to return a Python dictionary for this function so you will have to convert from a pandas
object to a dictionary.
mean_attack_per_type
Write a function called mean_attack_per_type
that calculates the average attack for every type of Pokemon in the dataset. This function should return a dictionary that has keys that are the Pokemon types and values that are the average attack for that Pokemon type.
The order of the keys in the dictionary does not matter.
Note that you will want to return a Python dictionary for this function so you will have to convert from a pandas
object to a dictionary.