Pandas Practice Continued & Data Visualization¶
In this lession, we will practice using groupby
and data visualization tools learned this week.
import pandas as pd
import seaborn as sns
Our Dataset!¶
Run the cell below to see the DataFrame you'll be working with today.
scoreboard = pd.DataFrame({
"Player": ["Arona", "Hannah", "Renusree", "Arpan", "Mia", "Asmi", "Alyssa", "Vani", "Vatsal",
"Jasmine", "Kevin", "Alessia"],
"FavoriteTrack": ["Bowser's Castle", "Rainbow Road", "Toad Harbor", "Big Blue", "Cheese Land",
"Toad Harbor", "Mario Circuit", "Bowser's Castle", "Mario Circuit",
"Coconut Mall", "Mario Circuit", "Cheese Land"],
"Coins": [9, 8, 8, 9, 9, 10, 9, 7, 9, 8, 9, 10],
"Mushrooms": [2, 0, 3, 1, 2, 2, 0, 3, 3, 1, 2, 3],
"TopSpeed": [150, 70, 60, 125, 30, 20, 80, 94, 10, 77, 23, 49],
"Character": ["Monty Mole", "Yoshi", "Luigi", "Blue Toad", "Toadette", "Princess Peach",
"Princess Daisy", "Waluigi", "King Boo", "Bowser", "Mario", "Wario"],
"Drivetrain": ["Bike", "Car", "4 wheeler", "Car", "Stroller", "4 wheeler", "Car", "Bike",
"Stroller", "4 wheeler", "Bike", "Bike"],
"Playstyle": ["Aggressive", "Aggressive", "Resourceful", "Speedster", "Resourceful", "Resourceful",
"Balanced", "Aggressive", "Balanced", "Balanced", "Balanced", "Resourceful"]
})
scoreboard
Player | FavoriteTrack | Coins | Mushrooms | TopSpeed | Character | Drivetrain | Playstyle | |
---|---|---|---|---|---|---|---|---|
0 | Arona | Bowser's Castle | 9 | 2 | 150 | Monty Mole | Bike | Aggressive |
1 | Hannah | Rainbow Road | 8 | 0 | 70 | Yoshi | Car | Aggressive |
2 | Renusree | Toad Harbor | 8 | 3 | 60 | Luigi | 4 wheeler | Resourceful |
3 | Arpan | Big Blue | 9 | 1 | 125 | Blue Toad | Car | Speedster |
4 | Mia | Cheese Land | 9 | 2 | 30 | Toadette | Stroller | Resourceful |
5 | Asmi | Toad Harbor | 10 | 2 | 20 | Princess Peach | 4 wheeler | Resourceful |
6 | Alyssa | Mario Circuit | 9 | 0 | 80 | Princess Daisy | Car | Balanced |
7 | Vani | Bowser's Castle | 7 | 3 | 94 | Waluigi | Bike | Aggressive |
8 | Vatsal | Mario Circuit | 9 | 3 | 10 | King Boo | Stroller | Balanced |
9 | Jasmine | Coconut Mall | 8 | 1 | 77 | Bowser | 4 wheeler | Balanced |
10 | Kevin | Mario Circuit | 9 | 2 | 23 | Mario | Bike | Balanced |
11 | Alessia | Cheese Land | 10 | 3 | 49 | Wario | Bike | Resourceful |
Group Activity¶
Find the Player
s with the most Coins
in each Drivetrain
.
# TODO: Who has the most coins per Drivetrain group?
scoreboard.loc[scoreboard.groupby('Drivetrain')['Coins'].idxmax(), ['Player', 'Drivetrain', 'Coins']]
Player | Drivetrain | Coins | |
---|---|---|---|
5 | Asmi | 4 wheeler | 10 |
11 | Alessia | Bike | 10 |
3 | Arpan | Car | 9 |
4 | Mia | Stroller | 9 |
Count how many Player
s in each Playstyle
category like each FavoriteTrack
.
# TODO: Which Playstyle likes Cheese Land the most?
scoreboard.groupby(['Playstyle', 'FavoriteTrack']).size()
Playstyle FavoriteTrack Aggressive Bowser's Castle 2 Rainbow Road 1 Balanced Coconut Mall 1 Mario Circuit 3 Resourceful Cheese Land 2 Toad Harbor 2 Speedster Big Blue 1 dtype: int64
Create both a line plot and a scatter plot to visualize TopSpeed
trends by Playstyle
. Compare the effectiveness of each in identifying patterns or outliers.
# TODO: Visualize!
sns.relplot(data=scoreboard, x='Playstyle', y='TopSpeed', kind='line')
sns.relplot(data=scoreboard, x='Playstyle', y='TopSpeed', kind='scatter')
<seaborn.axisgrid.FacetGrid at 0x7f65efcbc890>
What if we wanted to set the index of our DataFrame to be 2 columns? Set the indices to Drivetrain
and FavoriteTrack
and then find all the Bike
rs who like Bowser's Castle
.
# TODO: Which Bikers like Bowser's?
sns.catplot(data=scoreboard, x='Playstyle', y='TopSpeed', kind='bar', hue='Drivetrain')
<seaborn.axisgrid.FacetGrid at 0x7f65efd31290>
Whole Class Activity¶
Write a function players_above_average
that calculates the average Coins
for each Playstyle
, then lists the
Player
s whose Coins
are above the average of the inputted Playstyle
.
# Run this cell once before writing your function!
scoreboard.reset_index(inplace=True)
# TODO: Identify standout TAs!
def players_above_average(data, playstyle):
# find average coins for each playstyle
avg_score = data.groupby("Playstyle")["Coins"].mean()
# find average of the inputted playstyle
playstyle_avg_score = avg_score[playstyle]
# return filtered dataframe
above_avg_players = data[
(data["Playstyle"] == playstyle) & (data["Coins"] > playstyle_avg_score)
]
return above_avg_players
players_above_average(scoreboard, "Resourceful")
index | Player | FavoriteTrack | Coins | Mushrooms | TopSpeed | Character | Drivetrain | Playstyle | |
---|---|---|---|---|---|---|---|---|---|
5 | 5 | Asmi | Toad Harbor | 10 | 2 | 20 | Princess Peach | 4 wheeler | Resourceful |
11 | 11 | Alessia | Cheese Land | 10 | 3 | 49 | Wario | Bike | Resourceful |