Pandas Practice Continued & Data Visualization¶

In this lession, we will practice using groupby and data visualization tools learned this week.

In [12]:
import pandas as pd
import seaborn as sns

Our Dataset!¶

Run the cell below to see the DataFrame you'll be working with today.

In [13]:
scoreboard = pd.DataFrame({
    "Player": ["Arona", "Hannah", "Renusree", "Arpan", "Mia", "Asmi", "Alyssa", "Vani", "Vatsal",
               "Jasmine", "Kevin", "Alessia"],
    "FavoriteTrack": ["Bowser's Castle", "Rainbow Road", "Toad Harbor", "Big Blue", "Cheese Land", 
 "Toad Harbor", "Mario Circuit", "Bowser's Castle", "Mario Circuit", 
 "Coconut Mall", "Mario Circuit", "Cheese Land"],
    "Coins": [9, 8, 8, 9, 9, 10, 9, 7, 9, 8, 9, 10],
    "Mushrooms": [2, 0, 3, 1, 2, 2, 0, 3, 3, 1, 2, 3],
    "TopSpeed": [150, 70, 60, 125, 30, 20, 80, 94, 10, 77, 23, 49],
    "Character": ["Monty Mole", "Yoshi", "Luigi", "Blue Toad", "Toadette", "Princess Peach", 
                  "Princess Daisy", "Waluigi", "King Boo", "Bowser", "Mario", "Wario"],
    "Drivetrain": ["Bike", "Car", "4 wheeler", "Car", "Stroller", "4 wheeler", "Car", "Bike", 
                   "Stroller", "4 wheeler", "Bike", "Bike"],
    "Playstyle": ["Aggressive", "Aggressive", "Resourceful", "Speedster", "Resourceful", "Resourceful",
                  "Balanced", "Aggressive", "Balanced", "Balanced", "Balanced", "Resourceful"]
})

scoreboard
Out[13]:
Player FavoriteTrack Coins Mushrooms TopSpeed Character Drivetrain Playstyle
0 Arona Bowser's Castle 9 2 150 Monty Mole Bike Aggressive
1 Hannah Rainbow Road 8 0 70 Yoshi Car Aggressive
2 Renusree Toad Harbor 8 3 60 Luigi 4 wheeler Resourceful
3 Arpan Big Blue 9 1 125 Blue Toad Car Speedster
4 Mia Cheese Land 9 2 30 Toadette Stroller Resourceful
5 Asmi Toad Harbor 10 2 20 Princess Peach 4 wheeler Resourceful
6 Alyssa Mario Circuit 9 0 80 Princess Daisy Car Balanced
7 Vani Bowser's Castle 7 3 94 Waluigi Bike Aggressive
8 Vatsal Mario Circuit 9 3 10 King Boo Stroller Balanced
9 Jasmine Coconut Mall 8 1 77 Bowser 4 wheeler Balanced
10 Kevin Mario Circuit 9 2 23 Mario Bike Balanced
11 Alessia Cheese Land 10 3 49 Wario Bike Resourceful

Group Activity¶

Find the Players with the most Coins in each Drivetrain.

In [14]:
# TODO: Who has the most coins per Drivetrain group?
scoreboard.loc[scoreboard.groupby('Drivetrain')['Coins'].idxmax(), ['Player', 'Drivetrain', 'Coins']]
Out[14]:
Player Drivetrain Coins
5 Asmi 4 wheeler 10
11 Alessia Bike 10
3 Arpan Car 9
4 Mia Stroller 9

Count how many Players in each Playstyle category like each FavoriteTrack.

In [15]:
# TODO: Which Playstyle likes Cheese Land the most?
scoreboard.groupby(['Playstyle', 'FavoriteTrack']).size()
Out[15]:
Playstyle    FavoriteTrack  
Aggressive   Bowser's Castle    2
             Rainbow Road       1
Balanced     Coconut Mall       1
             Mario Circuit      3
Resourceful  Cheese Land        2
             Toad Harbor        2
Speedster    Big Blue           1
dtype: int64

Create both a line plot and a scatter plot to visualize TopSpeed trends by Playstyle. Compare the effectiveness of each in identifying patterns or outliers.

In [16]:
# TODO: Visualize!
sns.relplot(data=scoreboard, x='Playstyle', y='TopSpeed', kind='line')
sns.relplot(data=scoreboard, x='Playstyle', y='TopSpeed', kind='scatter')
Out[16]:
<seaborn.axisgrid.FacetGrid at 0x7f65efcbc890>
No description has been provided for this image
No description has been provided for this image

What if we wanted to set the index of our DataFrame to be 2 columns? Set the indices to Drivetrain and FavoriteTrack and then find all the Bikers who like Bowser's Castle.

In [17]:
# TODO: Which Bikers like Bowser's?
sns.catplot(data=scoreboard, x='Playstyle', y='TopSpeed', kind='bar', hue='Drivetrain')
Out[17]:
<seaborn.axisgrid.FacetGrid at 0x7f65efd31290>
No description has been provided for this image

Whole Class Activity¶

Write a function players_above_average that calculates the average Coins for each Playstyle , then lists the Players whose Coins are above the average of the inputted Playstyle .

In [18]:
# Run this cell once before writing your function!
scoreboard.reset_index(inplace=True)
In [19]:
# TODO: Identify standout TAs!

def players_above_average(data, playstyle):
    # find average coins for each playstyle
    avg_score = data.groupby("Playstyle")["Coins"].mean()

    # find average of the inputted playstyle
    playstyle_avg_score = avg_score[playstyle]

    # return filtered dataframe
    above_avg_players = data[
        (data["Playstyle"] == playstyle) & (data["Coins"] > playstyle_avg_score)
    ]

    return above_avg_players



players_above_average(scoreboard, "Resourceful")
Out[19]:
index Player FavoriteTrack Coins Mushrooms TopSpeed Character Drivetrain Playstyle
5 5 Asmi Toad Harbor 10 2 20 Princess Peach 4 wheeler Resourceful
11 11 Alessia Cheese Land 10 3 49 Wario Bike Resourceful