Pandas Practice¶
In this lesson, we will practice working with DataFrames.
import pandas as pd
import doctest
Our Dataset!¶
Run the cell below to see the DataFrame you'll be working with today. It is a dataset about the competing bakers at the Great Seattle Bake Off.
great_seattle_bakeoff = pd.DataFrame({
"BakerID": [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111],
"Name": ["Arona", "Hannah", "Renusree", "Arpan", "Mia", "Asmi", "Alyssa", "Vani", "Vatsal", "Laura", "Suh Young"],
"DessertBaked": ["red velvet cake", "basque cheesecake", "baumkuchen", "pound cake", "german chocolate cake", "victoria sponge cake", "birthday cake", "carrot cake", "matcha cake", "tres leches", "fruit cake"],
"FlavorScore": [60, 92, 78, 40, 38, 73, 80, 59, 75, 99, 0],
"PresentationScore": [95, 80, 88, 92, 98, 100, 98, 60, 77, 100, 100],
"CityOfOrigin": ["Paris", "New York City", "Florence", "Seattle", "Seattle", "Paris", "New York City", "Seattle", "Paris", "New York City", "Florence"],
"StartedBaking": [2006, 2007, 2010, 2020, 2020, 2015, 2019, 2013, 2018, 2018, 1325]
})
great_seattle_bakeoff
BakerID | Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | |
---|---|---|---|---|---|---|---|
0 | 101 | Arona | red velvet cake | 60 | 95 | Paris | 2006 |
1 | 102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 |
2 | 103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 |
3 | 104 | Arpan | pound cake | 40 | 92 | Seattle | 2020 |
4 | 105 | Mia | german chocolate cake | 38 | 98 | Seattle | 2020 |
5 | 106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 |
6 | 107 | Alyssa | birthday cake | 80 | 98 | New York City | 2019 |
7 | 108 | Vani | carrot cake | 59 | 60 | Seattle | 2013 |
8 | 109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 |
9 | 110 | Laura | tres leches | 99 | 100 | New York City | 2018 |
10 | 111 | Suh Young | fruit cake | 0 | 100 | Florence | 1325 |
Individual Activity¶
Display all the column names in the cell below.
great_seattle_bakeoff.columns
Index(['BakerID', 'Name', 'DessertBaked', 'FlavorScore', 'PresentationScore', 'CityOfOrigin', 'StartedBaking'], dtype='object')
Then, select the bakers who recieved more than 70 points on their cake's flavor.
great_seattle_bakeoff.loc[great_seattle_bakeoff['FlavorScore'] > 70]
BakerID | Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | |
---|---|---|---|---|---|---|---|
1 | 102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 |
2 | 103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 |
5 | 106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 |
6 | 107 | Alyssa | birthday cake | 80 | 98 | New York City | 2019 |
8 | 109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 |
9 | 110 | Laura | tres leches | 99 | 100 | New York City | 2018 |
Group Activity¶
Add a new column called CreativityScore
with the values [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]
great_seattle_bakeoff['CreativityScore'] = [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]
great_seattle_bakeoff
BakerID | Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | CreativityScore | |
---|---|---|---|---|---|---|---|---|
0 | 101 | Arona | red velvet cake | 60 | 95 | Paris | 2006 | 100 |
1 | 102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 | 80 |
2 | 103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 | 70 |
3 | 104 | Arpan | pound cake | 40 | 92 | Seattle | 2020 | 90 |
4 | 105 | Mia | german chocolate cake | 38 | 98 | Seattle | 2020 | 100 |
5 | 106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 | 20 |
6 | 107 | Alyssa | birthday cake | 80 | 98 | New York City | 2019 | 80 |
7 | 108 | Vani | carrot cake | 59 | 60 | Seattle | 2013 | 90 |
8 | 109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 | 100 |
9 | 110 | Laura | tres leches | 99 | 100 | New York City | 2018 | 70 |
10 | 111 | Suh Young | fruit cake | 0 | 100 | Florence | 1325 | 70 |
Now, calculate the mean of the FlavorScore
, PresentationScore
, and CreativityScore
column, and store that value under a new column, TotalScore
.
great_seattle_bakeoff['TotalScore'] = great_seattle_bakeoff[["FlavorScore",
"PresentationScore",
"CreativityScore"]].mean(axis=1)
great_seattle_bakeoff
BakerID | Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | CreativityScore | TotalScore | |
---|---|---|---|---|---|---|---|---|---|
0 | 101 | Arona | red velvet cake | 60 | 95 | Paris | 2006 | 100 | 85.000000 |
1 | 102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 | 80 | 84.000000 |
2 | 103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 | 70 | 78.666667 |
3 | 104 | Arpan | pound cake | 40 | 92 | Seattle | 2020 | 90 | 74.000000 |
4 | 105 | Mia | german chocolate cake | 38 | 98 | Seattle | 2020 | 100 | 78.666667 |
5 | 106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 | 20 | 64.333333 |
6 | 107 | Alyssa | birthday cake | 80 | 98 | New York City | 2019 | 80 | 86.000000 |
7 | 108 | Vani | carrot cake | 59 | 60 | Seattle | 2013 | 90 | 69.666667 |
8 | 109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 | 100 | 84.000000 |
9 | 110 | Laura | tres leches | 99 | 100 | New York City | 2018 | 70 | 89.666667 |
10 | 111 | Suh Young | fruit cake | 0 | 100 | Florence | 1325 | 70 | 56.666667 |
Next, change the index of the DataFrame to the BakerID
and find what Baker 105's CityOfOrigin
is. What about their StartedBaking
year?
great_seattle_bakeoff = great_seattle_bakeoff.set_index("BakerID")
great_seattle_bakeoff
Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | CreativityScore | TotalScore | |
---|---|---|---|---|---|---|---|---|
BakerID | ||||||||
101 | Arona | red velvet cake | 60 | 95 | Paris | 2006 | 100 | 85.000000 |
102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 | 80 | 84.000000 |
103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 | 70 | 78.666667 |
104 | Arpan | pound cake | 40 | 92 | Seattle | 2020 | 90 | 74.000000 |
105 | Mia | german chocolate cake | 38 | 98 | Seattle | 2020 | 100 | 78.666667 |
106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 | 20 | 64.333333 |
107 | Alyssa | birthday cake | 80 | 98 | New York City | 2019 | 80 | 86.000000 |
108 | Vani | carrot cake | 59 | 60 | Seattle | 2013 | 90 | 69.666667 |
109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 | 100 | 84.000000 |
110 | Laura | tres leches | 99 | 100 | New York City | 2018 | 70 | 89.666667 |
111 | Suh Young | fruit cake | 0 | 100 | Florence | 1325 | 70 | 56.666667 |
# What is Baker 105's city of origin and year they started baking?
great_seattle_bakeoff.loc[[105]][["CityOfOrigin", "StartedBaking"]]
CityOfOrigin | StartedBaking | |
---|---|---|
BakerID | ||
105 | Seattle | 2020 |
Whole Class Activity¶
Write a function advancing_bakers
that advances bakers into "Advanced" if their TotalScore
is 85 or above, and "Eliminated" otherwise. Apply this function to create a new column CompetitionStatus
.
def advancing_bakers(data):
"""
Returns the bakers who will advance to the next round of the Great Seattle Bake Off.
Parameters:
- DataFrame: data of a round of the Great Seattle Bake Off
>>> advancing_bakers(great_seattle_bakeoff).loc[101]["CompetitionStatus"]
'Advanced'
"""
advanced = data["TotalScore"] >= 85
eliminated = data["TotalScore"] < 85
data.loc[advanced, "CompetitionStatus"] = "Advanced"
data.loc[eliminated, "CompetitionStatus"] = "Eliminated"
return data
doctest.run_docstring_examples(advancing_bakers, globals())
great_seattle_bakeoff
Name | DessertBaked | FlavorScore | PresentationScore | CityOfOrigin | StartedBaking | CreativityScore | TotalScore | CompetitionStatus | |
---|---|---|---|---|---|---|---|---|---|
BakerID | |||||||||
101 | Arona | red velvet cake | 60 | 95 | Paris | 2006 | 100 | 85.000000 | Advanced |
102 | Hannah | basque cheesecake | 92 | 80 | New York City | 2007 | 80 | 84.000000 | Eliminated |
103 | Renusree | baumkuchen | 78 | 88 | Florence | 2010 | 70 | 78.666667 | Eliminated |
104 | Arpan | pound cake | 40 | 92 | Seattle | 2020 | 90 | 74.000000 | Eliminated |
105 | Mia | german chocolate cake | 38 | 98 | Seattle | 2020 | 100 | 78.666667 | Eliminated |
106 | Asmi | victoria sponge cake | 73 | 100 | Paris | 2015 | 20 | 64.333333 | Eliminated |
107 | Alyssa | birthday cake | 80 | 98 | New York City | 2019 | 80 | 86.000000 | Advanced |
108 | Vani | carrot cake | 59 | 60 | Seattle | 2013 | 90 | 69.666667 | Eliminated |
109 | Vatsal | matcha cake | 75 | 77 | Paris | 2018 | 100 | 84.000000 | Eliminated |
110 | Laura | tres leches | 99 | 100 | New York City | 2018 | 70 | 89.666667 | Advanced |
111 | Suh Young | fruit cake | 0 | 100 | Florence | 1325 | 70 | 56.666667 | Eliminated |