Pandas Practice¶

In this lesson, we will practice working with DataFrames.

In [1]:
import pandas as pd
import doctest

Our Dataset!¶

Run the cell below to see the DataFrame you'll be working with today. It is a dataset about the competing bakers at the Great Seattle Bake Off.

In [2]:
great_seattle_bakeoff = pd.DataFrame({
    "BakerID": [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111],
    "Name": ["Arona", "Hannah", "Renusree", "Arpan", "Mia", "Asmi", "Alyssa", "Vani", "Vatsal", "Laura", "Suh Young"],
    "DessertBaked": ["red velvet cake", "basque cheesecake", "baumkuchen", "pound cake", "german chocolate cake", "victoria sponge cake", "birthday cake", "carrot cake", "matcha cake", "tres leches", "fruit cake"],
    "FlavorScore": [60, 92, 78, 40, 38, 73, 80, 59, 75, 99, 0],
    "PresentationScore": [95, 80, 88, 92, 98, 100, 98, 60, 77, 100, 100],
    "CityOfOrigin": ["Paris", "New York City", "Florence", "Seattle", "Seattle", "Paris", "New York City", "Seattle", "Paris", "New York City", "Florence"],
    "StartedBaking": [2006, 2007, 2010, 2020, 2020, 2015, 2019, 2013, 2018, 2018, 1325]
})

great_seattle_bakeoff
Out[2]:
BakerID Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking
0 101 Arona red velvet cake 60 95 Paris 2006
1 102 Hannah basque cheesecake 92 80 New York City 2007
2 103 Renusree baumkuchen 78 88 Florence 2010
3 104 Arpan pound cake 40 92 Seattle 2020
4 105 Mia german chocolate cake 38 98 Seattle 2020
5 106 Asmi victoria sponge cake 73 100 Paris 2015
6 107 Alyssa birthday cake 80 98 New York City 2019
7 108 Vani carrot cake 59 60 Seattle 2013
8 109 Vatsal matcha cake 75 77 Paris 2018
9 110 Laura tres leches 99 100 New York City 2018
10 111 Suh Young fruit cake 0 100 Florence 1325

Individual Activity¶

Display all the column names in the cell below.

In [3]:
great_seattle_bakeoff.columns
Out[3]:
Index(['BakerID', 'Name', 'DessertBaked', 'FlavorScore', 'PresentationScore',
       'CityOfOrigin', 'StartedBaking'],
      dtype='object')

Then, select the bakers who recieved more than 70 points on their cake's flavor.

In [4]:
great_seattle_bakeoff.loc[great_seattle_bakeoff['FlavorScore'] > 70]
Out[4]:
BakerID Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking
1 102 Hannah basque cheesecake 92 80 New York City 2007
2 103 Renusree baumkuchen 78 88 Florence 2010
5 106 Asmi victoria sponge cake 73 100 Paris 2015
6 107 Alyssa birthday cake 80 98 New York City 2019
8 109 Vatsal matcha cake 75 77 Paris 2018
9 110 Laura tres leches 99 100 New York City 2018

Group Activity¶

Add a new column called CreativityScore with the values [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]

In [5]:
great_seattle_bakeoff['CreativityScore'] = [100, 80, 70, 90, 100, 20, 80, 90, 100, 70, 70]
great_seattle_bakeoff
Out[5]:
BakerID Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking CreativityScore
0 101 Arona red velvet cake 60 95 Paris 2006 100
1 102 Hannah basque cheesecake 92 80 New York City 2007 80
2 103 Renusree baumkuchen 78 88 Florence 2010 70
3 104 Arpan pound cake 40 92 Seattle 2020 90
4 105 Mia german chocolate cake 38 98 Seattle 2020 100
5 106 Asmi victoria sponge cake 73 100 Paris 2015 20
6 107 Alyssa birthday cake 80 98 New York City 2019 80
7 108 Vani carrot cake 59 60 Seattle 2013 90
8 109 Vatsal matcha cake 75 77 Paris 2018 100
9 110 Laura tres leches 99 100 New York City 2018 70
10 111 Suh Young fruit cake 0 100 Florence 1325 70

Now, calculate the mean of the FlavorScore, PresentationScore, and CreativityScore column, and store that value under a new column, TotalScore.

In [6]:
great_seattle_bakeoff['TotalScore'] = great_seattle_bakeoff[["FlavorScore", 
                                                             "PresentationScore", 
                                                             "CreativityScore"]].mean(axis=1)
great_seattle_bakeoff
Out[6]:
BakerID Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking CreativityScore TotalScore
0 101 Arona red velvet cake 60 95 Paris 2006 100 85.000000
1 102 Hannah basque cheesecake 92 80 New York City 2007 80 84.000000
2 103 Renusree baumkuchen 78 88 Florence 2010 70 78.666667
3 104 Arpan pound cake 40 92 Seattle 2020 90 74.000000
4 105 Mia german chocolate cake 38 98 Seattle 2020 100 78.666667
5 106 Asmi victoria sponge cake 73 100 Paris 2015 20 64.333333
6 107 Alyssa birthday cake 80 98 New York City 2019 80 86.000000
7 108 Vani carrot cake 59 60 Seattle 2013 90 69.666667
8 109 Vatsal matcha cake 75 77 Paris 2018 100 84.000000
9 110 Laura tres leches 99 100 New York City 2018 70 89.666667
10 111 Suh Young fruit cake 0 100 Florence 1325 70 56.666667

Next, change the index of the DataFrame to the BakerID and find what Baker 105's CityOfOrigin is. What about their StartedBaking year?

In [7]:
great_seattle_bakeoff = great_seattle_bakeoff.set_index("BakerID")
great_seattle_bakeoff
Out[7]:
Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking CreativityScore TotalScore
BakerID
101 Arona red velvet cake 60 95 Paris 2006 100 85.000000
102 Hannah basque cheesecake 92 80 New York City 2007 80 84.000000
103 Renusree baumkuchen 78 88 Florence 2010 70 78.666667
104 Arpan pound cake 40 92 Seattle 2020 90 74.000000
105 Mia german chocolate cake 38 98 Seattle 2020 100 78.666667
106 Asmi victoria sponge cake 73 100 Paris 2015 20 64.333333
107 Alyssa birthday cake 80 98 New York City 2019 80 86.000000
108 Vani carrot cake 59 60 Seattle 2013 90 69.666667
109 Vatsal matcha cake 75 77 Paris 2018 100 84.000000
110 Laura tres leches 99 100 New York City 2018 70 89.666667
111 Suh Young fruit cake 0 100 Florence 1325 70 56.666667
In [8]:
# What is Baker 105's city of origin and year they started baking?
great_seattle_bakeoff.loc[[105]][["CityOfOrigin", "StartedBaking"]]
Out[8]:
CityOfOrigin StartedBaking
BakerID
105 Seattle 2020

Whole Class Activity¶

Write a function advancing_bakers that advances bakers into "Advanced" if their TotalScore is 85 or above, and "Eliminated" otherwise. Apply this function to create a new column CompetitionStatus.

In [9]:
def advancing_bakers(data):
    """
    Returns the bakers who will advance to the next round of the Great Seattle Bake Off.
    Parameters:
    - DataFrame: data of a round of the Great Seattle Bake Off
    
    >>> advancing_bakers(great_seattle_bakeoff).loc[101]["CompetitionStatus"]
    'Advanced'
    """
    advanced = data["TotalScore"] >= 85
    eliminated = data["TotalScore"] < 85
    data.loc[advanced, "CompetitionStatus"] = "Advanced"
    data.loc[eliminated, "CompetitionStatus"] = "Eliminated"
    return data

doctest.run_docstring_examples(advancing_bakers, globals())
In [10]:
great_seattle_bakeoff
Out[10]:
Name DessertBaked FlavorScore PresentationScore CityOfOrigin StartedBaking CreativityScore TotalScore CompetitionStatus
BakerID
101 Arona red velvet cake 60 95 Paris 2006 100 85.000000 Advanced
102 Hannah basque cheesecake 92 80 New York City 2007 80 84.000000 Eliminated
103 Renusree baumkuchen 78 88 Florence 2010 70 78.666667 Eliminated
104 Arpan pound cake 40 92 Seattle 2020 90 74.000000 Eliminated
105 Mia german chocolate cake 38 98 Seattle 2020 100 78.666667 Eliminated
106 Asmi victoria sponge cake 73 100 Paris 2015 20 64.333333 Eliminated
107 Alyssa birthday cake 80 98 New York City 2019 80 86.000000 Advanced
108 Vani carrot cake 59 60 Seattle 2013 90 69.666667 Eliminated
109 Vatsal matcha cake 75 77 Paris 2018 100 84.000000 Eliminated
110 Laura tres leches 99 100 New York City 2018 70 89.666667 Advanced
111 Suh Young fruit cake 0 100 Florence 1325 70 56.666667 Eliminated