{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 7: `groupby` and Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objectives\n", "\n", "`pandas` is a very popular library used by data scientists to process **tabular data**: data organized into rows and columns like a spreadsheet. In this lesson, we'll discuss two key `DataFrame` operations: the `groupby` operation and the `apply` operation. We'll also discuss how `groupby` introduces an extra level of complexity towards indexing and slicing values. By the end of this lesson, students will be able to:\n", "\n", "* Explain in plain English the outcome of a `pandas` `DataFrame` `groupby` operation.\n", "* Apply the `groupby` operation to a `pandas` `DataFrame`.\n", "* Apply the `apply` operation to a `pandas` `DataFrame`.\n", "* Select values from a hierarchical index using tuples and the `slice` object as keys.\n", "\n", "\n", "## `groupby`\n", "\n", "Last time, we computed the \"shakiness\" of each location as the sum of all the magnitudes of earthquakes in that location across all time. For instructional purposes, let's review how to compute shakiness without `pandas` (using a list of dictionaries) and with `pandas` (using a `DataFrame`)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Processing a List of Dictionaries\n", "For this problem, we want to process the `earthquakes.csv` as a list of dictionaries and return the result as a dictionary (where the keys are place names and the values are ShakeFactor)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import cse163_utils" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 'nc72666881',\n", " 'year': 2016,\n", " 'month': 7,\n", " 'day': 27,\n", " 'latitude': 37.6723333,\n", " 'longitude': -121.619,\n", " 'name': 'California',\n", " 'magnitude': 1.43},\n", " {'id': 'us20006i0y',\n", " 'year': 2016,\n", " 'month': 7,\n", " 'day': 27,\n", " 'latitude': 21.5146,\n", " 'longitude': 94.5721,\n", " 'name': 'Burma',\n", " 'magnitude': 4.9},\n", " {'id': 'nc72666891',\n", " 'year': 2016,\n", " 'month': 7,\n", " 'day': 27,\n", " 'latitude': 37.5765,\n", " 'longitude': -118.85916670000002,\n", " 'name': 'California',\n", " 'magnitude': 0.06},\n", " {'id': 'nc72666896',\n", " 'year': 2016,\n", " 'month': 7,\n", " 'day': 27,\n", " 'latitude': 37.5958333,\n", " 'longitude': -118.99483329999998,\n", " 'name': 'California',\n", " 'magnitude': 0.4},\n", " {'id': 'nn00553447',\n", " 'year': 2016,\n", " 'month': 7,\n", " 'day': 27,\n", " 'latitude': 39.3775,\n", " 'longitude': -119.845,\n", " 'name': 'Nevada',\n", " 'magnitude': 0.3}]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using the code we wrote to parse the CSV to a list of dictionaries\n", "data = cse163_utils.parse('earthquakes.csv')\n", "data[:5] # First 5 rows-- food for thought: how would we get the first five rows from a DataFrame?" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "To compute shakiness, we'll follow these steps:\n", "\n", "* Create a result dictionary\n", "* For each earthquake, find its `name` and `magnitude` and add the `magnitude` to the corresponding `name`.\n", " " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "'California'", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mKeyError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[25]\u001b[39m\u001b[32m, line 5\u001b[39m\n\u001b[32m 3\u001b[39m name = earthquake[\u001b[33m'\u001b[39m\u001b[33mname\u001b[39m\u001b[33m'\u001b[39m]\n\u001b[32m 4\u001b[39m magnitude = earthquake[\u001b[33m'\u001b[39m\u001b[33mmagnitude\u001b[39m\u001b[33m'\u001b[39m]\n\u001b[32m----> \u001b[39m\u001b[32m5\u001b[39m \u001b[43mshakiness\u001b[49m\u001b[43m[\u001b[49m\u001b[43mname\u001b[49m\u001b[43m]\u001b[49m += magnitude\n\u001b[32m 7\u001b[39m shakiness\n", "\u001b[31mKeyError\u001b[39m: 'California'" ] } ], "source": [ "shakiness = {} # Empty dictionary\n", "for earthquake in data:\n", " name = earthquake['name']\n", " magnitude = earthquake['magnitude']\n", " shakiness[name] += magnitude\n", "\n", "shakiness" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Uh-oh! We get a `KeyError` because we are trying to set values of a dictionary before we've initialized any keys! When using a `dict` as a counter, remember that we have to check for the key before trying to modify its value. Let's try again." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'California': 3638.4900000000084,\n", " 'Burma': 22.200000000000003,\n", " 'Nevada': 496.6400000000006,\n", " 'Alaska': 3407.0000000000005,\n", " 'Hawaii': 375.2099999999998,\n", " 'Montana': 91.19000000000001,\n", " 'Puerto Rico': 385.99999999999983,\n", " 'Chile': 157.79999999999998,\n", " 'Dominican Republic': 97.70000000000002,\n", " 'British Virgin Islands': 230.09999999999994,\n", " 'Indonesia': 298.60000000000014,\n", " 'Washington': 176.2199999999999,\n", " 'Southern East Pacific Rise': 15.799999999999999,\n", " 'Argentina': 49.3,\n", " 'Philippines': 85.19999999999999,\n", " 'Canada': 99.96,\n", " 'Papua New Guinea': 130.3,\n", " 'Afghanistan': 43.6,\n", " 'Oregon': 107.23000000000002,\n", " 'South of Africa': 4.5,\n", " 'Peru': 63.800000000000004,\n", " 'Fiji': 75.0,\n", " 'Japan': 192.60000000000002,\n", " 'Oklahoma': 192.79999999999993,\n", " 'Mexico': 111.45,\n", " 'Kyrgyzstan': 39.8,\n", " 'Tennessee': 36.24,\n", " 'Tonga': 66.5,\n", " 'Arkansas': 6.66,\n", " 'South of the Fiji Islands': 77.00000000000001,\n", " 'Utah': 71.59,\n", " 'Georgia': 4.2,\n", " 'U.S. Virgin Islands': 84.49999999999999,\n", " 'Idaho': 27.930000000000003,\n", " 'Wyoming': 46.29999999999999,\n", " 'Iran': 13.299999999999999,\n", " 'Syria': 4.4,\n", " 'Russia': 122.60000000000001,\n", " 'Tajikistan': 53.80000000000001,\n", " 'Southwest Indian Ridge': 41.6,\n", " 'Anguilla': 2.4,\n", " 'Panama': 13.399999999999999,\n", " 'Kansas': 47.489999999999995,\n", " 'Northern Mariana Islands': 239.00000000000003,\n", " 'Christmas Island': 5.0,\n", " 'China': 63.900000000000006,\n", " 'New Zealand': 88.40000000000002,\n", " 'Vanuatu': 43.1,\n", " 'Guatemala': 26.300000000000004,\n", " 'Greece': 9.4,\n", " 'Poland': 4.2,\n", " 'Chagos Archipelago region': 4.4,\n", " 'Italy': 64.3,\n", " 'Virgin Islands region': 1.8,\n", " 'New Jersey': 2.33,\n", " 'Northern California': 2.45,\n", " 'Southern Mid-Atlantic Ridge': 13.6,\n", " 'South Sandwich Islands': 19.9,\n", " 'South Georgia and the South Sandwich Islands': 99.7,\n", " 'Northwest of Australia': 4.1,\n", " 'South Indian Ocean': 26.900000000000002,\n", " 'Solomon Islands': 45.5,\n", " 'Mid-Indian Ridge': 4.9,\n", " 'Portugal': 12.999999999999998,\n", " 'Ascension Island region': 4.7,\n", " 'Azerbaijan': 5.0,\n", " 'India': 9.3,\n", " 'Kiribati region': 4.6,\n", " 'Martinique': 4.6,\n", " 'Venezuela': 9.0,\n", " 'Bolivia': 12.9,\n", " 'Turkey': 8.0,\n", " 'Vanuatu region': 4.5,\n", " 'Missouri': 20.469999999999995,\n", " 'Guam': 9.0,\n", " 'Ohio': 1.98,\n", " 'Nicaragua': 18.5,\n", " 'East Timor': 4.5,\n", " 'Northern Mid-Atlantic Ridge': 9.899999999999999,\n", " 'Palau': 5.3,\n", " 'Colorado': 19.4,\n", " 'West Virginia': 2.31,\n", " 'New Caledonia': 157.6,\n", " 'Australia': 14.5,\n", " 'Off the coast of Oregon': 7.199999999999999,\n", " 'Virginia': 1.94,\n", " 'Costa Rica': 17.7,\n", " 'Ukraine': 4.8,\n", " 'Colombia': 8.899999999999999,\n", " 'East of the Kuril Islands': 4.7,\n", " 'Cyprus': 4.1,\n", " 'Pacific-Antarctic Ridge': 9.399999999999999,\n", " 'Uzbekistan': 4.6,\n", " 'Illinois': 7.289999999999999,\n", " 'Central Mid-Atlantic Ridge': 4.3,\n", " 'Western Indian-Antarctic Ridge': 4.5,\n", " 'Ecuador': 8.5,\n", " 'South of Panama': 4.6,\n", " 'El Salvador': 9.399999999999999,\n", " 'Western Xizang': 9.4,\n", " 'Azores-Cape St. Vincent Ridge': 4.7,\n", " 'North Carolina': 1.92,\n", " 'North of Svalbard': 5.0,\n", " 'Texas': 2.5,\n", " 'Fiji region': 14.600000000000001,\n", " 'Reykjanes Ridge': 4.5,\n", " 'Arizona': 4.39,\n", " 'Pakistan': 4.4,\n", " 'Greenland Sea': 4.5,\n", " 'New Hampshire': 1.5,\n", " 'South Georgia Island region': 86.39999999999999,\n", " 'New York': 1.48,\n", " 'Central East Pacific Rise': 4.6,\n", " 'North of Ascension Island': 9.5,\n", " 'Pennsylvania': 1.37,\n", " 'Japan region': 11.5,\n", " 'Taiwan': 4.2,\n", " 'Kuril Islands': 4.6}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shakiness = {} # Empty dictionary\n", "for earthquake in data:\n", " name = earthquake['name']\n", " magnitude = earthquake['magnitude']\n", " if name not in shakiness:\n", " shakiness[name] = 0\n", " shakiness[name] += magnitude\n", "\n", "shakiness" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And then we are done! It looks like California is the shakiest place on Earth!\n", "\n", "What we have just done is a **group by** operation on the data. We put all the earthquakes into groups *by* some column label such as `name`, *selected* column values of interest such as `magnitude`, and *aggregated* (by summing with `+`) the values in each group.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Processing a `pandas` `DataFrame`\n", "Since group by is such a common operation, `pandas` has its own syntax. Understanding how the list of dictionaries approach works is a good starting point to understand what happens in the background when using `pandas`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idyearmonthdaylatitudelongitudenamemagnitude
0nc72666881201672737.672333-121.619000California1.43
1us20006i0y201672721.51460094.572100Burma4.90
2nc72666891201672737.576500-118.859167California0.06
3nc72666896201672737.595833-118.994833California0.40
4nn00553447201672739.377500-119.845000Nevada0.30
...........................
8389nc72685246201682536.515499-121.099831California2.42
8390ak13879193201682561.498400-149.862700Alaska1.40
8391nc72685251201682538.805000-122.821503California1.06
8392ci37672328201682534.308000-118.635333California1.55
8393ci37672360201682534.119167-116.933667California0.89
\n", "

8394 rows × 8 columns

\n", "
" ], "text/plain": [ " id year month day latitude longitude name \\\n", "0 nc72666881 2016 7 27 37.672333 -121.619000 California \n", "1 us20006i0y 2016 7 27 21.514600 94.572100 Burma \n", "2 nc72666891 2016 7 27 37.576500 -118.859167 California \n", "3 nc72666896 2016 7 27 37.595833 -118.994833 California \n", "4 nn00553447 2016 7 27 39.377500 -119.845000 Nevada \n", "... ... ... ... ... ... ... ... \n", "8389 nc72685246 2016 8 25 36.515499 -121.099831 California \n", "8390 ak13879193 2016 8 25 61.498400 -149.862700 Alaska \n", "8391 nc72685251 2016 8 25 38.805000 -122.821503 California \n", "8392 ci37672328 2016 8 25 34.308000 -118.635333 California \n", "8393 ci37672360 2016 8 25 34.119167 -116.933667 California \n", "\n", " magnitude \n", "0 1.43 \n", "1 4.90 \n", "2 0.06 \n", "3 0.40 \n", "4 0.30 \n", "... ... \n", "8389 2.42 \n", "8390 1.40 \n", "8391 1.06 \n", "8392 1.55 \n", "8393 0.89 \n", "\n", "[8394 rows x 8 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('earthquakes.csv')\n", "df" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We first show how to do this in pandas, and then we will go back and explain what the syntax is saying." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "name\n", "Afghanistan 43.60\n", "Alaska 3407.00\n", "Anguilla 2.40\n", "Argentina 49.30\n", "Arizona 4.39\n", " ... \n", "Washington 176.22\n", "West Virginia 2.31\n", "Western Indian-Antarctic Ridge 4.50\n", "Western Xizang 9.40\n", "Wyoming 46.30\n", "Name: magnitude, Length: 118, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('name')['magnitude'].sum()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The way to read this line of code uses the following logic:\n", "* `df` is the `DataFrame` we want to process\n", "* `.groupby('name')` tells the `DataFrame` we want it to group all of its rows **`by`** the `name` attribute. Conceptually, this step splits all the data into groups by `name`.\n", "* **For each group**, `['magnitude'].sum()` selects the `magnitude` from all rows in that group and aggregates the data by taking the `sum` of the rows.\n", "\n", "We can substitute `sum` aggregation operation for any other `Series` methods like `min`, `max`, and `mean`. You can also group by any column or use any column for your computations!\n", "\n", "The final result is a `Series` that uses the `name` as its index. Remember that most of the datasets we've seen had integer indices, but we noted that his was not always the case. The result of a groupby has the values you used for group (in this case the string names of locations) as the indices. This makes it convenient to find the shakiness of California specifically, for example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3638.4900000000084" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shakiness = df.groupby('name')['magnitude'].sum()\n", "shakiness['California']" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### What's going on here?\n", "It's pretty incredible that such a short line of code can do something so complex. In the bullet-list above, I described the steps that happen here but I find it easier to think about it with a small example. The figure below shows a visual explanation of what is going on when you do a `groupby`.\n", "\n", "1. **Teal, `data`**. Specify the `DataFrame`.\n", "2. **Purple, `.groupby('col1')`**. Call the `groupby` function on that `DataFrame` passing `col1` as the column for forming groups. This splits the data up into groups based on the provided column.\n", " * This calls the `DataFrame.groupby(...)` method, **not** indexing with `[]`! Note the parentheses rather than square brackets.\n", "3. **Red, `['col2'].sum()`**. Specify the aggregation column, `col2`, and the aggregation function, `.sum()`. The values in each group are aggregated into a single value.\n", " * This selects a particular column by indexing with `[]`.\n", "4. **Green, `result = `**. The final result evaluates to a `Series` with `col1` as the index and the aggregation value for each group.\n", "\n", "![Groupby Diagram, as explained by the numbered list above](groupby.png)\n", "\n", "To get better understand what's going on, let's freeze at step 3 (before aggregating results into a `Series`) to see how Pandas represents the `groupby` operation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2
0A1
1B2
2C3
3A4
4C5
\n", "
" ], "text/plain": [ " col1 col2\n", "0 A 1\n", "1 B 2\n", "2 C 3\n", "3 A 4\n", "4 C 5" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make a DataFrame by giving it a list of dictionaries\n", "example_data = pd.DataFrame([\n", " {'col1': 'A', 'col2': 1},\n", " {'col1': 'B', 'col2': 2},\n", " {'col1': 'C', 'col2': 3},\n", " {'col1': 'A', 'col2': 4},\n", " {'col1': 'C', 'col2': 5},\n", "])\n", "\n", "example_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groupby = example_data.groupby('col1') # Notice no aggregate\n", "groupby" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This returns a special `pandas` object called a `DataFrameGroupBy`. This represents an \"unfinished\" `groupby` operation since we have not computed an aggregate yet. One feature that is kind of helpful for seeing what's going on (but you will probably rarely use in practice) is the fact you can loop over a `DataFrameGroupBy` to inspect the groups." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2
0A1
3A4
\n", "
" ], "text/plain": [ " col1 col2\n", "0 A 1\n", "3 A 4" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2
1B2
\n", "
" ], "text/plain": [ " col1 col2\n", "1 B 2" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2
2C3
4C5
\n", "
" ], "text/plain": [ " col1 col2\n", "2 C 3\n", "4 C 5" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for key, group in groupby: \n", " display(group) # To get the fancy Jupyter Notebook display of table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, you can use the `.describe()` function, which gives several different summary statistics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
col1
A2.02.52.1213201.01.752.53.254.0
B1.02.0NaN2.02.002.02.002.0
C2.04.01.4142143.03.504.04.505.0
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "col1 \n", "A 2.0 2.5 2.121320 1.0 1.75 2.5 3.25 4.0\n", "B 1.0 2.0 NaN 2.0 2.00 2.0 2.00 2.0\n", "C 2.0 4.0 1.414214 3.0 3.50 4.0 4.50 5.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groupby['col2'].describe()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is why when we do finish the computation, we get the sum of all these rows that fell into each group" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "col1\n", "A 5\n", "B 2\n", "C 8\n", "Name: col2, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = example_data.groupby('col1')['col2'].sum()\n", "result" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "It's important to highlight that **`result` is a `Series` with the step 2 group-by names as its index**. You can access the underlying `index` of a `Series` or `DataFrame` using its `.index` attribute." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Index(['A', 'B', 'C'], dtype='object', name='col1')\n", "8\n" ] } ], "source": [ "print(type(result))\n", "print(result.index)\n", "print(result['C'])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### When to use this?\n", "Whenever you want to separate your data into groups to get some value for each one, this is the natural time to ues a `groupby`! Whenever you see \"for each\" or \"per\" in a question, it's likely you will want to use a `groupby`. \n", "\n", "Here are some example questions you can solve with a groupby. We provide some \"code\" to compute each one, assuming the data had columns with the relevant names.\n", "* Compute the number of students per section\n", " ```\n", " grades.groupby('Section')['StudentID'].count()\n", " ```\n", "* Compute the average grade of students in each section\n", " ```\n", " grades.groupby('Section')['Grade'].mean()\n", " ```\n", "* Compute the total number of people in each country\n", " ```\n", " cities.groupby('country')['population'].sum()\n", " ```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Food for thought:** What would happen if we wanted to use `groupby` with multiple columns? For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
yearmonthday
2016727272.01.6175741.0953490.060.90001.4002.00005.60
28308.01.4481490.8968510.100.87751.2401.80005.10
29309.01.6401291.1659520.010.88001.4501.90007.70
30329.01.6150761.2626180.030.70001.2402.00005.70
31278.01.7508271.2615770.100.90001.5002.14755.90
81356.01.5200561.1573260.040.80001.2451.80256.10
2292.01.5394181.0899460.050.80001.3001.90005.50
3294.01.5563271.1473650.010.83001.2001.81505.10
4420.01.2491901.0347380.050.60001.0001.58256.30
5256.01.4287891.1442440.100.62001.1851.71505.70
6316.01.3132281.0655870.090.56001.1001.62005.40
7316.01.3569941.0785560.100.60001.1201.74255.10
8335.01.4849251.1314950.020.63001.2001.99005.10
9272.01.6147791.1641860.100.80751.3001.90005.09
10329.01.4047421.0387010.020.77001.1701.74005.40
11356.01.3905341.1591470.040.67751.1001.70007.20
12326.01.5332821.1586960.040.74001.2001.90005.40
13284.01.4219011.0803380.100.70001.1051.77755.20
14231.01.6926841.3721910.050.82001.2001.90005.90
15222.01.5839641.1575530.070.85501.3001.84005.40
16223.01.6299101.2231310.100.73001.3002.00005.10
17220.01.5836821.2036170.100.80001.2101.96755.20
18219.01.4997721.1594970.040.76001.2001.80505.90
19226.01.8194691.4162910.110.89251.3402.19007.40
20237.01.5532071.2962620.020.70001.2002.00006.40
21266.01.3323681.0322100.030.60501.1401.69255.10
22215.01.4514881.1856570.010.70001.1801.80005.60
23233.01.6433911.2456610.040.82001.3002.10006.20
24216.01.5531941.1440540.030.83751.2901.90006.80
25238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "year month day \n", "2016 7 27 272.0 1.617574 1.095349 0.06 0.9000 1.400 2.0000 5.60\n", " 28 308.0 1.448149 0.896851 0.10 0.8775 1.240 1.8000 5.10\n", " 29 309.0 1.640129 1.165952 0.01 0.8800 1.450 1.9000 7.70\n", " 30 329.0 1.615076 1.262618 0.03 0.7000 1.240 2.0000 5.70\n", " 31 278.0 1.750827 1.261577 0.10 0.9000 1.500 2.1475 5.90\n", " 8 1 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 6.10\n", " 2 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 5.50\n", " 3 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 5.10\n", " 4 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 6.30\n", " 5 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 5.70\n", " 6 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 5.40\n", " 7 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 5.10\n", " 8 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 5.10\n", " 9 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 5.09\n", " 10 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 5.40\n", " 11 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 7.20\n", " 12 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 5.40\n", " 13 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 5.20\n", " 14 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 5.90\n", " 15 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 5.40\n", " 16 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 5.10\n", " 17 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 5.20\n", " 18 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 5.90\n", " 19 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 7.40\n", " 20 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 6.40\n", " 21 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 5.10\n", " 22 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 5.60\n", " 23 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 6.20\n", " 24 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 6.80\n", " 25 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 5.90" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(['year', 'month', 'day'])['magnitude'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `apply`\n", "\n", "Earlier, we saw how arithmetic operators like `+`, `-`, `*`, and `/` apply element-wise to `pandas` `DataFrames` or `Series`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['magnitude'] * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we wanted to find the length of each value in the name column? You might try something like the following and hope it does an element-wise computation as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(df['name'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is evaluating to the number of elements in the entire `Series`, not the length of each place `name`! Only certain operators support element-wise computation.\n", "* Arithmetic operators such as `+`, `-`, `*`, `/`, etc.\n", "* Comparison operators such as `==`, `<`, etc.\n", "* Logical operators such as `&`, `|`, `~`.\n", "\n", "Other functions will act on the entire `Series`, just as we saw with `len`!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in Functions on `pandas`\n", "To call the `len` function on each `str`, use the magic syntax below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['name'].str.len()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This reads, \"Take the name column, apply the `len` function defined for `str`s to each element in the `Series`.\" The syntax looks unusual at first, but it helps programmers be clear about the data type and the function to apply to each element! We won't look at other data types now but there is a similar syntax for those as well.\n", "\n", "Now you aren't limited to just calling `len` here, you can call pretty much any `str` function using this syntax.\n", "\n", "* `str.lower()` to lower-case each string and return them as a new `Series`.\n", "* `str.upper()` to upper-case each string and return them as a new `Series`.\n", "* `str.len()` to compute the length of each string and return them as a new `Series`.\n", "* `str.strip()`, `str.lstrip()`, `str.rstrip()` to trim whitespace from each string and return them as a new `Series`.\n", "\n", "For example, the following cell shows how to convert each name to its upper-case version." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['name'].str.upper()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this operation does not modify the original name column but rather returns a new `Series` with all the names upper-cased.\n", "\n", "What if you wanted to write your own function to transform a value and apply it to each element in a `Series`? For example, what if I wanted to grab the first two characters from each name? \n", "\n", "This is where we will need the more general `apply` function defined for `pandas` objects. `apply` is more general than using the specific `str` functions we saw above since it will let you call your own functions on each element. Before we show how to do the specific example of grabbing the first two characters from the names, let's use this new approach to find the `len` of each name." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['name'].apply(len)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first part, `df['name'].apply(`, should probably make some sense to you. We are calling some function named `apply` on the `Series` `df['name']`. What's very strange about this is it seems to be passing `len` **as a parameter** to the `apply` function! `len` is not being called directly here but passed as a parameter so that it can be later applied to each element.\n", "\n", "This is probably the first time that you've seen this syntax. While this does look very strange, this is a feature of Python. A function is, in some sense, just like any other value in Python. But unlike `int`, `str`, or `list`, a Python function doesn't represent data (like numbers, strings, or collections of such data). A Python function represents an algorithm that can later be applied to some data!\n", "\n", "So the authors of `pandas` who wrote the `apply` function, wrote it to take a parameter that is **another function** like `len`. They then call that function on each element in the `Series`. As Python programmers, we can write our own functions that take other functions as parameters too." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def list_apply(values, function):\n", " \"\"\"\n", " Takes a list of values and a function, and applies that function\n", " to each value in values. The given function must take one parameter\n", " as input and the returned list will be the result of calling that\n", " function once for each value in the list.\n", " \"\"\"\n", " # It's not necessary to use a list comprehension here, \n", " # but it's the easiest way to write this method!\n", " return [function(v) for v in values]\n", "\n", "list_apply(['I', 'love', 'dogs'], len)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no restriction to only passing in the `len` function as a parameter here. You can pass any function that takes a single argument.\n", "\n", "So if we write a function `first_two` that takes a `str` and returns the first two characters, then we can pass `first_two` to `apply`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def first_two(s):\n", " \"\"\"\n", " Returns the first two characters of the given str as a str.\n", " \n", " Assumes there are at least two characters in s.\n", " \"\"\"\n", " return s[:2]\n", "\n", "df['name'].apply(first_two)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Food for thought:** What happens to the parameter in the function definition? What if we had multiple parameters?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Saving Results\n", "Remember this `apply` function doesn't modify any data in the `DataFrame` or `Series`, but rather returns a new one. It's common that you want to save the result of an `apply` to your dataset to use those values later. Just like how you can use the `[]` assignment syntax for `list` and `dict` types in Python, we can also assign columns directly to a `pandas` `DataFrame`.\n", "\n", "We can use this syntax to update an existing column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['name'] = df['name'].str.upper()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or use it to introduce a new column to the `DataFrame`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['first_two_letters'] = df['name'].apply(first_two)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hierarchical indexing\n", "\n", "Let's revisit `groupby` to take note of something interesting. Take a look at the indices that result from the following `groupby` call:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
yearmonthday
2016727272.01.6175741.0953490.060.90001.4002.00005.60
28308.01.4481490.8968510.100.87751.2401.80005.10
29309.01.6401291.1659520.010.88001.4501.90007.70
30329.01.6150761.2626180.030.70001.2402.00005.70
31278.01.7508271.2615770.100.90001.5002.14755.90
81356.01.5200561.1573260.040.80001.2451.80256.10
2292.01.5394181.0899460.050.80001.3001.90005.50
3294.01.5563271.1473650.010.83001.2001.81505.10
4420.01.2491901.0347380.050.60001.0001.58256.30
5256.01.4287891.1442440.100.62001.1851.71505.70
6316.01.3132281.0655870.090.56001.1001.62005.40
7316.01.3569941.0785560.100.60001.1201.74255.10
8335.01.4849251.1314950.020.63001.2001.99005.10
9272.01.6147791.1641860.100.80751.3001.90005.09
10329.01.4047421.0387010.020.77001.1701.74005.40
11356.01.3905341.1591470.040.67751.1001.70007.20
12326.01.5332821.1586960.040.74001.2001.90005.40
13284.01.4219011.0803380.100.70001.1051.77755.20
14231.01.6926841.3721910.050.82001.2001.90005.90
15222.01.5839641.1575530.070.85501.3001.84005.40
16223.01.6299101.2231310.100.73001.3002.00005.10
17220.01.5836821.2036170.100.80001.2101.96755.20
18219.01.4997721.1594970.040.76001.2001.80505.90
19226.01.8194691.4162910.110.89251.3402.19007.40
20237.01.5532071.2962620.020.70001.2002.00006.40
21266.01.3323681.0322100.030.60501.1401.69255.10
22215.01.4514881.1856570.010.70001.1801.80005.60
23233.01.6433911.2456610.040.82001.3002.10006.20
24216.01.5531941.1440540.030.83751.2901.90006.80
25238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "year month day \n", "2016 7 27 272.0 1.617574 1.095349 0.06 0.9000 1.400 2.0000 5.60\n", " 28 308.0 1.448149 0.896851 0.10 0.8775 1.240 1.8000 5.10\n", " 29 309.0 1.640129 1.165952 0.01 0.8800 1.450 1.9000 7.70\n", " 30 329.0 1.615076 1.262618 0.03 0.7000 1.240 2.0000 5.70\n", " 31 278.0 1.750827 1.261577 0.10 0.9000 1.500 2.1475 5.90\n", " 8 1 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 6.10\n", " 2 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 5.50\n", " 3 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 5.10\n", " 4 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 6.30\n", " 5 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 5.70\n", " 6 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 5.40\n", " 7 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 5.10\n", " 8 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 5.10\n", " 9 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 5.09\n", " 10 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 5.40\n", " 11 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 7.20\n", " 12 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 5.40\n", " 13 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 5.20\n", " 14 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 5.90\n", " 15 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 5.40\n", " 16 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 5.10\n", " 17 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 5.20\n", " 18 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 5.90\n", " 19 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 7.40\n", " 20 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 6.40\n", " 21 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 5.10\n", " 22 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 5.60\n", " 23 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 6.20\n", " 24 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 6.80\n", " 25 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 5.90" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes = df.groupby(['year', 'month', 'day'])['magnitude'].describe()\n", "magnitudes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In `pandas`, a `DataFrame` can have a **hierarchical index** (or, a multi-level index) called a `MultiIndex`. We can use the `.index` property to investigate this further." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiIndex([(2016, 7, 27),\n", " (2016, 7, 28),\n", " (2016, 7, 29),\n", " (2016, 7, 30),\n", " (2016, 7, 31),\n", " (2016, 8, 1),\n", " (2016, 8, 2),\n", " (2016, 8, 3),\n", " (2016, 8, 4),\n", " (2016, 8, 5),\n", " (2016, 8, 6),\n", " (2016, 8, 7),\n", " (2016, 8, 8),\n", " (2016, 8, 9),\n", " (2016, 8, 10),\n", " (2016, 8, 11),\n", " (2016, 8, 12),\n", " (2016, 8, 13),\n", " (2016, 8, 14),\n", " (2016, 8, 15),\n", " (2016, 8, 16),\n", " (2016, 8, 17),\n", " (2016, 8, 18),\n", " (2016, 8, 19),\n", " (2016, 8, 20),\n", " (2016, 8, 21),\n", " (2016, 8, 22),\n", " (2016, 8, 23),\n", " (2016, 8, 24),\n", " (2016, 8, 25)],\n", " names=['year', 'month', 'day'])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Food for thought:** See a familiar data structure?\n", "\n", "The `MultiIndex` comes in handy when your data might have multiple categorical dimensions. It allows us to work with and manipulate data with multiple levels as though it were one. Some common use cases are:\n", "\n", "* Time-series data with multiple columns for time units (like the year, month, and day in `earthquakes.csv`)\n", "* Experimental data with multiple factors that are difficult to represent in two dimensions\n", "* Geographical data with nested location columns (such as country, state, city)\n", "* Grouping data on multiple dimensions (hence why we see it with multi-column `groupby`!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing a `MultiIndex`\n", "\n", "We can use `.loc` with `MultiIndex`, but the syntax will look a bit strange. Recall that `.loc[]` takes in a row indexer and a column indexer, like so:\n", "\n", "```python\n", "df.loc[row_indexer, column_indexer]\n", "```\n", "\n", "To select a specific combination of values in the `MultiIndex`, you would need to use a `tuple`." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 316.000000\n", "mean 1.356994\n", "std 1.078556\n", "min 0.100000\n", "25% 0.600000\n", "50% 1.120000\n", "75% 1.742500\n", "max 5.100000\n", "Name: (2016, 8, 7), dtype: float64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes.loc[(2016, 8, 7), :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even though there are three values in the `MultiIndex`, we don't need to select all of them! In the following slice, we have:\n", "\n", "* The `'year'` index set to `2016`\n", "* The `'month'` index set to `8`\n", "\n", "Notice that we didn't specify a `'day'` index, so we still see all the possible `'day'` values that have a `'year'` of `2016` and a `'month'` of `8`." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
day
1356.01.5200561.1573260.040.80001.2451.80256.10
2292.01.5394181.0899460.050.80001.3001.90005.50
3294.01.5563271.1473650.010.83001.2001.81505.10
4420.01.2491901.0347380.050.60001.0001.58256.30
5256.01.4287891.1442440.100.62001.1851.71505.70
6316.01.3132281.0655870.090.56001.1001.62005.40
7316.01.3569941.0785560.100.60001.1201.74255.10
8335.01.4849251.1314950.020.63001.2001.99005.10
9272.01.6147791.1641860.100.80751.3001.90005.09
10329.01.4047421.0387010.020.77001.1701.74005.40
11356.01.3905341.1591470.040.67751.1001.70007.20
12326.01.5332821.1586960.040.74001.2001.90005.40
13284.01.4219011.0803380.100.70001.1051.77755.20
14231.01.6926841.3721910.050.82001.2001.90005.90
15222.01.5839641.1575530.070.85501.3001.84005.40
16223.01.6299101.2231310.100.73001.3002.00005.10
17220.01.5836821.2036170.100.80001.2101.96755.20
18219.01.4997721.1594970.040.76001.2001.80505.90
19226.01.8194691.4162910.110.89251.3402.19007.40
20237.01.5532071.2962620.020.70001.2002.00006.40
21266.01.3323681.0322100.030.60501.1401.69255.10
22215.01.4514881.1856570.010.70001.1801.80005.60
23233.01.6433911.2456610.040.82001.3002.10006.20
24216.01.5531941.1440540.030.83751.2901.90006.80
25238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "day \n", "1 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 6.10\n", "2 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 5.50\n", "3 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 5.10\n", "4 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 6.30\n", "5 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 5.70\n", "6 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 5.40\n", "7 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 5.10\n", "8 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 5.10\n", "9 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 5.09\n", "10 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 5.40\n", "11 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 7.20\n", "12 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 5.40\n", "13 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 5.20\n", "14 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 5.90\n", "15 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 5.40\n", "16 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 5.10\n", "17 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 5.20\n", "18 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 5.90\n", "19 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 7.40\n", "20 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 6.40\n", "21 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 5.10\n", "22 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 5.60\n", "23 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 6.20\n", "24 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 6.80\n", "25 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 5.90" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes.loc[(2016, 8), :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `slice(None)` to select all values at a particular level. So, if we wanted all the values in August (when `'month'` is `8`) and it didn't matter what day or year it was, we might do the following:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
yearmonthday
201681356.01.5200561.1573260.040.80001.2451.80256.10
2292.01.5394181.0899460.050.80001.3001.90005.50
3294.01.5563271.1473650.010.83001.2001.81505.10
4420.01.2491901.0347380.050.60001.0001.58256.30
5256.01.4287891.1442440.100.62001.1851.71505.70
6316.01.3132281.0655870.090.56001.1001.62005.40
7316.01.3569941.0785560.100.60001.1201.74255.10
8335.01.4849251.1314950.020.63001.2001.99005.10
9272.01.6147791.1641860.100.80751.3001.90005.09
10329.01.4047421.0387010.020.77001.1701.74005.40
11356.01.3905341.1591470.040.67751.1001.70007.20
12326.01.5332821.1586960.040.74001.2001.90005.40
13284.01.4219011.0803380.100.70001.1051.77755.20
14231.01.6926841.3721910.050.82001.2001.90005.90
15222.01.5839641.1575530.070.85501.3001.84005.40
16223.01.6299101.2231310.100.73001.3002.00005.10
17220.01.5836821.2036170.100.80001.2101.96755.20
18219.01.4997721.1594970.040.76001.2001.80505.90
19226.01.8194691.4162910.110.89251.3402.19007.40
20237.01.5532071.2962620.020.70001.2002.00006.40
21266.01.3323681.0322100.030.60501.1401.69255.10
22215.01.4514881.1856570.010.70001.1801.80005.60
23233.01.6433911.2456610.040.82001.3002.10006.20
24216.01.5531941.1440540.030.83751.2901.90006.80
25238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "year month day \n", "2016 8 1 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 6.10\n", " 2 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 5.50\n", " 3 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 5.10\n", " 4 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 6.30\n", " 5 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 5.70\n", " 6 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 5.40\n", " 7 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 5.10\n", " 8 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 5.10\n", " 9 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 5.09\n", " 10 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 5.40\n", " 11 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 7.20\n", " 12 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 5.40\n", " 13 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 5.20\n", " 14 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 5.90\n", " 15 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 5.40\n", " 16 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 5.10\n", " 17 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 5.20\n", " 18 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 5.90\n", " 19 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 7.40\n", " 20 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 6.40\n", " 21 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 5.10\n", " 22 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 5.60\n", " 23 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 6.20\n", " 24 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 6.80\n", " 25 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 5.90" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes.loc[(slice(None), 8, slice(None)), :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Resetting Indices\n", "\n", "The `.reset_index()` method converts index levels back into regular columns. This is useful when you need to perform column-wise operations or when you want to \"flatten\" your data structure.\n", "\n", "By default, `.reset_index()` moves all index levels to their own columns and assigns integer indices starting from 0:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearmonthdaycountmeanstdmin25%50%75%max
02016727272.01.6175741.0953490.060.90001.4002.00005.60
12016728308.01.4481490.8968510.100.87751.2401.80005.10
22016729309.01.6401291.1659520.010.88001.4501.90007.70
32016730329.01.6150761.2626180.030.70001.2402.00005.70
42016731278.01.7508271.2615770.100.90001.5002.14755.90
5201681356.01.5200561.1573260.040.80001.2451.80256.10
6201682292.01.5394181.0899460.050.80001.3001.90005.50
7201683294.01.5563271.1473650.010.83001.2001.81505.10
8201684420.01.2491901.0347380.050.60001.0001.58256.30
9201685256.01.4287891.1442440.100.62001.1851.71505.70
10201686316.01.3132281.0655870.090.56001.1001.62005.40
11201687316.01.3569941.0785560.100.60001.1201.74255.10
12201688335.01.4849251.1314950.020.63001.2001.99005.10
13201689272.01.6147791.1641860.100.80751.3001.90005.09
142016810329.01.4047421.0387010.020.77001.1701.74005.40
152016811356.01.3905341.1591470.040.67751.1001.70007.20
162016812326.01.5332821.1586960.040.74001.2001.90005.40
172016813284.01.4219011.0803380.100.70001.1051.77755.20
182016814231.01.6926841.3721910.050.82001.2001.90005.90
192016815222.01.5839641.1575530.070.85501.3001.84005.40
202016816223.01.6299101.2231310.100.73001.3002.00005.10
212016817220.01.5836821.2036170.100.80001.2101.96755.20
222016818219.01.4997721.1594970.040.76001.2001.80505.90
232016819226.01.8194691.4162910.110.89251.3402.19007.40
242016820237.01.5532071.2962620.020.70001.2002.00006.40
252016821266.01.3323681.0322100.030.60501.1401.69255.10
262016822215.01.4514881.1856570.010.70001.1801.80005.60
272016823233.01.6433911.2456610.040.82001.3002.10006.20
282016824216.01.5531941.1440540.030.83751.2901.90006.80
292016825238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " year month day count mean std min 25% 50% 75% \\\n", "0 2016 7 27 272.0 1.617574 1.095349 0.06 0.9000 1.400 2.0000 \n", "1 2016 7 28 308.0 1.448149 0.896851 0.10 0.8775 1.240 1.8000 \n", "2 2016 7 29 309.0 1.640129 1.165952 0.01 0.8800 1.450 1.9000 \n", "3 2016 7 30 329.0 1.615076 1.262618 0.03 0.7000 1.240 2.0000 \n", "4 2016 7 31 278.0 1.750827 1.261577 0.10 0.9000 1.500 2.1475 \n", "5 2016 8 1 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 \n", "6 2016 8 2 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 \n", "7 2016 8 3 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 \n", "8 2016 8 4 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 \n", "9 2016 8 5 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 \n", "10 2016 8 6 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 \n", "11 2016 8 7 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 \n", "12 2016 8 8 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 \n", "13 2016 8 9 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 \n", "14 2016 8 10 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 \n", "15 2016 8 11 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 \n", "16 2016 8 12 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 \n", "17 2016 8 13 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 \n", "18 2016 8 14 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 \n", "19 2016 8 15 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 \n", "20 2016 8 16 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 \n", "21 2016 8 17 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 \n", "22 2016 8 18 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 \n", "23 2016 8 19 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 \n", "24 2016 8 20 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 \n", "25 2016 8 21 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 \n", "26 2016 8 22 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 \n", "27 2016 8 23 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 \n", "28 2016 8 24 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 \n", "29 2016 8 25 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 \n", "\n", " max \n", "0 5.60 \n", "1 5.10 \n", "2 7.70 \n", "3 5.70 \n", "4 5.90 \n", "5 6.10 \n", "6 5.50 \n", "7 5.10 \n", "8 6.30 \n", "9 5.70 \n", "10 5.40 \n", "11 5.10 \n", "12 5.10 \n", "13 5.09 \n", "14 5.40 \n", "15 7.20 \n", "16 5.40 \n", "17 5.20 \n", "18 5.90 \n", "19 5.40 \n", "20 5.10 \n", "21 5.20 \n", "22 5.90 \n", "23 7.40 \n", "24 6.40 \n", "25 5.10 \n", "26 5.60 \n", "27 6.20 \n", "28 6.80 \n", "29 5.90 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes_reset = magnitudes.reset_index()\n", "magnitudes_reset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You could optionally pass in the name of an index level to turn it into a column instead. Take a look at the following:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearcountmeanstdmin25%50%75%max
monthday
7272016272.01.6175741.0953490.060.90001.4002.00005.60
282016308.01.4481490.8968510.100.87751.2401.80005.10
292016309.01.6401291.1659520.010.88001.4501.90007.70
302016329.01.6150761.2626180.030.70001.2402.00005.70
312016278.01.7508271.2615770.100.90001.5002.14755.90
812016356.01.5200561.1573260.040.80001.2451.80256.10
22016292.01.5394181.0899460.050.80001.3001.90005.50
32016294.01.5563271.1473650.010.83001.2001.81505.10
42016420.01.2491901.0347380.050.60001.0001.58256.30
52016256.01.4287891.1442440.100.62001.1851.71505.70
62016316.01.3132281.0655870.090.56001.1001.62005.40
72016316.01.3569941.0785560.100.60001.1201.74255.10
82016335.01.4849251.1314950.020.63001.2001.99005.10
92016272.01.6147791.1641860.100.80751.3001.90005.09
102016329.01.4047421.0387010.020.77001.1701.74005.40
112016356.01.3905341.1591470.040.67751.1001.70007.20
122016326.01.5332821.1586960.040.74001.2001.90005.40
132016284.01.4219011.0803380.100.70001.1051.77755.20
142016231.01.6926841.3721910.050.82001.2001.90005.90
152016222.01.5839641.1575530.070.85501.3001.84005.40
162016223.01.6299101.2231310.100.73001.3002.00005.10
172016220.01.5836821.2036170.100.80001.2101.96755.20
182016219.01.4997721.1594970.040.76001.2001.80505.90
192016226.01.8194691.4162910.110.89251.3402.19007.40
202016237.01.5532071.2962620.020.70001.2002.00006.40
212016266.01.3323681.0322100.030.60501.1401.69255.10
222016215.01.4514881.1856570.010.70001.1801.80005.60
232016233.01.6433911.2456610.040.82001.3002.10006.20
242016216.01.5531941.1440540.030.83751.2901.90006.80
252016238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " year count mean std min 25% 50% 75% max\n", "month day \n", "7 27 2016 272.0 1.617574 1.095349 0.06 0.9000 1.400 2.0000 5.60\n", " 28 2016 308.0 1.448149 0.896851 0.10 0.8775 1.240 1.8000 5.10\n", " 29 2016 309.0 1.640129 1.165952 0.01 0.8800 1.450 1.9000 7.70\n", " 30 2016 329.0 1.615076 1.262618 0.03 0.7000 1.240 2.0000 5.70\n", " 31 2016 278.0 1.750827 1.261577 0.10 0.9000 1.500 2.1475 5.90\n", "8 1 2016 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 6.10\n", " 2 2016 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 5.50\n", " 3 2016 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 5.10\n", " 4 2016 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 6.30\n", " 5 2016 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 5.70\n", " 6 2016 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 5.40\n", " 7 2016 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 5.10\n", " 8 2016 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 5.10\n", " 9 2016 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 5.09\n", " 10 2016 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 5.40\n", " 11 2016 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 7.20\n", " 12 2016 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 5.40\n", " 13 2016 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 5.20\n", " 14 2016 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 5.90\n", " 15 2016 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 5.40\n", " 16 2016 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 5.10\n", " 17 2016 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 5.20\n", " 18 2016 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 5.90\n", " 19 2016 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 7.40\n", " 20 2016 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 6.40\n", " 21 2016 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 5.10\n", " 22 2016 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 5.60\n", " 23 2016 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 6.20\n", " 24 2016 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 6.80\n", " 25 2016 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 5.90" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes_year_reset = magnitudes.reset_index(level=\"year\")\n", "magnitudes_year_reset" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yeardaycountmeanstdmin25%50%75%max
month
7201627272.01.6175741.0953490.060.90001.4002.00005.60
7201628308.01.4481490.8968510.100.87751.2401.80005.10
7201629309.01.6401291.1659520.010.88001.4501.90007.70
7201630329.01.6150761.2626180.030.70001.2402.00005.70
7201631278.01.7508271.2615770.100.90001.5002.14755.90
820161356.01.5200561.1573260.040.80001.2451.80256.10
820162292.01.5394181.0899460.050.80001.3001.90005.50
820163294.01.5563271.1473650.010.83001.2001.81505.10
820164420.01.2491901.0347380.050.60001.0001.58256.30
820165256.01.4287891.1442440.100.62001.1851.71505.70
820166316.01.3132281.0655870.090.56001.1001.62005.40
820167316.01.3569941.0785560.100.60001.1201.74255.10
820168335.01.4849251.1314950.020.63001.2001.99005.10
820169272.01.6147791.1641860.100.80751.3001.90005.09
8201610329.01.4047421.0387010.020.77001.1701.74005.40
8201611356.01.3905341.1591470.040.67751.1001.70007.20
8201612326.01.5332821.1586960.040.74001.2001.90005.40
8201613284.01.4219011.0803380.100.70001.1051.77755.20
8201614231.01.6926841.3721910.050.82001.2001.90005.90
8201615222.01.5839641.1575530.070.85501.3001.84005.40
8201616223.01.6299101.2231310.100.73001.3002.00005.10
8201617220.01.5836821.2036170.100.80001.2101.96755.20
8201618219.01.4997721.1594970.040.76001.2001.80505.90
8201619226.01.8194691.4162910.110.89251.3402.19007.40
8201620237.01.5532071.2962620.020.70001.2002.00006.40
8201621266.01.3323681.0322100.030.60501.1401.69255.10
8201622215.01.4514881.1856570.010.70001.1801.80005.60
8201623233.01.6433911.2456610.040.82001.3002.10006.20
8201624216.01.5531941.1440540.030.83751.2901.90006.80
8201625238.01.5193280.9260280.100.89001.4001.99505.90
\n", "
" ], "text/plain": [ " year day count mean std min 25% 50% 75% max\n", "month \n", "7 2016 27 272.0 1.617574 1.095349 0.06 0.9000 1.400 2.0000 5.60\n", "7 2016 28 308.0 1.448149 0.896851 0.10 0.8775 1.240 1.8000 5.10\n", "7 2016 29 309.0 1.640129 1.165952 0.01 0.8800 1.450 1.9000 7.70\n", "7 2016 30 329.0 1.615076 1.262618 0.03 0.7000 1.240 2.0000 5.70\n", "7 2016 31 278.0 1.750827 1.261577 0.10 0.9000 1.500 2.1475 5.90\n", "8 2016 1 356.0 1.520056 1.157326 0.04 0.8000 1.245 1.8025 6.10\n", "8 2016 2 292.0 1.539418 1.089946 0.05 0.8000 1.300 1.9000 5.50\n", "8 2016 3 294.0 1.556327 1.147365 0.01 0.8300 1.200 1.8150 5.10\n", "8 2016 4 420.0 1.249190 1.034738 0.05 0.6000 1.000 1.5825 6.30\n", "8 2016 5 256.0 1.428789 1.144244 0.10 0.6200 1.185 1.7150 5.70\n", "8 2016 6 316.0 1.313228 1.065587 0.09 0.5600 1.100 1.6200 5.40\n", "8 2016 7 316.0 1.356994 1.078556 0.10 0.6000 1.120 1.7425 5.10\n", "8 2016 8 335.0 1.484925 1.131495 0.02 0.6300 1.200 1.9900 5.10\n", "8 2016 9 272.0 1.614779 1.164186 0.10 0.8075 1.300 1.9000 5.09\n", "8 2016 10 329.0 1.404742 1.038701 0.02 0.7700 1.170 1.7400 5.40\n", "8 2016 11 356.0 1.390534 1.159147 0.04 0.6775 1.100 1.7000 7.20\n", "8 2016 12 326.0 1.533282 1.158696 0.04 0.7400 1.200 1.9000 5.40\n", "8 2016 13 284.0 1.421901 1.080338 0.10 0.7000 1.105 1.7775 5.20\n", "8 2016 14 231.0 1.692684 1.372191 0.05 0.8200 1.200 1.9000 5.90\n", "8 2016 15 222.0 1.583964 1.157553 0.07 0.8550 1.300 1.8400 5.40\n", "8 2016 16 223.0 1.629910 1.223131 0.10 0.7300 1.300 2.0000 5.10\n", "8 2016 17 220.0 1.583682 1.203617 0.10 0.8000 1.210 1.9675 5.20\n", "8 2016 18 219.0 1.499772 1.159497 0.04 0.7600 1.200 1.8050 5.90\n", "8 2016 19 226.0 1.819469 1.416291 0.11 0.8925 1.340 2.1900 7.40\n", "8 2016 20 237.0 1.553207 1.296262 0.02 0.7000 1.200 2.0000 6.40\n", "8 2016 21 266.0 1.332368 1.032210 0.03 0.6050 1.140 1.6925 5.10\n", "8 2016 22 215.0 1.451488 1.185657 0.01 0.7000 1.180 1.8000 5.60\n", "8 2016 23 233.0 1.643391 1.245661 0.04 0.8200 1.300 2.1000 6.20\n", "8 2016 24 216.0 1.553194 1.144054 0.03 0.8375 1.290 1.9000 6.80\n", "8 2016 25 238.0 1.519328 0.926028 0.10 0.8900 1.400 1.9950 5.90" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "magnitudes_day_year_reset = magnitudes.reset_index(level=[\"year\", \"day\"])\n", "magnitudes_day_year_reset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Food for thought:** When might it be preferable to use a `MultiIndex`? When might it not be preferable?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ⏸️ Pause and 🧠 Think\n", "\n", "Take a moment to review the following concepts and reflect on your own understanding. A good temperature check for your understanding is asking yourself whether you might be able to explain these concepts to a friend outside of this class.\n", "\n", "Here's what we covered in this lesson:\n", "\n", "* `groupby`\n", "* `apply`\n", "* Hierarchical indexing and `MultiIndex`\n", "\n", "Here are some other guiding exercises and questions to help you reflect on what you've seen so far:\n", "\n", "1. In your own words, write a few sentences summarizing what you learned in this lesson.\n", "2. What did you find challenging in this lesson? Come up with some questions you might ask your peers or the course staff to help you better understand that concept.\n", "3. What was familiar about what you saw in this lesson? How might you relate it to things you have learned before?\n", "4. Throughout the lesson, there were a few **Food for thought** questions. Try exploring one or more of them and see what you find.\n", "\n", "## In-Class\n", "\n", "When you come to class, we will work together on the problems in `ufos.ipynb`. We will also need `ufos.csv` for these tasks. Make sure that you have a way of editing and running these files!\n", "\n", "## Canvas Quiz\n", "\n", "All done with the lesson? Complete the [Canvas Quiz linked here!](https://canvas.uw.edu/courses/1860345/quizzes/2334842)\n" ] } ], "metadata": { "kernelspec": { "display_name": "cse163", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.9" } }, "nbformat": 4, "nbformat_minor": 2 }