{ "cells": [ { "cell_type": "markdown", "id": "3fde23c2", "metadata": {}, "source": [ "# CSVs and Visuzlizations\n", "\n", "In this lesson:\n", "\n", "1. Reading and filtering CSV files with `csv.DictReader`.\n", "2. Visualizing the resulting data." ] }, { "cell_type": "markdown", "id": "f9af310b", "metadata": {}, "source": [ "## `csv.DictReader`\n", "\n", "To recap, `csv.DictReader` takes a file object and returns a list-like representation of the file, where the following is true:\n", "\n", "* the first line of the file is treated as the keys in a dictionary.\n", "* each subsequent line is treated as one item in a list\n", "* each line is split on `,` and the results stored as the values in the dictionary\n", "\n", "The end result of this is a list of dictionaries, all of which have the same keys." ] }, { "cell_type": "code", "execution_count": null, "id": "72701bda", "metadata": {}, "outputs": [], "source": [ "import csv\n", "with open('covid-history-by-state.csv') as file:\n", " reader = csv.DictReader(file)\n", " print(list(reader)[:10])" ] }, { "cell_type": "markdown", "id": "264bd81c", "metadata": {}, "source": [ "### Aside: \"pretty printing\"\n", "\n", "`pprint` is a tool that can print things in a much more friendly way. First, you must include `from pprint import pprint`, then use `pprint` instead of `print` (notice the two `p`s instead of one!)" ] }, { "cell_type": "code", "execution_count": null, "id": "2a109a96", "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", "with open('covid-history-by-state.csv') as file:\n", " reader = csv.DictReader(file)\n", " pprint(list(reader)[:10])" ] }, { "cell_type": "markdown", "id": "7b9775b0", "metadata": {}, "source": [ "## Sorting and filtering the data\n", "\n", "Suppose we then just want to get the first 12 months for the state of Washington ('WA'). We'll need to loop through and only look at those rows with a key `state` of `WA` first, then order by year and month." ] }, { "cell_type": "code", "execution_count": null, "id": "41b1c44f", "metadata": {}, "outputs": [], "source": [ "data = []\n", "with open('covid-history-by-state.csv') as file:\n", " reader = csv.DictReader(file)\n", " for row in reader:\n", " if row['state'] == 'WA':\n", " data.append(row)\n", "\n", "pprint(data)" ] }, { "cell_type": "markdown", "id": "b75ea765", "metadata": {}, "source": [ "### Aside: Sorting\n", "\n", "In this case, the data happens to already be sorted, but if it weren't, we could do it ourselves with the following:" ] }, { "cell_type": "code", "execution_count": null, "id": "9799de3b", "metadata": {}, "outputs": [], "source": [ "data.sort(key=lambda x: (x['year'], x['month']))\n", "data" ] }, { "cell_type": "markdown", "id": "1f8e5ff7", "metadata": {}, "source": [ "We can verify this actually sorts things by randomly shuffling the list and then sorting it again:" ] }, { "cell_type": "code", "execution_count": null, "id": "ff1f834c", "metadata": {}, "outputs": [], "source": [ "import random\n", "random.shuffle(data)\n", "data" ] }, { "cell_type": "code", "execution_count": null, "id": "8330e1a8", "metadata": {}, "outputs": [], "source": [ "data.sort(key=lambda x: (x['year'], x['month']))\n", "pprint(data)" ] }, { "cell_type": "markdown", "id": "74155f0c", "metadata": {}, "source": [ "Sorting works by comparing items in a collection to determine which comes first. In the case of a simple list of numbers, it's fairly clear what the result is:" ] }, { "cell_type": "code", "execution_count": null, "id": "928dbe07", "metadata": {}, "outputs": [], "source": [ "numbers = [3, 6, 1, -1, 0, -8, 42]\n", "numbers.sort()\n", "numbers" ] }, { "cell_type": "markdown", "id": "b4a07bdd", "metadata": {}, "source": [ "With numbers, we compare using `<`, giving us clear ordering: -1 < 1 < 3 and so on. That same logic holds even with strings, if we consider \"<\" (less than) to actually mean \"comes before\":\n", "\n", "```py\n", "words = [\"zebra\", \"alligator\", \"pig\", \"dog\", \"aardvark\"]\n", "words.sort()\n", "print(words)\n", "# ['aardvark', 'alligator', 'dog', 'pig', 'zebra']\n", "```\n", "\n", "It's much less clear when we're working with dictionaries. Let's reset the data, then try to sort it again." ] }, { "cell_type": "code", "execution_count": null, "id": "e5010970", "metadata": {}, "outputs": [], "source": [ "data = []\n", "with open('covid-history-by-state.csv') as file:\n", " reader = csv.DictReader(file)\n", " for row in reader:\n", " if row['state'] == 'WA':\n", " data.append(row)" ] }, { "cell_type": "markdown", "id": "ba22c352", "metadata": {}, "source": [ "What happens if we try to sort the data, comparing dictionaries to see which comes first?\n", "\n", "For example, given these two dictionaries, which one would you say comes first?\n", "\n", "```python\n", "{'year': '2020', 'month': '1', 'state': 'WA', 'positive': '0'}\n", "{'year': '2020', 'month': '1', 'state': 'WA', 'positive': '1'}\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "41a4328b", "metadata": {}, "outputs": [], "source": [ "data.sort()" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.3" } }, "nbformat": 4, "nbformat_minor": 5 }