{ "cells": [ { "cell_type": "markdown", "id": "039de857", "metadata": {}, "source": [ "# Sets\n", "\n", "Sets are a new (to us!) data structure similar to lists in that they contain a bunch of other things. The two main distinguishing factors for sets are:\n", "\n", "1. Sets are _unordered_.\n", "2. Sets only include _unique_ values.\n", "\n", "As with all our data structures, we'll review how to create, query, and modify sets." ] }, { "cell_type": "markdown", "id": "b186c80b", "metadata": {}, "source": [ "## Creating sets\n", "\n", "New sets can be created either with the function `set()`, or with comma-separated items in between curly braces (`{}`). (This mirrors how we can create a new list either with `[1, 2, 3]` or using the `list()` function.)" ] }, { "cell_type": "code", "execution_count": null, "id": "2c263bd7", "metadata": {}, "outputs": [], "source": [ "s1 = {1, 2, 3, 3, 1, 2, 1, 1, 1}" ] }, { "cell_type": "code", "execution_count": null, "id": "a35be48d", "metadata": {}, "outputs": [], "source": [ "s2 = set([1, 2, 3])" ] }, { "cell_type": "code", "execution_count": null, "id": "f60ddaa1", "metadata": {}, "outputs": [], "source": [ "s1 == s2" ] }, { "cell_type": "markdown", "id": "7ce14d42", "metadata": {}, "source": [ "**Note:** `{}` does _not_ create an empty set. Instead, it creates a new dictionary. We'll be covering dictionaries in the near future." ] }, { "cell_type": "markdown", "id": "ecae3f46", "metadata": {}, "source": [ "## Querying Sets\n", "\n", "In getting the data out of a set, here's where we start to see another key difference between sets and lists: ordering.\n", "\n", "Sets are defined as being unordered, which means it's an error to try and get the \"first\" item out. (If there's no order, what does \"first\" mean anyways?)" ] }, { "cell_type": "code", "execution_count": null, "id": "02ec33e6", "metadata": {}, "outputs": [], "source": [ "s1 = {1, 2, 3, 4, 5, 6}\n", "s1[0] # ERROR!!" ] }, { "cell_type": "markdown", "id": "193d350e", "metadata": {}, "source": [ "Instead, sets are normally worked on using collective operations. These set operations inherit from the world of _Math_ and evaluate to new sets:" ] }, { "cell_type": "code", "execution_count": null, "id": "f4d55d23", "metadata": {}, "outputs": [], "source": [ "z = {5, 6, 7, 8}\n", "y = {1, 2, 3, 1, 5}\n", "k = z & y # named version: z.intersection(y)\n", "j = z | y # named version: z.union(y)\n", "m = y - z # named: y.difference(z)\n", "n = z - y # named: z.difference(y)" ] }, { "cell_type": "code", "execution_count": null, "id": "13703051", "metadata": {}, "outputs": [], "source": [ "{1, 2, 3} & {2, 3, 4} # intersection" ] }, { "cell_type": "code", "execution_count": null, "id": "811dbff2", "metadata": {}, "outputs": [], "source": [ "{1, 2, 3} | {2, 3, 4} # union" ] }, { "cell_type": "code", "execution_count": null, "id": "357dc123", "metadata": {}, "outputs": [], "source": [ "{1, 2, 3} - {2, 3, 4} # difference" ] }, { "cell_type": "markdown", "id": "b98cb512", "metadata": {}, "source": [ "## Modifying Sets\n", "\n", "When it comes to modifying sets, we typically just add or remove one element at a time." ] }, { "cell_type": "markdown", "id": "ead9c4fe", "metadata": {}, "source": [ "### Adding items" ] }, { "cell_type": "code", "execution_count": null, "id": "7f83272f", "metadata": {}, "outputs": [], "source": [ "s1 = {1, 2, 3}" ] }, { "cell_type": "code", "execution_count": null, "id": "93183f44", "metadata": {}, "outputs": [], "source": [ "s1.add(4)" ] }, { "cell_type": "code", "execution_count": null, "id": "a69104c1", "metadata": {}, "outputs": [], "source": [ "s1 = s1 | {5}" ] }, { "cell_type": "markdown", "id": "cdcfbb21", "metadata": {}, "source": [ "### Removing Items" ] }, { "cell_type": "code", "execution_count": null, "id": "c19c9877", "metadata": {}, "outputs": [], "source": [ "s1 = {1, 2, 3}" ] }, { "cell_type": "code", "execution_count": null, "id": "720971a1", "metadata": {}, "outputs": [], "source": [ "s1.remove(3)" ] }, { "cell_type": "code", "execution_count": null, "id": "0b820fa2", "metadata": {}, "outputs": [], "source": [ "s1.discard(4)" ] }, { "cell_type": "code", "execution_count": null, "id": "63f5dc11", "metadata": {}, "outputs": [], "source": [ "s1 = s1 - {2}" ] }, { "cell_type": "markdown", "id": "fb17208a", "metadata": {}, "source": [ "## Practice: Sets and Files\n", "\n", "Write a function that returns the number of unique words in a file." ] }, { "cell_type": "code", "execution_count": null, "id": "bf0f1fa2", "metadata": {}, "outputs": [], "source": [ "def unique_words(filename):\n", " ..." ] }, { "cell_type": "markdown", "id": "c58cc9b9", "metadata": {}, "source": [ "Then, modify that function to only return the unique words that _aren't_ the common words: a, an, and, the. " ] }, { "cell_type": "markdown", "id": "9784680f", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.3" } }, "nbformat": 4, "nbformat_minor": 5 }