# Sets

Sets are a new (to us!) data structure similar to lists in that they contain a bunch of other things. The two main distinguishing factors for sets are:

1. Sets are _unordered_.
2. Sets only include _unique_ values.

As with all our data structures, we'll review how to create, query, and modify sets.

## Creating sets

New sets can be created either with the function `set()`, or with comma-separated items in between curly braces (`{}`). (This mirrors how we can create a new list either with `[1, 2, 3]` or using the `list()` function.)

In [None]:
s1 = {1, 2, 3, 3, 1, 2, 1, 1, 1}

In [None]:
s2 = set([1, 2, 3])

In [None]:
s1 == s2

**Note:** `{}` does _not_ create an empty set. Instead, it creates a new dictionary. We'll be covering dictionaries in the near future.

## Querying Sets

In getting the data out of a set, here's where we start to see another key difference between sets and lists: ordering.

Sets are defined as being unordered, which means it's an error to try and get the "first" item out. (If there's no order, what does "first" mean anyways?)

In [None]:
s1 = {1, 2, 3, 4, 5, 6}
s1[0] # ERROR!!

Instead, sets are normally worked on using collective operations. These set operations inherit from the world of _Math_ and evaluate to new sets:

In [None]:
z = {5, 6, 7, 8}
y = {1, 2, 3, 1, 5}
k = z & y # named version: z.intersection(y)
j = z | y # named version: z.union(y)
m = y - z # named: y.difference(z)
n = z - y # named: z.difference(y)

In [None]:
{1, 2, 3} & {2, 3, 4} # intersection

In [None]:
{1, 2, 3} | {2, 3, 4} # union

In [None]:
{1, 2, 3} - {2, 3, 4} # difference

## Modifying Sets

When it comes to modifying sets, we typically just add or remove one element at a time.

### Adding items

In [None]:
s1 = {1, 2, 3}

In [None]:
s1.add(4)

In [None]:
s1 = s1 | {5}

### Removing Items

In [None]:
s1 = {1, 2, 3}

In [None]:
s1.remove(3)

In [None]:
s1.discard(4)

In [None]:
s1 = s1 - {2}

## Practice: Sets and Files

Write a function that returns the number of unique words in a file.

In [None]:
def unique_words(filename):
 ...

Then, modify that function to only return the unique words that _aren't_ the common words: a, an, and, the. 