Lesson 1.3: Data Structures

Objectives¶

File processing opens up new opportunities to work with more complex data than the numbers and strings that we’ve been typing into py scripts. But how do we represent this complex data? Data structures such as lists can represent complex data. While lists are quite useful on their own, Python provides several other built-in data structures to make it easier to represent complex data. By the end of this lesson, students will be able to:

Apply list comprehensions to define basic list sequences.
Apply set operations to store and retrieve values in a set.
Apply dict operations to store and retrieve values in a dictionary.
Describe the difference between the various data structures’ properties (list, set, dict, tuple).
Define type annotations for parameters and returns.

List Comprehensions¶

Recall that we have previously used lists to represent indexed sequences of values, which can store values of any type:

# Create a list
l = [1, 2, 'hello']

# Print a list and get a value
print(l)
print(l[1])  # Lists, like str, are 0-indexed

# You can also use slicing on lists, just like str
print(l[1:2])  # [2]
print(l[:2])   # [1, 2]

# You can use assignment to update values in a list (can't with a str)
l[1] = 'dogs'
print(l)

# Two ways to loop over a list
for i in range(len(l)):
    print(l[i])

for val in l:
    print(val)

# Build up a list programmatically, fun_numbers from last time
start = 2
stop = 16
result = []  # Empty list
for i in range(start, stop):
    if i % 2 == 0 or i % 5 == 0:
        result.append(i)
print(result)

[1, 2, 'hello']
2
[2]
[1, 2]
[1, 'dogs', 'hello']
1
dogs
hello
1
dogs
hello
[2, 4, 5, 6, 8, 10, 12, 14, 15]

So far, we’ve shown how to specify a list by hand, using the following syntax:

nums = [1, 2, 3]
print(nums)

[1, 2, 3]

What if I wanted to make a list of the numbers from 1 to 10? We could do that, but it would take a little bit more typing.

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(nums)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

But, what if we wanted a list of numbers from 1 to 100? That’s a ridiculous amount of typing! We could instead build up the list programmatically. Here is an example using the list methods we have learned.

nums = []
for i in range(1, 101): 
    nums.append(i)
print(nums)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]

This is handy because range defines a sequence of numbers we are interested in, and then we just append them to our list. But these two ways are not the only ways to define a list data structure!

List Comprehension Syntax¶

This is a very common pattern in Python, so it provides some nice syntax called a list comprehension to help you build up these lists. We first show the code and then explain the parts.

nums = [i for i in range(1, 101)]
print(nums)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]

This is very similar to our first approach, where we spelled out what was inside the list, like in [1, 2, 3]. We read list comprehensions from the inside to the outside.

nums = [                   # 3) Store the result in a list called nums
    i                      # 2) The value you will put in the list
    for i in range(1, 101) # 1) What you are looping over
]
print(nums)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]

This is just a compact syntax for writing the full loop we showed earlier! It’s a bit weird when you first see it that the for comes after the value, but you get used to it with practice. This syntax can be very handy for specifying things really quickly.

One of the nice things about list comprehensions is they let you pretty easily transform your values before putting them in the list. In the last example, we just put i in the list but that isn’t the only option! For example, what if we wanted to put in the squares of all the even numbers between 1 and 10? We could write a list comprehension for that too!

squares = [i ** 2 for i in range(1, 11)]
print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Again, you should read it in the order we showed above:

What are you looping over? range(1, 11) using loop variable i
What value are you storing in the result? i ** 2 (i.e. $i^2$ )
What are we storing the result as? A list named squares

One last thing list comprehensions allow you to do is filter values from the original sequence to just the ones you want! For example, what if we only wanted the squares of numbers divisible by 3 between 1 and 10? Now you could implement this specific task by changing the range call to include a step-size, but you could imagine having a more complex condition that you can’t simply solve it using that approach.

Take a second to think about how we would write this without a list comprehension.

It would be similar to our approach above that uses a call to append, but would be different in that it has an if-statement to only append some times. It would look something like the snippet below.

squares = []
for i in range(1, 11):
    if i % 3 == 0:  # This is the new addition!
        squares.append(i ** 2)
print(squares)

[9, 36, 81]

Python provides a syntax for conditionally including a value in a list comprehension using an if statement inside the comprehension. Again, the syntax looks a bit weird at first.

squares = [i ** 2 for i in range(1, 11) if i % 3 == 0]
print(squares)

[9, 36, 81]

Like before, we will write this out in another way to show what is going on:

squares = [               # 4) Store the result in a list called squares
    i ** 2                # 3) What value should be stored in the result?
    for i in range(1, 11) # 1) What you are looping over
    if i % 3 == 0         # 2) Should we include the value in the result?
]
print(squares)

[9, 36, 81]

Tuples¶

Let’s focus on two key properties of the list data structure.

Lists have integer indices that order the elements in the list.
Lists are mutable: unlike a string, elements can be added or removed at any index in a list.

A different data structure in our programming toolbox is the tuple (pronounced either like “two pull” or to rhyme with “supple”). A tuple is much like a list in that it has integer indices, but it is different in that it is immutable. A tuple will have a pre-defined number of values inside of it, and you can’t modify them!

Lists and tuples look very similar but have some key differences.

Tuples don’t have any meaningful methods like list does since you cannot modify them.
While lists are defined with square brackets ([]), tuples are defined with commas alone: 1, 2, 3. We can also add parentheses around the structure for clarity. In fact, when displaying a tuple, Python will show parentheses to indicate that the tuple is a single unit!

To access values in a tuple, you can index into them just like lists. To prove to you that you can’t modify tuples, compare the list output to the tuple output:

l = [1, 2, 3]
print('l[0] =',l[0])
print('l before', l)
l[1] = 14
print('l  after', l)

l[0] = 1
l before [1, 2, 3]
l  after [1, 14, 3]

t = (1, 2, 3)
print('t[0] =', t[0])
# Though they display with parentheses for clarity
print('t before', t)
# Tuples cannot be modified
t[1] = 14
print('t  after', t)

t[0] = 1
t before (1, 2, 3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[24], line 6
      4 print('t before', t)
      5 # Tuples cannot be modified
----> 6 t[1] = 14
      7 print('t  after', t)

TypeError: 'tuple' object does not support item assignment

Food for thought: Why would we want to use a tuple instead of a list? What does the use of a tuple communicate to other programmers who might be using your code or data?

Unpacking Tuples¶

Tuples normally appear as a way to return more than one value from a function (see the five_number_summary function in THA 1 for an example!). For example, we can write a function that returns both the first and second letter from a word. This is all done with Python returning and unpacking a tuple.

def first_two_letters(word):
    return word[0], word[1]

# Unpack the result into a and b
a, b = first_two_letters('goodbye')
print(a)
print(b)

g
o

One nice feature Python allows you to do is to unpack a tuple so that you can give a variable name to each component rather than having to specify the values by index (i.e. t[2]).

t = (4, 5, 6)
print(t[1] + t[2])

a, b, c = t  # "Unpacks" t so that each element gets a variable name
print(b + c)

11
11

The number of assignment targets on the left side must match with the length of the tuple. The following two snippets show what happens when you try to unpack too many or too few items.

t = (4, 5, 6)
a, b, c, d = t  # Try to unpack it into 4 variables
print(b + c)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[27], line 2
      1 t = (4, 5, 6)
----> 2 a, b, c, d = t  # Try to unpack it into 4 variables
      3 print(b + c)

ValueError: not enough values to unpack (expected 4, got 3)

t = (4, 5, 6)
a, b = t  # Try to only unpack the first 2 values
print(a)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[28], line 2
      1 t = (4, 5, 6)
----> 2 a, b = t  # Try to only unpack the first 2 values
      3 print(a)

ValueError: too many values to unpack (expected 2)

If you only want to use a few values, a common technique is to use a name like _ that indicates it won’t be used elsewhere.

t = (4, 5, 6)
a, b, _ = t  # Unpack all 3 values but assign the last one to a placeholder
print(a)

This isn’t any special syntax! This is just using the character _ as the variable name (just like we can use the name b for a variable name)! For example, you could theoretically print(_) and it would print 6 in the snippet above. It’s not conventional to do this though, since using _ as a variable name indicates you don’t care to use that value!

Sets¶

Let’s look at another data structure called a set that is designed with the idea of finding a set of unique values in mind. A set is like a list in the sense that it is a sequence of values, but differs in two major ways:

The set does not allow duplicate values.
The set does not have a notion of indices. There is no “index 0”, in a set. The mindset you should have a set is an unordered “bag” of values. The set does have some internal ordering in which it stores the values, but you can’t access any particular value inside of it.

set is a type defined in Python that implements this behavior. The following cell shows how to create one (using set() to make an empty set). The set class has the following methods (assume stored in a variable called s):

s.add(x) which adds x to s (ignores duplicates)
s.remove(x) removes x from s
s.clear() removes all values from s

nums = set()  # Creates an empty set
print(nums)

nums.add(1)
nums.add(2)
nums.add(3)
nums.add(4)
nums.add(2)  # Is a duplicate, so it's ignored
nums.add(-1)

print(nums)

print(set([1,1,1,2,3,4,4]))

set()
{1, 2, 3, 4, -1}
{1, 2, 3, 4}

Also helpful, sets also support the len function and allow you to use the in keyword to see if they contain a value.

print(len(nums))

if 5 in nums:
    print('Found it')
else:
    print('Did not find it')

5
Did not find it

However, like we said earlier the set does not have a notion of an index. That means if you try to run the following cell, you will get an error.

print(nums[0])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[32], line 1
----> 1 print(nums[0])

TypeError: 'set' object is not subscriptable

So what’s the point of using a set over a list? It turns out the way it is implemented internally, it does an incredibly fast job at determining whether a particular element is inside of it so as to avoid adding duplicates. By losing the ability to “index” into the set, we gain speed since it can store the data in its own special way to optimize these “membership queries” (i.e. “is this element in this set?”).

To show this, let’s revisit count_unique_words from Lesson 3 with basically all of the same code above, except:

Change the unique_words to a set instead of a list.
Remove the in check since the set just skips duplicates.

def count_unique_words_set(file_name):
    unique_words = set()
    with open(file_name) as f:
        for line in f.readlines():
            for word in line.split():
                unique_words.add(word)
    return len(unique_words)

Dictionaries¶

Suppose we wanted to perform a slightly more complex analysis. Instead of finding the number of unique words, what if we want to count the occurrences of each word in a file? It’s not clear how you could use a list or set to solve the problem, “For a given word, how many of them have we seen?” A list seems like it’s more on the right track, but unfortunately, the indices have to be numbers! There is no way of using a list to say that a word should be an index.

The last data structure we are going to learn in this lesson is called a dictionary, represented in Python as dict. A dict is a very powerful data structure since it acts, in some sense, as a more generalized list. Essentially a dict is much like a list, but allows you to store any data type as the index, while a list only allows integers from 0 to len - 1.

To create a dict in Python, you use the syntax in the following snippet. Note that dict supports the square-bracket notation for accessing a value, but now you can use any value for the index. In fact, dict uses a different term for the index to reduce confusion with lists: we call the “index” of an entry in a dict its key. We describe a dict as a collection of key/value pairs that are accessible via the key.

d = {'a': 1, 'b': 17, 47: 'scurvy'}
print(d)
# This makes a dictionary with the following keys/values:
#   The key 'a' is associated with the value 1
#   The key 'b' is associated with the value 17
#   The key  47 is associated with the value 'scurvy'

# You can get/set the value for a key using the square-bracket notation
print(d['b'])

d['dogs'] = 'cute'
print(d)

# If a key already exists in the dict, it will be overwritten if you set it
d['dogs'] = 'very adorable'
print(d)

{'a': 1, 'b': 17, 47: 'scurvy'}
17
{'a': 1, 'b': 17, 47: 'scurvy', 'dogs': 'cute'}
{'a': 1, 'b': 17, 47: 'scurvy', 'dogs': 'very adorable'}

The nice thing is you have a pretty solid understanding of how to use a dict already because you know how to use lists! The semantics of accessing/setting a value associated to a key are very similar to accessing/setting a value associated to an index in a list.

If you try to look up a key that is not in the dict, you will run into a KeyError. As a note, we also show how to make an empty dict with the syntax {} (just like an empty list is []).

d = {}
d['dogs'] = 'very cute'
print(d['cats'])

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[35], line 3
      1 d = {}
      2 d['dogs'] = 'very cute'
----> 3 print(d['cats'])

KeyError: 'cats'

To prevent this error, you can use the in keyword to see if a key is in a dict before trying to access it.

d = {}
d['dogs'] = 'very cute'
if 'cats' in d:
    print(d['cats'])
else:
    print('No cats!')

No cats!

Example: Counting Word Lengths¶

Imagine we had a list of strings, and we wanted to find sum of the word lengths that start with each letter. For example, with the list ['cats', 'dogs', 'deers'] we would report the sum of the lengths of strings that start with 'c' is 4 while the sum of the lengths of strings that start with 'd' is 9. We will write a function called count_lengths to solve this problem. The function should take a list of words (all str) and we can assume none of the str are the empty string.

This seems like the task of a dict where the keys are the first letters of the words, and the values are the sum of the lengths. Let’s try to write a function to use the things we have seen so far to do this!

def count_lengths(words):
    counts = {}
    for word in words:
        first_letter = word[0]
        counts[first_letter] = counts[first_letter] + len(word)
    return counts
        
print(count_lengths(['cats', 'dogs', 'deers']))

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[37], line 8
      5         counts[first_letter] = counts[first_letter] + len(word)
      6     return counts
----> 8 print(count_lengths(['cats', 'dogs', 'deers']))

Cell In[37], line 5, in count_lengths(words)
      3 for word in words:
      4     first_letter = word[0]
----> 5     counts[first_letter] = counts[first_letter] + len(word)
      6 return counts

KeyError: 'c'

We ran into an error! What happened?

It turns out we crashed on the first word in the list, 'cats'. We get the first letter 'c' and we try to get the value in the dictionary associated to the key 'c' when we evaluate counts[first_letter] + len(word). Remember though, if a key is not present, we get a KeyError, which is exactly what happened in this snippet!

To fix this, we need to introduce a common pattern when working with dicts. If you are ever adding values to a dict, you commonly need to think about these cases:

This is the first time we have seen the key
We have seen the key before

Depending on which case you are in, you need to write different code to handle the fact that the key is not present in the dict in the first case. We can easily fix this by introducing a check that uses in. All of the added code is inside the loop, and is there to avoid getting this KeyError.

def count_lengths(words):
    counts = {}
    for word in words:
        first_letter = word[0]
        if first_letter in counts:
            counts[first_letter] = counts[first_letter] + len(word)
        else:
            counts[first_letter] = len(word)
    return counts
        
print(count_lengths(['cats', 'dogs', 'deers']))

{'c': 4, 'd': 9}

Food for thought: How would you describe what the if check is doing in English?

Type Annotations¶

By now, we have learned a lot of different data types and data structures. Many students in this class come from programming backgrounds in a language where you have to declare the types of your variables (e.g., Java). Python was created with the idea of flexibility, so that any variable can store a value of any type, so you don’t need type declarations.

However, Python has more recently started adding notions of defining types for variables to help spot certain types of bugs earlier. They’ve added the ability to annotate variables with a type to help indicate what type of values should be stored there. They are still relatively early in development, so these annotations are purely informational. Still, future versions of Python will allow more complex checks to occur based on these type annotations.

In CSE 163, we will ask you to use these type annotations for parameters and returns of a function. We will not expect you to annotate local variables. This matches what many Python developers do, where they add type annotations to function headers to document their methods better but do not do so on their local variables.

Here is an example method with type annotations. The annotation is read param_name: type and for the return type, it is the type that follows the ->. The syntax says that the parameter s is type str, the parameter times is type int and the return type of the method is a str.

def repeat(s: str, times: int) -> str:
    """
    Returns a new string that has the contents of s repeated 
    times times
    """

    result = ""
    for i in range(times):
        result += s
    return result

Adding these annotations makes it clear to someone reading the method exactly what types should be passed in. Adding these does not mean we don’t have to document our parameters and returns in our comments. It’s just an added note indicating what values should be. Like comments and doc-strings, type annotations do not affect the behavior of your code; it just provides more information to the user what values should be given and what values will be returned.

Annotating Data Structures¶

With data structures, there is an extra step with type annotations: we need to indicate what are the types inside the structure.

For example, if you have a method that takes a list of strings and prints out the first character of each string, it would be inappropriate to type the parameter as a list because we then don’t know that the values inside the list must be strs. So we need to modify our type annotation to say list[str] (square brackets important) to indicate that it is a list containing str elements.

def print_first_characters(words: list[str]) -> None:
    """
    Takes in a list of words and prints the first
    character in each word
    """
    for word in words:
        print(word[0])

Here are some other example data structure type annotations:

list[float] - A list of float elements:
set[int] - A set of int elements
dict[str, int] - A dict with str keys and int values (Note we have to specify the type for the key and the value, separated by a comma)
dict[str, int | str] - A dict with str keys and values that are either int or str (Note that we indicate the possibility of multiple data types with the |)
tuple[int, str, float] - A tuple of length 3 with the first element an int, the second a str, and the third a float.

Sometimes, though, you can’t come up with a good type annotation for a parameter or a value inside of a data structure. For example, maybe you have a list with heterogeneous types (mixed types) where you can’t constrain them to a set of known types; maybe it can contain bools or ints or floats or str and a myriad of other types that would be too hard to list out.

In this case, Python has a fallback where you can say the type of a value (or the type inside of a data structure) is Any type. This is generally a bad practice unless necessary, as it defeats the purpose of specifying types in the first place. That being said, here’s how you can do it. Note the import statement at the top! This is necessary to gain access to Any, since it is not included by default. Don’t worry about the details for now, we’ll formally introduce import statements soon!

from typing import Any

def example(x: list[Any]) -> dict[str, Any]:
    ...

You are not responsible for using type annotations on Homework 1. However, starting from Homework 2, we will expect that your code has the correct type annotations! See more details in the Code Quality Guide.

Food for thought: How would you type-annotate the other functions that you’ve seen so far in this class?

Testing with assert¶

Throughout these lessons, we’ve been checking our work by calling a function and printing its result to see if it looks right. That works, but it has downsides: we have to re-read the output by eye every time, and it’s easy to miss when a small change quietly breaks something we fixed earlier.

A better approach is to write tests that check our code automatically. Python gives us a statement called assert for exactly this purpose. You write the keyword assert followed by a boolean expression (something that evaluates to True or False):

assert <boolean expression>

The behavior of an assert statement is simple:

If the expression is True, nothing happens and your program keeps going.
If the expression is False, your program immediately stops with an AssertionError.

In other words, an assert is a way of saying “I claim this statement should be True. If it isn’t, then something is wrong, so stop the program.”

def double(x):
    """
    Returns x multiplied by 2.
    """
    return x * 2


# This assertion passes, so nothing happens
assert double(4) == 8

# This assertion fails, so we get an AssertionError
assert double(4) == 9

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[42], line 12
      9 assert double(4) == 8
     11 # This assertion fails, so we get an AssertionError
---> 12 assert double(4) == 9

AssertionError:

Notice that the passing assertion produced no output at all. When you run a file full of tests and see nothing, that’s good news since every check passed. The only time an assert speaks up is when a check fails.

Testing functions that return a float¶

There’s one common situation where comparing with == will surprise you: float values. Because of how computers store decimal numbers, arithmetic on floats isn’t always exact. Take a second to predict the output of the cell below before running it.

print(0.1 + 0.2)
print(0.1 + 0.2 == 0.3)

0.30000000000000004
False

Surprised? Mathematically 0.1 + 0.2 is 0.3, but the computer stores a value that is just barely off, so the == comparison comes out False. This means we should not use == to compare float values in our tests.

Instead, Python’s math module provides a function called isclose that checks whether two numbers are close enough to be considered equal. We import math once at the top of our file and then call math.isclose(a, b), which returns True when a and b are approximately equal. Again, just like with the Any type annotation, don’t worry too much about how importing code works for now, we’ll explain this in more depth next week!

import math

print(math.isclose(0.1 + 0.2, 0.3))

True

So here’s our rule of thumb for tests:

Use == for values that are exact, like int, str, bool, and list.
Use math.isclose(...) whenever you’re comparing float values.

Food for thought: Why might math.isclose be a more reliable choice than rounding both numbers to a few decimal places and then comparing them?

Putting tests in a testing function¶

It’s good practice to gather all the assert statements for a function into their own testing function, and then call that testing function from main. By convention, the testing function for a function named foo is named test_foo.

import math


def mean(a, b):
    """
    Returns the average of a and b.
    """
    return (a + b) / 2


def test_mean():
    """
    Tests the mean function using assert statements.
    """
    assert math.isclose(mean(2, 4), 3)
    assert math.isclose(mean(10, 20), 15)
    assert math.isclose(mean(0.1, 0.2), 0.15)


def main():
    test_mean()
    print('All tests passed!')


if __name__ == '__main__':
    main()

All tests passed!

When we run this program, main calls test_mean, which runs all three assertions. If every assertion passes, control returns to main and we see “All tests passed!”. If any assertion fails, the program stops at that line with an AssertionError, and “All tests passed!” is never printed.

Since mean returns a float, we used math.isclose for every assertion rather than ==, following the rule of thumb from above.

You’ll use exactly this pattern on Homework 1: you’ll write a testing function with assert statements for each function you implement and call those testing functions from main. Keep in mind that when writing tests for code you write, you shouldn’t just arbitrarily write random assert statements. Each test should be meaningful, and should verify some aspect of the code’s expected behavior. Try to test for edge cases, or cases in which your code might break or return the wrong output!

To solidify your understanding of assert statements, here are some more functions and examples of their corresponding test cases:

def double_string(s):
    """
    Returns the given string doubled.
    """
    return s + s

def add_item(lst, item):
    """
    Adds the given item to the list and returns it.
    """
    lst.append(item)
    return lst

def is_non_negative(num):
    """
    Returns True if the given number is positive, False if not.
    """
    return num >= 0

def empty(lst):
    """
    Returns None if the given list is empty, otherwise returns the original list.
    """
    if len(lst) == 0:
        return None
    else:
        return lst

# This is an example of using asserts with strings
def test_double_string():
    """
    Tests the double_string function.
    """
    assert double_string("hello") == "hellohello"
    assert double_string(" ") == "  "
    assert double_string("") == ""

# This is an example of using asserts with lists
def test_add_item():
    """
    Tests the add_item function.
    """
    assert add_item([1, 2, 3], 4) == [1, 2, 3, 4]
    assert add_item(["hello"], "world") == ["hello", "world"]

# This is an example of using asserts with integers and the negation operator
def test_is_non_negative():
    """
    Tests the is_non_negative function.
    """
    assert is_non_negative(100)
    assert is_non_negative(0)
    assert not is_non_negative(-1)

# This is an example of using asserts with None
def test_empty():
    """
    Tests the empty function.
    """
    assert empty([1, 2, 3]) == [1, 2, 3]
    assert empty([1, 2, 3]) is not None
    assert empty([]) is None

def main():
    test_double_string()
    test_add_item()
    test_is_non_negative()
    test_empty()
    print("All tests passed!")

if __name__ == "__main__":
    main()

All tests passed!

⏸️ Pause and 🧠 Think¶

Take a moment to review the following concepts and reflect on your own understanding. A good temperature check for your understanding is asking yourself whether you might be able to explain these concepts to a friend outside of this class.

Here’s what we covered in this lesson:

List comprehensions
tuples and tuple unpacking
set and set methods
dict and dictionary methods
Type annotations
Assert statements and testing

In-Class¶

Let’s work on area_codes.py and count_words.py in small groups! Make sure to discuss your approach with a partner.

`area_codes`¶

For this practice problem, refer to area_codes.py.

Write a function area_codes that takes a list of str as input, where each str in the list is a phone number, and returns the number of unique area codes found in those phone numbers. Each phone number will be of the format '123-456-7890' and the area code is the first three characters in the str.

For example, if we were to call

area_codes([
    '123-456-7890',
    '206-123-4567',
    '123-000-0000',
    '425-999-9999'
])

This call would return 3 because there are 3 unique area-codes in these phone numbers (123, 206, 425).

Hint: Try developing a simpler version of this problem that prints all the area codes for the given phone numbers. Then, count each unique area code by selecting the appropriate data structure: list, tuple, set, or dict.

`count_words`¶

For this practice problem, refer to count_words.py.

Write a function count_words that takes a file name and returns a dict that stores the tokens as keys and the number of times a specific token appeared in the file as values. Remember that a token is a sequence of characters separated by spaces.

Consider a file popular_techno_song.txt with the contents:

123456
dun dun dun dun
dun dun dun dun
err
dun dun dun dun dun dun dun dun
dundundundundundundundundundun
er er er er er er ER ER ER ER ER ER der der der der derrr

count_words('popular_techno_song.txt') should return the dict

{
    'dun': 16,
    'err': 1,
    'dundundundundundundundundundun': 1,
    'er': 6,
    'ER': 6,
    'der': 4,
    'derrr': 1
}

No need to do anything fancy with capitalization or ignoring any punctuation! Just process each token. Here’s our recommended process:

Start by writing the function header.
Then, process the file word-by-word.
Finally, think about how to store data.