Data Structures¶
Data structures, such as lists, can represent complex data. While lists are quite useful on their own, Python provides several other built-in data structures to make it easier to represent complex data. By the end of this lesson, students will be able to:
- Apply list comprehensions to define basic
list
sequences. - Apply
set
operations to store and retrieve values in a set. - Apply
dict
operations to store and retrieve values in a dictionary. - Describe the difference between the various data structures' properties (
list
,set
,dict
,tuple
).
import doctest
List comprehensions¶
Another one of the best features of Python is the list comprehension. A list comprehension provides a concise expression for building-up a list of values by looping over any type of sequence.
We already know how to create a list counting all the numbers between 0 and 10 (exclusive) by looping over a range
.
nums = [0] * 10
for i in range(10):
nums[i] = i
nums
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A list comprehension provides a shorter expression for achieving the same result.
[]
[]
[1, 2, 3]
[1, 2, 3]
[i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
nums[1]
1
# what items are being added?
# how are we getting those items?
[(what item?) (where from?)]
What if we wanted to compute all these values squared? A list comprehension can help us with this as well.
[i * 2 for i in range(10)]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
["hello" * i for i in range(10)]
['', 'hello', 'hellohello', 'hellohellohello', 'hellohellohellohello', 'hellohellohellohellohello', 'hellohellohellohellohellohello', 'hellohellohellohellohellohellohello', 'hellohellohellohellohellohellohellohello', 'hellohellohellohellohellohellohellohellohello']
Or, what if we wanted to only include values of i
that are even?
[i ** 2 for i in range(10) if i % 2 == 0]
[0, 4, 16, 36, 64]
nums = []
for i in range(10):
if i % 2 == 0 and i % 10 > 3:
nums.append(i ** 2)
nums
[16, 36, 64]
Before running the next block of code, what do you think will output?
words = "I saw a dog today".split()
[word[0] for word in words if len(word) >= 2 and word[1] == 'a']
['s']
Practice: Fun numbers¶
Fill in the blank with a list comprehension to complete the definition for fun_numbers
.
"hello" + 123
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[22], line 1 ----> 1 "hello" + 123 TypeError: can only concatenate str (not "int") to str
def fun_numbers(start, stop):
"""
Returns an increasing list of all fun numbers between start (inclusive)
and stop (exclusive). A fun number is defined as a number that is either
divisible by 2 or divisible by 5.
>>> fun_numbers(2, 16)
[2, 4, 5, 6, 8, 10, 12, 14, 15]
>>> fun_numbers(1, 11)
[2, 4, 5, 6, 8, 10]
"""
return [i for i in range(start, stop) if i % 2 == 0 or i % 5 == 0]
doctest.run_docstring_examples(fun_numbers, globals())
Tuples¶
Whereas lists represent mutable sequences of any elements, tuples (pronounced "two pull" or like "supple") represent immutable sequences of any elements. Just like strings, tuples are immutable, so the sequence of elements in a tuple cannot be modified after the tuple has been created.
While lists are defined with square brackets, tuples are defined by commas alone as in the expression 1, 2, 3
. We often add parentheses around the structure for clarity. In fact, when representing a tuple, Python will use parentheses to indicate a tuple.
t = (1, 2, 3)
nums
[16, 36, 64]
nums.index(16)
0
t[0]
1
t.index(1)
0
t.append(45)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[42], line 1 ----> 1 t.append(45) AttributeError: 'tuple' object has no attribute 'append'
t[0] = 45
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[43], line 1 ----> 1 t[0] = 45 TypeError: 'tuple' object does not support item assignment
We learned that there are many list functions, most of which modify the original list. Since tuples are immutable, there isn't an equivalent list of tuple functions. So why use tuples when we could just use lists instead?
Your choice of data structure communicates information to other programmers. If you know exactly how many elements should go in your data structure and those elements don't need to change, a
tuple
is right for the job. By choosing to use a tuple in this situation, we communicate to other programmers that the sequence of elements in this data structure cannot change! Everyone working on your project doesn't have to worry about passing atuple
to a function and that function somehow destroying the data.
Tuples provide a helpful way to return more than one value from a function. For example, we can write a function that returns both the first letter and the second letter from a word.
def first_two_letters(word):
return word[0], word[1]
a, b = first_two_letters("goodbye")
a
b
'o'
Sets¶
Whereas lists represent mutable sequences of any elements, sets represent mutable unordered collections of unique elements. Unlike lists, sets are not sequences so we cannot index into a set in the same way that we could for lists and tuples. Sets only represent unique elements, so attempts to add duplicate elements are ignored.
nums = set()
nums.add(1)
nums.add(2)
nums.add(3)
nums.add(2) # duplicate ignored
nums.add(-1)
nums
{-1, 1, 2, 3}
len({1, 2, 3})
3
{1, 2, 3}.pop()
1
{1, 2, 3} == {3, 2, 1}
True
So what's the point of using a set
over a list
? Sets are often much faster than lists at determining whether a particular element is contained in the set. We can see this in action by comparing the time it takes to count the number of unique words in a large document. Using a list results in much slower code.
def count_unique(path):
unique = []
with open(path) as f:
for line in f.readlines():
for token in line.split():
if token not in unique:
unique.append(token)
return len(unique)
%time count_unique("moby-dick.txt")
CPU times: user 7.14 s, sys: 37 ms, total: 7.18 s Wall time: 7.18 s
32553
By combining sets and list comprehensions, we can compose our programs in more "Pythonic" ways.
def count_unique(path):
with open(path) as f:
return len(set([token for token in f.read().split()]))
%time count_unique("moby-dick.txt")
CPU times: user 23.8 ms, sys: 10.1 ms, total: 33.9 ms Wall time: 32.2 ms
32553
nums = [1, 2, 3, 4]
nums[2] = 5
nums
[1, 2, 5, 4]
nums.append(345678)
nums += [23456]
Practice: Area codes¶
Fill in the blank to compose a "Pythonic" program that returns the number of unique area codes from the given list of phone numbers formatted as strings like "123-456-7890"
. The area code is defined as the first 3 digits in a phone number.
def area_codes(phone_numbers):
"""
Returns the number of unique area codes in the given sequence.
>>> area_codes([
... '123-456-7890',
... '206-123-4567',
... '123-000-0000',
... '425-999-9999'
... ])
3
"""
return len(set(...))
doctest.run_docstring_examples(area_codes, globals())
Dictionaries¶
A dictionary represents mutable unordered collections of key-value pairs, where the keys are immutable and unique. In other words, dictionaries are more flexible than lists. A list could be considered a dictionary where the "keys" are non-negative integers counting from 0 to the length minus 1.
Dictionaries are often helpful for counting occurrences. Whereas the above example counted the total number of unique words in a text file, a dictionary can help us count the number of occurrences of each unique word in that file.
d = {0: 'a', 1: 'b', 2: 'c'}
l = ['a', 'b', 'c']
len(d)
3
len(l)
3
d[0]
'a'
l[0]
'a'
d[2] = 'c'
d[3] = 'd'
d
{0: 'a', 1: 'b', 2: 'c', 3: 'd'}
l[2] = 'c'
l[3] = 'd'
l
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[15], line 2 1 l[2] = 'c' ----> 2 l[3] = 'd' 3 l IndexError: list assignment index out of range
letters = {
'a': 1,
'a': 2,
'b': 2,
'c': 3
}
letters
letters['c']
3
letters.get('d', 0)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[28], line 1 ----> 1 letters['d'] KeyError: 'd'
def count_tokens(path):
counts = {}
with open(path) as f:
for token in f.read().split():
# this is the same:
#counts[token] = counts.get(token, 0) + 1
if token not in counts:
counts[token] = 1
else:
counts[token] += 1
return counts
%time count_tokens("moby-dick.txt")
CPU times: user 33.7 ms, sys: 10 ms, total: 43.7 ms Wall time: 42.4 ms
{'MOBY': 1, 'DICK;': 1, 'OR': 2, 'THE': 59, 'WHALE': 3, 'by': 1072, 'Herman': 1, 'Melville': 1, 'CHAPTER': 149, '1': 1, 'Loomings.': 1, 'Call': 2, 'me': 336, 'Ishmael.': 3, 'Some': 33, 'years': 59, 'ago--never': 1, 'mind': 52, 'how': 166, 'long': 249, 'precisely--having': 1, 'little': 230, 'or': 665, 'no': 435, 'money': 6, 'in': 3751, 'my': 559, 'purse,': 4, 'and': 5803, 'nothing': 79, 'particular': 40, 'to': 4368, 'interest': 10, 'on': 889, 'shore,': 4, 'I': 1714, 'thought': 127, 'would': 401, 'sail': 50, 'about': 270, 'a': 4393, 'see': 186, 'the': 13411, 'watery': 23, 'part': 118, 'of': 6364, 'world.': 18, 'It': 254, 'is': 1530, 'way': 150, 'have': 734, 'driving': 9, 'off': 162, 'spleen': 1, 'regulating': 1, 'circulation.': 1, 'Whenever': 1, 'find': 51, 'myself': 41, 'growing': 8, 'grim': 10, 'mouth;': 3, 'whenever': 14, 'it': 1490, 'damp,': 4, 'drizzly': 1, 'November': 2, 'soul;': 6, 'involuntarily': 11, 'pausing': 7, 'before': 248, 'coffin': 19, 'warehouses,': 1, 'bringing': 10, 'up': 403, 'rear': 12, 'every': 211, 'funeral': 3, 'meet;': 1, 'especially': 40, 'hypos': 1, 'get': 91, 'such': 322, 'an': 569, 'upper': 29, 'hand': 107, 'me,': 143, 'that': 2641, 'requires': 6, 'strong': 21, 'moral': 3, 'principle': 7, 'prevent': 5, 'from': 1036, 'deliberately': 6, 'stepping': 5, 'into': 511, 'street,': 3, 'methodically': 2, 'knocking': 7, "people's": 4, 'hats': 2, 'off--then,': 1, 'account': 31, 'high': 79, 'time': 192, 'sea': 132, 'as': 1573, 'soon': 102, 'can.': 6, 'This': 94, 'substitute': 3, 'for': 1337, 'pistol': 3, 'ball.': 2, 'With': 53, 'philosophical': 1, 'flourish': 2, 'Cato': 1, 'throws': 3, 'himself': 134, 'upon': 526, 'his': 2385, 'sword;': 1, 'quietly': 15, 'take': 109, 'ship.': 31, 'There': 81, 'surprising': 7, 'this.': 10, 'If': 50, 'they': 554, 'but': 1018, 'knew': 39, 'it,': 234, 'almost': 178, 'all': 1284, 'men': 108, 'their': 596, 'degree,': 4, 'some': 560, 'other,': 31, 'cherish': 9, 'very': 305, 'nearly': 32, 'same': 197, 'feelings': 2, 'towards': 106, 'ocean': 30, 'with': 1617, 'me.': 61, 'now': 455, 'your': 233, 'insular': 4, 'city': 10, 'Manhattoes,': 1, 'belted': 3, 'round': 208, 'wharves': 2, 'Indian': 42, 'isles': 8, 'coral': 6, 'reefs--commerce': 1, 'surrounds': 2, 'her': 272, 'surf.': 1, 'Right': 46, 'left,': 4, 'streets': 8, 'you': 725, 'waterward.': 1, 'Its': 3, 'extreme': 11, 'downtown': 1, 'battery,': 2, 'where': 165, 'noble': 41, 'mole': 2, 'washed': 4, 'waves,': 10, 'cooled': 2, 'breezes,': 3, 'which': 532, 'few': 89, 'hours': 27, 'previous': 25, 'were': 638, 'out': 427, 'sight': 67, 'land.': 5, 'Look': 26, 'at': 1198, 'crowds': 4, 'water-gazers': 1, 'there.': 31, 'Circumambulate': 1, 'dreamy': 4, 'Sabbath': 3, 'afternoon.': 2, 'Go': 7, 'Corlears': 1, 'Hook': 1, 'Coenties': 1, 'Slip,': 1, 'thence,': 2, 'Whitehall,': 1, 'northward.': 1, 'What': 95, 'do': 227, 'see?--Posted': 1, 'like': 538, 'silent': 21, 'sentinels': 1, 'around': 33, 'town,': 3, 'stand': 59, 'thousands': 14, 'mortal': 36, 'fixed': 30, 'reveries.': 2, 'leaning': 23, 'against': 129, 'spiles;': 1, 'seated': 24, 'pier-heads;': 1, 'looking': 53, 'over': 356, 'bulwarks': 10, 'ships': 53, 'China;': 1, 'aloft': 28, 'rigging,': 14, 'if': 393, 'striving': 8, 'still': 275, 'better': 51, 'seaward': 1, 'peep.': 1, 'But': 637, 'these': 351, 'are': 555, 'landsmen;': 1, 'week': 4, 'days': 52, 'pent': 2, 'lath': 1, 'plaster--tied': 1, 'counters,': 1, 'nailed': 10, 'benches,': 2, 'clinched': 3, 'desks.': 1, 'How': 48, 'then': 334, 'this?': 7, 'Are': 7, 'green': 35, 'fields': 5, 'gone?': 2, 'here?': 5, 'look!': 2, 'here': 135, 'come': 117, 'more': 422, 'crowds,': 1, 'pacing': 8, 'straight': 40, 'water,': 49, 'seemingly': 9, 'bound': 15, 'dive.': 2, 'Strange!': 1, 'Nothing': 9, 'will': 332, 'content': 16, 'them': 269, 'extremest': 1, 'limit': 2, 'land;': 5, 'loitering': 1, 'under': 109, 'shady': 1, 'lee': 10, 'yonder': 12, 'warehouses': 1, 'not': 1017, 'suffice.': 2, 'No.': 3, 'They': 68, 'must': 267, 'just': 107, 'nigh': 37, 'water': 67, 'possibly': 28, 'can': 167, 'without': 142, 'falling': 12, 'in.': 11, 'And': 313, 'there': 411, 'stand--miles': 1, 'them--leagues.': 1, 'Inlanders': 1, 'all,': 50, 'lanes': 1, 'alleys,': 1, 'avenues--north,': 1, 'east,': 4, 'south,': 1, 'west.': 1, 'Yet': 36, 'unite.': 1, 'Tell': 10, 'does': 77, 'magnetic': 7, 'virtue': 15, 'needles': 5, 'compasses': 4, 'those': 291, 'attract': 1, 'thither?': 1, 'Once': 5, 'more.': 15, 'Say': 5, 'country;': 2, 'land': 34, 'lakes.': 1, 'Take': 12, 'any': 303, 'path': 6, 'please,': 5, 'ten': 40, 'one': 740, 'carries': 16, 'down': 296, 'dale,': 1, 'leaves': 15, 'pool': 3, 'stream.': 3, 'magic': 6, 'it.': 170, 'Let': 29, 'most': 268, 'absent-minded': 4, 'be': 960, 'plunged': 9, 'deepest': 5, 'reveries--stand': 1, 'man': 236, 'legs,': 8, 'set': 58, 'feet': 88, 'a-going,': 1, 'he': 1479, 'infallibly': 11, 'lead': 10, 'region.': 1, 'Should': 2, 'ever': 171, 'athirst': 3, 'great': 269, 'American': 30, 'desert,': 1, 'try': 25, 'this': 1122, 'experiment,': 2, 'caravan': 1, 'happen': 5, 'supplied': 9, 'metaphysical': 2, 'professor.': 1, 'Yes,': 24, 'knows,': 4, 'meditation': 2, 'wedded': 3, 'ever.': 2, 'artist.': 1, 'He': 183, 'desires': 4, 'paint': 3, 'dreamiest,': 1, 'shadiest,': 1, 'quietest,': 1, 'enchanting': 1, 'bit': 18, 'romantic': 6, 'landscape': 3, 'valley': 6, 'Saco.': 1, 'chief': 23, 'element': 8, 'employs?': 1, 'trees,': 4, 'each': 124, 'hollow': 17, 'trunk,': 4, 'hermit': 3, 'crucifix': 1, 'within;': 6, 'sleeps': 5, 'meadow,': 1, 'sleep': 28, 'cattle;': 2, 'cottage': 1, 'goes': 50, 'sleepy': 5, 'smoke.': 2, 'Deep': 1, 'distant': 18, 'woodlands': 2, 'winds': 15, 'mazy': 1, 'way,': 60, 'reaching': 12, 'overlapping': 2, 'spurs': 1, 'mountains': 3, 'bathed': 2, 'hill-side': 1, 'blue.': 4, 'though': 270, 'picture': 14, 'lies': 19, 'thus': 98, 'tranced,': 1, 'pine-tree': 1, 'shakes': 4, 'its': 364, 'sighs': 1, "shepherd's": 2, 'head,': 65, 'yet': 237, 'vain,': 3, 'unless': 25, 'eye': 44, 'stream': 6, 'him.': 109, 'visit': 10, 'Prairies': 1, 'June,': 2, 'when': 496, 'scores': 4, 'miles': 19, 'wade': 3, 'knee-deep': 1, 'among': 155, 'Tiger-lilies--what': 1, 'charm': 6, 'wanting?--Water--there': 1, 'drop': 21, 'there!': 30, 'Were': 4, 'Niagara': 1, 'cataract': 1, 'sand,': 3, 'travel': 3, 'thousand': 49, 'it?': 18, 'Why': 22, 'did': 219, 'poor': 92, 'poet': 1, 'Tennessee,': 1, 'suddenly': 42, 'receiving': 11, 'two': 252, 'handfuls': 4, 'silver,': 1, 'deliberate': 5, 'whether': 65, 'buy': 3, 'him': 548, 'coat,': 7, 'sadly': 11, 'needed,': 2, 'invest': 6, 'pedestrian': 2, 'trip': 2, 'Rockaway': 1, 'Beach?': 1, 'robust': 3, 'healthy': 5, 'boy': 6, 'soul': 36, 'him,': 231, 'other': 339, 'crazy': 9, 'go': 140, 'sea?': 2, 'first': 193, 'voyage': 37, 'passenger,': 1, 'yourself': 14, 'feel': 53, 'mystical': 8, 'vibration,': 3, 'told': 36, 'ship': 224, 'land?': 2, 'old': 409, 'Persians': 1, 'hold': 68, 'holy?': 1, 'Greeks': 2, 'give': 64, 'separate': 19, 'deity,': 1, 'own': 178, 'brother': 4, 'Jove?': 1, 'Surely': 4, 'meaning.': 3, 'deeper': 17, 'meaning': 10, 'story': 36, 'Narcissus,': 1, 'who': 250, 'because': 67, 'could': 204, 'grasp': 4, 'tormenting,': 1, 'mild': 20, 'image': 7, 'saw': 90, 'fountain,': 2, 'was': 1548, 'drowned.': 1, 'image,': 3, 'we': 376, 'ourselves': 12, 'rivers': 1, 'oceans.': 4, 'ungraspable': 1, 'phantom': 11, 'life;': 8, 'key': 6, 'all.': 30, 'Now,': 84, 'say': 87, 'am': 74, 'habit': 2, 'going': 78, 'begin': 11, 'grow': 19, 'hazy': 1, 'eyes,': 36, 'conscious': 8, 'lungs,': 4, 'mean': 19, 'inferred': 4, 'passenger.': 1, 'For': 164, 'passenger': 2, 'needs': 18, 'purse': 1, 'rag': 2, 'something': 107, 'Besides,': 25, 'passengers': 4, 'sea-sick--grow': 1, "quarrelsome--don't": 1, 'nights--do': 1, 'enjoy': 4, 'themselves': 43, 'much,': 9, 'general': 56, 'thing;--no,': 1, 'never': 185, 'passenger;': 1, 'nor,': 4, 'salt,': 2, 'Commodore,': 1, 'Captain,': 19, 'Cook.': 1, 'abandon': 3, 'glory': 12, 'distinction': 3, 'offices': 1, 'them.': 52, 'part,': 20, 'abominate': 1, 'honourable': 6, 'respectable': 1, 'toils,': 1, 'trials,': 1, 'tribulations': 1, 'kind': 22, 'whatsoever.': 3, 'quite': 40, 'much': 197, 'care': 8, 'myself,': 15, 'taking': 47, 'ships,': 17, 'barques,': 1, 'brigs,': 1, 'schooners,': 1, 'what': 374, 'not.': 14, 'cook,--though': 1, 'confess': 5, 'considerable': 26, 'that,': 99, 'cook': 9, 'being': 198, 'sort': 138, 'officer': 10, 'ship-board--yet,': 1, 'somehow,': 11, 'fancied': 6, 'broiling': 1, 'fowls;--though': 1, 'once': 119, 'broiled,': 1, 'judiciously': 2, 'buttered,': 1, 'judgmatically': 1, 'salted': 6, 'peppered,': 1, 'speak': 30, 'respectfully,': 1, 'reverentially,': 1, 'broiled': 2, 'fowl': 6, 'than': 300, 'will.': 6, 'idolatrous': 1, 'dotings': 1, 'Egyptians': 2, 'ibis': 1, 'roasted': 1, 'river': 4, 'horse,': 3, 'mummies': 2, 'creatures': 20, 'huge': 25, 'bake-houses': 1, 'pyramids.': 1, 'No,': 17, 'sea,': 101, 'simple': 10, 'sailor,': 13, 'right': 64, 'mast,': 10, 'plumb': 1, 'forecastle,': 10, 'royal': 24, 'mast-head.': 2, 'True,': 7, 'rather': 61, 'order': 46, 'some,': 4, 'make': 102, 'jump': 12, 'spar': 4, 'spar,': 3, 'grasshopper': 1, 'May': 4, 'meadow.': 1, 'first,': 8, 'thing': 130, 'unpleasant': 2, 'enough.': 6, 'touches': 2, "one's": 11, 'sense': 10, 'honour,': 3, 'particularly': 7, 'established': 3, 'family': 8, 'land,': 21, 'Van': 1, 'Rensselaers,': 1, 'Randolphs,': 1, 'Hardicanutes.': 1, 'putting': 16, 'tar-pot,': 1, 'been': 394, 'lording': 1, 'country': 12, 'schoolmaster,': 2, 'making': 41, 'tallest': 1, 'boys': 4, 'awe': 6, 'you.': 10, 'The': 521, 'transition': 4, 'keen': 16, 'one,': 60, 'assure': 5, 'you,': 45, 'schoolmaster': 2, 'decoction': 1, 'Seneca': 1, 'Stoics': 1, 'enable': 4, 'grin': 5, 'bear': 29, 'even': 166, 'wears': 6, 'time.': 34, 'hunks': 2, 'sea-captain': 1, 'orders': 15, 'broom': 3, 'sweep': 12, 'decks?': 1, 'indignity': 1, 'amount': 6, 'to,': 27, 'weighed,': 1, 'mean,': 8, 'scales': 2, 'New': 47, 'Testament?': 1, 'Do': 21, 'think': 94, 'archangel': 7, 'Gabriel': 12, 'thinks': 15, 'anything': 36, 'less': 47, 'promptly': 1, 'respectfully': 3, 'obey': 5, 'instance?': 1, 'Who': 18, "ain't": 17, 'slave?': 1, 'that.': 20, 'Well,': 23, 'then,': 169, 'however': 51, 'sea-captains': 2, 'may': 205, 'about--however': 1, 'thump': 4, 'punch': 2, 'about,': 18, 'satisfaction': 2, 'knowing': 14, 'right;': 6, 'everybody': 6, 'else': 30, 'served': 8, 'way--either': 1, 'physical': 4, 'point': 54, 'view,': 9, 'is;': 9, 'so': 776, 'universal': 7, 'passed': 33, 'round,': 15, 'hands': 79, 'should': 176, 'rub': 1, "other's": 13, 'shoulder-blades,': 1, 'content.': 1, 'Again,': 4, 'always': 75, 'paying': 3, 'trouble,': 2, 'whereas': 6, 'pay': 16, 'single': 36, 'penny': 3, 'heard': 85, 'of.': 6, 'On': 35, 'contrary,': 8, 'pay.': 1, 'difference': 13, 'world': 73, 'between': 108, 'paid.': 1, 'act': 26, 'perhaps': 47, 'uncomfortable': 3, 'infliction': 1, 'orchard': 2, 'thieves': 2, 'entailed': 1, 'us.': 20, 'BEING': 1, 'PAID,--what': 1, 'compare': 5, 'urbane': 1, 'activity': 6, 'receives': 3, 'really': 30, 'marvellous,': 2, 'considering': 27, 'earnestly': 7, 'believe': 22, 'root': 3, 'earthly': 15, 'ills,': 1, 'monied': 1, 'enter': 8, 'heaven.': 8, 'Ah!': 6, 'cheerfully': 2, 'consign': 1, 'perdition!': 1, 'Finally,': 2, 'wholesome': 2, 'exercise': 2, 'pure': 7, 'air': 42, 'fore-castle': 2, 'deck.': 26, 'world,': 32, 'head': 136, 'far': 150, 'prevalent': 1, 'astern': 5, '(that': 5, 'is,': 96, 'violate': 1, 'Pythagorean': 1, 'maxim),': 1, 'Commodore': 5, 'quarter-deck': 8, 'gets': 3, 'atmosphere': 4, 'second': 50, 'sailors': 32, 'forecastle.': 8, 'breathes': 4, 'first;': 1, 'so.': 20, 'In': 213, 'commonalty': 1, 'leaders': 2, 'many': 153, 'things,': 25, 'suspect': 3, 'wherefore': 5, 'after': 234, 'having': 60, 'repeatedly': 3, 'smelt': 3, 'merchant': 11, 'whaling': 78, 'voyage;': 5, 'invisible': 13, 'police': 1, 'Fates,': 2, 'has': 276, 'constant': 5, 'surveillance': 1, 'secretly': 2, 'dogs': 4, 'influences': 6, 'unaccountable': 15, 'way--he': 1, 'answer': 16, 'else.': 7, 'And,': 13, 'doubtless,': 6, 'voyage,': 34, 'formed': 20, 'grand': 39, 'programme': 1, 'Providence': 1, 'drawn': 19, 'ago.': 4, 'came': 117, 'brief': 8, 'interlude': 3, 'solo': 1, 'extensive': 3, 'performances.': 1, 'bill': 5, 'run': 40, 'this:': 15, '"GRAND': 1, 'CONTESTED': 1, 'ELECTION': 1, 'FOR': 3, 'PRESIDENCY': 1, 'OF': 20, 'UNITED': 1, 'STATES.': 1, '"WHALING': 1, 'VOYAGE': 1, 'BY': 9, 'ONE': 3, 'ISHMAEL.': 1, '"BLOODY': 1, 'BATTLE': 1, 'IN': 11, 'AFFGHANISTAN."': 1, 'Though': 39, 'cannot': 63, 'tell': 105, 'why': 50, 'exactly': 21, 'stage': 5, 'managers,': 1, 'put': 68, 'shabby': 3, 'others': 21, 'magnificent': 2, 'parts': 30, 'tragedies,': 1, 'short': 29, 'easy': 22, 'genteel': 1, 'comedies,': 1, 'jolly': 14, 'farces--though': 1, 'exactly;': 1, 'yet,': 41, 'recall': 5, 'circumstances,': 15, 'springs': 2, 'motives': 3, 'cunningly': 1, 'presented': 17, 'various': 36, 'disguises,': 1, 'induced': 4, 'performing': 3, 'did,': 9, 'besides': 15, 'cajoling': 1, 'delusion': 1, 'choice': 5, 'resulting': 2, 'unbiased': 1, 'freewill': 1, 'discriminating': 1, 'judgment.': 2, 'Chief': 5, 'overwhelming': 1, 'idea': 28, 'whale': 371, 'himself.': 17, 'Such': 30, 'portentous': 4, 'mysterious': 8, 'monster': 29, 'roused': 3, 'curiosity.': 2, 'Then': 35, 'wild': 74, 'seas': 35, 'rolled': 41, 'island': 14, 'bulk;': 1, 'undeliverable,': 1, 'nameless': 15, 'perils': 16, 'whale;': 32, 'these,': 12, 'attending': 4, 'marvels': 6, 'Patagonian': 4, 'sights': 9, 'sounds,': 3, 'helped': 12, 'sway': 4, 'wish.': 1, 'men,': 60, 'perhaps,': 30, 'things': 85, 'inducements;': 1, 'tormented': 14, 'everlasting': 9, 'itch': 1, 'remote.': 1, 'love': 16, 'forbidden': 1, 'seas,': 18, 'barbarous': 2, 'coasts.': 1, 'Not': 39, 'ignoring': 1, 'good,': 6, 'quick': 20, 'perceive': 7, 'horror,': 1, 'social': 10, 'it--would': 1, 'let': 98, 'me--since': 1, 'well': 124, 'friendly': 3, 'terms': 8, 'inmates': 2, 'place': 69, 'lodges': 1, 'By': 47, 'reason': 56, 'welcome;': 1, 'flood-gates': 2, ...}
As an aside, there's also a more Pythonic way to write this program using collections.Counter
, which is a specialized dictionary. The Counter
type also sorts the results in order from greatest to least.
def count_tokens(path):
from collections import Counter
with open(path) as f:
return Counter(f.read().split())
%time count_tokens("moby-dick.txt")
CPU times: user 31.9 ms, sys: 3.62 ms, total: 35.6 ms Wall time: 33.8 ms
Counter({'the': 13411, 'of': 6364, 'and': 5803, 'a': 4393, 'to': 4368, 'in': 3751, 'that': 2641, 'his': 2385, 'I': 1714, 'with': 1617, 'as': 1573, 'was': 1548, 'is': 1530, 'it': 1490, 'he': 1479, 'for': 1337, 'all': 1284, 'at': 1198, 'this': 1122, 'by': 1072, 'from': 1036, 'but': 1018, 'not': 1017, 'be': 960, 'on': 889, 'so': 776, 'one': 740, 'had': 740, 'have': 734, 'you': 725, 'or': 665, 'were': 638, 'But': 637, 'their': 596, 'an': 569, 'some': 560, 'my': 559, 'are': 555, 'they': 554, 'him': 548, 'like': 538, 'which': 532, 'upon': 526, 'The': 521, 'into': 511, 'when': 496, 'now': 455, 'no': 435, 'out': 427, 'more': 422, 'there': 411, 'old': 409, 'up': 403, 'would': 401, 'been': 394, 'if': 393, 'we': 376, 'what': 374, 'whale': 371, 'its': 364, 'over': 356, 'these': 351, 'only': 342, 'other': 339, 'me': 336, 'then': 334, 'will': 332, 'such': 322, 'And': 313, 'very': 305, 'any': 303, 'than': 300, 'down': 296, 'those': 291, 'has': 276, 'still': 275, 'her': 272, 'seemed': 272, 'about': 270, 'though': 270, 'them': 269, 'great': 269, 'most': 268, 'must': 267, 'ye': 256, 'It': 254, 'two': 252, 'who': 250, 'long': 249, 'before': 248, 'said': 244, 'yet': 237, 'man': 236, 'it,': 234, 'after': 234, 'your': 233, 'Ahab': 232, 'him,': 231, 'little': 230, 'do': 227, 'ship': 224, 'three': 223, 'did': 219, 'In': 213, 'every': 211, 'round': 208, 'last': 208, 'thou': 207, 'may': 205, 'could': 204, 'through': 202, 'being': 198, 'same': 197, 'much': 197, 'while': 194, 'first': 193, 'our': 193, 'time': 192, 'see': 186, 'never': 185, 'He': 183, 'almost': 178, 'own': 178, 'should': 176, 'might': 174, 'ever': 171, 'it.': 170, 'then,': 169, 'can': 167, 'how': 166, 'even': 166, 'where': 165, 'whale,': 165, 'For': 164, 'off': 162, 'Captain': 162, 'good': 159, 'among': 155, 'many': 153, 'made': 153, 'way': 150, 'far': 150, 'CHAPTER': 149, 'white': 149, 'us': 145, 'me,': 143, 'without': 142, 'go': 140, 'sort': 138, 'cried': 137, 'head': 136, 'here': 135, 'himself': 134, 'sea': 132, 'thing': 130, 'against': 129, 'thought': 127, 'whales': 126, 'whole': 126, 'Sperm': 125, 'each': 124, 'well': 124, 'boat': 124, 'Whale': 121, 'once': 119, 'part': 118, 'come': 117, 'came': 117, 'A': 116, 'seen': 116, 'till': 115, 'away': 115, 'know': 114, 'As': 112, 'take': 109, 'under': 109, 'him.': 109, 'men': 108, 'between': 108, 'again': 108, 'both': 108, 'hand': 107, 'just': 107, 'something': 107, 'Queequeg': 107, 'towards': 106, 'small': 106, 'tell': 105, 'there,': 105, 'full': 105, 'Ahab,': 105, 'found': 103, 'thy': 103, 'soon': 102, 'make': 102, 'sea,': 101, 'that,': 99, 'thus': 98, 'let': 98, 'called': 98, 'eyes': 98, 'Stubb': 98, 'look': 97, 'is,': 96, 'What': 95, 'This': 94, 'think': 94, 'side': 93, 'she': 93, 'poor': 92, 'So': 92, 'man,': 92, 'get': 91, "don't": 91, 'say,': 91, 'saw': 90, 'too': 90, 'day': 90, 'now,': 90, 'it;': 90, 'few': 89, 'went': 89, 'White': 89, 'feet': 88, 'another': 88, 'along': 88, 'say': 87, 'back': 87, 'him;': 86, 'sperm': 86, 'heard': 85, 'things': 85, '"I': 85, 'Now,': 84, 'certain': 84, 'whose': 83, 'I,': 82, 'half': 82, 'boats': 82, 'Stubb,': 82, 'There': 81, 'At': 81, 'them,': 81, 'strange': 80, 'seem': 80, 'nothing': 79, 'high': 79, 'hands': 79, 'going': 78, 'whaling': 78, 'stood': 78, 'does': 77, 'shall': 77, 'ye,': 77, 'ship,': 77, "ship's": 77, 'which,': 76, 'Moby': 76, 'always': 75, 'nor': 75, 'seems': 75, 'am': 74, 'wild': 74, 'sometimes': 74, 'world': 73, "whale's": 73, 'times': 72, 'life': 72, 'Pequod': 72, 'young': 71, 'so,': 71, 'time,': 70, 'place': 69, 'body': 69, "he's": 69, 'Queequeg,': 69, 'present': 69, 'They': 68, 'hold': 68, 'put': 68, 'and,': 68, 'standing': 68, "Ahab's": 68, 'sight': 67, 'water': 67, 'because': 67, 'black': 67, 'boat,': 67, 'Starbuck': 67, 'dead': 66, 'this,': 66, 'head,': 65, 'whether': 65, 'beneath': 65, 'hard': 65, 'deck,': 65, 'crew': 65, 'give': 64, 'right': 64, 'ere': 64, 'moment': 64, 'living': 64, 'known': 64, 'cannot': 63, 'vast': 63, 'often': 63, 'looked': 62, 'keep': 62, 'me.': 61, 'rather': 61, "it's": 61, 'four': 61, 'sea.': 61, 'again,': 61, 'also': 61, 'way,': 60, 'one,': 60, 'having': 60, 'men,': 60, 'too,': 60, 'large': 60, 'end': 60, 'THE': 59, 'years': 59, 'stand': 59, 'line': 59, 'face': 59, 'Nor': 59, 'set': 58, 'turned': 58, 'whale.': 58, 'Starbuck,': 58, 'night': 57, 'whales,': 57, 'hand,': 57, 'here,': 57, 'captain': 57, 'general': 56, 'reason': 56, 'matter': 56, "I'll": 56, 'left': 56, 'within': 56, 'lay': 55, 'iron': 55, 'That': 55, 'live': 55, 'point': 54, 'open': 54, 'peculiar': 54, 'With': 53, 'looking': 53, 'ships': 53, 'feel': 53, 'since': 53, 'His': 53, 'To': 53, 'near': 53, 'side,': 53, 'mind': 52, 'days': 52, 'them.': 52, 'Oh,': 52, 'find': 51, 'better': 51, 'however': 51, 'best': 51, 'further': 51, 'who,': 51, 'began': 51, 'call': 51, 'entire': 51, 'heart': 51, 'sail': 50, 'If': 50, 'all,': 50, 'goes': 50, 'second': 50, 'why': 50, 'sun': 50, 'least': 50, 'water,': 49, 'thousand': 49, 'took': 49, 'hundred': 49, 'true': 49, "that's": 49, 'arm': 49, 'God': 49, 'How': 48, "man's": 48, 'When': 48, 'comes': 48, 'case': 48, 'was,': 48, 'turn': 48, 'length': 48, 'instant': 48, 'Mr.': 48, 'taking': 47, 'New': 47, 'less': 47, 'perhaps': 47, 'By': 47, 'hear': 47, 'fine': 47, 'curious': 47, 'forth': 47, 'deck': 47, 'lower': 47, "Pequod's": 47, 'Right': 46, 'order': 46, 'business': 46, 'Jonah': 46, 'turning': 46, 'several': 46, 'taken': 46, 'you,': 45, 'broad': 45, 'name': 45, 'kept': 45, 'All': 45, 'eye': 44, 'night,': 44, '"The': 44, 'up,': 44, 'used': 44, 'harpooneer': 44, 'got': 44, 'sir,': 44, 'close': 44, 'leg': 44, 'above': 44, 'thee': 44, 'themselves': 43, 'enough': 43, 'Now': 43, 'dark': 43, 'touching': 43, 'Indian': 42, 'suddenly': 42, 'air': 42, 'on,': 42, 'ivory': 42, 'next': 42, 'coming': 42, 'word': 42, 'slowly': 42, 'means': 42, 'myself': 41, 'noble': 41, 'making': 41, 'yet,': 41, 'rolled': 41, 'common': 41, 'mighty': 41, 'running': 41, 'aye,': 41, "Whale's": 41, 'particular': 40, 'especially': 40, 'straight': 40, 'ten': 40, 'quite': 40, 'run': 40, 'day,': 40, 'light': 40, 'broken': 40, 'whatever': 40, 'last,': 40, 'sea;': 40, 'indeed,': 40, 'knew': 39, 'grand': 39, 'Though': 39, 'Not': 39, 'harpoon': 39, 'sat': 39, 'mere': 39, 'oil': 39, 'whalemen': 39, 'English': 39, 'Nantucket': 38, 'wide': 38, 'gone': 38, 'when,': 38, 'caught': 38, 'deep': 38, 'fish': 38, 'concerning': 38, 'But,': 38, 'So,': 38, 'wondrous': 38, 'cut': 38, "boat's": 38, 'nigh': 37, 'voyage': 37, 'behind': 37, 'itself': 37, 'whom': 37, 'We': 37, 'No': 37, 'felt': 37, 'gave': 37, 'plainly': 37, 'makes': 37, 'air,': 37, 'mortal': 36, 'Yet': 36, 'soul': 36, 'told': 36, 'story': 36, 'eyes,': 36, 'anything': 36, 'single': 36, 'various': 36, 'brought': 36, 'struck': 36, 'out,': 36, 'You': 36, 'five': 36, 'lost': 36, 'Whale,': 36, 'thee,': 36, 'green': 35, 'On': 35, 'Then': 35, 'seas': 35, 'sharp': 35, 'though,': 35, 'long,': 35, 'human': 35, 'savage': 35, 'board': 35, 'placed': 35, 'leaving': 35, 'chance': 35, 'more,': 35, 'work': 35, 'them;': 35, 'us,': 35, 'given': 35, 'land': 34, 'time.': 34, 'voyage,': 34, 'chase': 34, 'indeed': 34, 'ready': 34, 'help': 34, 'back,': 34, 'sudden': 34, '"Aye,': 34, 'Flask,': 34, 'Some': 33, 'around': 33, 'passed': 33, 'people': 33, 'harpooneers': 33, 'art': 33, 'bottom': 33, 'turns': 33, "there's": 33, 'nearly': 32, 'world,': 32, 'sailors': 32, 'whale;': 32, 'famous': 32, 'wind': 32, 'held': 32, 'darted': 32, 'unknown': 32, 'heavy': 32, 'completely': 32, 'new': 32, 'sound': 32, 'therefore': 32, 'sailed': 32, 'waters': 32, 'said,': 32, 'times,': 32, 'Bildad,': 32, 'crew,': 32, 'Greenland': 32, 'Dick': 32, 'account': 31, 'ship.': 31, 'other,': 31, 'there.': 31, 'Cape': 31, 'voice': 31, 'blue': 31, 'holding': 31, 'himself,': 31, 'but,': 31, 'seamen': 31, 'want': 31, 'clear': 31, 'again.': 31, 'in,': 31, 'show': 31, 'fell': 31, 'shot': 31, 'SAILOR.': 31, 'ocean': 30, 'fixed': 30, 'American': 30, 'there!': 30, 'all.': 30, 'speak': 30, 'else': 30, 'really': 30, 'parts': 30, 'Such': 30, 'perhaps,': 30, 'sign': 30, 'red': 30, 'says': 30, 'bed': 30, 'be,': 30, 'drawing': 30, 'morning': 30, 'hardly': 30, 'bones': 30, 'carried': 30, 'cast': 30, 'cabin': 30, 'fast': 30, 'down,': 30, 'remained': 30, 'boats,': 30, 'upper': 29, 'Let': 29, 'bear': 29, 'short': 29, 'monster': 29, 'sure': 29, 'final': 29, 'head.': 29, 'stranger': 29, "won't": 29, 'top': 29, 'somehow': 29, 'proper': 29, 'there;': 29, 'craft': 29, 'done,': 29, 'Pequod,': 29, 'sailing': 29, 'Flask': 29, 'spout': 29, 'de': 29, 'tail': 29, 'aloft': 28, 'possibly': 28, 'sleep': 28, 'idea': 28, 'already': 28, 'I.': 28, 'tossed': 28, 'itself,': 28, 'rest': 28, 'fifty': 28, 'devil': 28, 'heads': 28, 'become': 28, 'man.': 28, 'cry': 28, 'done': 28, 'moment,': 28, 'thing,': 28, 'mark': 28, '"And': 28, 'object': 28, '"What': 28, 'strangely': 28, 'Whale;': 28, 'hours': 27, 'to,': 27, 'considering': 27, 'her,': 27, 'stands': 27, 'fire': 27, 'sailor': 27, 'fair': 27, 'none': 27, "I've": 27, 'seeing': 27, 'eyeing': 27, 'pretty': 27, 'course': 27, 'me;': 27, 'entirely': 27, 'rolling': 27, 'While': 27, 'Aye,': 27, 'wholly': 27, 'except': 27, 'blood': 27, 'use': 27, 'Peleg': 27, 'natural': 27, 'captain,': 27, 'Ahab.': 27, 'fish,': 27, 'distance': 27, 'Look': 26, 'considerable': 26, 'act': 26, 'deck.': 26, 'Nantucket,': 26, 'flying': 26, 'wooden': 26, 'getting': 26, 'giving': 26, 'touch': 26, "Queequeg's": 26, 'creature': 26, 'Here': 26, 'From': 26, 'across': 26, 'waves': 26, 'ancient': 26, 'forward': 26, 'away,': 26, 'rising': 26, 'hoisted': 26, 'previous': 25, 'try': 25, 'unless': 25, 'Besides,': 25, 'huge': 25, 'things,': 25, 'either': 25, 'passage': 25, 'hung': 25, 'mass': 25, 'enormous': 25, 'somewhat': 25, 'bed,': 25, 'together': 25, "I'm": 25, 'saying': 25, 'light,': 25, 'one.': 25, 'takes': 25, 'form': 25, 'death': 25, 'sideways': 25, 'Oh!': 25, 'will,': 25, 'O': 25, 'not,': 25, 'pointed': 25, 'mate': 25, 'line,': 25, 'For,': 25, 'Tashtego': 25, 'gentlemen,': 25, 'Leviathan': 25, 'seated': 24, 'Yes,': 24, 'royal': 24, 'air.': 24, 'middle': 24, 'suppose': 24, 'lie': 24, 'low': 24, 'received': 24, 'One': 24, 'generally': 24, 'certainly': 24, 'home': 24, 'fresh': 24, 'plain': 24, 'view': 24, 'carry': 24, 'fishery,': 24, 'bearing': 24, 'twenty': 24, 'bodily': 24, 'skeleton': 24, 'vessel': 24, 'lance': 24, 'sails': 24, 'rope': 24, 'watery': 23, 'leaning': 23, 'chief': 23, 'Well,': 23, 'started': 23, 'became': 23, 'foot': 23, 'followed': 23, 'hour': 23, 'look,': 23, 'queer': 23, 'altogether': 23, 'alone': 23, 'tail,': 23, 'Upon': 23, 'over,': 23, 'seldom': 23, 'feeling': 23, 'similar': 23, 'during': 23, 'cutting': 23, 'third': 23, 'Nevertheless,': 23, 'instead': 23, 'therefore,': 23, 'fact': 23, 'dropped': 23, 'Peleg,': 23, 'Bildad': 23, 'life,': 23, 'secret': 23, 'mate,': 23, 'brow': 23, 'Why': 22, 'kind': 22, 'believe': 22, 'easy': 22, 'soul,': 22, 'cold': 22, 'thick': 22, 'watch': 22, 'air;': 22, 'jet': 22, 'fiery': 22, 'lines': 22, 'floating': 22, 'bone': 22, 'mariners': 22, "can't": 22, 'thrown': 22, 'thinking': 22, 'Lord': 22, "It's": 22, 'real': 22, 'ran': 22, 'short,': 22, 'hope': 22, 'drew': 22, 'AND': 22, 'bows': 22, 'read': 22, 'manner': 22, 'strike': 22, 'purpose': 22, 'thirty': 22, 'born': 22, 'whales.': 22, 'glance': 22, 'different': 22, "d'ye": 22, 'man;': 22, 'visible': 22, 'Ahab;': 22, 'off,': 22, 'Dutch': 22, "Stubb's": 22, 'sharks': 22, 'carpenter': 22, 'strong': 21, 'silent': 21, 'drop': 21, 'land,': 21, 'Do': 21, 'exactly': 21, 'others': 21, 'place,': 21, 'filled': 21, 'swift': 21, 'fellow': 21, 'however,': 21, 'six': 21, 'flew': 21, 'thoughts': 21, 'precisely': 21, 'Of': 21, 'doubt': 21, 'hot': 21, 'hidden': 21, 'instant,': 21, 'for,': 21, 'commanded': 21, 'steel': 21, 'looks': 21, 'knows': 21, 'also,': 21, 'sweet': 21, 'killed': 21, 'deadly': 21, 'regular': 21, 'cabin,': 21, 'gold': 21, 'arms': 21, 'mast-head': 21, 'forehead': 21, 'smoke': 21, 'free': 21, 'suspended': 21, 'complete': 21, 'waters,': 21, 'hammer': 21, 'dashed': 21, 'Thou': 21, 'remains': 21, 'gazing': 21, 'bulk': 21, 'species': 21, 'St.': 21, 'Pip': 21, 'mild': 20, 'part,': 20, 'creatures': 20, 'that.': 20, 'us.': 20, 'so.': 20, 'formed': 20, 'OF': 20, 'quick': 20, 'stop': 20, 'late': 20, 'darkness': 20, 'door': 20, 'book': 20, 'house': 20, 'were,': 20, "didn't": 20, 'teeth': 20, 'masts': 20, 'kill': 20, 'original': 20, 'fancy': 20, "'em": 20, 'on.': 20, 'spare': 20, 'soft': 20, 'higher': 20, 'passing': 20, 'now;': 20, 'degree': 20, 'slightest': 20, 'oil,': 20, 'lofty': 20, 'here.': 20, 'beat': 20, 'learned': 20, 'power': 20, 'pass': 20, 'harpoon,': 20, 'wonder': 20, 'water;': 20, 'according': 20, 'start': 20, 'leg,': 20, 'owing': 20, 'blow': 20, 'again;': 20, 'level': 20, 'end,': 20, 'coffin': 19, 'lies': 19, 'miles': 19, 'separate': 19, 'grow': 19, 'mean': 19, 'Captain,': 19, 'drawn': 19, 'way.': 19, 'sitting': 19, 'words': 19, 'mouth,': 19, 'number': 19, '"But': 19, 'fain': 19, 'weather': 19, 'interval': 19, 'morning,': 19, 'break': 19, '"He': 19, 'mouth': 19, 'fear': 19, 'outer': 19, 'hat': 19, 'quickly': 19, 'hearts': 19, 'Like': 19, 'God,': 19, 'former': 19, 'bows,': 19, 'finally': 19, 'speaking': 19, 'laid': 19, 'hands,': 19, 'nature': 19, 'hanging': 19, 'jaw': 19, ...})
Practice: Count lengths¶
Suppose we want to compute a histogram (counts) for the number of words that begin with each character in a given text file. Your coworker has written the following code and would like your help to finish the program. Explain your fix.
def count_lengths(words):
counts = {}
for word in words:
first_letter = word[0]
counts[first_letter] += 1
return counts
count_lengths(['cats', 'dogs', 'deers'])