Data Structures and Files¶
In this lesson, we'll practice our Python programming skills using loops, strings, lists, and dictionaries.
import doctest
Group Activity: Debug words in common¶
This function takes two string filenames, file1
and file2
, and returns a set of words that occur in both files. A word is defined as a series of characters separated by whitespace. Each word is normalized to be lowercase. If the file is empty, it should return an empty set (set()
). You may assume the file name provided describes a file that exists.
There are 2 bugs in the starter code provided below. Identify and resolve them!
def words_in_common(file1, file2):
"""
Returns a set of words that are in both of the inputted files.
>>> words_in_common("twister.txt", "simple.txt")
{'the'}
>>> sorted(words_in_common("twister.txt", "pepper.txt"))
['peppers', 'peter', 'pickled', 'the']
>>> words_in_common("pepper.txt", "empty.txt")
set()
"""
in_common = set()
with open(file1) as f:
# words = f.read().split().lower() buggy
words = set(f.read().lower().split())
with open(file2) as f:
# in_common = words & f.read().split().lower() buggy
in_common = words & set(f.read().lower().split())
# return set(in_common) buggy
return in_common
doctest.run_docstring_examples(words_in_common, globals())
Group Activity: Longest word by letter¶
Write a function longest_word_by_letter
that takes a string path and returns a dictionary associating each first letter with the max length of words that start with said letter. Normalize the first letter of each word to be lowercase. If the file is empty, return an empty dictionary.
def longest_word_by_letter(path):
"""
Returns a dictionary containing pairs of the first letter of each word in the file and the length
of the longest word in the file that starts with the letter. If the file is empty,
an empty dictionary will be returned.
>>> longest_word_by_letter("simple.txt")
{'t': 5, 's': 8, 'i': 2}
>>> longest_word_by_letter("twister.txt")
{'p': 7, 'a': 1, 'o': 2, 'i': 2, 'w': 7, 't': 3}
>>> longest_word_by_letter("empty.txt")
{}
"""
counts = {}
with open(path) as f:
for word in f.read().split():
letter = word[0].lower()
if letter not in counts:
counts[letter] = 0
counts[letter] = max(counts[letter], len(word))
return counts
doctest.run_docstring_examples(longest_word_by_letter, globals())
Group Activity: DNA match score¶
Write a function dna_match_score
that takes two strings of the same length that represent DNA sequences and returns their alignment score. DNA sequences are strings with only the characters "A"
, "C"
, "G"
, "T"
, or "-"
(to represent a gap). When aligning the two DNA sequences, there will never be a gap in both strings at the same index.
To compute the alignment score, compare the characters that appear at the same index in both strings:
- If both characters match and are one of
"A"
,"C"
,"G"
,"T"
, the score is +2. - If both characters are one of
"A"
,"C"
,"G"
,"T"
but they don't match, the score is -1. - If one character is one of
"A"
,"C"
,"G"
,"T"
and the other is a gap"-"
, the score is -2.
For example, dna_match_score("-ATGC", "CATGT")
returns 3
following the process in the table below for each index 0 through 4 in the DNA sequences.
i | seq1 |
seq2 |
score |
---|---|---|---|
0 | - | C | -2 |
1 | A | A | +2 |
2 | T | T | +2 |
3 | G | G | +2 |
4 | C | T | -1 |
def dna_match_score(seq1, seq2):
"""
Returns the alignment score of two DNA sequences of equal length, where score is the number of
matching (+2 points), non-matching (-1 points), and missing characters (-2 points).
>>> dna_match_score("-ATGC", "CATGT")
3
>>> dna_match_score("ATGC", "ATGC")
8
>>> dna_match_score("-AT", "C-T")
-2
"""
score = 0
for i in range(len(seq1)):
if seq1[i] == seq2[i]:
score += 2
elif seq1[i] == "-" or seq2[i] == "-":
score -= 2
else:
score -= 1
return score
doctest.run_docstring_examples(dna_match_score, globals())
Testing¶
Run all the tests and ensure your code is working by running the following code block.
doctest.testmod()
TestResults(failed=0, attempted=9)