Data Structures and Files¶
In this lesson, we'll practice our Python programming skills: loops, strings, lists, and dictionaries.
import doctest
Practice: DNA match score¶
Write a function dna_match_score
that takes two strings of the same length that represent DNA sequences and returns their alignment score. DNA sequences are strings with only the characters "A"
, "C"
, "G"
, "T"
, or "-"
(to represent a gap). When aligning the two DNA sequences, there will never be a gap in both strings at the same index.
To compute the alignment score, compare the characters that appear at the same index in both strings:
- If both characters match and are one of
"A"
,"C"
,"G"
,"T"
, the score is +2. - If both characters are one of
"A"
,"C"
,"G"
,"T"
but they don't match, the score is -1. - If one character is one of
"A"
,"C"
,"G"
,"T"
and the other is a gap"-"
, the score is -2.
For example, dna_match_score("-ATGC", "CATGT")
returns 3
following the process in the table below for each index 1 through 5 in the DNA sequences.
i | seq1 |
seq2 |
score |
---|---|---|---|
1 | - | C | -2 |
2 | A | A | +2 |
3 | T | T | +2 |
4 | G | G | +2 |
5 | C | T | -1 |
def dna_match_score(seq1, seq2):
"""
Returns the alignment score of two DNA sequences of equal length, where score is the number of
matching (+2 points), non-matching (-1 points), and missing characters (-2 points).
>>> dna_match_score("-ATGC", "CATGT")
3
>>> dna_match_score("ATGC", "ATGC")
8
>>> dna_match_score("-AT", "C-T")
-2
"""
result = 0
for i in range(len(seq1)):
if seq1[i] == seq2[i]:
result += 2
elif seq1[i] == '-' or seq2[i] == '-':
result -= 2
else:
result -= 1
return result
doctest.run_docstring_examples(dna_match_score, globals())
Practice: Words by letter¶
Write a function words_by_letter
that takes a string filename and returns a dictionary associating each letter with the number of words that begin with said letter. Normalize the first letter of each word to be lowercase. If the file is empty, return an empty dictionary.
def words_by_letter(filename):
"""
Returns a dictionary containing letter-count pairs, where each the count represents the number
of words starting with a given letter in the specified file.
>>> words_by_letter("simple.txt")
{'t': 3, 's': 2, 'i': 1}
>>> words_by_letter("twister.txt")
{'p': 24, 'a': 3, 'o': 4, 'i': 1, 'w': 1, 't': 1}
"""
out = {}
with open(file_name) as f:
for word in f.read().split():
letter = word[0].lower()
if letter not in out:
out[letter] = 0
out[letter] += 1
return out
doctest.run_docstring_examples(words_by_letter, globals())
Practice: Count divisible digits¶
Write a function count_divisible_digits
that takes two integers n
and m
and returns the number of digits in n
that are divisible by m
. For this problem, any digit in n
that is 0 is divisible by any number. Assume m
is a non-negative single digit: 0 ≤ m
< 10. If m
is 0, return 0.
Do not use str
to solve this problem in any way to solve any part of the problem. Instead, you should solve this problem by manipulating the number itself using integer division:
n
// 10 evaluates to all but the last digit ofn
.n
% 10 evaluates to the last digit ofn
.
def count_divisible_digits(n, m):
"""
Returns the number of digits in n that are divisible by m. If m is 0, then return 0. Likewise,
if any digit in n is 0, then it is divisible by all numbers.
>>> count_divisible_digits(650899, 3)
4
>>> count_divisible_digits(-204, 5)
1
>>> count_divisible_digits(10, 0)
0
"""
if m == 0:
return 0
elif n == 0:
return 1
else:
n = abs(n)
count = 0
while n > 0:
digit = n % 10
if digit % m == 0:
count += 1
n //= 10
return count
doctest.run_docstring_examples(words_by_letter, globals())
Testing¶
Run all the tests and ensure your code is working by running the following code block.
doctest.testmod()