DNA Strand
Implementing a DNA data type for scientific computing.1
Computational biology is an interdisciplinary area of computer science focused on programming computers to understand biological data, such as DNA, the carrier of genetic information in living things. Each DNA strand is made up of smaller units called nucleotides. There are four different nucleotides present in DNA: A, C, G, and T. Every human cell has billions of nucleotides, some of which are the same (or at least very similar) across almost all humans while other portions vary across the population. Computer programs can be used to analyze DNA for criminal justice and to simulate biological processes for scientific research.
DNA Strand
- Oct 19
- LinkedIntList
- BJP 16.2
- Designate private visibility to encapsulate nested classes.
- Trace the execution of programs with nested object references.
- Apply the standard traversal pattern to loop over each item stored in linked nodes.
- Oct 20
- SectionLinkedIntList
- Oct 21
- Verification
- BJP 16.3
- Define methods that handle the empty, front, middle, and end linked node cases.
- Explain why
LinkedList
(unlikeArrayList
) implements theQueue
interface.
- Oct 22
- SectionVerification
- Oct 23
- Recursive Tracing
- BJP 12.1, 12.2
- Trace the execution of programs with method calls by transferring control.
- Trace the execution of programs using stack frames each with their own variables.
- Trace the execution of programs with a single recursive call.
- DNA profiling
- The process of determining an individual’s genetic fingerprint for use in criminal justice. One place where DNA tends to have high genetic diversity is in Short Tandem Repeats (STRs). An STR is a short sequence of nucleotides that tends to repeat consecutively numerous times at specific locations inside human DNA. The number of times any particular STR repeats varies a lot among individuals. In the DNA samples below, for example, Alice has the STR
AGAT
repeated four times in her DNA, while Bob has the same STR repeated five times.2- Alice
- CTAGATAGATAGATAGATGACTA
- Bob
- CTAGATAGATAGATAGATAGATT
Specification
Implement a DNAStrand
data type that represents DNA with linked Nucleotide
nodes.
DNAStrand(String dna)
- Constructs a new
DNAStrand
with the givendna
string representing nucleotide bases. int maxConsecutiveRepeats(DNAStrand substrand)
- Returns the maximum number of consecutive repetitions of the nucleotides in
substrand
found within this strand. Throws anIllegalArgumentException
if thesubstrand
is null or empty. Assume thesubstrand
is strictly shorter thanthis
. Make sure to consider every node as a potential starting position for the maximum consecutive repeats.
There are a few additional requirements.
- Do not construct any arrays, lists, strings, or other data structures.
- Only the
DNAStrand
constructor may createnew Nucleotide(...)
instances.
Web app
To launch the web app, Open Terminal
javac Server.java && java Server database.csv; rm *.class
Then, open the Network
When you’re done, return to the terminal and enter the key combination Ctrl
C
to close the server.
Grading
Mark the program to submit it for grading. In addition to the automated tests, the final submission will also undergo code review for internal correctness on comments, variable names, indentation, data fields, and code quality.
- Commenting errors
- Method header does not describe the purpose of each parameter or does not describe the meaning of the return value of a non-void method.
- Readability errors
- Nondescriptive names for local variables or methods.
- Control structure errors
- Bad use of if/else, e.g. empty branch, if/if instead of if/else, poor factoring, unnecessary tests.
- Bad use of loops. For example, including an unnecessary check before a loop that repeats the loop test.
- Method errors
- Not throwing exceptions at the top of a method when detecting the error does not require significant computation.
- Using if/else instead of if for exception code (non-exception code should not be in an else branch).
- Modifying a parameter to a method when such modification was not part of the intended behavior.
- Using if/else instead of if for exception code (non-exception code should not be in an else branch).
- Miscellaneous errors
- Failure to simplify code. For example, not factoring out common code, extra tests or loops that aren’t necessary. Create
private
“helper” method(s) to capture repeated code.
Brian Yu and David Malan. 2020. DNA. In Nifty Assignments 2020. http://nifty.stanford.edu/2020/dna/
Keith Schwarz. 2020. The Adventures of Links. In CS 106B Winter 2020. https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1204/handouts/260%20Assignment%207.pdf ↩
Using multiple STRs, rather than just one, can improve the accuracy of DNA profiling. If the probability that two people have the same number of repeats for a single STR is 5%, and the analyst looks at 10 different STRs, then the probability that two DNA samples match purely by chance is about 1 in 1 quadrillion (assuming all STRs are independent of each other). So if two DNA samples match in the number of repeats for each of the STRs, the analyst can be pretty confident they came from the same person. CODIS, The FBI’s DNA profiling database considers 20 different STRs. In practice, since analysts know on which chromosome and at which location in the DNA an STR will be found, they can localize their search to just a narrow section of DNA. But we’ll ignore that detail for this problem and check the entire strand each time. ↩