Link Search Menu Expand Document

DNA Strand

Implementing a DNA data type for scientific computing.1

Computational biology is an interdisciplinary area of computer science focused on programming computers to understand biological data, such as DNA, the carrier of genetic information in living things. Each DNA strand is made up of smaller units called nucleotides. There are four different nucleotides present in DNA: A, C, G, and T. Every human cell has billions of nucleotides, some of which are the same (or at least very similar) across almost all humans while other portions vary across the population. Computer programs can be used to analyze DNA for criminal justice and to simulate biological processes for scientific research.

DNA Strand

Oct 19
LinkedIntList
BJP 16.2
  1. Designate private visibility to encapsulate nested classes.
  2. Trace the execution of programs with nested object references.
  3. Apply the standard traversal pattern to loop over each item stored in linked nodes.
Oct 20
SectionLinkedIntList
Oct 21
Verification
BJP 16.3
  1. Define methods that handle the empty, front, middle, and end linked node cases.
  2. Explain why LinkedList (unlike ArrayList) implements the Queue interface.
Oct 22
SectionVerification
Oct 23
Recursive Tracing
BJP 12.1, 12.2
  1. Trace the execution of programs with method calls by transferring control.
  2. Trace the execution of programs using stack frames each with their own variables.
  3. Trace the execution of programs with a single recursive call.
DNA profiling
The process of determining an individual’s genetic fingerprint for use in criminal justice. One place where DNA tends to have high genetic diversity is in Short Tandem Repeats (STRs). An STR is a short sequence of nucleotides that tends to repeat consecutively numerous times at specific locations inside human DNA. The number of times any particular STR repeats varies a lot among individuals. In the DNA samples below, for example, Alice has the STR AGAT repeated four times in her DNA, while Bob has the same STR repeated five times.2
Alice
CTAGATAGATAGATAGATGACTA
Bob
CTAGATAGATAGATAGATAGATT

Specification

Implement a DNAStrand data type that represents DNA with linked Nucleotide nodes.

DNAStrand(String dna)
Constructs a new DNAStrand with the given dna string representing nucleotide bases.
int maxConsecutiveRepeats(DNAStrand substrand)
Returns the maximum number of consecutive repetitions of the nucleotides in substrand found within this strand. Throws an IllegalArgumentException if the substrand is null or empty. Assume the substrand is strictly shorter than this. Make sure to consider every node as a potential starting position for the maximum consecutive repeats.

There are a few additional requirements.

  • Do not construct any arrays, lists, strings, or other data structures.
  • Only the DNAStrand constructor may create new Nucleotide(...) instances.

Web app

To launch the web app, Open Terminal , paste the following command, and press Enter.

javac Server.java && java Server database.csv; rm *.class

Then, open the Network dropdown and select host 0.0.0.0:8000.

When you’re done, return to the terminal and enter the key combination Ctrl C to close the server.

Grading

Mark the program to submit it for grading. In addition to the automated tests, the final submission will also undergo code review for internal correctness on comments, variable names, indentation, data fields, and code quality.

Commenting errors
Method header does not describe the purpose of each parameter or does not describe the meaning of the return value of a non-void method.
Readability errors
Nondescriptive names for local variables or methods.
Control structure errors
Bad use of if/else, e.g. empty branch, if/if instead of if/else, poor factoring, unnecessary tests.
Bad use of loops. For example, including an unnecessary check before a loop that repeats the loop test.
Method errors
Not throwing exceptions at the top of a method when detecting the error does not require significant computation.
Using if/else instead of if for exception code (non-exception code should not be in an else branch).
Modifying a parameter to a method when such modification was not part of the intended behavior.
Miscellaneous errors
Failure to simplify code. For example, not factoring out common code, extra tests or loops that aren’t necessary. Create private “helper” method(s) to capture repeated code.
  1. Brian Yu and David Malan. 2020. DNA. In Nifty Assignments 2020. http://nifty.stanford.edu/2020/dna/

    Keith Schwarz. 2020. The Adventures of Links. In CS 106B Winter 2020. https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1204/handouts/260%20Assignment%207.pdf 

  2. Using multiple STRs, rather than just one, can improve the accuracy of DNA profiling. If the probability that two people have the same number of repeats for a single STR is 5%, and the analyst looks at 10 different STRs, then the probability that two DNA samples match purely by chance is about 1 in 1 quadrillion (assuming all STRs are independent of each other). So if two DNA samples match in the number of repeats for each of the STRs, the analyst can be pretty confident they came from the same person. CODIS, The FBI’s DNA profiling database considers 20 different STRs. In practice, since analysts know on which chromosome and at which location in the DNA an STR will be found, they can localize their search to just a narrow section of DNA. But we’ll ignore that detail for this problem and check the entire strand each time.