image University of Washington Computer Science & Engineering
  CSE 428Wi '16:  Approximate Schedule
  CSE Home   About Us    Search    Contact Info 

Schedule details will evolve as we go.

    Due Lecture Topic Reading
Week 1
Tu   Introduction & Overview Papers Below
Week 2
Week 3
Tu   Work !  
Week 4
Tu   Work !!
Week 5
Tu   Work !!!
Week 6
Tu   Work !!!!
Week 7
Tu   Work !!!!!
Week 8
Tu   Work !!!!!!
Week 9
Tu   Work !!!!!!!
Week 10
Tu   Demos?

References:  Here is a sampling of the literature on genome assembly using "Next Generation Sequencing." If you find other interesting articles, please share them.

Most links below take you to PubMed, the NIH bibliographic database. Usually, but counterintuitively, from a PubMed abstract you click the icon of the publisher (or sometimes the icon saying "UW article online") to get to the actual article.

padlock   Journal access: Some of the journals and articles cited below are completely open access, or are freely available via PubMed Central (look for the "Free in PMC" icon).  Electronic access to other cited articles is generally free from on-campus IP addresses.  For off-campus access, follow the "[offcampus]" links or look at the UW library "proxy server" instructions.  Let me know if none of these work for you. padlock

References -- Introduction & Overview: Some background:

  1. WS Noble, "A quick guide to organizing computational biology projects." PLoS Comput. Biol., 5, #7 (2009) e1000424.
  2. Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program. Available at: Accessed 30 Mar 2012.
  3. V Makinen, D Belazzougui, F Cunial, AI Tomescu, Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing, Cambridge, 2015. (Amazon)

References -- Reviews: Some Review articles on genome assembly:

  1. K Paszkiewicz, DJ Studholme, "De novo assembly of short sequence reads." Brief. Bioinformatics, 11, #5 (2010) 457-72. [offcampus]
  2. SD Jackman, I Birol, "Assembling genomes using short-read sequencing technology." Genome Biol., 11, #1 (2010) 202. [offcampus]
  3. J Henson, G Tischler, Z Ning, "Next-generation sequencing and large genome assemblies." Pharmacogenomics, 13, #8 (2012) 901-15. [offcampus]
  4. JR Miller, S Koren, G Sutton, "Assembly algorithms for next-generation sequencing data." Genomics, 95, #6 (2010) 315-27. [offcampus]
  5. JT Simpson, M Pop, "The Theory and Practice of Genome Sequence Assembly." Annu Rev Genomics Hum Genet, 16, (2015) 153-72. [offcampus]

References -- DeBruijn: DeBruijn Graphs and Euler Tours:

  1. PA Pevzner, H Tang, MS Waterman, "An Eulerian path approach to DNA fragment assembly." Proc. Natl. Acad. Sci. U.S.A., 98, #17 (2001) 9748-53. [offcampus]
  2. DR Zerbino, E Birney, "Velvet: algorithms for de novo short read assembly using de Bruijn graphs." Genome Res., 18, #5 (2008) 821-9. [offcampus]
  3. MJ Chaisson, D Brinza, PA Pevzner, "De novo fragment assembly with short mate-paired reads: Does the read length matter?" Genome Res., 19, #2 (2009) 336-46. [offcampus]
  4. PE Compeau, PA Pevzner, G Tesler, "How to apply de Bruijn graphs to genome assembly." Nat. Biotechnol., 29, #11 (2011) 987-91. [offcampus]
  5. JT Simpson, K Wong, SD Jackman, JE Schein, SJ Jones, I Birol, "ABySS: a parallel assembler for short read sequence data." Genome Res., 19, #6 (2009) 1117-23. [offcampus]
  6. R Luo, B Liu, Y Xie, Z Li, W Huang, J Yuan, G He, Y Chen, Q Pan, Y Liu, et 20 al., "SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler." Gigascience, 1, #1 (2012) 18. [offcampus] (Erratum: Gigascience. 2015 Jul 8;4:30. doi: 10.1186/s13742-015-0069-2. eCollection 2015, PMID: 26161257)
  7. Y Safonova, A Bankevich, PA Pevzner, "dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes." J. Comput. Biol., 22, #6 (2015) 528-45. [offcampus]

References -- String Graphs: Fragment assembly via string graphs:

  1. EW Myers, "The fragment assembly string graph." Bioinformatics, 21 Suppl 2, (2005) ii79-85. [offcampus]
  2. JT Simpson, R Durbin, "Efficient construction of an assembly string graph using the FM-index." Bioinformatics, 26, #12 (2010) i367-73. [offcampus]
  3. JT Simpson, R Durbin, "Efficient de novo assembly of large genomes using compressed data structures." Genome Res., 22, #3 (2012) 549-56. [offcampus]
  4. I Ben-Bassat, B Chor, "String graph construction using incremental hashing." Bioinformatics, 30, #24 (2014) 3515-23. [offcampus]

References -- String Indices: Suffix Arrays, BWT, FM-index, etc.:

  1. U Manber, E Myers (1990). Suffix arrays: a new method for on-line string searches. First Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 319-327.
  2. U Manber, E Myers (1991), Suffix arrays: A new method for on-line string searches. [offcampus]
  3. [offcampus]
  5. Ben Langmead's BWT/FM-index tutorial:

References -- Evaluation: How good is your assembly?:

  1. D Earl, K Bradnam, J St John, A Darling, D Lin, J Fass, HO Yu, V Buffalo, DR Zerbino, M Diekhans, et 61 al., "Assemblathon 1: a competitive assessment of de novo short read assembly methods." Genome Res., 21, #12 (2011) 2224-41. [offcampus]
  2. KR Bradnam, JN Fass, A Alexandrov, P Baranay, M Bechner, I Birol, S Boisvert, JA Chapman, G Chapuis, R Chikhi, et 81 al., "Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species." Gigascience, 2, #1 (2013) 10. [offcampus]
  3. G Narzisi, B Mishra, "Comparing de novo genome assembly: the long and short of it." PLoS ONE, 6, #4 (2011) e19175.
  4. W Zhang, J Chen, Y Yang, Y Tang, J Shang, B Shen, "A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies." PLoS ONE, 6, #3 (2011) e17915.

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX