image University of Washington Computer Science & Engineering
 CSE 527, Au '04: Reading #1: What Students Found
  CSE Home   About Us    Search    Contact Info 

I asked for brief reports on good primers on course-related topics. Here's what you found:
Wikipedia tutorial on the basic principles of Statistics,

It covers a wide array of topics such as linear regression, extreme value theory and F-test. The contents are well organized and easy to follow. The Wiki tutorial is suitable for readers who want to get a quick overview on the fundamentals of Statistics, and works well as a reference guide for those who are beginners to this field. The wiki page also provides list of external resources on Statistics. And I found one of them, the "virtual laboratories in Probability and Statistics" (, to be exceptionally educational. It's an interactive online course that consists of text, applets, and data sets. This tutorial covers topics ranging from distributions to Bernoulli Trials and Hypothesis testing.

Alvis Brazma, Helen Parkinson, Thomas Schlitt, Mohammadreza Shojatalab, A quick introduction to elements of biology - cells, molecules, genes, functional genomics, microarrays, EMBL-European Bioinformatics Institute. October 2001

This biology overview was written by members of the EMBL's European Bioinformatics Institute. The EBI and EMBL have developed and maintain many of the molecular biology databases and algorithms in common use throughout the world. With this perspective, the authers give a thorough introduction to the aspects of molecular biology that are needed to comprehend papers in many areas of computational biology. Furthermore, the authors connect the biology to current (as of 2001) research in computational biology, including gene identification, microarrays, motif prediction, protein folding and gene and protein annotations, to name a few. The reader comes away with an appreciation for the diverse research areas in computational biology and enough biological understanding to begin to understand the source of biological complexity in current areas of active research. The document contains numerous links to more in-depth coverage and to the Online Biology Book, which gives a text book level presentation of cellular and molecular biology. This is also an excellent resource to the aspiring (computational) biologist, but can only be considered an introduction in the way CLR is an introduction to algorithms. The link is: For me, this document was a review in basic biology. The most relevant thing for me was as a future resource for links to instructive pictures and cartoons for the purpose of presentations. This would be an excellent site, however, for the computer scientist or mathematician with little or no understanding of biology, as each concept is presented with the assumption of no prior exposure.

'A Science Primer' from the National Center for Biotechnology Information,

The primer contains nine short chapters on the different types of information available through the Entrez site and some background on what the information means and how it was obtained. The primer is aimed at people with at least some science background and assumes a basic knowledge of biology, genetics, and proteins. I recommend this primer to people who have basic biology and computer science knowledge and would like a basic introduction to how these topics are used in biotechnology and modern genetics. My favorite section was on molecular modeling. I found the concept of trying to write programs to solve protein structures based on homology interesting.

This primer, offered through the National Center for Biotechnology Information (NCBI), introduces the reader to the following topics: Bioinformatics, Genome Mapping, Molecular Modeling, SNPs, ESTs, Microarray Technology, Molecular Genetics, Pharmacogenomics, and Phylogenetics. The primer is useful for both biologists and computer scientists since it provides a wealth of information (though more broad than deep) about both the biological and computational concepts underlying NCBI resources. Although the amount of information offered through NCBI remains overwhelming, the primer provides practical information about which database is useful for what. For example, it explains the subtle differences between BLAST (the Basic Local Alignment Search Tool), PSI-BLAST (Position-Specific Iterated BLAST), and PHI-BLAST (Pattern Hit Initiated BLAST). Together, the modules provide information that will help the reader use NCBIís databases more effectively.

Lawrence Hunter, Molecular Biology for Computer Scientists, cite

The article was a lot to digest, but it does seem to present a broad overview of the subject without going overboard with the details.

Also reviewed last year.

Larry Gonick, The Cartoon Guide To Genetics,

I found surprisingly informative. It was very funny as well.

Also reviewed last year.

Genomics and Its Impact on Science and Society, DOE's Human Genome Program

The article is 12 pages long and includes nice color graphics and references to other websites for additional information. Of particular interest is the "Gene Gateway" which contains tutorials, web data- bases, and tools anyone can use. The document is available in several formats (PDF and HTML) and there are also PowerPoint slides available for download. Much of the article goes over material already presented in class but it is still quite useful for someone who has not been exposed to this before to read it again. It was interesting to learn about the origins of the Genome Project (which I've always associated with the NIH) within the DOE. Other interesting information from this article :
* the average gene consists of 3000 bases, but sizes vary greatly with the largest one having 2.4 million bases;
* the functions of more than 50% of discovered genes are unknown
* the human genome sequence is almost (99.9%) exactly the same in all people --> upon reading this, I wondered how they knew this ???
* the human genome has a much greater portion (50%) of _repeat_ _sequences_ than the mustard weed (11%), the worm (7%), and the fly (3%)
--> perhaps because we have changed more over the course of our evolution than these other life forms ?
* I think I already knew this but it was interesting to read that the Y chromosome has the fewest genes (231 compared Chromosome 1 which as the most at 2968)
* researchers are studying how DNA variants correlate with indi- vidual responses to medical treatments; enzymes (proteins that catalyze chemical reactions) encoded by a particular multi- gene family (cytochrome P450) are responsible for metabolizing most drugs used today
* "gene therapy" (aka gene transfer) is a largely experimental field in which most current protocols are aimed at establishing the safety of gene-delivery processes rather than effectiveness The article also briefly discusses the myriad ethical and policy issues related to the availability of genetic information. One page is entitled "Building a 'Systems Level' View of Life" which brought to mind a recent IEEE Spectrum article which talked about the human body and the aging process from a reliability theory point of view ("Why We Fall Apart" IEEE Spectrum Sep '04).

I have searched on the web and I found the following page is very useful. It contains a set of tutorials for beginners in Computational Biology. For me, a gradutate student in EE studying speech recognition, I found it very useful as I have little background in biology.
In Chapter 1, the author gives introduction to sequance analysis in CompBio. What the problems are? How to solve them. This chapter is very useful for readers like me.
In Chapter 2, the author shows how to take advantage of internet to to research in CompBio. This chapter can be applied to all research areas.
Chapter 3 talks about the multiple alignment problem in detail.
Chapter 4 provides maths for molecular phylogenetics. Even I have a good math background, I still found these pages useful on some concetpts and methods.
In Chapter 5, the author presents genetic algorithm on proteins. I am familar with genetic algorithm. But it is very interesting to look how to apply it on protein folding simulations.

Hunter, Molecular Biology for Computer Scientists,

It's been years since the last time I've seen some of this stuff, and so it's been a really good review. I think he takes a more biology-centered route, as he starts by explaining how organisms are put together, and drills down to the molecular level, including how gene expression works. The second major topic of the paper briefly explains the major techniques that have been/are being used to gather biological data. Personally, I think this summary does a good job of conveying how 'messy' bio is - he mentions that DNA is not only found in chromosomes, but also in mitrochondria (which may have been separate organisms at some point), and plasmids (in bacteria). He also mentions that mitochondria are inherited (in humans) only maternally, since they've found in the cytoplasm of the egg ; this suggests that not all the information we're concerned about will be located in chromosomal DNA. I think it's a good summary, in that it has a lot of information (it even goes into detail about various model organisms), but doesn't overwhelm the reader with too much information (the section on catalysts had a lot of useful information, but didn't go into activation energy).

Also reviewed last year.

Dan E. Krane and Michael L. Raymer, Fundamental Concepts of Bioinformatics,

The Krane/Raymer book seemed to come from more of a "chemist's" point of view - there wasn't nearly as much discussion about how cells are put together, nor about the incredibly diversity of stuff (organelles, exceptions to rules, etc) in bio. It did have a section that gave an overview of how chemical bonds work, which was pretty cool. I felt that it also did a better job of explaining the various experimental techniques biologists use to get their data ; it also listed fewer methods ("Imaging" was left out ; the "DNA Sequencing" part wasn't as detailed). What I really liked about this was that it seemed to approach things from a more algorithmic point of view - things seemed cleaner, and better suited to developing procedures that deal with that.

Lawrence Hunter, Molecular Biology for Computer Scientist,

The article by Hunter is well written and will benefit computer scientists and engineers who are looking for a quick introduction to the relevant issues in computational biology. Hunter does a good job at explaining the fundamentals of cell biology, DNA replication, amino acids proteins, and gene expression. I particularly liked how Hunter organizes the information so that the reader who is not familiar with the concepts is not overwhelmed by it. Those of us who are used to encoutering mathematical expressions every 10 lines in everything we read will not be overwhelmed by the qualitative nature of the subject.

Also reviewed last year.

Michael Jordan, An introduction to bioinformatics,

Jordan's article is taken from Chapter 24 ("Bioinformatics") of his upcoming book, "Introduction to Graphical Models". This introduction is very terse and focuses almost exclusively on those topics of molecular biology that are relevant to computational techniques in gene sequencing, alignment, phylogenetic analisys. This introduction is very well written (as is the rest of Jordan's book), but a biologist will find the exposition very superficial. Indeed, the audience for this article consists of computer scientists in machine learning who want a *very* quick introduction to the undelying biology, and are anxious to start applying graphical models (hidden Markov models, etc.) to current problems in computational biology.

Gene Myers, Computational Biology for Computer Scientists,

The reading begins with a brief review of the structure and function of DNA and RNA. Using language and notation familiar to mathematical computer scientists, the author briefly describes transcription, translation and the genetic code before quickly moving on to provide a summary of basic tools which are currently available for working with DNA (polymerases, gel electrophoresis, and cloning systems). The chapter concludes with a motivational overview of several fundamental problems in computational biology: sequence comparison, approximate pattern matching, multiple squence comparison, and RNA secondary struction prediction. Myers notes two themes which he projects will consistently appear for computer scientists who are developing algorithms for the field: 1) the need for approximate matching (due to both evolution and the possibility of experimental error at the biological level); 2) the potential for solutions to traditionally NP-hard problems to be solvable in a specific context. It is his first point that I found most noteworthy - previously I had assumed that most of the challenges in this area surrounded being able to mine massive data sets in a tractable way. Yet now it seems that we have to also be very concerned with the quality and regularity of the data. The target audience for this reading would have had a previous course in biology, as the author does not dedicate too much time to introducing cellular materials and processes. As a CS student who took an undergraduate course in biology several years ago, I did find this material appropriate for myself. It was not difficult to see the strong parallels between his problem definitions and those which have been more generally covered in an algorithms textbook.

Also reviewed last year.

NCBI National Center for Biotechnology Information, A Basic Introduction to the Science Underlying NCBI Resources - What is a Genome,

My primer basically covered: Structural aspects of the Genome, like nuclear DNA, organelle DNA (Mitochondria), RNA, proteins, differences between introns and exons, as well as the genetic Code. Furthermore, it gives a small introduction to transcription and translation resulting in Proteins, also covering structural genes, junk DNA, and regulatory sequences. Mechanisms of genetic variation and heredity are covered too (while Mendelís Laws were also discussed). I like about this article that it covers all fundamental biology that leads to this enormous amount of sequence data (DNA, RNA, protein sequences and even whole genomes), thus underlying classical Bioinformatics, i.e. Computational Molecular Biology. But on the other hand, it might have been useful to introduce some more basic computational ideas, thus the website was created by NCBI that announces
Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.
In addition, the following link can be considered as a very good primer for computer scientists that are interested in Biology. (I read those ones too, but I think that they are not as relevant for the course as the other one because they cover the basics on cell

Hunter, Lawrence, Molecular Biology for Computer Scientists,

As someone who has absolutely no formal biology training in university, I found Hunter's primer on molecular biology to be well-suited for my needs. It begins with a general overview of basic molecular biology, from cellular structure, to DNA, to proteins and their pathways. One of the things that I can definitely take away from this reading is the nomenclature used in biology. As someone who works in a research lab that includes both bioinformaticists and bench scientists, it's edifying to get a lot of the words I hear thrown around in the laboratory clearly defined and placed into a context, such as "clone", "residue", and "primary structure". The reading ends with some biological techinques used to tease out relevant genetic information. Interestingly enough, microarray technology isn't included in this primer (maybe it's a bit dated?).
Overall, I think this reading is highly relevant for a technically-oriented person without substantial biology training. One aspect that I think could have been added that might've given more perspective would be some basic bioinformatics tools/techniques used widely in the field (brief discussion of HMMs, for instance, or BLAST alignment). A small section on that, I think, would have given the reader an idea of how this biological information is applied to computational research.

Also reviewed last year.

I thought they did a fairly good job considering the brevity of the site; they do link to a few other pages, though. The first few sections are an overview of the fundamental concepts needed for the last 2 sections. It's probably a bit too short for someone with no biology background. The last sections cover the idea behind functional genomics and an introduction to microarrays, research areas that seem to be receiving more attention at EBI. They make sure to mention a few databases and programs they have that, while not directly relevant, did help me better understand what people are/were working on and the type of data they are looking for.

Also reviewed last year.

Lawrence Hunter, Introduction to molecular biology for computer scientists,

I found this text to be very informative yet quite readable. Despite the detail present, Hunter introduces terminology and concepts at a reasonable pace and provides many analogies that computer scientists may find useful.

Also reviewed last year.

Computer Applications in Molecular Biology,

This is a *very* brief, almost "elevator summary", of molecular biology with some mention of computer applications RPTEOF ,0); &rpt( '', "", '', '', <Alvis Brazma, Helen Parkinson, Thomas Schlitt, Mohammadreza Shojatalab, A quick introduction to elements of biology - cells, molecules, genes, functional genomics, microarrays,

It's a very good online article to "refresh" my very limited knowledge about molecular biology. I took an introductory course on molecular biology four years ago, and since my later research has nothing to do with it, now I have almost forgotten everything I learned. This article helps me pick up some basic ideas quickly, and besides, it also introduces some computational aspects, ie., DNA and protein databases, and sequencing algorithms, which I didn't learn from that course. It also talks about microarray technology and data analysis, in a simple but clear manner. I particularly like its many links to more detailed explanations and other online resources about some terms/jargons in it. In summary, it's a really easy to read and "quick" introduction and I think it meets my needs very well.

Kapetanovic IM et al., Overview of Commonly Used Bioinformatics Methods and Their Applications, Kapetanovic IM et al. Overview of Commonly Used Bioinformatics Methods and Their Applications Annals of the New York Academy of Sciences. 1020: 10-21 (2004).

This article provides an excellent brief overview of bioinformatics methods. It highlights important definitions and key references for further details. This was a good article for someone like me who has had some previous exposure to these terms but needs accurate definitions and good references. It would need additional background and diagrams or figures for a novice reader. I especially enjoyed the section on fuzzy logic and had not realized how well this method can be applied to some data.
For some basic database info and a nice picture of a dendrogram, I would recommend: Debes JD and Urrutia R. Bioinformatics tools to understand human diseases. Surgery. 135(6): 579-585 (2004) found at the following

Lawrence Hunter, Molecular Biology for Computer Scientists,

In the paper, the technical terms such as, genotype, phonetype, mitosis and meiosis, were clearly defined. Some contrastive terms like, evaluation and mutation, were also discussed. An overview of the cell structure and the chemical structure of an amino acid were given. That can help me to understand more about the physical and chemical properties of the proteins. That is quite important for me to understand some experiments like, gel electrophoresis run. In this paper and in the class, issues of prasing a segment of DNA have been discussed. Maybe, Hidden Markov Model or Dynamic Time Wrapping can be applied here. These two algorithms are popular in speech recognition.
This paper gave me a detailed tutorial on biology. That is good for the people who do not have much biology background. I learnt Biology in high school and it is long enough for me to forget most of it. Hence, this paper is helpful for me. However, Since there are not many stuffs related to computational biology, I think that it is not that appropriate for the people who already have a strong biology background but want to know more about how pattern recognition techniques can be applied.

Also reviewed last year.

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to]