Papers, etc. |
Note on Electronic Access to Journals
Links to full papers below are often to journals that require a
paid subscription. The UW Library is generally a paid
subscriber, and you can freely access these articles if you do
so from an on-campus computer. For off-campus access,
follow the "[offcampus]" links below or
look at the
library "proxy server" instructions.
You will be prompted for your UW net ID and password once per
session.
03/27: ---- Organizational Meeting ----
04/03: Estimating significance of whole genome multiple alignments -- Amol Prakash, CSE
| Abstract:
Whole genome alignments are being widely used by biologists for
multiple purposes related to comparative genomics. Before doing
any such analysis on a particular portion of the alignment, it
is critical to have confidence that that portion is correctly
aligned. Unfortunately, no method to estimate the significance
of these whole genome alignments has been suggested to date. In
this work we provide a methodology to do so and assess
significance of the 8-vertebrate MultiZ alignment present on the
UCSC Genome Browser. We report approximately 0.75% (1.57Mbp) of
the human chromosome 1 alignment as having high p-value. This
number increases to 14% if we consider only the alignments
containing zebrafish. The results for chromosome 1 are
available as a UCSC browser track at
http://bio.cs.washington.edu/SigMA-w/. This is also the first
tool that can compute the significance of every portion of an
alignment, and not just the entire alignment.
|
04/17: Array CGH data analysis - theory and practice -- Amir Ben-Dor, Agilent
| Abstract:
Cancer typically arises as a result of an acquired genomic instability
and subsequent clonal evolution of neoplastic cells. Consequently,
cancer cells contain multiple regions of copy number gains and losses
throughout their genomes. The patterns of copy number aberrations
present in a cancer genome consist of selected and non-selected lesions
and vary within and across different tissues of origin. For example loss
of CDKN2A (9p21) is frequent in melanomas and lung carcinomas, SMAD4
(18q21) deletions are present in colon cancers, and HER2/NEU (17q12)
amplification is often seen in breast carcinomas. Recent technology
developments introduced an oligonucleotide array platform for array
based comparative genomic hybridization (aCGH) analyses. This platform
provides increased resolution in determining the boundaries of measured
genome alterations.
In the talk I will review the biological background of cancer genome
instabilities, and the measurement technologies, in particular array
CGH, and discuss in details data analysis goals, tasks and solutions.. I
will briefly describe the data analysis workflow of a multi-sample array
cgh cancer study, and provide details on the more complicated steps of
the analysis - aberration calling, data centering, detecting common
aberrations and, if time permits, joint analysis of cgh and gene
expression data. To demonstrate the workflow, I'll share some examples
from on-going collaboration with John Weinstein group (NCI), analyzing
array CGH data for the NCI-60 drug discovery panel of cell lines. The
NCI-60 panel includes multiple highly annotated, well characterized
samples from nine different tissue types and thus represents a valuable
resource for studying the patterns of genomic lesions that may be
present in human cancers.
|
05/01: Discovery of higher-order functional features in the human genome -- Bob Thurman, GS
| Abstract:
It has long been hypothesized that the human and other large
genomes are organized into higher-order (i.e., greater than
gene-sized) functional domains. Recent technological advances
have enabled the rapid emergence of large-scale biological data
sets comprising specific functional variables (e.g.,
transcription, histone modifications, etc.) sampled in a nearly
continuous fashion across the genome. A major outstanding
question is to what degree such data reveal coherent
higher-order features that may in turn illuminate the underlying
functional architecture of the genome. To address this, we
developed novel approaches based on wavelet analysis for
discovery of \`\`domain-level'' behavior in fine scale functional
genomic data, and for correlating apparently disparate
functional data types collected at different resolutions and
scales. Wavelets represent a powerful mathematical framework
for decomposing a given genomic data type into increasingly
coarse scales, allowing broader and broader trends in the data
to reveal themselves. We apply this approach to a variety of
continuously sampled data types from the NHGRI ENCODE project to
visualize distinct higher order features of the human genomic
landscape. We then applied Hidden Markov Models (HMMs) to the
wavelet decomposition to provide segmentations of the ENCODE
regions into discrete functional states or domains. We also
correlate multiple continuous data types at multiple scales to
uncover important similarities and differences, a major feature
of which is that such relationships (e.g., the correspondence
between transcription, histone modification patterns, and
fine-scale evolutionary conservation) are often highly localized
in nature, disappearing and reappearing again from region to
region and locus to locus. The results highlight an analytical
framework which may be applied broadly to other complex genomes.
|
05/08: Computational exploration of biological organization with the Bioverse -- Jason Mcdermott, Microbiology
05/15: Protein Structure Prediction: an alternative model -- Charles Mader, Microbiology
| Abstract:
I will present an introduction to the protein structure
prediction problem in computational biology.
The Poisson-Boltzmann equation is commonly used to
predict electrostaict interactions in protein structure
prediction. I present acritique of the Poisson-Boltzmann approach to
protein electrostatics. Based on this critique I
derive the RB equation. The RB equation
provides a way to parameterize an energy function such that
the native conformation is the minimum engery. I show how
to use the p-space elipsoid to determine the resolution of
this model, and describe how the volume of the p-space
eplipsoid can be used to evaluate second order corrections
to the model.
Charles provides Additional Information:
|
05/22: CMfinder: A Covariance Model Based RNA Motif Finding Algorithm -- Zizhen Yao, CSE
| Abstract:
The recent discoveries of large numbers of non-coding RNAs
creates a need for tools for automatic, high quality
identification and characterization of conserved RNA motifs that
can be readily used for database search. Previous tools fall
short of this goal. CMfinder is a new tool for RNA motif
prediction. It is an expectation maximization algorithm using
covariance models for motif description, carefully crafted
heuristics for effective motif search, and a novel Bayesian
framework for structure prediction combining folding energy and
sequence covariation. When testing on known ncRNA families,
including some difficult cases with poor sequence conservation
and large indels, our method demonstrates excellent average
per-base-pair accuracy --- 79% compared with at most 60% for
alternative methods.
In this talk, I will discuss the algorithmic issues in CMfinder,
and a systematic framework for discovering ncRNAs at genomic
scale. In a continuing collaboration with biologists, we have
identified several dozens of promising candidates in different
bacterial clades, with one experimentally validated novel
riboswitch, and a few others under close investigation.
|
|