Instructors: Magdalena Balazinska and Bill Howe
Meeting times: Fridays 1:30pm-4:20pm (ending just in time for TGIF!)
Location: CSE 405.
Class mailing list : https://mailman.cs.washington.edu/mailman/listinfo/cse599c
Scientists today face an avalanche of data. Oceanographers generate terabytes with daily forecasts of temperature, elevation, and velocity. Astronomers acquire hundreds of millions of images from increasingly powerful telescopes. Physicists are already discussing petabyte-scale datasets collected from particle accelerators. Biologists have sequenced the human genome, itself a large dataset, and are now describing the complex interactions between all 20,000 - 80,000 protein-encoding genes, not to mention the interactions between the proteins they encode. In all cases, scientists' ability to collect data has outpaced their ability to manage it. Complicate matters with non-standard data types, extreme performance demands, and ever-changing requirements, and you have one of the major data management challenges of today. What do these applications have in common, and why are traditional data management tools inadequate? In this course, we will investigate this question from the perspective of modern database research. We will look at what scientific datasets in different domains have in common, and what sets them apart. We will survey the literature in this area, and explore tools used in practice.
Approximately two papers will be assigned for each class. Please read the papers and come prepared to discuss them.
The course grade will be based on participation.
The course calendar is still preliminary and subject to change.
Date |
Topic and readings | Discussion | |
---|---|---|---|
April 2 |
Topic: Data deluge in science and its implications. Guest talks: Readings: All the papers below are very quick reads, except the last one, which is a bit longer.
|
Open |
|
April 9 |
Topic: Science in the cloud (part 1) Instead of a normal lecture, we will attend the Cloud Futures 2010 workshop! Readings: None assigned. |
None |
|
April 16 |
Topic: Science in the cloud (part 2) Lecture notes: lecture3.pdf Readings:
|
Positive: A |
|
April 23 |
Topic: Data intensive analytics Lecture notes: lecture4.pdf Readings:
|
Positive: B Negative: C Break: A |
|
April 30
|
Topic: New data types (arrays, meshes, and other) Lecture notes: Readings:
|
Positive: C Negative: A Break: B |
|
May 7 |
Topic: RDF and ontologies Lecture notes: lecture6.pdf Guest talk by David Jones, Department Head, Environmental & Information Systems (EIS), UW Applied Physics Lab on data management challenges in the ocean sciences. Note that there is a third paper to read this week on the NANOOS Visualization System, which David will present. Readings:
Other readings (not required):
|
Positive: D Negative: E Break: F |
|
May 14 |
Topic: Query composition and language bindings Lecture notes: Readings:
|
Positive: E Negative: F Break: D |
|
May 21 |
Topic: Scientific workflows and mashups Readings:
|
Positive: F Negative: D Break: E |
|
May 28 |
Topics:
Lecture notes:
Readings:
|
Positive: G Negative: H |
|
June 3 |
Topic: Visualization Lecture notes: Readings: off this week for SIGMOD |
Positive: H Negative: G |