Overview

This quarter the Database group meeting will be used mostly for presenting current

research as well as for inviting speakers from outside of CSE UW.

Meetings will be held in CSE 605 Database Lab unless specified otherwise.

The group meeting is sponsored by Yahoo! as part of the Yahoo! Database Talk Series.

Mailing List

You can sign up for the mailing list here. Send mail to that list at uw-db at cs.

Schedule

Date	Time	Presenter	Title
Mon, Apr 6	03:00pm	Brian Cooper (Yahoo! Research)	PNUTS: Yahoo!'s Massive Scale Data Platform
Wed, April 15	2.30pm	Kristi Morton
Wed, April 22	2.30pm	Wolfgang Gatterbauer
Wed, April 29	2.30pm
Wed, May 6	2.30pm
Wed, May 13	2.30pm	Abhay Jha
Wed, May 20	2.30pm
Web, May 27	2.30pm	Evan
Wed, June 3	2.30pm	Prasang
Wed, Jun 10	2.30pm	Marianne
Wed, June 17	2.30pm	Nodira

Details

Scalable Query Processing in Probabilistic Databases with SPROUT

Abstract:

I'll describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!'s web applications. When we set out to design PNUTS, our goal was to build a database system that could scale to thousands of servers, but still provide useful DBMS features like indexes, transactions, query optimization, views, and so on. Of course, to reach that scale you have to give up some of the richness of those features, and I'll talk about the tradeoffs that we have faced and the decisions we've made. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. I'll describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results. I'll also discuss experiences building a real production system out of research ideas, and how trying to build a system that actually had to work in production changed our vision and research approach to the system.

Short Bio:

Brian Cooper is a research scientist at Yahoo! Research. Before that he was an assistant professor at Georgia Tech, and before that he was a PhD student at Stanford. His interests are in building distributed systems, and in particular, distributed systems that do database-style management and processing of data. At Yahoo! he works on building very large distributed data storage and processing systems. In previous lives he has worked on self-adaptive peer-to-peer systems, distributed streaming event processing, reliable distributed archival data storage, and XML indexing.

Speaker schedule:

http://reserve.cs.washington.edu/visitor/week.php?year=2009&month=04&day=06&area=5&room=1385