Database Group Meeting, Autumn '09

Database Group Meeting

Overview

This quarter the Database group meeting will be largely used for presenting current research at the Database group at UW with a few presenters coming from outside UW.

Meetings will be held in CSE 605 Database Lab unless specified otherwise. We meet from 3pm to 4pm.

The group meetings sponsored by Yahoo! as part of the Yahoo! Database Talk Series are labeled with.

Mailing List

You can sign up for the mailing list here. Send mail to that list at uw-db at cs.

Schedule

October 14, 2009	Abhay
October 21, 2009	- Cancelled -
October 28, 2009	Emad
November 6, 2009	Nilesh Dalvi
November 11, 2009	Veteran’s Day
November 18, 2009	Wolfgang
November 25, 2009	Philip A. Bernstein, MSR
December 2, 2009	Prasang
December 9, 2009	544 Presentations (starts at 1:30pm)

Details

Week 1: Bridging the Gap Between Intensional and Extensional Query Evaluation in Probabilistic Databases

Presenter: Abhay Kumar Jha

Abstract: There are two broad approaches to query evaluation over probabilistic databases : 1) Intensional Methods proceed by manipulating expressions over symbolic events associated with uncertain tuples. This approach is very general and can be applied to any query, but requires an expensive post-processing phase, which involves some general-purpose probabilistic inference 2) Extensional Methods, on the other hand evaluate the query by translating operations over symbolic events to a query plan; extensional methods scale well, but they are restricted to safe queries. In this paper, we bridge this gap by proposing an approach that can translate the evaluation of any query into extensional operators, followed by some post-processing that requires probabilistic inference. Our approach uses characteristics of the data to adapt smoothly between the two evaluation strategies. If the query is safe or becomes safe because of the data instance, then the evaluation is completely extensional and inside the database. If the query/data combination departs from the ideal setting of a safe query, then some intensional processing is performed, whose complexity depends only on the distance from the ideal setting.

Week 3: SciDB

Presenter: Emad Soroush

Abstract: Demo.

Week 5: Large-scale Information Extraction from the Web

Presenter: Nilesh Dalvi

Abstract: A significant portion of web pages embed interesting and valuable semantic content suitable for structured representation. The traditional Information Extraction techniques, however, fall short of achieving high quality extraction at Web scale. In this talk, I will outline some of the work going on at Yahoo! Research on addressing the challenges of Information Extraction on a Web scale. I will focus on wrapper-based techniques, which exploit the HTML structure of websites to extract the information of interest. I will address two problems: (i) making wrappers more robust to changes in websites, and (ii) enabling learning of wrappers from automatically obtained noisy training data.

Week 7: TBA

Presenter: Wolfgang Gatterbauer

Abstract: TBA

Week 8: Hyder: A Transactional Indexed Record Manager for Shared Flash Storage

Presenter: Philip A. Bernstein, Microsoft Research and Affiliate Professor at CSE, University of Washington

Abstract: An enormous increase in the I/O rate to shared storage is made possible by the availability of large flash storage chips and cheap high-speed network switches. Hyder is a research project to develop a new transactional indexed-record manager based on these technologies. It's a data-sharing system, where all compute servers have direct access to shared flash storage and no direct-attached disk. Its main feature is that it scales out without partitioning the database or application. It is therefore well suited to a data center environment, where scale-out is especially important and where specialized flash hardware and networking can be cost-effective. The software architecture that makes this possible is radically different than classical transactional record managers. It uses log-structured record storage, sliding-window RAID, binary search trees, and optimistic concurrency control. There is no locking, ARIES-style logging, or B-trees. After a brief discussion of motivation, I will spend most of the talk describing the architecture. This work is joint with Colin Reid, also at Microsoft.

Students yet to present this academic year

~~Abhay, Emad~~, Wolfgang, Prasang, Nodira, Kate, Julie, Vibhor, YongChul, Kristi, Marianne, Yingyi, Alexandra