CSE599T: Topics in Probabilistic and Statistical Databases

Friday, 9:30-12:20 CSE 503

Instructor: Dan Suciu


We will discuss concepts, algorithms, and systems used for process probabilistic data, and for applying statistical techniques to data management. Applications include management of uncertain data, data anonymization, approximate query processing, and query size estimation. We will discuss the probabilistic data model, several approaches to query evaluation, data lineage/provenance, the random graph data model, sketches from data, and sampling techniques. The course will consists mostly of lectures, very few reading assignments, and discussions in class.


Will be based on participation to the discussions in class

Mailing List:

Please subscribe cse599t mailing list.


1. Introduction

2. Representation of Probabilistic Databases

3. Representation and Query Evaluation

4. Dichotomy Theorem

5. Query Evaluation

6. Aspects of Query Processing

7. Probabilistic logic, Conditional logic

8. Implicit Probabilistic Data

9. Histograms and Sampling

10. Sampling and Review