CSE 544 - Syllabus, Winter 2011

Goals and class format

Computing is become data centered today. Databases have been at the heart of commercial applications for decades, but today both commercial and scientific organizations need to store and process huge volumes of data, and this requires and extension of data management techniques far beyond traditional database systems. Virtually every area of Computer Science today uses data management concepts. The purpose of this course is to discuss key concepts that underly both traditional databases and modern data management. This is a graduate class: we discuss all traditional topics and cover some of the in depth, and will also discuss some foundational material and some novel research topics.

List of topics:

  • Traditional Concepts.

    • SQL, Realtional Calculus, Non-recursive datalog with negation, Relational Algebra.

    • Data models, data normalization, constraints, views.

    • Transactions: recovery (Aries), concurrency control.

  • Systems Aspects

    • Architecture of a DBMS, storage, indexing.

    • Query execution, query optimization.

    • Database statistics, size estimation.

    • Parallel databases

  • Foundations of Data Management

    • Query languages and complexity classes.

    • Query containment, acyclc queries, semi-join reductions.

    • Datalog: least fixpoint semantics, magic set optimizations, extensions with negations, modern applications of datalog.

    • Data provenance: semiring provenance and applications.

    • Data privacy: differential privacy and adversarial privacy.

    • Probabilistic databases.

The class meets twice a week. Most classes will consists of lecturing, but we will also have discussions. Some of the material covered in the lectures is compiled from multiple papers/books and it is vital for you attend the lectures and read the notes carefully: it will be difficult to obtain the same information from other texts. Some lecture will have a reading assignments: please read the papers before class and come prepared to discuss them. Class participation is part of your final grade.

There will be three homework assignments. The main goal of these assignments is to ensure that everyone in the class knows (a) how to use a database management system (DBMS), (b) gets a sense of how to build a DBMS, (c) understand the theory that forms the foundation of data management.

Part of the class will be a programming project, which will be designed as a mini research project. You can find the instructions for the project on the class website at the project page. At the end of the project, you are expected to hand-in a conference-style paper and make a presentation in class.

General advice

Read the papers before coming to lecture and start your project early. Plan to work on the project, readings, and assignments in parallel! There is not enough time between deadlines to work on them in sequence. We expect you to spend a significant amount of time each week working on this class.

Assumed Background

We assume that you have taken an undergraduate database class and that you vaguely remember it. If you did not take a database class before, for each topic that we cover, you should read the related undergraduate material in the textbook. You can also expect that the workload in the class will be higher for you. But this should not discourage you from taking the class. In the past, students without prior database knowledge have successfully taken the class.

Textbooks and lecture notes

We will post copies of the slides used in the lecture on the class web site.

The following book is recommended for the class. Please make sure to get the third edition of this book.

Evaluation

The evaluation includes three assignments, a project, paper reviews, and class participation.

  • Paper reviews 15%. To be handed-in before each lecture. You may skip one review without any penalty. This is an individual assignment. Each student should submit an original review.

  • Assignments 45%. There will be three assignments. The first one should be easy. This is an individual assignment. The second is more challenging as it requires implementing parts of a simple DBMS. See below for late days policy. To be completed in groups of up to two students. The third assignment is theoretical: you will be asked to solve some theory problems, related to query languages, views, etc. This is an individual assignment.

  • Project: 30%. The project report will be worth 30% and the presentation will be worth 15%. To be completed in groups of up to three students.

  • Class participation: 10%

Late policy: Since this is a graduate class, we are more lenient about late days. We will accept valid excuses (conferences, paper deadlines, etc.) and will work with you to figure out the earliest day that you can hand in your work. Note, however, that the schedule for the entire course is very tight. Once you fall behind, it will be very hard to catch-up. Also, we will not grade anything handed in after the deadline for the project reports.

Collaboration policy: You are encouraged to discuss the content of this course with anyone you like. Assignments are to be completed in groups of up to two students. Within the group of two, divide the work any way you see fit - of course some modes of collaboration lead to better learning than others. Please do NOT discuss your answers with other groups, although feel free to point each other to any relevant documentation. The project is to be done in a group of up to three students. Groups can talk to each other about their projects as much as they want. Of course, if two teams pick the same project, we expect each team to produce original work different from that of other teams. Paper reviews should be done individually without talking to others. Feel free to look-up any information on the web that you may find useful in completing the assignments, projects, or paper reviews.

Topics and Schedule

Please see the class schedule posted on the class website.

Attendance

I hope you will attend every lecture. If you miss a lecture, talk to a friend who was present, and be sure to check the class website for messages.

Tools

The course website and mailing list will be used extensively to provide you with course information, such as the class schedule, lecture notes, homework assignments, class messages, and other things.

Computing Resources

Information about labs, computing resources, and various other useful pointers are posted on the class website.