CSE 344, Spring 2018

Intro to Data Management

Course Info

Course Information and Policies

Instructor: Evan McCarty CSE 214
Lecture: Monday, Wednesday, Friday 9:30-10:20 MLR 301

Contact Info

Contact Information and Office Hours

Course Email List: You should automatically be subscribed to this list and receive important email. For the most part, expect to receive announcements through the course piazza page.

Course Staff and Office Hours:
  Instructor: Evan McCarty: ejmcc@cs.washington.edu   Monday and Wednesday, 12:30-1:30 or by appointment; Room CSE 214

TAs:
  Sravan Konda:sravan75@uw  
Tuesday 3:30-4:20, CSE 007
  Ariel Lin:arielin@uw  
Friday 2:30-3:20, 2nd Floor Breakout and
Monday 2:00-2:50, CSE 007
  Matthew Liu:liux44@uw  
Wednesday 1:30-2:20, 4th Floor Breakout
  Michelle Prawiro:mp19@uw  
Monday 11:30-12:20, CSE 007 and
Friday 11:30-12:20, 2nd Floor Breakout
  Jason Tan:jct96@uw  
Wednesday 4:30-5:20, CSE 220

Lectures

Lecture Materials

By end of day, lecture slides will be posted here from the day's lecture. Topics for the following lecture will be uploaded along with that day's slides', along with relevant chapters from the Database Systems textbook by Garcia-Molina, Ullman and Widom (GUW). Readings that are required before lecture will be indicated in bold.
  1. 1. March 26th: Course Introduction and Motivation   [ pptx | pdf ]
         No reading from GUW
  2. 2. March 28th: The Data Model and introduction to relational databases   [ pptx | pdf ]
         GUW 2.1-2.2
  3. 3. March 30th: Joins   [ pptx | pdf ]
         GUW 6.1-6.2
  4. 4. April 2nd: Grouping and Aggregation   [ pptx | pdf ]
         GUW 6.4
  5. 5. April 4th: Subqueries   [ pptx | pdf ]
         GUW 6.3
  6. 6. April 6th: Relational Algebra   [ pptx | pdf ]
         GUW 2.4
  7. 7. April 9th: Datalog   [ pptx | pdf ]
         GUW 5.3
  8. 8. April 11th: Datalog   [ pptx | pdf ]
         GUW 5.4 Souffle Guide
  9. 9. April 13th: Intro to Semi-structured data   [ pptx | pdf ]
         GUW 11.1
         Comparing relational to semi-structured and distributed data bases
  10. 10. April 16th: Data Management with Semi-structured data   [ pptx | pdf ]
         GUW 11.2-11.4; Note that the textbook uses XML, not JSon
  11. 11. April 18th: SQL++   [ pptx | pdf ]
         SQL++ Manual
  12. 12. April 20th: Physical Plans   [ pptx | pdf ]
         GUW 15.1-15.2
  13. 13. April 23rd: Indexing   [ pptx | pdf ]
         GUW 14.1-14.3,15.6
  14. 14. April 25th: Disk Accesses   [ pptx | pdf ]
         GUW 15.2-15.3
  15. 15. April 27th: Plan Cost Estimation   [ pptx | pdf ]
         GUW 15.2-15.3
  16. 16. April 30th: Intro to Parallel Databases   [ pptx | pdf ]
         GUW 13.3,20.1,20.3
  17. 17. May 2nd: Map/Reduce   [ pptx | pdf ]
         GUW 20.2
  18. 18. May 4th: Map/Reduce II   [ pptx | pdf ]
         GUW 20.2
  19. 19. May 7th: Exam Review   [ pptx | pdf ]
         Practice Midterm. Solutions.
  20. 20. May 9th: Midterm Exam   [ No Slides ]
  21. 21. May 11th: Entity Relations   [ pptx | pdf ]
         GUW 4.1-4.3
  22. 22. May 14th: E/R constraints   [ pptx | pdf ]
         GUW 4.3-4.6
  23. 23. May 16th: Normalization   [ pptx | pdf ]
         GUW 3.1-3.3
  24. 24. May 18th: Lossless Decomposition and SQL Views   [ pptx | pdf ]
         GUW 3.4-3.5
  25. 25. May 21st: Transactions   [ pptx | pdf ]
         GUW 18.1-18.3
  26. 26. May 23rd: Scheduling   [ pptx | pdf ]
         GUW 18.3-18.5
  27. 27. May 25th: Isolation   [ pptx | pdf ]
         GUW 18.3-18.5
  28. 28. May 30th: Analysis and Ethics [Not on Final Exam]   [ pptx | pdf ]
         Bad Data Science: Debt and Growth NYPD CompStat
  29. 29. June 1st: Review   [ pptx | pdf ]
         Practice Final. Solutions.
Sections

Sections

Section material distributed to TAs will be made available here. Solutions to problems posted here must be gotten in section from the TA.
Sections (All times on Thursdays):
AA: Matthew Liu - 8:30 MGH 238
AB: Matthew Liu - 9:30 MGH 242
AC: Sravan Konda - 12:30 MGH 228
AD: Jason Tan - 1:30 DEN 212

TA led sections will be held weekly on Thursdays. You should expect to go to your registered weekly section. They will be incredibly helpful for review, applicable practice of the material, and hints on your homework. Please bring your laptop to section so that you can follow along with examples provided in the section.

  1. Section 1: Setting up Git and SQLite   Help with setup
  2. Section 2: Basic SQL   Slides   Worksheet   Solution
  3. Section 3: Relational Algebra   Slides   Worksheet   Solution
  4. Section 4: Datalog   Slides   Worksheet   Solution
  5. Section 5: SQL++   Slides   Worksheet   Solution
  6. Section 6: Cost Estimation + Parallel   Slides   Worksheet   Solution   Cost Estimation Guide
  7. Section 7: Map/Reduce + Spark   Slides   Worksheet   Solution   Extra Questions   Solution
  8. Section 8: Design Theory   Slides   Worksheet   Solution
  9. Section 9: Transactions   Slides   Worksheet   Solution
Homeworks

Homework Assignments

Turn in your assignments through the Canvas course page. In general, homework will be posted on Wednesdays and due the following Wednesday at 11:30 (for coding assignments) and 11:00 for the online quizzes. Use git pull upstream master to get the new assignments

Coding Assignments: 30% of your grade

Written Quizzes: 10% of your grade

Exams

The midterm for this course will be Wednesday, May 9th, from 9:30-10:20 in MLR 301 and will be 25% of your grade.
Here is the practice midterm. Here are solutions. Also, Here is a collection of previous 344 exams. The final for this course will be Wednesday, June 6th from 8:30 - 10:20 in MLR 301 and will be 35% of your grade.

The textbook is Database Systems: The Complete Book by Hector Garcia-Molina, Jeffrey D. Ullman and Jennifer Widom, 2nd edition


Acknowledgments: Many of the materials posted here and used in the course have been shared and refined by many other instructors and TAs in previous offerings of CSE344. This version of the course was particularly based on previous offerings by Profs. Cheung and Suciu