CSED 516: Reading Assignments Schedule, Fall 2021

CSED 516: Reading Assignments Schedule, Fall 2022

Please do not repost or otherwise distribute the materials available from this website. Some material is available freely on the web, other is behind a paywall, other is private and we only have permission to use the material in class, not distribute.

All reviews are due before the beginning of the lecture. There are no late days for paper reviews.

October 11. Review 1

Submit your review on Canvas

What goes around Read sections 1-5 and 10. The other sections are not recommended and we will not discuss them in class.
A Case Against SQL

Some suggested topics for discussion in your review:

What is physical and logical data independence?
Briefly compare data independence in IMS, Codasyl, and the relational model.
Speculate what led to the decline of IMS / Codasyl and rise of the relational model.
Explain briefly three peculiar behaviors of SQL

October 18. Review 2

Submit your review on Canvas

How good are they?

For those interested in additional information: this video describes the optimizer of SQL Server, which some consider to be the best in industry.

October 26. Review 3

Submit your review on Canvas

Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04.
- Read only sections 1,2,3
D. DeWitt and M. Stonebraker. Mapreduce – a major step backward. In Database Column (Blog), 2008.
Ashish Thusoo et al: Hive - a petabyte scale data warehouse using Hadoop. ICDE 2010: 996-1005.
- Read sections 1, 2, and skim through section 4 (focus on the optimizations)

November 1. Review 4

Submit your review on Canvas

M. Zaharia et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012
McSherry Scalability, but at what cost?

November 8. Review 5

Submit your review on Canvas

Anurag Gupta, et al.Amazon Redshift and the Case for Simpler Data Warehouses. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15).
- Skim over the paper (it's very high level)
- Suggested discussion points: What does an redshift cluster consist of? What types of data partition does redshift support? (We will discuss these in detail in a later lecture, but it's easy to imagine what they do.) What was the key metric for the redshift design team? How long (seconds or minutes) did it typically take to launch a redshift cluster?
Dageville et al, The Snowflake Elastic Data Warehouse. SIGMOD Conference 2016: 215-226.
- Read sections 1,2,3, skim over 4, and read sec. 6
- Suggested discussion topics:
  - What is elasticity, why is it important, and how is it supported in Snowflake?
  - How is data storage handled in Snowflake, and why? What would have been the alternatives?
  - How are worker failures handled in Snowflake? How does this compare to MapReduce?
  - How does snowflake handle semistructure data?

November 15. Review 6

Submit your review on Canvas

The Design and Implementation of Modern Column-Oriented Database Systems.
- Read sections 1, 2, skim over Sec. 3
- Read sections 4.1, 4.4., 4.5

Suggested discussion points:

What are the differences between column and row oriented data stores?
Discuss at least one technique from Section 4.
What are column stores good for?