CSED 516: Reading Assignments Schedule, Fall 2021

Please do not repost or otherwise distribute the materials available from this website. Some material is available freely on the web, other is behind a paywall, other is private and we only have permission to use the material in class, not distribute.

All reviews are due before the beginning of the lecture. There are no late days for paper reviews.

October 12. Review 1

Submit your review here.
  1. What goes around Read sections 1-5 and 10. The other sections are not recommended and we will not discuss them in class.
  2. A Case Against SQL
Some suggested topics for discussion in your review:

October 19. Review 2

Submit your review here.
  1. How good are they?

This is a very good paper; I recommend reading it entirely. We will discuss several aspects of this paper in class.

For those interested in additional information: this video describes the optimizer of SQL Server, which some consider to be the best in industry.

October 26. Review 3

Submit your review here.
  1. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04.
    • Read only sections 1,2,3
  2. D. DeWitt and M. Stonebraker. Mapreduce – a major step backward. In Database Column (Blog), 2008.
  3. Ashish Thusoo et al: Hive - a petabyte scale data warehouse using Hadoop. ICDE 2010: 996-1005.
    • Read sections 1, 2, and skim through section 4 (focus on the optimizations)
Suggested discussion topics:

November 2. Review 4

Submit your review here.
  1. M. Zaharia et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012
  2. McSherry Scalability, but at what cost?
Suggested discussion topics:

November 9. Review 5

Submit your review here.
  1. Anurag Gupta, et al.Amazon Redshift and the Case for Simpler Data Warehouses. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15).
    • Skim over the paper (it's very high level)
    • Suggested discussion points: What does an redshift cluster consist of? What types of data partition does redshift support? (We will discuss these in detail in a later lecture, but it's easy to imagine what they do.) What was the key metric for the redshift design team? How long (seconds or minutes) did it typically take to launch a redshift cluster?
  2. Dageville et al, The Snowflake Elastic Data Warehouse. SIGMOD Conference 2016: 215-226.
    • Read sections 1,2,3, skim over 4, and read sec. 6
    • Suggested discussion topics:
      • What is elasticity, why is it important, and how is it supported in Snowflake?
      • How is data storage handled in Snowflake, and why? What would have been the alternatives?
      • How are worker failures handled in Snowflake? How does this compare to MapReduce?
      • How does snowflake handle semistructure data?

November 16. Review 6

Submit your review here.
  1. BigQuery (dremel)
This paper has lots of important information; please read carefully. Some useful things to know, and some suggestions for questions discussion topics:

November 23. Review 7

Submit your review here.
  1. The Design and Implementation of Modern Column-Oriented Database Systems.
    • Read sections 1, 2, skim over Sec. 3
    • Read sections 4.1, 4.4., 4.5
Suggested discussion points: