CSE590Q: Database Seminar (Fall 2014)

Monday, 3:30-4:20, CSE 405

In this seminar we will read papers about hybrid big data management systems. These are systems that combine together two or more other existing systems either to get the best of both worlds or to offer the union of the underlying systems' capabilities.

  Date Paper(s) Link Slides Presenter

General motivation for using different types of systems

Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, Washington, DC, USA, 2-11. DOI=10.1109/ICDE.2005.1 http://dx.doi.org/10.1109/ICDE.2005.1

1 10/6/2014

Workflow systems developed in industry to combine various engines together

The "Big Data" Ecosystem at LinkedIn
Roshan Sumbaly, LinkedIn; Jay Kreps, LinkedIn; Sam Shah, LinkedIn

Note: Magda, Bill, and DanH will all be out-of-town on that day.

2 10/13/2014

Hadoop+RDBMS approach 1

A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and
A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS
technologies for analytical workloads.
PVLDB, 2(1), 2009.

3 10/20/2014

Hadoop+RDBMS approach 2

D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit,
A. Avanes, M. Flasza, and J. Gramling. Split query processing in
Polybase. In SIGMOD, 2013

4 10/27/2014

Hadoop/HPC integration

Shantenu Jha, Judy Qiu, André Luckow, Pradeep Kumar Mantha, Geoffrey
Charles Fox: A Tale of Two Data-Intensive Paradigms: Applications,
Abstractions, and Architectures. CoRR abs/1403.1528 (2014)

5 11/3/2014

DBMS + LA: Approach 1

Joseph M. Hellerstein, Christoper RĂ©, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib analytics library: or MAD skills, the SQL. Proc. VLDB Endow. 5, 12 (August 2012), 1700-1711. D

6 11/10/2014

DBMS + LA: Approach 2

Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, Thomas Moscibroda, and Zheng Zhang. 2012. MadLINQ: large-scale distributed matrix computation for the cloud. In Proceedings of the 7th ACM european conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 197-210.

7 11/17/2014

R and Hadoop

Ricardo: Integrating R and Hadoop (2010)
by Sudipto Das , Yannis Sismanis , Kevin S. Beyer , Rainer Gemulla , Peter J. Haas , John Mcpherson

8 11/24/2014

Hadoop and streaming engines

How to Fit when No One Size Fits
Harold Lim (Duke University); Yuzhang Han (Duke University); Shivnath Babu (Duke University)
CIDR 2013

9 12/1/2014

In situ data processing

Manos Karpathiotakis, Miguel Branco, Ioannis Alagiannis, Anastasia Ailamaki:
Adaptive Query Processing on RAW Data. 1119 - 1130.
VLDB 2014