UpSizeR: Synthetically Scaling Up a Given Database State

Abstract

E-commerce and social networking services must ensure that their systems are scalable. Engineering for rapid growth requires intensive testing with scaled-up datasets. Although such a larger dataset is synthetically generated, it must be similar to a real dataset if it is to be useful.

This talk presents UpSizeR, a tool for scaling up relational databases. Given a database state D and a positive number s, UpSizeR generates a synthetic state D’ that is s times the size of D, yet similar to D in terms of query results. UpSizeR does this by extracting inter-column and inter-row information from D. UpSizeR can also be used by an enterprise to make a synthetic copy (s=1) of its proprietary dataset for a vendor, or scale down a production dataset (s<1) for non-production testing. Experiments with Flickr data shows good agreement between crawled data and UpSizeR output for various sizes.

However, UpSizeR currently cannot scale the social network topology in Flickr. This leads to the Attribute Value Correlation Problem: If D records data from a social network, how do the social interactions affect correlation among attribute values in D?

Bio

Y.C. Tay received his BSc from the University of Singapore and PhD from Harvard University. He is a professor in the Departments of Mathematics and Computer Science at the National University of Singapore. His main research interest is performance modeling (database transactions, wireless protocols, traffic equilibrium, cache misses). Other interests include distributed protocols and their correctness proofs. He is currently on sabbatical at UCLA.