CSE 490h: Problem-solving on large-scale clusters: theory and applications

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Home

Calendar & Readings

Grading & Objectives

Projects

Staff

Anonymous Feedback

Mail archive

In any reasonably complex system, it's rare that there is a single so-called "correct" answer; more often, the final solution is a balance of various tradeoffs. Oftentimes, a choice must be made between system complexity, reliability guaranteees (where "reliability" may mean several things, including as data consistancy or system uptime), or runtime (where "runtime" may mean several things, such as latency or bandwidth).
With that in mind, please answer the following questions when reading the MapReduce paper. If you're not used to diving directly into academic papers, this reader might provide a gentler introduction to MapReduce's underlying concepts.

Name two possible uses for MapReduce besides WordCount. Do not list problems that calculate a specific attribute of the words in a document corpus, eg, DistributedGrep or WhitespaceCount.
What kind of constraints does MapReduce place on its problem domain? Said in another way, what applications would not work in MapReduce? eg, could you use MapReduce's for Amazon's shopping cart? How about SETI@Home? An MMORPG?
Besides MapReduce, what is another possible strategy for parallelizing computation over a large document corpus? What are the strengths and weaknesses of that system relative to MapReduce?
What are the scarce resources in a MapReduce system (eg, disk space, personnel, CPU)? What is the absolute limit for adding those resources into a MapReduce system (eg, if skilled personnel were the limiting resource, then the hard limit would be the number of skilled labourers)? Assuming that limit were reached, how could MapReduce be tweaked to reduce dependancy on that resource?
* Bonus question: In an earlier lecture, we gave an example of shuffling the sentances in a document set by assigning each sentance a randomly-generated key. While this solution works in the theoretical MapReduce model, it is not guaranteed to work in practice. Why?