Final Project
Assigned: Feb 12th.
Part I - Initial Proposal:
This part is an individual assignment
DUE: Thursday Feb 14th
Your assignment is to pitch an idea that will form the basis of a final project. This project will be
developed during the remainder of the quarter (approximately 4 weeks) in small teams (2-3 persons).
Your first proposal should be a rough outline of the problem you are trying to solve, and the solution
you suggest.
Project Requirements:
The function performed by your project is entirely up to you. The only requirement for your project is
that it must utilize the Hadoop cluster. We encourage you to find a problem whose solution benefits
from such a distributed system. This could be porting an existing application to Hadoop, or something
more unique.
With this assignment, you have the opportunity to propose a project that you believe is interesting and
valuable, and if you can convince your peers, you can then design and build it in an team environment.
With multiple developers and the resources we've provided (namely, our bad-ass cluster), you will have
the power to build something phenomenal and resume-worthy, sure to give you the street-cred you need to
impress future employers or faculty-members.
If your product will depend on the availability of special software or datasets, it helps to get an early
start to determine both feasibility and accessibility, so be sure to mention this upfront.
Part II - Finalize Proposal:
DUE: Tuesday Feb 19th
Project proposals will be posted here after Thursday. You may then work together to form groups of 2
or 3, or you may work alone if you wish. You will then develop your proposal - this assignment has two
main parts:
- In an essay, describe your proposed project so that people understand what it is and why it is
valuable. Also, describe the project architecture so that it is clear that the system can be built
given the available resources and technology. For details, see the "Project Proposal Format" section
below
- Prepare a short "elevator pitch" (3-5 minutes) to present your group's idea to the class.
Project Proposal Format:
You will submit an essay of no more than 2 pages of text (illustrations are free). Your essay should
follow the outline below.
- Overview - 1-2 paragraphs. Describe and analyze the problem or idea, giving background on
the problem and listing some of the properties of existing solutions (if the idea isn't new). Also,
briefly explain your proposed solution, describing your top-level objectives, differentiators, and
the scope of the work.
- Suggested Solution and System Architecture - 2-3 paragraphs. Describe your solution in more detail, including essential system features and
organization. Provide an analysis of the technical feasibility at this level. If necessary, include a highlevel
sketch of the components and how they will integrate (illustrations may help).
- Development Plan -
1 paragraph. Describe a high-level timeline for this project, consisting of major milestones and
their short descriptions (1-2 sentences). This should help you to scope the work and determine the
number of developers you might need to complete your project.
- Feasibility Rationale -
1 paragraph. Evaluate the conceptual integrity of your idea and identify any risks. This is an
appropriate space to list concerns upfront which might require additional help from instructors or staff.
Ideas and Hints
Here is a list of some potential projects as well as some that have been worked on in the previous quarters:
- The Netflix Challenge - The Netflix Prize is an ongoing open competition for the best collaborative
filtering algorithm that predicts user ratings for films, based on previous ratings. Furthermore, we already have
netflix data on the cluster and this was given as a class project in the previous quarter so we have some pointers
of where to start. For detailed information you can see the prize page: http://www.netflixprize.com/
- N-Body Classical Mechanics Simulation - last year one of the projects turned the cluster into a simulator
for large sets of interacting objects. Ultimately, the cluster was used to simulate the collision of the milky way
and andromeda galaxies. For a low-res video go here
- Geozette - one of the projects used the cluster to extract geographical information from news articles
on the web and plot them on an interactive map. (link soon)