"The Music Bach Never Wrote" - Daniel Otero
    Using a large collection of midi files, ranging from classical to pop, analyze the input and create a directed graph representing the probabilistic distribution of musical phrases in relation to each other. This graph will be used to algorithmically compose music through something like a Markov Model.

"Geozette" - Julia Schwarz
   
Today, American's face geographic illiteracy. This is particularly problematic when reading international news: since people cannot pin an event to a geographic location, it becomes less tangible. Geozette aims to analyze news feeds and cluster results to present both news and trends on a map in a unique and visually appealing manner.

"Analyze and Predict Price Fluctuation for Everyday Products" - David Balatero
   
Small businesses often struggle to stay ahead of the fluctuating prices for products. Using data from eBay, shopping.com and elsewhere, attempt to build a model to analyze and predict the prices for products to help small business automatically control pricing.

"Social Networking Connectedness" - Jeff Shannon
   
Explore the connectedness of online communities by building complex graphs based on friendships and shared interests. This data can be used to analyze the patterns of friendship clustering, such as the sizing and separation of different clusters.

"Photo-Mosaics in Motion" - James George
   
In an artistic effort, this proposed tool will convert a video into a stream of photo-mosaic images; that is, each frame will be composed of a photo-mosaic. This style of video also poses an interesting question: When a frame sub-block is properly analyzed for color and gradient properties, will movement on screen lead to the selection of similar palette photos, naturally creating a mosaic of videos within a video?

"Music Recommendation System" - James Hughes
   
Using MusicBrainz data, create a music recommendation system which uses the method of Collaborative Filtering (CF), a technique for item-based recommendation.

"N-body problem simulation" - Slava Chernyak/Robert Gay
   
Considering a set of N point particles each creating a potential field, investigate the dynamics of each of the N particles in the composite potential. While this specific problem is considered unsolvable, this project will explore various optimizations to estimate results and then visualize the output. Both Slava and Robert wrote about the same problem, so it must be interesting!

"Bayesian Wikipedia Bot" - Mark Perry
   
Bayesian analysis is a popular method used to analyze the probability of whether a document is desired and is extremely common in spam filtering. Wikipedia contains more than 1.7 million articles with a combined 130 million edits, each edit containing information on content changed or reverted. The Bayesian Wikipedia Bot will train on a data set of these edits and then apply the results to an incoming feed of Wikipedia edits in order to analyze new edits and mark them as good or bad, assisting editors in fighting vandalism and false information.

"Course Source Code/Lecture Notes Search" - Danny Suyanto
   
Assist instructors in searching texts across the desktop, network storage and course websites by creating a custom search engine geared towards their needs and file-types.

"Multiple Sequence Alignment" - Michael Hoak
   
Use MapReduce to generate a phylogenetic three, or evolution tree, from nonaligned DNA sequences. Optimize by only aligning sequences that are determined to be closely related.

"Paraphrase Extraction on the Web" - Brian Ngo
   
In any language, there are a multitude of ways to express the same idea. Lexically, this is called paraphrasing. Using MapReduce, create a dictionary of paraphrases based on a web corpus (i.e., a Nutch crawl). These results could be used in a multitude of ways, but would be particularly useful in search engines for improving ranking and query-recognition.

"Cellarspot Data Import and Recommendation System" - Alex Loddengaard
   
Cellarspot.com is a social network for wine lovers, though it hasn't yet achieved a critical mass to support it's interface. In order to make it easier for users to enter wine bottles, it can be seeded with results that are important from a crawl or scrape of wine data on other sites. These results could be used for auto-completion or search. Additionally, this data could be used to generate a recommendation system around wine.

"Netflix Recommendation Challenge" - Andrew Hitchcock
   
Explore new methods to improve Netflix's recommendation system. Improving the system by 10% results in a million dollar prize!

"Distributed Neural Network for Image Classification" - Brian Steadman
   
Build a distributed neural network that can be used for classifying images with tags. The neural network could be trained on tagged Flickr images. Images would be preprocessed to detect edges and fed into the network, then corrected with back-propagation. This model could eventually be used to solve many other types of classification problems.

"Classifying Electrical Data from the Human Brain" - Tim Wong
   
Attempt to identify and classify brain signals for use in new interfaces for the impaired. Data would be provided by the Harborview Medical Center.

"Probabilistic DNF Formula Simulator" - Brian Harris
   
Brian has already developed a probabilistic database interface called MistiQ, which computes probabilities by approximation via Monte-Carlo simulations over DNF formulas. This project would attempt to parallelize MistiQ to create a generalized distributed application for DNF formula simulation over multiple formulas.