"The Music Bach Never Wrote" - Daniel Otero
Using a large collection of midi files, ranging from
classical to pop, analyze the input and create a directed graph representing the
probabilistic distribution of musical phrases in relation to each other. This
graph will be used to algorithmically compose music through something like a
Markov Model.
"Geozette" - Julia Schwarz
Today, American's face geographic illiteracy. This is
particularly problematic when reading international news: since people cannot
pin an event to a geographic location, it becomes less tangible. Geozette aims
to analyze news feeds and cluster results to present both news and trends on a
map in a unique and visually appealing manner.
"Analyze and Predict Price Fluctuation for
Everyday Products" - David Balatero
Small businesses often struggle to stay ahead of the
fluctuating prices for products. Using data from eBay, shopping.com and
elsewhere, attempt to build a model to analyze and predict the prices for
products to help small business automatically control pricing.
"Social Networking Connectedness" - Jeff Shannon
Explore the connectedness of online communities by
building complex graphs based on friendships and shared interests. This data can
be used to analyze the patterns of friendship clustering, such as the sizing and
separation of different clusters.
"Photo-Mosaics
in Motion" - James George
In an artistic effort, this proposed tool will convert
a video into a stream of photo-mosaic images; that is, each frame will be
composed of a photo-mosaic. This style of video also poses an interesting
question: When a frame sub-block is properly analyzed for color and gradient
properties, will movement on screen lead to the selection of similar palette
photos, naturally creating a mosaic of videos within a video?
"Music Recommendation System" - James Hughes
Using MusicBrainz data, create a music recommendation
system which uses the method of Collaborative Filtering (CF), a technique for
item-based recommendation.
"N-body problem simulation" - Slava
Chernyak/Robert Gay
Considering a set of N point particles each creating a
potential field, investigate the dynamics of each of the N particles in the
composite potential. While this specific problem is considered unsolvable, this
project will explore various optimizations to estimate results and then
visualize the output. Both Slava and Robert wrote about the same problem, so it
must be interesting!
"Bayesian Wikipedia Bot" - Mark Perry
Bayesian analysis is a popular method used to analyze
the probability of whether a document is desired and is extremely common in spam
filtering. Wikipedia contains more than 1.7 million articles with a combined 130
million edits, each edit containing information on content changed or reverted.
The Bayesian Wikipedia Bot will train on a data set of these edits and then
apply the results to an incoming feed of Wikipedia edits in order to analyze new
edits and mark them as good or bad, assisting editors in fighting vandalism and
false information.
"Course
Source Code/Lecture Notes Search" - Danny Suyanto
Assist instructors in searching texts across the
desktop, network storage and course websites by creating a custom search engine
geared towards their needs and file-types.
"Multiple Sequence Alignment" - Michael Hoak
Use MapReduce to generate a phylogenetic three, or
evolution tree, from nonaligned DNA sequences. Optimize by only aligning
sequences that are determined to be closely related.
"Paraphrase Extraction on the Web" - Brian Ngo
In any language, there are a multitude of ways to
express the same idea. Lexically, this is called paraphrasing. Using MapReduce,
create a dictionary of paraphrases based on a web corpus (i.e., a Nutch crawl).
These results could be used in a multitude of ways, but would be particularly
useful in search engines for improving ranking and query-recognition.
"Cellarspot Data Import and Recommendation
System" - Alex Loddengaard
Cellarspot.com is a social network for wine lovers,
though it hasn't yet achieved a critical mass to support it's interface. In
order to make it easier for users to enter wine bottles, it can be seeded with
results that are important from a crawl or scrape of wine data on other sites.
These results could be used for auto-completion or search. Additionally, this
data could be used to generate a recommendation system around wine.
"Netflix Recommendation Challenge" - Andrew
Hitchcock
Explore new methods to improve Netflix's
recommendation system. Improving the system by 10% results in a million dollar
prize!
"Distributed Neural Network for Image
Classification" - Brian Steadman
Build a distributed neural network that can be used
for classifying images with tags. The neural network could be trained on tagged
Flickr images. Images would be preprocessed to detect edges and fed into the
network, then corrected with back-propagation. This model could eventually be
used to solve many other types of classification problems.
"Classifying Electrical Data from the Human
Brain" - Tim Wong
Attempt to identify and classify brain signals for use
in new interfaces for the impaired. Data would be provided by the Harborview
Medical Center.
"Probabilistic DNF Formula Simulator" - Brian Harris
Brian has already developed a probabilistic database interface called MistiQ, which computes probabilities by approximation via Monte-Carlo simulations over DNF formulas. This project would attempt to parallelize MistiQ to create a generalized distributed application for DNF formula simulation over multiple formulas.