|
CSEP 521 - Projects - Autumn 2009
|
|
The project specifications:
The project includes a proposal and a written final
report. The purpose of the project is to
evaluate several alternative approaches to algorithmically solve an
applied problem.
It must include readings from literature.
It may include an implementation study. It may be done in small teams, with approval.
There are three components to the project.
- The one page project proposal is due in class on Monday, November 2, 2009.
- The final report is due Tuesday, December 8, 2009. Final reports must be uploaded to the Project Discussion Board by 11:30 pm, Tuesday, December 8, 2009.
- All students are required to read and comment on at least 3 other reports by 11:30 pm, Wednesday December 16, 2009. All comments should be respectful and add some value to the particular project report that is being commented on.
In your proposal you should describe in one page the applied problem
that you are interested in and list several papers (and other
resources you plan to use) that you will read in order to research
your applied problem. Discuss briefly what techniques you will use to
evaluate the algorithms you will find. If you are working in a team,
then one proposal from the team is adequate. In doing your proposal
you are welcome to use the web (e.g. ResearchIndex) or library
resources like IEEE Explore, ACM Digital Library, and INSPEC, to find the material you need. I will evaluate
each proposal, so it is a good idea to get the proposal to me early so
that I can approve it at the earliest possible date.
In your final 5 to 10 page
paper you should describe the alternative algorithms hopefully with
illustrative examples. You should describe your evaluation criteria.
You should then evaluate your algorithms accordingly. The audience for your
paper should be computer professionals with a bachelors degree or equivalent.
Your paper should contain proper scholarly citations, and informative title and
section headings. Group projects must be 10 pages.
Comments on other students project reports should be respectful and add value to the report. For example, a comment might include additional references that were not noted in the original report or it might relate the problem investigated to another similar problem. Comments need not be long, but they should should add new information.
Listed below are a number of applied algorithms problems. I will describe each
problem briefly to give you some inspiration. However, it is your job to
find the resources where algorithms are found to solve the problem. Naturally,
the TA or instructor can give some guidance to help get you started, but in the end
you are responsible for finding the papers and other resources. You are not
required to choose your project from this list, but if you choose something on
your own it would be wise to give me the proposal early. The problems listed
below are quite general. Part of your job will be to focus on just a few
papers that are interesting, accessible with your background, and include
the topic.
-
N-Body Simulation:
Astronomers are very interested in the evolution of the
universe. In the past few years they have developed techniques to simulate
large numbers of stars. They have investigated what happens when galaxies
collide. What makes this problem hard is that the naive approach to simulating
N bodies is to make N2 force computations to calculate the new
velocities and positions of the bodies. This is just too much computation.
There are ways to do much less computation yet achieve acceptable, low error,
simulations.
-
Program Analysis using Graphs:
Compiler writers are always trying to ways to generate better code.
Interestingly, they have found ways to represent code as graphs or graphs with
weights on the edges to indicate the frequency that one piece of code is
executed just after another one is. Using some graph algorithms the code
can be mapped to memory so that it has good instruction cache performance.
-
Tree Drawing:
There are many cases when we would like to automatically generate a good drawing
of a tree. For example, a big company does a reorganization, so the management
hierarchy must be redrawn. A company that prepares family trees would like
to generate them automatically. Drawing trees is just one example of graph
drawing.
-
Web Page Layout:
It is more and more popular to have a personal web page where you receive things
you are interested in. For example, I get my stock prices, local team sports
scores, and weather on my personal web page. It would be nice to have a good
layout for my web page. How can that be done automatically.
-
VLSI Layout:
Another layout problem that is even more complicated is laying out components
on chip automatically. In this case we have to consider not only where the
components will go but where the wires will go that connect them. This is
a huge problem area
but there are subareas that are in themselves quite interesting.
For example, let us suppose that the components are already
placed, then what is the
best way to route the wires.
-
Exact String Matching:
A common problem that comes up in text databases is, given a relatively short
query string, find all instances of it in a very long string. For example, the
classic unix utility "grep" uses exact string matching. If the length of
the query string is m and the length of the database string is n, then we would
like to answer the query in O(m+n) time. There are some techniques around that
even do better on real data. That is the time is sublinear!
-
Approximate String Matching:
Another common problem that comes up in some text databases and in DNA sequence
databases is that of approximate string matching. Here for each query string
we want to find all instances of very similar strings in a very long string.
A key problem is to make concrete the concept of "very similar".
Udi Manber invented a companion to grep called "agrep" that does approximate
string matching. In this case the measure of similarity between strings is
edit distance, how many insert and delete operations it takes
to transform one string into the other.
-
Algebraic Simplification:
In systems like Mathematica, Maple, and MatLab, there are functions that
simplify algebraic expressions, to make them shorter or more elegant.
For example, the expression (x2 + 3*x + 2)/(x+1) can be simplified
to (x+2). Simplification is certainly more complicated than factoring as
was done in this case. How do different simplification algorithms compare?
-
Intersection Detection:
A common problem in computer game design is given a set of objects, detect all
intersections quickly. Given, a missile and a target, each defined as a polygon
how do you detect an intersection between the two very quickly. This is
a classic problem in the field of computational geometry where there are many
other interesting problems to find. As a starter problem, can you come up
with a fast algorithm that will detect if a circle (given by its center
and radius) and a rectangle (given by its lower left and upper right corners)
intersect? Now do this for hundreds of such objects.
-
Surface Reconstruction from Polygons:
An MRI machine produces a sequence of parallel 2-dimensional slices.
Slices can be analyzed to find the boundary of an object like a bone. The
boundary
can be represented as a sequence of little line segments
called a polygon. The problem
is to construct a true 3-dimensional surface of the bone. There are several
techniques that do this by taking neighboring slices and constructing a surface
between them. This surface can be just a set of triangular patches.
-
Nearest Neighbor Search:
This is a classic problem that is seen in many different contexts. There is
a fixed database of points.
A query point is given and the job is to find the database point that is
nearest to the query point. Take as an example the FBI database of
fingerprints. A query fingerprint arrives and we want to find the nearest
fingerprint in the database. We probably want the top 20, but this is just
an example. Naturally, we need to make concrete what we mean be the distance
between two fingerprints.
-
Prime Testing:
For RSA public key encryption we need a way of finding large prime numbers.
There are several algorithms that do a good job of this. All of them use
randomness. A little over two years ago a deterministic polynomial time algorithm was discovered. It's time complexity appears to be too high to compete with the random algorithms.
-
Public Key Encryption:
There are several competing algorithms to provide public key encryption.
The first and most famous is RSA. There is a new method that uses elliptic
curves that shows a lot of promise.
-
Network Routing Algorithms:
There are several competing network routing algorithms. Network people
call them shortest path routing and distance vector routing, and
there may be others. Both are
in common use today. Why are both still in use? Isn't one better than the
other.
-
External Memory Sorting:
Most of us know about the classic "internal sorting algorithms" like quicksort,
mergesort and heapsort. What do you do if the data to sort does not fit in
memory. I was reminded of this problem when I visited AT&T Research Labs
several years ago. AT&T must sort a huge number of phone calls records
every day to generate our phone bills.
-
Web Page Authority: Many current search engines such as Google and
Clever use information about the link structure of the web in order to
compute the authority of a web page in the internet community. Simply
setting the authority of a web page to be the number of other pages
that link to it does not work well since it's easy to create a large
number of web pages which do nothing other than link to a given page
in order to artificially boost the authority of that page. The
current methods used to define the authority of web pages treat the
web as a graph and assign authority scores to web pages based on the
eigenvectors of the adjacency matrix of that graph.
-
Splay Trees: Splay trees are perhaps the most frequently used
non-trivial data structure in computer science. They are used in the
GCC compiler, Windows NT and the Malloc Library. Splay trees are
easier to implement than red black trees or other binary tree data
structures but the insertion operation for splay trees takes O(log n)
time in an amortized sense. Surprisingly there are still some
fundamental conjectures about splay trees that are believed to be true
but which have never been proven.
-
Error Correcting Codes: By blowing up the length of a message by a
certain amount, it is possible to reconstruct the message with high
probability even when some subset of the bits in the message are
corrupted. In general, we would like to blow up the length of the
message as little as possible and to be able to correct as many errors
as possible. Error correcting codes give a systematic way to achieve
this goal. These codes are used in a wide variety of real world
applications including networking, compact disks and disk drives.
-
SAT Solvers: In the past few years AI researchers have developed
a number of satisfiability solvers that can be used to solve planning or
general optimization problems. Considering SAT is an NP-complete problem,
how is that SAT Solvers seem to do so well in solving some problems.
-
Primal-Dual Schema: Recently there has been
a lot of success in approximately solving some NP-hard problems
using linear programming primal-dual schema. These algorithms are quite
simple, but their analysis relies on understanding the primal-dual nature
of linear programming. It would be interesting to compare these approaches
with other less mathematical techniques such as local search.
|
|
Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to ladner]
|