Visualizations of large databases are especially useful for seeing the data in context. But, such visualizations rely on the human analyst to visually identify relationships and correlations in the data. On the other hand, data mining techniques are designed to automatically identify correlations in the data, but cannot present the data in context. There is great interest in combining data mining with visualization. A promising approach is to use data mining techniques to steer the user towards the appropriate visualizations. The goal of this project is to develop interfaces for database visualization that use data mining techniques to help users find the relationships in the data that are most relevant to their problems.
Cartograms are maps that scale the area of a region to reflect some other data (e.g. population). As noted by the cartogram central website, cartographers have developed many different types of cartograms. We are beginning to see algorithms capable of generating some types of cartograms. Many of the algorithms use optimization techniques to design a cartogram that maintains a particular set of constraints. One project in this area is to develop a new algorithm for creating cartograms. The project could focus on identifying a particular set of constraints that are important for creating a particular type of cartogram and then implementing the constraints using standard optimization techniques. For example, you might develop an algorithm for producing Dorling cartograms.
Companies such as A9 and Google as well as a number of research groups are starting to collect digitized images of city streets. Typically the imagery shows houses and storefronts along the streets. Michael Koller's seamless city is a collection of such imagery for some streets in San Francisco. Marc Levoy's group has also collected a nice set of such images. These images have the potential to significantly enhance the usefulness of maps - particularly route maps depicting driving directions. In fact it is common practice in some hand-designed map to show a plan view (top-down) of the roads with frontal views of the buildings along the roads. This project would be aimed at developing an algorithm for automatically laying out the city-scan imagery in the form of a map.
We have read and discussed a number of papers on perception of graphs and charts (Cleveland's The Elements of Graphing Data describes summarizes the most comprehensive studies on this topic.) Use as much of the data space as possible to depict data, and clearly show scale breaks are two well known principles for improving perceptual effectiveness. The goal of this project is to develop a quantitative metric for the effectiveness of a given graph or chart. Given a graph or chart (either as an XML specification, or if you want a bigger challenge, a bitmap) compute how well it conforms to the perceptual principles outlined by Cleveland and others.
Graphs and charts are commonly found in print publications today. Many of these diagrams are poorly designed and are aesthetically ugly. Develop a system for taking bitmap representations of graphs and charts and re-styling them to improve their aesthetic design. One approach might be to automatically convert a bitmap into a higher level semantic representation and then develop techniques for rendering from this higher level representation. Another approach might be to use ideas like image analogies to directly transfer the style from one kind of graph or chart to another.
Cutaways illustrations are commonly used to expose the internal structure of 3D objects. Thousands of such illustrations exist in print. One drawback of such illustrations is that they are static and are often cluttered with information as a result. We have recently seen a system for converting static exploded view diagrams into interactive diagrams. Develop a system for similarly converting static cutaway drawings into interactive cutaways in which the user can control where the cutaway is created. The key challenge in this project is to reconstruct outer shell of the parts that have been cut away in the original drawing. One approach might be to use inpainting and texture synthesis techniques to reconstruct the outer shell. However, completely automated inpainting and texture synthesis are unlikely to adequately capture the structure of the outer surface. So part of this project is to develop semi-automated techniques for controlling inpainting and texture synthesis to produce the desired result.
When using colors to depict nominal (categorical) data, it is essential to pick colors that are perceptually discriminable. Moreover, it should be easy for users to name the colors so that they can talk about the data in terms of color categories. An effective set of colors will simultaneously optimize both of these constraints. Develop an algorithm for choosing a set of colors subject to these constraints. Color could be constrained in other ways as well. For example, their might be a constraint to enforce color harmony in the chosen set. Or perhaps the colors should conform to natural color palettes. Generalize your algorithm to include constraints like these.
Flickr and del.icio.us are popular sites for creating shared taxonomies. Flickr is designed for annotating digital photographs with metadata tags, while del.icio.us is designed for similarly annotating WWW bookmarks. This project is aimed at visualizing such taxonomies to understand the relationships between different tags. Flickr already has a nice visualization showing the 150 most popular tags, where they use size to indicate popularity. How would you extend this approach to show how popularity changes over time? Because users can choose any tag word as an annotation, different tag words with synonymous meanings are sometimes chosen by different users. For example, some people might use the word "car" while others might use "automobile" to annotate the same picture. How would you produce a visualization to identify such synonyms.
Alan Borning pointed out that one of the challenges for the UrbanSim project is to create visualizations that depict the uncertainty in the simulated results. Develop a visualization that makes it clear that the simulation results are uncertain and only show one possible outcome.
Bertin is famous for his idea of reorderable matrices. If data is a function of two nominal variables, then the rows and columns can be permuted. He advocates permuting the data until the data is clustered and patterns emerge. Develop a tool that does this automatically or semi-automatically.
Nomograms or nomographs are devices for graphically calculating functions of multiple variables. The devices are especially useful because they visually show how small perturbations in functional values will affect the calculations. The goal of this project would be to build interactive nomograms. Users would input a multi-parameter function and the system would generate the appropriate nomogram as well as an interface for performing computations with the nomogram.
As micro-arrays are used more and more extensively in biology for gene expression analysis (see NYT), the lack of good visualizations are impeding progress. One of the main difficulty is the identification of patterns in very large tables of results. Neither commercial offerings (see GeneMaths) nor research projects (see Jinwook Seo) to have struck a good balance between functionality and scalability. Develop a tool that can help biology during gene expression analysis.
Magnetic resonance imagery (MRI) is still a rapidly improving imaging modality. New acquisition protocols are continually being developed (such as acquiring diffusion tensors) and and the accuracy and resolution of data is being improved. Functional MRI, by adding the time dimension is making analysis of the data even more difficult. Develop some new ways of looking at new types of MRI data sets.
One way to map "the Internet" is to consider the structure of the backbone router interconnections. Bill Cheswick has been keeping archives of the daily changes in the roughly 100,000 core reachable routers for over three years. Even the static dataset from a single day is a difficult challenge to show comprehensibly, and showing growth and changes over time is an even more interesting problem. The H3 browser for large graphs is a potential resource.