Backup tapes University of Washington Department of Computer Science & Engineering
 Sample projects
  CSE Home   About Us    Search    Contact Info 

Project proposals are due Monday, 4/22. Each proposal must be at most one page, and include:

Below are some projects ideas, but anything goes and you are welcome to come up with your own problem. Some of the problems below refer to a Peer Data Management project, called Piazza, while others refer to the XML Toolkit project. First a brief description of these two projects, then the project ideas.

A. Peer Data Management. Background: What can databases do for peer-to-peer? Grant proposal

A dynamic community of users wishes to exchange and share data. Think of Napster for sharing data in databases instead of sharing songs. Assume for the moment that all peers have relational data to exchange (the alternative is XML data). Some characteristics of such a system:

B. The XML Toolkit. Background: VLDB Submission XMLTK

The toolkit defines an API for highly scalable processing of XPath exrpessions on an XML data stream. Currently it evaluates up to 1,000,000 linear XPath expressions on XML data at 5.6 MB/s. The API can be used in any applications that require efficient access to streaming XML data. In the toolkit we have used it to build some simple, but highly scalable, Unix commands for processing XML data: xsort, xtail, xagg(regate).


  1. System catalog for Piazza. Define and build a system catalog for a peer data management system. What will be stored in the system catalog ? (users ? relation names ? attribute names ?) What functionality does the catalog support (what queries) ? Where do we store the catalog given that there is no central trusted servers? (Maybe we just need to assume one.)
  2. Query optimization in Piazza. Background: /projects/db/zives/mediation/vldb/vldb.pdf. Define a specific (and simple) schema integration formalism; design an algorithm for answering a query given a set of schema definitions; optimize it, test it.
  3. Update propagation in Piazza. Background: /projects/db/p2p/updategrams/436.pdf
  4. Encrypition, security in Piazza and the XML Toolkit (this is at the junction between a peer data management project and a project on the XML toolkit). The problem here is to take XML data and encrypt it such as to enforce certain access control policies. Background: Cryptographically Enforced Conditional Access for XML. For example, suppose you have a relation Grades(cid, courseName, sid, studentName, grade). You want to make this available to the peer community, but don't want to allow everyone to access the students' grades: only users who know the Student ID (sid) AND the student's name can access their grades (the assumption is that such a user can only be the student herself, hence it's OK for her to see the grades). Such access control policies can be enforced by encrypting data in a certain way. The problem here is to built tools that encrypt XML data according to certain encryption policies, and other tools that decrypt data that is accessed according to those policies.
  5. Define events in the XML toolkit. Currently the toolkit does complex XML computations at very high speed: it computes up to 1,000,000 linear XPath expressions over XML data at 5.6 MB/s. Background: VLDB Submission. But it doesn't handle branching XPath expressions or more complex events. The project here is to extend those algorithms to handle more complex events.
  6. Xpath containment/equivalence algorithm. Implement an algorithm that checks efficiently whether two XPath expressions are equivalent. Background: Containment and Equivalence for an XPath Fragment

CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to pmork]