Project IV Option: Choose Your Own Data Structures Adventure

Project Option: Choose Your Own Data Structures Adventure

Overview

There are a variety of data structures which we did not write up in projects that are interesting alternatives to those we focussed on. These support slightly different operations; improve running time (asymptotically or practically); or emphasize simplicity of coding. Some of these are relatively obscure, difficult to code, or difficult to understand. Implementing and studying these gives you an opportunity to expand your repertoire of data structures and to acquire a deep understanding of their strengths (and weaknesses).

Assignment

For this assignment, select an ADT, four data structures (that implement that ADT), and a set of simple applications to test the data structures. The four data structures should include one that we have already implemented and three new ones from the list below. Implement the data structures, instrument them to measure their performance, and compare them on the applications.

Your implementations should be efficient, and you should be able to demonstrate that your data structures achieve their expected asymptotic bounds using your instrumentation for measurement (and hopefully by timing them as well; if not, explain why timing doesn't work). Moreover, you should be able to intelligently compare your implementations using your measurements (time is easy to compare; can you compare your instrumented measurements?).

For example, you might measure the number of nodes accessed in an AVL tree for various sizes and insertion sequences to check its performance. This is easily compared against the number of accesses for another type of search tree or the number of probes in a hash table (but how would the hash function fit in?).

You should have one "reasonable" application (something like word counting, for instance) and several dummy applications designed specifically to show off one data structure or another (e.g., counting frequencies of random numbers might show off hash tables). Gather results for running each data structure on each application and discuss the results.

You may be able to exceed our expectations on this project by implementing more data structures than we mention above; however, more likely to truly impress us would be solid, reusable, and clean implementations, and really interesting and profound test results and applications.

ADTs and Data Structures

The following are the ADTs and data structures that you may implement (we do not mention the ones already focussed on in a project, but one of those is your fourth data structure).

Please verify your choices with us before you start the project. We may recommend against some combinations (like Treap plus Red-Black plus top-down splay tree which does not really represent a wide variety). Most of these are covered in your book; we can supply you with further material and references on the rest.

Priority Queue ADT
- Leftist/skew heaps with increase/decreaseKey
- Binomial queue
- Fibonacci heap
- Pairing heap
- Any efficient dictionary supporting order (those listed below with a *)
Dictionary/Search ADT
- *B-Tree/2-3 Tree/2-3-4 Tree
- *Top-down splay tree
- *Red-black tree
- *Randomized skip list
- *Deterministic skip list
- *AA-Tree
- *Treap
- Quad tree/Oct tree (this is not in your book; ask us for a reference)
- k-D tree
- Trie/Digital search tree (this is not in your book; ask us for a reference)
- Hash table
- Extendible hashing

Application suggestions

Here are some possibly reasonable applications to use as your main app. Some of these are on the hard end of "reasonable", just so you know. Feel free to invent your own (simpler) application.

Many graph algorithms require priority queues (and some require decreaseKey and increaseKey or complex structures like Fibonacci Heaps). Also, there's an application called n-body simulation that uses priority queues. Finally, Huffman encoding, a compression technique, uses priority queues.

Up-trees as well as many graph algorithms assume a simple numbering of nodes which can be used to access values quickly, but in truth this requires a good dictionary. These might make good "reasonable" application for dictionaries!