CSE 326 Data Structures Project 2

CSE 326 Data Structures
Project 2
File Sharing

Due: Monday, 23 April 2001, 10pm

Objectives

Implement an AVL Tree
Compare the AVL Tree, unbalanced BST, and Sorted Vector implementations of the Dictionary ADT

Overview

Suppose we are a member of a group of music lovers. Each music lover has a machine on which they store their favorite mp3 files. Each machine is connected to the other music lover's machines over a network. Wouldn't it be great if there were some easy way to share our favorite songs with our fellow music lovers? And if we could download songs from other people's machines over the network? And if we didn't have to worry about copyright infringement? In this assignment we will simulate how this can be done (well, everything except the legal issues) over a peer-to-peer network (as opposed to a client-server style network).

Of course we will need some way to store our song databases, and for this we will be using the Dictionary ADT. A Dictionary is a collection of data that supports the operations Lookup, Insert and Delete. Furthermore, each data element stored in the Dictionary must have a unique key value that may be used as a parameter to the three Dictionary operations. Here are some examples: in an English Language Dictionary the data are the word definitions and the keys are the words themselves; in a phonebook the data are names, phone numbers and addresses (and occasional advertisements) and the keys are the people's names; any book may be thought of as a dictionary, with each page being the data element and the page number being the key. In this assignment our data will be songs and our keys will be unique song identification numbers.

More detail

Our network will be represented by a collection of nodes (each node representing some music lover's machine). Each node will keep a database of songs (a Dictionary), and will have a predetermined capacity (i.e. maximum number of songs that can be stored in the database). Each node will also have a preferred neighbor node. A node may only initiate network communication with its preferred neighbor, however it may respond to any node that contacts it. In our simulation, all network communication will be either requests or responses to requests. A request is initiated when a user at a given machine requests a song that doesn't already exist in that machine's database. That node will then request the song from its preferred neighbor. If the node that receives the request has the song in its database, it will respond by sending the requested song back over the network. Otherwise, it will in turn request the song from its preferred neighbor. If the song is not found within a specified number of network "hops" from the original machine then the request will time out, and NULL will be returned. If the song is eventually found, it will be copied into the database of every node along the request path (including the node that initiated the request). If any node becomes full then it deletes the oldest song (i.e. the song that's been in the database the longest) from its database before adding the new song. (I know what you're thinking--oldies are the best! But this was the easiest way to work the need for deletion into the assignment.)

At the beginning of the simulation the network configuration will be read from an input file (see the Input files section below) describing the number of nodes, each node's capacity, and each node's preferred neighbor. Then a separate input file will be read for each node, initializing that node with a list of songs for its database. Finally, a request file will be read, listing the requested songs and at which node each request originates. Once all the requests are processed, the program will display the total number of file transfers (see the Viewing the results section below) and terminate.

Teams

You may work in teams of two on this assignment. Please refer to the Teamwork section of the Programming Assignment Guidelines. We strongly encourage you to work in teams! Individuals will be required to complete the same amount of work.

What you must do

Your assignment has the following three parts:

Part 1: Implement an AVL Tree

The Dictionary ADT is by definition implementation independent. We will need to perform many Lookups and Inserts, as well as some Deletes, and we'd like to find an efficient implementation of the Dictionary ADT for this purpose. We will be examining and comparing three implementations: the SortedVectorDictionary, the BSTree, and the AVLTree. The first two are already implemented; your job is to implement the third.

Your AVLTree class must be derived from the provided BSTree class. You will probably also want to create your own AVLNode class derived from the BSTNode class, since the latter doesn't store height information required to keep the balance. You may not add any public methods to AVLTree beyond what BSTree has unless you clear it with me (Nic) beforehand. However, you may add as many private methods as you wish (within reason).

The text explains fairly well how AVL Trees work, and even provides code for some of the methods. I encourage you to refer to this code if you are stuck, but you will not be able to copy it directly because you must conform to the interface of the BSTree class. The text does not cover AVL deletion, however! It is up to you to come up with a deletion algorithm that runs in O(log N) time. Hint: AVL deletion may require O(log N) rotations (in contrast to AVL insert, which requires at most two single rotations).

IMPORTANT: You must name your AVL class "AVLTree" (not "AvlTree" or "AVL_Tree"). Your specification file must be named "AVLTree.h", and your implementation file must be named "AVLTree.cpp". If you do not follow these requirements then your code may not work properly with my test files.

Part 2: Modify the existing code

Currently, the Node class uses only a SortedVectorDictionary as its song database. Implement the command-line options -BST and -AVL to indicate which data structure the Node class should use. If neither of these flags are given then a SortedVectorDictionary should be used. If both of these flags are given the program should output an error message and exit.

Part 3: The README file

A large proportion of your grade will be based on the completeness and clarity of your README file. See the The README section below for a description of what your README file must include.

The project skeleton

As a starting point we've provided a few files:

Dictionary.h (do not modify): This file defines the Dictionary ADT. Dictionaries support Lookup, Insert and Delete.
SortedVectorDictionary.h (do not modify): These files implement the Dictionary ADT using SortedVector.
BSTree.h, BSTree.cpp (do not modify): These implement Binary Search Trees, and all the operations you would expect from them. In addition, pre- post- and in-order traversals are implemented. BSTree derives from Dictionary.
Node.h, Node.cpp: These files define the Node class. Nodes are somewhat modified from the last assignment. You will need to modify these files to complete part 2 of the assignment.
Song.h, Song.cpp (do not modify): These files define the Song class. Songs have unique SongID numbers and song names. For this assignment we won't actually be storing the mp3 data.
Vector.h, Vector.cpp, SortedVector.h, SortedVector.cpp (do not modify): Vectors are used to keep track of the Nodes in the network. SortedVectors are used in the implementation of the SortedVectorDictionary class. These files are improved slightly since project1, so be sure not to use your old copies!
fileshare.cpp: This is the main driver code for the program. You will need to modify the main function to complete part 2 of the assignment.
sort.cpp

All provided files can be found in the project directory:

/cse/courses/cse326/01sp/project2

Input files

There are three types of input files for this project:

network.txt

Some sample input files may be found in the same directory with the other provided files.

Viewing the results

When run in verbose mode, the fileshare program will display a message whenever any of the following happens:

a request is made
a song is located
a song is inserted into some node's database
a song is deleted from some node's database

In any case, once all of the requests have been handled, the total number of file transfers will be displayed (any time a song is transferred over the network as a response to a request it is counted as a file transfer).

Turnin

You are required to turn in your project2 directory, which must include at least the following files:

AVLTree.h, AVLTree.cpp
Your modified Node.h and Node.cpp
Your modified fileshare.cpp
Your Makefile
Your README file
Any additional files you used or changed (but you do not need to turn in sort.cpp)

Turn in your files electronically using turnin:

% cd ~/your326Directory/yourProject2Directory % make clean % cd ../ % turnin -c cse326 -p project2 yourProject2Directory

Note that you do not need to print out anything or hand in any paper copies! Only electronic submissions will be accepted.

The README

Your README file should document your submission (though it does not by any means replace comments in the code!). It should contain the following:

Your name(s) and the name of your team.

Approximately how long the project took you to complete.

A description of how you divided the work between you (omit this if you worked alone).

A brief statement of completeness: does your project meet the requirements? if not, in what ways does it fall short?

Acknowledgement of any assistance you received from anyone but the 326 staff or the Weiss book.

A list of the files used in your project with a brief (anywhere from one sentence to one paragraph) description of each file. You should list all the files, but you only need to describe the ones you created or changed.

A description of any extra credit you attempted (see the If you find this too easy... section below).

Also answer the following questions (justify your answers):

Experiment a little with sort. You may include the results of some of your tests to help you justify your answers to the following questions:
- What is the worst case time complexity of Insert for each of SortedVector, BSTree, and AVLTree?
- What is the worst case time complexity of Inserting N elements into an initially empty structure for the above three data structures?
- Based on your tests, which data structure was the fastest for insert? Does this depend on the number of items you inserted?
- What is the worst case time complexity of outputting all N elements in sorted order for each of the above three data structures?
- In practice, was this faster or slower than inserting N elements? Why? Does this depend on the number of items you inserted?
- Now modify sort.cpp to generate lists of sorted integers (first try increasing, then decreasing), and experiment some more. Do any of your answers to the above five questions change? If so, how are they different? Explain why these changes occurred based on the properties of each data structure.
What did you enjoy about this assignment? What did you hate? What's your favorite song?

Additional utilities

None for this assignment.

If you find this too easy...

For a small amount of extra credit, you may attempt one or more of the following. If you choose to do one of these, it is important that you first have your program working exactly according to the specifications, so that it will work with my test scripts. If you would like to try something not on this list, please email me about it first (bone@cs).

Currently each Node must have exactly one preferred neighbor. Modify both fileshare.cpp and the Node class to allow for an arbitrary number of neighbors. Modify the file format so that all currently valid files are still valid (except that you may require each node to be on a separate line). How does this change the "efficiency" of the system (e.g. the time required to service a request; the total number of file transfers)?
Modify the Song class to allow some actual data (such as mp3 data) and keep track of its size, so that not all songs will be the same size. Modify the input file format to allow for this somehow. Also modify the Node class so that a Node's capacity reflects the number of bytes it can store in its database rather than the number of songs. (In some cases a Node may need to delete several small songs from its database to make room for one big one.) Make note of anything interesting that you discover in your README.
Recursion is very useful, but it is often slower than iteration because it requires the extra overhead of pushing new function calls onto the call stack. Implement iterative (i.e. non-recursive) algorithms for your AVLTree in addition to the normal recursive algorithms. Modify sort.cpp to compare the efficiencies of your implementations (in this case, do turn in sort.cpp).
Other ideas...? (email Nic)

Approximate grading breakdown

30% - correct implementation of AVLTree class, not counting Delete
15% - correct implementation of AVLTree Delete
15% - correctly modified the code according to assignment specifications
40% - your README file

Acknowledgements

The BSTree class is modified from Weiss' BinarySearchTree class. The idea behind this assignment was developed by Alon Halevy, Maya Rodrig, and Nic Bone. Except as mentioned above, this document and all the supplied code were written by Nic Bone.