CSE 326 Data Structures
Project 2
File Sharing

Due: Monday, 23 April 2001, 10pm

Objectives

Overview

Suppose we are a member of a group of music lovers. Each music lover has a machine on which they store their favorite mp3 files. Each machine is connected to the other music lover's machines over a network. Wouldn't it be great if there were some easy way to share our favorite songs with our fellow music lovers? And if we could download songs from other people's machines over the network? And if we didn't have to worry about copyright infringement? In this assignment we will simulate how this can be done (well, everything except the legal issues) over a peer-to-peer network (as opposed to a client-server style network).

Of course we will need some way to store our song databases, and for this we will be using the Dictionary ADT. A Dictionary is a collection of data that supports the operations Lookup, Insert and Delete. Furthermore, each data element stored in the Dictionary must have a unique key value that may be used as a parameter to the three Dictionary operations. Here are some examples: in an English Language Dictionary the data are the word definitions and the keys are the words themselves; in a phonebook the data are names, phone numbers and addresses (and occasional advertisements) and the keys are the people's names; any book may be thought of as a dictionary, with each page being the data element and the page number being the key. In this assignment our data will be songs and our keys will be unique song identification numbers.

More detail

Our network will be represented by a collection of nodes (each node representing some music lover's machine). Each node will keep a database of songs (a Dictionary), and will have a predetermined capacity (i.e. maximum number of songs that can be stored in the database). Each node will also have a preferred neighbor node. A node may only initiate network communication with its preferred neighbor, however it may respond to any node that contacts it. In our simulation, all network communication will be either requests or responses to requests. A request is initiated when a user at a given machine requests a song that doesn't already exist in that machine's database. That node will then request the song from its preferred neighbor. If the node that receives the request has the song in its database, it will respond by sending the requested song back over the network. Otherwise, it will in turn request the song from its preferred neighbor. If the song is not found within a specified number of network "hops" from the original machine then the request will time out, and NULL will be returned. If the song is eventually found, it will be copied into the database of every node along the request path (including the node that initiated the request). If any node becomes full then it deletes the oldest song (i.e. the song that's been in the database the longest) from its database before adding the new song. (I know what you're thinking--oldies are the best! But this was the easiest way to work the need for deletion into the assignment.)

At the beginning of the simulation the network configuration will be read from an input file (see the Input files section below) describing the number of nodes, each node's capacity, and each node's preferred neighbor. Then a separate input file will be read for each node, initializing that node with a list of songs for its database. Finally, a request file will be read, listing the requested songs and at which node each request originates. Once all the requests are processed, the program will display the total number of file transfers (see the Viewing the results section below) and terminate.

Teams

You may work in teams of two on this assignment. Please refer to the Teamwork section of the Programming Assignment Guidelines. We strongly encourage you to work in teams! Individuals will be required to complete the same amount of work.

What you must do

Your assignment has the following three parts:

Part 1: Implement an AVL Tree

The Dictionary ADT is by definition implementation independent. We will need to perform many Lookups and Inserts, as well as some Deletes, and we'd like to find an efficient implementation of the Dictionary ADT for this purpose. We will be examining and comparing three implementations: the SortedVectorDictionary, the BSTree, and the AVLTree. The first two are already implemented; your job is to implement the third.

Your AVLTree class must be derived from the provided BSTree class. You will probably also want to create your own AVLNode class derived from the BSTNode class, since the latter doesn't store height information required to keep the balance. You may not add any public methods to AVLTree beyond what BSTree has unless you clear it with me (Nic) beforehand. However, you may add as many private methods as you wish (within reason).

The text explains fairly well how AVL Trees work, and even provides code for some of the methods. I encourage you to refer to this code if you are stuck, but you will not be able to copy it directly because you must conform to the interface of the BSTree class. The text does not cover AVL deletion, however! It is up to you to come up with a deletion algorithm that runs in O(log N) time. Hint: AVL deletion may require O(log N) rotations (in contrast to AVL insert, which requires at most two single rotations).

IMPORTANT: You must name your AVL class "AVLTree" (not "AvlTree" or "AVL_Tree"). Your specification file must be named "AVLTree.h", and your implementation file must be named "AVLTree.cpp". If you do not follow these requirements then your code may not work properly with my test files.

Part 2: Modify the existing code

Currently, the Node class uses only a SortedVectorDictionary as its song database. Implement the command-line options -BST and -AVL to indicate which data structure the Node class should use. If neither of these flags are given then a SortedVectorDictionary should be used. If both of these flags are given the program should output an error message and exit.

Part 3: The README file

A large proportion of your grade will be based on the completeness and clarity of your README file. See the The README section below for a description of what your README file must include.

The project skeleton

As a starting point we've provided a few files:

Dictionary.h (do not modify)
This file defines the Dictionary ADT. Dictionaries support Lookup, Insert and Delete.
SortedVectorDictionary.h (do not modify)
These files implement the Dictionary ADT using SortedVector.
BSTree.h, BSTree.cpp (do not modify)
These implement Binary Search Trees, and all the operations you would expect from them. In addition, pre- post- and in-order traversals are implemented. BSTree derives from Dictionary.
Node.h, Node.cpp
These files define the Node class. Nodes are somewhat modified from the last assignment. You will need to modify these files to complete part 2 of the assignment.
Song.h, Song.cpp (do not modify)
These files define the Song class. Songs have unique SongID numbers and song names. For this assignment we won't actually be storing the mp3 data.
Vector.h, Vector.cpp, SortedVector.h, SortedVector.cpp (do not modify)
Vectors are used to keep track of the Nodes in the network. SortedVectors are used in the implementation of the SortedVectorDictionary class. These files are improved slightly since project1, so be sure not to use your old copies!
fileshare.cpp
This is the main driver code for the program. You will need to modify the main function to complete part 2 of the assignment.
sort.cpp
This program is independent from fileshare.cpp. sort generates a list of random integers and inserts them into an initially empty SortedVector, BSTree, and AVLTree. Then it uses the nice properties of these data structures to output the integers in sorted order. Each of these six steps (inserting into three data structures and sorting using three data structures) is timed and the results are displayed. You will need to modify this file to complete part 3 of the assignment (yes, the README part).
Makefile
The make command will build your program, as specified in Makefile. make sort will build the sort program. You may need to modify this file to include whatever new files you create.

All provided files can be found in the project directory:

/cse/courses/cse326/01sp/project2

Input files

There are three types of input files for this project:

network.txt
This file describes the network configuration. It consists of triples of integers, the first one being a node number (each node must have a unique number), the second being that node's capacity, and the third being the number of that node's "preferred neighbor" (see the More detail section above). Because of a small bug in my implementation (which isn't worth fixing at this point) the node numbers must be consecutive starting with zero: 0, 1, 2, etc....
songlistN.txt (where N is the node number)
There must be one of these files for each node. This file is a list of pairs. Each pair represents a song that is in the database of node N. The first element of each pair is an integer representing a unique song number and the second element of each pair is the name of that song. Song names must be enclosed in double quotes ("), and may not contain double quotes as part of the name.
requests.txt
This file is a list of requests, which will be processed one at a time during the execution of the program. A request is a pair of integers, the first one being the number of the node that is making the request, and the second being the number of the requested song.

Some sample input files may be found in the same directory with the other provided files.

Viewing the results

When run in verbose mode, the fileshare program will display a message whenever any of the following happens:

In any case, once all of the requests have been handled, the total number of file transfers will be displayed (any time a song is transferred over the network as a response to a request it is counted as a file transfer).

Turnin

You are required to turn in your project2 directory, which must include at least the following files:

Turn in your files electronically using turnin:

% cd ~/your326Directory/yourProject2Directory
% make clean
% cd ../
% turnin -c cse326 -p project2 yourProject2Directory

Note that you do not need to print out anything or hand in any paper copies! Only electronic submissions will be accepted.

The README

Your README file should document your submission (though it does not by any means replace comments in the code!). It should contain the following:

Also answer the following questions (justify your answers):

Additional utilities

None for this assignment.

If you find this too easy...

For a small amount of extra credit, you may attempt one or more of the following. If you choose to do one of these, it is important that you first have your program working exactly according to the specifications, so that it will work with my test scripts. If you would like to try something not on this list, please email me about it first (bone@cs).

Approximate grading breakdown

Acknowledgements

The BSTree class is modified from Weiss' BinarySearchTree class. The idea behind this assignment was developed by Alon Halevy, Maya Rodrig, and Nic Bone. Except as mentioned above, this document and all the supplied code were written by Nic Bone.