Programming Assignment 2: Binary Search Trees

Due: midnight, Wednesday, February 25

The purpose of this assignment is to try out Binary Search Trees as index structures for database systems. To make it easy on all of us, we will use the articles from Project 1 as the records to be stored and retrieve them according to their keywords again. Only the index structure will change.

Instructions

Implement a standard Binary Search Tree (ie. not balanced) for the index structure to be used in storing and retrieving articles. The nodes of the tree should have fields for keyword and for record as in Project 1 and should also have the fields you need for the tree structure and anything else you want to keep track of in a node. You will read in the same file of articles that you did in Project 1 and this time store the keywords and lists of records in the tree structure.

After you have constructed the tree, you will do the following:

Traverse the tree in the appropriate order to output the keywords and their accompanying lists of articles in alphabetical order by keyword. This should be achieved with a recursive method that you write.
Output the following statistics for your tree:
- height
- maximum abs(balance factor) over all nodes
- average abs(balance factor) of all nonleaf nodes
Read the query files we provide (there will be several) and for each one compute and print the following retrieval statistics:
- minimum number of nodes accessed over all keywords in that query file
- maximum number of nodes accessed over all keywords in that query file
- average number of nodes accessed over all keywords in that query file
These query statistics can be accumulated as you do the retrievals. You do not need to print the keywords or articles in this part of the work.

Output

Your output will consist of:

the keywords with their articles in alphabetical order from traversing the tree

the statistics for the tree itself

the statistics for each of the query data sets we give you

Don'ts

Please read these carefully before starting with the assignment:

Doing more than what the assignment asks for is definitely good. However, you must not include these in the execution of the program by default, i.e., you can include some additional command line arguments to enable some additional features that your program has. This is particularly important because doing additional stuff by default may cause your program to fail tests when passed throught automatic testing.
Do not add additional features such as packages etc., to your java program.
You may have modified the program to read some standard input and query file to avoid trouble of typing the entire list again and again. However, you must remember to put off this feature before you turn in your final project.
Make sure that your program prints all output on to the standard output and not to some file that you have defined unless otherwise asked to do so. Having said that please turn off all your debugging output before the turnin.
Do not modify the keywords in the sample output file. For example, if the sample output file indicates "NO TITLES FOUND", your output must print it in the same way. Printing "keyword not present", "title absent", "no information", "title not found" etc would cause the automated tests to fail.
Having said all that, do not worry too much about these. But remember when you decide to do something that the assignment does not ask for you must think again!
Please turn in the source code of your program. Do not convert your program into a Microsoft Word document or like before turn in.

Data

Here are the query files to try query-1, query-2, query-3, query-4.
Here is a sample output file, sampleOutput.txt . Please note that this may not be a CORRECT file that your program generates. It only shows you how the output needs to be presented. Your output must be printed on SCREEN (using standard output) in the same format as above. Please do not modify the CAPITALIZED keywords in the sample file. Your output must be in EXACTLY the same format only the records and statistics values can be altered.
For your convenience, we have provided a script (Perl) which you can use to generate more query files (in case you need more). All it does it takes an input files and generates an output files by randomly selecting a few lines from the input file. You can run it under LINUX. Windows users may need perl installed on their system if they need to run the script. Please note that you are not required to use this script, it is being provided, to save you some effort in generating sample query files.

What and how to turn in

The turnin instructions are same as those for Project 1. Please refer to turnin instructions here.