Programming Assignment 2: Binary Search Trees
Due: midnight, Wednesday, February 25
The purpose of this assignment is to try out Binary Search Trees
as index structures for database systems. To make it easy on all of us, we will
use the articles from Project 1 as the records to be stored and
retrieve them according to their keywords again. Only the index structure
will change.
Instructions
Implement a standard Binary Search Tree (ie. not balanced) for the index structure
to be used in storing and retrieving articles. The nodes of the tree should have
fields for keyword and for record as in Project 1 and should also
have the fields you need for the tree structure and anything else you want to keep
track of in a node. You will read in the same file of articles that you did in Project
1 and this time store the keywords and lists of records in the tree structure.
After you have constructed the tree, you will do the following:
- Traverse the tree in the appropriate order to output the keywords and their
accompanying lists of articles in alphabetical order by keyword. This should be
achieved with a recursive method that you write.
- Output the following statistics for your tree:
- height
- maximum abs(balance factor) over all nodes
- average abs(balance factor) of all nonleaf nodes
- Read the query files we provide (there will be several) and for each one
compute and print the following retrieval statistics:
- minimum number of nodes accessed over all keywords in that query file
- maximum number of nodes accessed over all keywords in that query file
- average number of nodes accessed over all keywords in that query file
These query statistics can be accumulated as you do the retrievals.
You do not need to print the keywords or articles in this part of the work.
Output
Your output will consist of:
the keywords with their articles in alphabetical order from traversing the tree
the statistics for the tree itself
the statistics for each of the query data sets we give you
Don'ts
Please read these carefully before starting with the assignment:
- Doing more than what the assignment asks for is definitely good.
However, you must not include these in the execution of the program
by default, i.e., you can include some additional command line arguments
to enable some additional features that your program has. This is
particularly important because doing additional stuff by default
may cause your program to fail tests when passed throught automatic testing.
- Do not add additional features such as packages etc., to your java
program.
- You may have modified the program to read some standard input and
query file to avoid trouble of typing the entire list again and again.
However, you must remember to put off this feature before
you turn in your final project.
- Make sure that your program prints all output on to the standard
output and not to some file that you have defined unless otherwise asked
to do so. Having said that please turn off all your debugging output
before the turnin.
- Do not modify the keywords in the sample output file. For example,
if the sample output file indicates "NO TITLES FOUND", your output
must print it in the same way. Printing "keyword not present", "title absent",
"no information", "title not found" etc would cause the automated tests to
fail.
- Having said all that, do not worry too much about these. But remember
when you decide to do something that the assignment does not ask
for you must think again!
- Please turn in the source code of your program. Do not convert
your program into a Microsoft Word document or like before turn in.
Data
Here are the query files to try query-1,
query-2, query-3, query-4.
Here is a sample output file, sampleOutput.txt . Please note that this may not be a CORRECT file that
your program generates. It only shows you how the output
needs to be presented. Your output must be printed on SCREEN (using
standard output) in the same format as above. Please do not
modify the CAPITALIZED keywords in the sample file. Your
output must be in EXACTLY the same format only the records and
statistics values can be altered.
For your convenience, we have provided a script (Perl) which you can use to generate more query
files (in case you need more). All it does it takes an input files
and generates an output files by randomly selecting a few lines
from the input file. You can run it under LINUX. Windows users may
need perl installed on their system if they need to run the script.
Please note that you are not required to use this script, it is
being provided, to save you some effort in generating sample
query files.
What and how to turn in
The turnin instructions are same as those for Project 1.
Please refer to turnin instructions here.