Autumn 2000

CSE 373: Programming Project #2
due Wednesday, November 1 (in class)
Search Trees as Database Indexes

Preliminary information

A relational database system stores its data in relational tables which are just sets of records. Each record is a structure having several different fields called attributes. For example, the records we will use have the following attributes:

StudentNumber   LastName   FirstName   Age   Class

Attributes can be unique or nonunique. In the example, StudentNumber is unique, meaning that no two students may have the same number. The other four attributes are nonunique, since several different students may have the same LastName, many may have a common FirstName, and both Age and Class (i.e. number of years in college) will have a lot of students sharing the same values.

To provide rapid access to data, database systems use indexes. Each index allows the database to be searched (efficiently) on a selected attribute called the key of that index. In our example, StudentNumber is the primary key, because it is unique, but indexes can also be built with other keys, unique or not, when fast access on those keys is desired.

Index structures may be composed of search trees, hash tables, and other data structures.
In this assignment you will use binary search trees for the indexes.
In the next assignment, which will be small, but will build on this one, you will use hash tables for indexes.
In real database systems, the most common index structures are B-Trees and hash tables that can handle data on disk efficiently.

Your job is to design a system that reads a relation from disk into memory and creates indexes for the specified attributes, both unique and nonunique. Because we are allowing nonunique keys, the nodes of the search trees must point to lists of all the records with the key values stored in these nodes. (This will make parts of your code reusable - so bear that in mind since we will get to hash tables.) You will be extending the structures and functions the book gives for binary search trees, rather than just copying the author's code. While you're at it, think about how you can improve them and make them more readable (it will be a definite plus for you).

Requirements

Store your database in an array of either structs or objects.
Construct two indexes for your database: the index to the primary key StudentNumber will be index 1, while the index to the nonunique key Age we will call index 2. The keys in both indexes are simply going to be integers. If you want to create indexes on other attributes as well, that's fine, but it is not a requirement.

Design and implement an ADT for extended binary search trees that can handle the following operations:

1. Insert(Record R, Index I)
Here R is a reference (pointer or index) to a record in the database array. The semantics of this operation is that it inserts Record R into Index I which is a primary key index if I = 1 and a nonunique key index otherwise.
For example, if R points to the record
6413578 Buckingham      Christopher     29 4
and I = 1, then 6413578 would be inserted into the StudentNumber index with a pointer to R. For the same record with I = 2, the key 29 would be inserted into the Age index with a pointer to R. Note that for primary keys, trying to insert a duplicate record (with a key K that is already in the primary key index) should result in the following error message:
"INSERT ** DUPLICATE RECORD WITH KEY K".

2. Find(KeyType K, Index I)
This operation should find all data records in Index I with a keyvalue K and then print them (on the screen). For primary keys, at most one record will be found. For nonunique keys, many records (or none) may be found. If none are found, the following message should be printed:
"FIND ** NO RECORDS FOUND FOR KEY K".

3. FindRange(Keytype Low, Keytype High, Index I)
This operation should find all data records in index I with a key value in the range between Low and High, inclusively, and then print them (on the screen). If no records are found, the following message should be printed:
"FINDRANGE ** NO RECORDS FOUND BETWEEN Low AND High".

4. DeletePrimary(KeyType K)
This operation should delete the record with primary key K from Index 1 - the primary key index.
You should make your delete more efficient than the one shown in the book, and do that without using recursion.

5. DeleteSecondary(KeyType Primekey, KeyType K, Index I)
This operation should delete the reference to the record whose primary key is Primekey from Index I, which is not the primary key index. This record can be found in Index I by searching for the secondary key K. If it is the only record with secondary key K, then that key should be deleted from the secondary index, too. If there are other records with secondary key K, then only the list element pointing to the record with primary key Primekey should be deleted.

You will also need a function (or method) to delete a record from the whole database. It should use the routines DeletePrimary and DeleteSecondary, explained above.

6. Delete(KeyType K)
This operation should delete the record with specified primary key from the entire database.
This entails:
1) finding the record using Index 1;
2) removing any reference to this record from your nonprimary index(es);
3) removing primary key K from Index 1;
4) marking the record deleted in the database array.

For example, Delete(6413578) should invoke the following sequence of operations:

- Find 6413578 in Index 1 and get the pointer to its database record.
- Access the database record and find the value of the Age attribute.
- Use the value of the Age attribute to find and remove this record from Index 2, the Age index.
- Delete primary key 6413578 from Index 1, the StudentNumber index.
- Mark the record with 6413578 deleted in the database array.

If there is no record with primary key K in the database, the message
"DELETE ** RECORD NOT FOUND WITH KEY K"
should be printed out.

User Interface

Initially, you will read in all records of the database from a file, store the records in your database array, and create the two indexes for StudentNumber and Age. This process will test Insert.

Once the database and indexes are set up, the user will want to issue commands that either modify the database, or cause a set of records to be printed. You should be able to handle the following user commands:

FI keyname keyvalue (calls Find)

FR keyname lowvalue highvalue (calls FindRange)

DE keyvalue (calls Delete to delete the record whose primary key is specified from the database, including explicit deletions from all indexes and lazy deletions (ie. marking) from the database array)

For the purpose of these user commands, the keyname will be a single character, the first character of the full key name, ie. 'S' for StudentNumber, 'A' for Age. We will provide a set of test commands and the database created from records you have provided us with.

Sample Commands

FI S 1342573
FR S 3467981 3467999
FI A 23
FR A 18 25
DE 1342573

Comments

As was the case with Programming Project #1, your code should include the following block at the top of each source file:

TITLE:
AUTHOR:
DATE:
PURPOSE:
ARGUMENTS:
COMPILER/SYSTEM:

and for each function or class method:

TITLE:
PURPOSE:
ARGUMENTS:

As always, be sure to start early and come to us with questions before the last 24 hours leading to the due date.

The Database

Download the text file, containing the database records students in this class have submitted.

The Test Commands

Download the text file, containing sample queries that should be performed (in the given order) on your database, once you have implemented all necessary functionality, as discussed above.

Turn-in Procedure

Turn in a hardcopy of your program and its output in class on Wednesday, November 1.
Make sure to write your e-mail address on your hardcopy submission.
The output should show all tests that you ran and their results, including all the user commands we gave you and anything you added (such as putting a record with duplicate StudentNumber into the data, so your program can reject it).

There will be NO ELECTRONIC TURN-IN.
However, if TAs have questions about your programs, they may ask you to e-mail them your code, so have it ready just in case.

Good luck!