Spring 2000
CSE 326: Program #2
due Friday, May 5th
B+ Tree Implementation
Implement a B+Tree ADT with three operations: find, insert,
and range-query.
- Insert will insert a data record into the B+Tree while maintaining proper B+Tree structure.
- Find will return a data record given a key.
- Range-query will take two arguments low and high, in the domain of the keys, and return a list of all data records with keys between low and high.
You will test your B+Tree implementation using a dataset provided with the project. The dataset will be on disk, but your B+Tree and the records to which its leaves point will be in memory. The tree will be initially empty. After inserting a large number of records, you will use the find and range-query operations to perform some simple selections of data records. You do not need to implement an elaborate user-interface to your B+Tree. Instead you will write a simple function (probably main) that will insert all the records, run the required selection operations, and output the results, all of which will be very simple once the B+Tree is in place.
The sample data is provided as a file in comma-separated-value (csv) format, and looks like this:
TITLE,FIRST NAME,LAST NAME,ID,EMPLOYER,OCCUPATION,DATE,
AGGREGATE_YTD,AMOUNT,CITY,STATE
Mrs.,John,Broussard,1086894,Info. Requested,Info. Requested,8/17/99,100,100,Abbeville,LA
Mr.,Dennis,Batteen,1037504,Self,Attorney,8/11/99,25,25,Aberdeen,SD
Hon.,William,Arnot,1015866,Info Requested,Info Requested,8/4/99,400,100,Abilene,TX
Mr.,Jeoffrey,Bodenhorst,1065745,Lebanon Apparel,Management,8/25/99,1000,1000,Abingdon,VA
Mr.,Norman,Audette,1019609,,Retired,8/24/99,20,20,Abington,MA
Details
Many of the implementation details will be left up
to you. You are permitted (in fact, encouraged) to use STL classes
like vector, but no tree-related classes. Your primary resource should
be the notes handed out in class which contain pseudo-code for the
B+Tree operations you are asked to implement.
- The B+Tree will be indexed using the 'ID' field. In the sample
data, this key is unique. (See the extensions below.)
- The order of the B+tree should be five: each internal node will
contain 4 keys and 5 pointers. This parameter of your tree should be
a #define constant.
- The leaf nodes of your B+Tree should contain 4 keys and 4 pointers to data
records. The number of pointers should be a #define constant.
- You will design a struct or class to contain a single record of
data, consistent with the provided sample data. Also, you will
probably want to write a simple function to display a single record.
- Your B+Tree class does not have to be templatized: it can be
designed specifically for the record structure you devise above.
- You will be provided with a simple class, FileParser, to assist
in the file I/O. This class will enable you to extract the fields of
the data file one-by-one without commas. The fields will be returned
sequentially until the end-of-file without regard for line breaks. It
will therefore be your responsibility to collect a bunch of fields and
form a data record, but the number of fields is fixed so this is easy.
- We have included a more elaborate Makefile than previously. It
is more complicated, but should be easier to use. It is
well-commented so it should be clear how to add sources. It also has
some handy additional targets like "make clean" which removes all
object files and executables.
- There are too many implementation details and design choices to
list them all here. Feel free to raise general issues on the class
list. If critical additional guidelines emerge, they will be added to
this project description.
Final Output
Once you have your B+Tree working, you should run the following
operations in main or some other function:
- Find and display the data record with key value 1026156.
- Find and display the data record with key value 1070741.
- Find and display all data records with key values between 1062444 and
1062488, inclusive.
- Find and display all data records with key values between 1055027 and
1055500, inclusive.
- Find and display all data records with key values between 1016000 and
1017000, inclusive.
Turn-in procedures
You will be required to turn-in a paper copy of your code, as well as
submit your source code electronically. Your code should include the
following block at the top of each source file:
- TITLE:
- AUTHOR:
- DATE:
- PURPOSE:
- ARGUMENTS:
and for each function or class:
- TITLE:
- PURPOSE:
- ARGUMENTS:
In addition, you must include a detailed explanation of the structures
they have designed in the comments at the top of your source files.
The paper submission will take place in class on Friday May 5th, and
the electronic submission should be completed BEFORE this time.
Please remember that your code must compile on the Linux instructional
machines, and you should submit a makefile along with your source
code.
Turn-in Instructions
Files
All the files for this project are included in this tar file:
project2.tar
Use "tar xvf project2.tar" unpack it.
Here are the individual files:
Source Files:
FileParser.cpp
FileParser.h
Makefile
Sample Data:
sample.csv
Extensions (optional)
- Modify your B+Tree to accomodate indexing by redundant keys (such as AMOUNT or STATE). Test it.
- Devise a convenient way to display the structure of your B+Tree. Implement it, at least for datasets of reasonable size.
Additional Notes
Handout about the dynamic_cast operator
Sample program using vector
Output of the sample program