CSE 373 Data Structures 09sp, Homework 3

Electronic Turn-in due: 11:45pm Thursday, 5/07/09.

For this assignment, (if you wish) you may work with one partner for the programming portion of the assignment. Although it is possible to split the implementation into two pieces, you may be interested in trying pair programming. Pair programming is a technique that is a part of Extreme Programming approaches used in some software companies. Here are a few references on pair programming: [as part of Extreme programming (XP)] [pair programming in Computer Science courses]. Note: You are not required to work in pairs – working individually is fine.

If you plan on working pairs: one partner MUST send email containing the names and email addresses of both partners to Sean by 11:45pm Monday May 4^th at the latest.

Goals:

This project is meant to give you more Java programming experience. In particular, you will practice:

· Reading code & Modifying code

· Using Eclipse (or whatever environment you choose) to work with multiple Java files, command line parameters, reading input files.

· Using Java Interfaces, Exceptions, and Generics

You will get to review and explore the types of Dictionaries we have discussed in lecture:

· Binary Search Trees

· AVL Trees

· Separate Chaining Hash Tables

· Linear Probing Hash Tables

Context:

In lecture, we discussed various uses for the Dictionary ADT. One useful application is to allow users to quickly look up items by their UPC code. This application could be running on a device connected to bar code scanner or behind a web application allowing users to type UPC codes into a web page. Such applications could be used to help consumers identify recalled products, or give them information about nutritional content or allergens. UPC codes themselves are rather interesting from a data representation standpoint. Read more here. A large database of UPC codes for all kinds of things (food, CDs, coffee cups, iPods) can be found here.

Programming:

You are given a Dictionary Interface. You should provide the following implementations:

· Binary Search Trees (BinarySearchTree.java)

· AVL Trees (AvlTree.java)

· Separate Chaining Hash Tables (SeparateChainingHashTable.java)

· Linear Probing Hash Tables (LinearProbingHashTable.java)

In addition, you need to write a simple driver program that will read in the provided data file, and test out your implementations of the interface. For simplicity, you can just read the UPC code (the number that appears before the first comma) into one field (use the UPC code as the “key”) and then read the remaining info into a single string (the “value” associated with the key). You may assume that there will be no duplicate keys in data sets we test your code with. (We will be using our own driver program to test your code so you should be sure that you implement the provided interface and use the filenames listed above for your implementations.) Here is another sample csv file. Although the driver program you write should be able to read in files like this and insert the values into the four data structures, for the purposes of non-extra credit and for timing and statistics checking you really only need to be storing the key – the numeric values that appear as the first item on each line of the file before the first comma. You can generate some more random data files here.

Before you panic too much, note that our textbook has actually provided implementations for almost all of these: BinarySearchTree.java, AvlTree.java, SeparateChainingHashTable.java, and QuadraticProbingHashTable.java Note that you will need to modify the quadratic probing code to be linear probing code. (A link to all code from the textbook can be found here. The code from the book refers to an exception, here is a very basic class you can include to make the code from the book compile.) Your real work is to study the code from the textbook and make a few modifications.

Although many of the methods listed in the Dictionary interface are already implemented for you in the code from the textbook, they are not all there. None of the implementations from the book currently implement the returnStats(int statType) method which returns various statistics about the implementation in question. You will need to modify the code from the book to collect these statistics and to return them when returnStats is called. Details are provided in the Dictionary Interface file. Your implementations should all grow automatically as needed.

Write-up Questions:

This week your write-up involves a comparison of your four dictionary implementations. I would expect these reports to be a couple of pages long, potentially longer if you include many graphs or tables. Exceptional reports may be awarded a small amount of extra credit, so have fun with this! Please prepare a report that addresses the following:

Big-O Analysis: What is the worst case big-O running time of isEmpty, insert, and contains operations on all of your dictionary implementations?
Statistics & Timing comparison: Perform several experiments on various data sets and data set sizes to examine the performance of all of your Dictionary implementations. You should keep track of both statistics and timings for these experiments. The timing will be similar to what you did in homework #2, where you timed pieces of code. Be sure to do this *for several values of N* - feel free to generate (and share) other data files. It is up to you to write and to determine what this client code should be. Just be sure that it exercises your Dictionary “in a reasonable manner” – reasonable here means doing some sequence of operations – insertions, contains. You are not required to explicitly measure individual calls to insert or contains but feel free to do so if interested. Similar to homework #2, graphing your results is recommended, but a table of results will work also.
Comparison: Compare what you see in your experiments, to what you expected to see based on a big-O analysis. (This is similar to what you should have done in homework #2 as well.) In your discussion you should answer these questions:

How useful was big-O for predicting the measured run time or statistics for insert and contains for your implementations?
If your predictions based on Big-O differed substantially from your measured times and statistics, speculate as to why this occurred.
Which of your implementations would you recommend to someone who needs to use a Dictionary? Why is that one preferred? Are there any conditions under which you might suggest using your other implementations?

Testing: Please briefly discuss how you went about testing your dictionary implementations. Feel free to refer to your submitted test files.
Above & Beyond: Please briefly describe any ways that your assignment goes above and beyond the requirements. (see below for ideas)

What to turn in:

Electronic Turn-in should include the following files:

1. Any Java files needed for your Dictionary implementations, at least: BinarySearchTree.java, AvlTree.java, SeparateChainingHashTable.java, LinearProbingHashTable.java)

Java files you used to test your Dictionary implementations. (These could be main methods within the files listed above.)
Java client (Driver) code (code that uses your Dictionary implementations) that you used in your statistics & timing experiments.
Electronic version of your project report in pdf, MS word, or text format (please check with us first if you need to submit in another format).
If you do extra credit, create a separate code bundle that includes all files needed to run and test your code. If you have added functionality, please add it to your Dictionary interface file as well.

If you work with a partner, 1) only one partner needs to submit the code and writeup electronically, be sure that both partners’ names are listed in all files. 2) Include a text file (submit electronically and print out) that describes how you developed and tested your code. If you divided it into pieces, describe that. If you worked on everything together, describe the actual process used – how long you talked about what, how long and in what order you wrote and tested the code. Be sure to describe at least one good thing and one bad thing about the process of working together.

Above and Beyond:

There are many directions you can go for extra credit on this assignment. Here are a few ideas and here is an interface that may help, please add the prototypes for any methods you implement.

· (2 pts) Implement a find method to return actual values associated with a key (currently the interface only has a contains method that returns a Boolean result).

· (2 pts) Implement a findmin or findmax method to return actual values associated with a key (currently the interface only has a contains method that returns a Boolean result)

· (2 pts) Implement a remove method to remove a value from the Dictionary. (Lazy deletion will only result in one point, actual deletion for AVL will be two.)

· (1-2 pts each) Implement more dictionaries! This could include very simple ones (an array or linked list), others from the book, more hash tables, or putting wrappers around Java collections. Whatever you do, the new implementations should implement the Dictionary interface including the getStatistics method – be sure to document what each parameter returns, and you should run several tests and report the statistics you find.

· (2 pts) Implement two more hash functions – one you expect to behave poorly and one you expect to behave better than your current one. Measure them and compare statistics.

I’m open to other ideas as well, send me an email or stop by to get “approval”.