CSE 373 Data Structures 09sp, Homework 3
Electronic Turn-in due: 11:45pm
Thursday, 5/07/09.
For
this assignment, (if you wish) you may work with one partner for the
programming portion of the assignment.
Although it is possible to split the implementation into two pieces, you
may be interested in trying pair programming. Pair programming is a technique that is a
part of Extreme Programming approaches used in some software companies. Here are a few references on pair
programming: [as part of Extreme
programming (XP)] [pair programming in
Computer Science courses]. Note: You
are not required to work in pairs – working individually is fine.
If you plan on working pairs: one
partner MUST send email containing the names and email addresses of both
partners to Sean
by 11:45pm Monday May 4th at
the latest.
Goals:
This project is meant to
give you more Java programming experience.
In particular, you will practice:
·
Reading code & Modifying code
·
Using Eclipse (or whatever environment you choose) to work
with multiple Java files, command line parameters, reading input files.
·
Using Java Interfaces, Exceptions, and Generics
You will get to review
and explore the types of Dictionaries we have discussed in lecture:
·
Binary Search Trees
·
AVL Trees
·
Separate Chaining Hash Tables
·
Linear Probing Hash Tables
Context:
In lecture, we discussed
various uses for the Dictionary ADT. One
useful application is to allow users to
quickly look up items by their UPC code.
This application could be running on a device connected to bar code
scanner or behind a web application allowing users to type UPC codes into a web
page. Such applications could be used to
help
consumers identify recalled products, or give them information about nutritional
content or allergens. UPC codes
themselves are rather interesting from a data representation standpoint. Read more here. A large database of UPC codes for all kinds
of things (food, CDs, coffee cups,
iPods) can be found here.
Programming:
You are given a Dictionary
Interface. You should provide the
following implementations:
·
Binary Search Trees (BinarySearchTree.java)
·
AVL Trees (AvlTree.java)
·
Separate Chaining Hash Tables (SeparateChainingHashTable.java)
·
Linear Probing Hash Tables (LinearProbingHashTable.java)
In addition, you need to write a simple driver program that
will read in the provided data file, and test out
your implementations of the interface.
For simplicity, you can just read the UPC code (the number that appears
before the first comma) into one field (use the UPC code as the
“key”) and then read the remaining info into a single string (the
“value” associated with the key).
You may assume that there will be no duplicate keys in data sets we test
your code with. (We will be using our
own driver program to test your code so you should be sure that you implement
the provided interface and use the filenames listed above for your
implementations.) Here is another sample csv file.
Although the driver program you write should be able to read in files
like this and insert the values into the four data structures, for the purposes
of non-extra credit and for timing and statistics checking you really only need
to be storing the key – the numeric values that appear as the first item
on each line of the file before the first comma. You can generate some more random data files here.
Before you panic too much, note that our textbook has
actually provided implementations for almost all of these: BinarySearchTree.java,
AvlTree.java,
SeparateChainingHashTable.java,
and QuadraticProbingHashTable.java Note that you will need to modify the
quadratic probing code to be linear probing code. (A link to
all code from the textbook can be found here. The code from the book refers to an
exception, here is a very basic class you
can include to make the code from the book compile.) Your real work is to study the code from the
textbook and make a few modifications.
Although many of the methods listed in the Dictionary
interface are already implemented for you in the code from the textbook, they are
not all there. None of the
implementations from the book currently implement the returnStats(int statType) method which returns
various statistics about the implementation in question. You will need to modify the code from the
book to collect these statistics and to return them when returnStats is
called. Details are provided in the Dictionary Interface file. Your implementations should all grow
automatically as needed.
Write-up Questions:
This
week your write-up involves a comparison of your four dictionary
implementations. I would expect these
reports to be a couple of pages long, potentially longer if you include many
graphs or tables. Exceptional reports
may be awarded a small amount of extra credit, so have fun with this! Please prepare a report that addresses the
following:
- Big-O Analysis: What is the worst case big-O running time of
isEmpty, insert, and contains operations on all of your dictionary
implementations?
- Statistics & Timing comparison: Perform several
experiments on various data sets and data set sizes to examine the
performance of all of your Dictionary implementations. You should keep
track of both statistics and timings for these experiments. The timing will be similar to what you
did in homework #2, where you timed pieces of
code. Be sure to do this *for
several values of N* - feel free to generate (and share) other data
files. It is up to you to write and
to determine what this client code should be. Just be sure that it exercises your
Dictionary “in a reasonable manner” – reasonable here
means doing some sequence of operations – insertions, contains. You are not required to explicitly
measure individual calls to insert or contains but feel free to do so if
interested. Similar to homework #2,
graphing your results is recommended, but a table of results will work
also.
- Comparison: Compare what you see in your experiments, to what
you expected to see based on a big-O analysis. (This is similar to what you should have
done in homework #2 as well.) In
your discussion you should answer these questions:
- How useful was big-O
for predicting the measured run time or statistics for insert and
contains for your implementations?
- If your predictions
based on Big-O differed substantially from your measured times and
statistics, speculate as to why this occurred.
- Which of your
implementations would you recommend to someone who needs to use a
Dictionary? Why is that one
preferred? Are there any
conditions under which you might suggest using your other
implementations?
- Testing: Please briefly discuss how you went about testing
your dictionary implementations.
Feel free to refer to your submitted test files.
- Above & Beyond:
Please briefly describe any ways that your assignment goes above
and beyond the requirements. (see below for ideas)
What to turn in:
Electronic
Turn-in should include the following files:
1.
Any Java files needed for your Dictionary implementations,
at least: BinarySearchTree.java, AvlTree.java, SeparateChainingHashTable.java, LinearProbingHashTable.java)
- Java files you used to test
your Dictionary implementations.
(These could be main methods within the files listed above.)
- Java client (Driver) code
(code that uses your Dictionary implementations) that you used in your
statistics & timing experiments.
- Electronic version of your
project report in pdf, MS word, or text format (please check with us first
if you need to submit in another format).
- If
you do extra credit, create a separate code bundle that includes all files
needed to run and test your code.
If you have added functionality, please add it to your Dictionary
interface file as well.
If
you work with a partner, 1) only one partner needs to submit the code and writeup electronically, be sure that both
partners’ names are listed in all files.
2) Include a text file (submit electronically and print out) that
describes how you developed and tested your code. If you divided it into pieces, describe
that. If you worked on everything
together, describe the actual process used – how long you talked about
what, how long and in what order you wrote and tested the code. Be sure to describe at least one good thing
and one bad thing about the process of working together.
Above and
Beyond:
There are many directions you can go for
extra credit on this assignment. Here are a few ideas and here is an interface that may help, please add the prototypes
for any methods you implement.
·
(2 pts) Implement a find method to
return actual values associated with a key (currently the interface only has a
contains method that returns a Boolean result).
·
(2 pts) Implement a findmin or
findmax method to return actual values associated with a key (currently the
interface only has a contains method that returns a Boolean result)
·
(2 pts) Implement a remove method
to remove a value from the Dictionary. (Lazy deletion will only result in one
point, actual deletion for AVL will be two.)
·
(1-2 pts each) Implement more
dictionaries! This could include very
simple ones (an array or linked list), others from the book, more hash tables,
or putting wrappers around Java collections.
Whatever you do, the new implementations should implement the Dictionary
interface including the getStatistics method – be sure to document what
each parameter returns, and you should run several tests and report the
statistics you find.
·
(2 pts) Implement two more hash
functions – one you expect to behave poorly and one you expect to behave
better than your current one. Measure
them and compare statistics.
I’m open to other ideas as well, send
me an email or stop by to get “approval”.