CSE373 Data Structures Wi06, Homework 4, Parts 1 and 2
Programming problems Part 1 due online Thursday,
2/16/06, at 11 pm.
Written problems Part 1 due at the beginning of class, Friday, 2/17/06,
2:30 pm.
Programming problems Part 2 due online Thursday,
2/23/06, at 11 pm.
Written problems Part 2 due at the beginning of class, Friday, 2/24/06,
2:30 pm.
No late assignments will be accepted.
This assignment delves more into trees, covering balanced binary trees
and applications. In particular, it works with real-world data,
nutritional information for thousands of food items courtesy the USDA.
The assignment has many files provided:
- BasicOrderedSet.java - interface
for an ordered set. DO NOT CHANGE THIS FILE. Change your other files
accordingly.
- BasicOrderedSubTreeSet.java
- Code provided for you to implement a subset. DO NOT CHANGE THIS FILE.
- FieldsDefinition.java - Code
provided for you to create database records with many fields. DO NOT
CHANGE THIS FILE.
- FoodDataComparator.java - Code
to make Comparators on arbitrary fields. DO NOT CHANGE THIS FILE.
- FoodData.java - Code to read in the USDA
Nutrition Data. DO NOT CHANGE THIS FILE.
- Scanner.java - Parsing code. DO NOT CHANGE
THIS FILE.
- TreesAreFun.java - Demo code. You will
add on to the demos here.
- ABBREV.txt - The USDA Nutrition Data.
- fields.txt - The fields of the nutrition data.
- SR18_doc.pdf - The documentation on the
specification for the format is inside. Look in it for fun and to thank
us for taking care of the processing for you. :)
- ComparisonCounter.java - simple
class to count compareTo calls
As usual, your code should not only work properly, but it also should be
high-quality.
In particular:
- Write clean code - prefer simple and clear to complicated, unless there
is a compelling reason, which should be clearly explained in comments. Avoid
duplicate code or other situations where the same thing appears in more than
one place. Use sensible names, indent clearly, and format your code so it
blends cleanly with existing code.
- Use appropriate JavaDoc comments to specify all public classes, interfaces,
methods, and any other features of your code that need documenting.
What to do - Written Problems Part 1
Answer these questions on a separate sheet of paper. We suggest you do these
problems early as part of your preparation for the programming part of the
assignment.
- It is possible to insert a permutation of 1,2,3,4,5,6,7
into an initially empty splay tree such that, upon completion, the root is
1, its right child is 2, and so on -- a linear chain, until a leaf of 7.
State the order of insertion and draw the steps along the way.
(The inserts must be done in the style of the book and lecture, which may
be different from online applets.)
- Suppose the numbers 1 to n are inserted into a splay tree in the
fashion described above, resulting in a linear chain. In terms of big-Oh,
how much total work was done? How much work is it now to access and splay
the leaf n? (This is an example of how splay trees have good amortized
performance.)
- Draw an AVL tree that requires three sets of rotations to balance
following a delete. You should strive to find an example that is not
needlessly complex. (The slides show an example requiring two sets of
rotations, but it is not the simplest.) Mark the node to be deleted.
- Look at the code in BasicOrderedSubTreeSet.java. What is the
big-Oh of size()? How could it be made faster? What changes would you
have to make regarding access and state? (This was a case where I went
for the slow easy implementation rather than the fast hard implementation.)
- Include your output from demo3 and demo4 along with a description
of why your demo4 is cool.
What to do - Written Problems Part 2
Answer these questions on a separate sheet of paper. We suggest you do these
problems early as part of your preparation for the programming part of the
assignment.
- What does the code (node.parent.right == node) say about
node? (This is a big hint to making a more efficient Splay tree)
- Draw a red-black tree that does not satisfy the AVL height
property. List the properties that must hold for a tree to be a red-black
tree and explain why the tree satisfies these properties. Show where the
tree violates the AVL height property and explain why.
- Benchmarking
(This question is worth more than the other questions, so spend some
quality time here.)
- List your results from the benchmarking tests.
Discuss the differences between trees that are roughly balanced and are
unbalanced, both theoretically and according to your findings. Discuss
the performance of the splay tree as compared to the vanilla BST for the
insertions. Do your findings make sense?
- Discuss the performance of
the splay tree as compared to the BST tree for the uniformly distributed
comparisons. Why does the Splay tree perform better than the BST for the
tree created from ascending-order data?
- Discuss the performance of the
splay tree as compared to the BST tree for a smaller set of comparisons.
Why does the Splay tree perform dramatically better?
- Book problems 9.5, 9.6
What to do - Programming Part 1
In this assignment, we'll get to play with some real data! But first we
need to build some tools so we can do interesting things. In the process
of building these tools, it's best to test/experiment with simple examples
before using the full data. Then, once our tools are working, we'll try
doing some interesting queries.
First, we're going to update our code from last week. It might sound like
a lot of items, but they should be very little code once you understand
what's going on.
- Correct anything broken with your add, contains, and size from last
week. Change your BasicOrderedTreeSet to implement BasicOrderedSet
instead of BasicSet. Get help early if you're having trouble. You will
not do well on the assignment if this code is broken.
- Read about Comparators. Look up String's Comparator that ignores
case and think about why it's useful. Add a state that's a Comparator.
Add a constructor to BasicOrderedTreeSet that takes in as a parameter a
Comparator. Write a method that returns the comparator for the set. For
the no-parameter constructor, use the default comparator:
private static class DefaultComparator implements Comparator {
public int compare(Object a, Object b) {
return ((Comparable) a).compareTo(b);
}
}
public BasicOrderedTreeSet() {
this(new DefaultComparator());
}
- Replace calls to compareTo with calls to the comparator's compare.
- Add a parent link to your TreeNode (or whatever you call it). For
any TreeNode in the tree, it should point to its parent, except for the
root node, whose parent link is null.
- Modify add to assign the parent link correctly. (You're encouraged
update remove too, but we won't grade you on it.)
- Rewrite the iterator. Now that there are parent links, the only
state you'll need is a single TreeNode. No stack. Your iterator's
constructor should be O(height), .hasNext() should be O(1), and .next()
O(height).
- Add a method subSet to the implementation that returns a
BasicOrderedSet that is the subset within the specified range. Basically
create a new BasicOrderedSubTreeSet with the proper parameters and return
it. Read the Java SDK for more details on how subSet and subsetted sets work.
Now let's play!
Run the main in TreesAreFun.java. Look at the output and
what the demo1 and demo2 methods are doing. Fill in the demo3 method. It
should make a new BasicOrderedSet containing the subset of items from
McDonald's. Then it should iterate that subset into a set with a different
Comparator so that you can then extract the subset of those McDonald's items
with between 100mg and 1000mg of cholesterol. It should then iterate through
the final subset, printing out the name of the item, the number of
calories, the fat content, and the amount of cholesterol.
Fill in the demo4 method that does something cool and finds something
interesting in the data. Calories in alcohol? Lycopene content?
What to do - Programming Part 2
Now we're going to implement the Splay tree, which does self-balancing,
and benchmark how it compares to the unbalanced tree in different
situations.
Make a new file BasicOrderedSplayTreeSet.java where the class
BasicOrderedSplayTreeSet extends BasicOrderedTreeSet. (Please don't
misspell/change the name of the class, because it makes your TAs grumpy.)
- Write a method that takes in a TreeNode (or whatever you call it) and
splays the tree containing that node such that the node is now the root.
(You'll likely need several helper methods.)
- Override add and contains to perform the appropriate splay after a
contains and an add. You should not splay because of an iterator or
calling subSet().
- Write a benchmarking program using sets of ComparisonCounters.
Make an example of insertion of items in ascending order and random order
(for example use the numbers 1 to 100000, but tweak the size so that it's
not too fast nor too slow. you might need to go down to 10000 or less
if you're getting stack overflow). Time how long it takes to do each in
milliseconds and note for each how many times compareTo was called. Do
the example on both the BasicOrderedTreeSet and the BasicOrderedSplayTreeSet.
- For all four cases, BST ascending, BST random, Splay ascending, Splay
random, do a large number of random contains (try a million and tweak
from there). The contains calls should be uniformly distributed over the entire range of items in the sets.
Obtain timing results and compareTo counts.
- For all four cases, BST ascending, BST random, Splay ascending, Splay
random, do a large number of contains. The contains calls should be
chosen randomly from numbers that are multiples of 1000 (change this to
100 or 10000 as you feel appropriate so that between 5 and 20 items are
selected repeatedly). Obtain timing results and
compareTo counts. Turnin your code for all these tests.
What to turn in
Turn in your programming assignment online using the turnin form linked from
the course web site (when available). You should include the Java source code
for the interfaces, implementations, and testing code.
Turn in your Part 1 written questions at the
beginning of class, Friday, 2/17.
Turn in your Part 2 written questions at the
beginning of class, Wednesday, 2/22.