Experiments

Analyzing the asymptotic behavior of your code.

Overview
Experiments and Discussion Prompts
Submission

Overview

In this part of the assignment, you will run some code that uses the data structures you implemented in the previous part, and then analyze the resulting behavior. This will generally involve the following steps:

Read through the experiment methods and make a prediction about their resulting behavior.
Run the experiment code and save an image of the resulting plot. (A window displaying the plot should open automatically as the experiment runs.)
Answer the discussion questions for each specific experiment in order to explain how or why the experiment produced those results.

The hypothesis/predict-based-on-the-code portions of your write-up will be graded based on completion, so just write down your honest thoughts before running the experiment. Please submit all plots generated along with your answers. The post-experiment analysis portions will be graded based on the clarity of your explanations and whether they match up with your plot.

Your completed write-up must be in PDF format. If you have a partner, you will work on this write-up and submit together.

Warning: If your code from the previous part is buggy, your experiments may produce unexpected results. Make sure you pass all tests on Gradescope before you attempt to analyze any experimental results.

Experiments and Discussion Prompts

Task: For each of the experiments, make a prediction according to the bolded prompt in your write-up before running the experiment or opening its discussion prompts below.
Then, run the experiment, expand the discussion prompts, and answer the questions inside.
Also include the PNGs of the plots inside your write-up PDF.

Include 1–2 sentences of justification for your predictions, and try to keep answers to discussion questions to a maximum of 2–3 sentences.

Experiment 1: `ArrayMap` Remove Order

This experiment plots the runtime of removing all entries from your array map, but with two different orderings. Read through the code for each test method of the experiment and describe the order that each one removes from the dictionary in terms of the locations of the removed elements in the dictionary’s array. Then, predict the order of growth (constant, logarithmic, linear, $\bm{n\log(n)}$ , quadratic, exponential) of each method’s runtime as its size increases.

Discussion (after running the experiment)

Explain why the orderings differ so much in runtime. Be sure to mention what happens to the data structure’s internals during each remove call and how that affects the runtime of future remove calls.

Experiment 2: `BinarySearchTreeMap` Put Order

This experiment plots the runtime of adding values to a BST map (sorted based on the keys) with three different key orderings.

Read through each test method, and describe the order that each one adds to the dictionary. Then, predict the order of growth (constant, logarithmic, linear, $\bm{n\log(n)}$ , quadratic, exponential) of the runtime of each as its size increases.

Discussion (after running the experiment)

You should notice that test1 and test2 take similarly long to run, and take much longer than test3. Why are test1 and test2 so similarly bad in runtime even though the ordering is different? Describe the state/shape of the BinarySearchTreeMap to explain your answer.

Experiment 3: Chaining with Different `hashCode`s vs. AVL trees

This experiment explores how different hash code functions affect the runtime of a ChainedHashMap, and compare that to the runtime of an AVLTreeMap.

First, we’ll look at the tests involving the ChainedHashMap: test1, test2, and test3. Each uses a different class (FakeString1, FakeString2, and FakeString3 respectively) as keys for a ChainedHashMap. Each of the different fake string objects represent a string (by storing an array of chars) but each class has different implementations of the hashCode method. Read over these implementations of hashCode, and expand the box below to show some plots to help illustrate the output distributions of these hash functions.

Histograms

Each plot shows the distributions of outputs of the hashCode methods across 80,000 randomly-generated fake strings.

Histogram for FakeString1:

Histogram for FakeString1 hashCode values

Histogram for FakeString2:

Histogram for FakeString2 hashCode values

Histogram for FakeString3:

Histogram for FakeString3 hashCode values

Now, predict which test method will have the fastest and slowest asymptotic runtime growths.

Discussion (after running the experiment)

Note that the distributions for the hash codes used in test1 and test2 look very similar in shape, but in your graphs you produce, you should see that test2 tends to run much faster than test1—why is that? Hint: look at the x-axis scale labels.
You should also see that FakeString3 produces the fastest runtimes when used as keys for ChainedHashMap—why is this? Explain using the histogram above, citing at least 1 observation about the histogram.

Now, we’ll consider test4, which uses AVLTreeMap. This test uses a fake string class that does not provide a working hashCode method, but is comparable in a way that mimics regular String comparisons. You should see that the AVL-tree-based implementation performs much better than a chained-hash implementation that’s used with bad key objects, but looks like it’s only about a constant factor worse than chained-hashing with good keys.

What functionality does ChainedHashMap require from its keys? What else must be true of its keys in order for the dictionary to perform well, if anything?
What functionality does AVLTreeMap require from its keys? What else must be true of its keys in order for the dictionary to perform well, if anything?
Which of these two has a runtime with a better (slower) asymptotic growth: the AVLTreeMap, or the ChainedHashMap (with good keys)? (Your answer should be based on the properties of these implementations, not just the results of this graph.)

Experiment 4: Load Factor Thresholds for Resizing

This experiment tests the runtime for ChainedHashMap’s put method across different values of the load factor threshold for resizing.

First, answer the following prompts:

Briefly describe the difference between test1 and test2.
Which test do you expect to run faster? What about asymptotically faster?

Discussion (after running the experiment)

Why is using the load factor of 300 slow? Explain at a high level how this affects the ChainedHashMap.put behavior.
This was not a part of this experiment, but explain why very small load factor thresholds (much less than 0.75; e.g., 0.05) might be wasteful.

Experiment 5: Initial Internal Chain Capacities

This experiment tests the runtime for inserting elements into ChainedHashMap with different initial capacities for the internal ArrayMap chains.

Briefly describe the differences between the three tests.

Discussion (after running the experiment)

Note that although the runtimes when using the three initial sizes are similar, using an initial capacity of 2 results in the fewest spikes and is generally the fastest. Why would a lower initial ArrayMap capacity result in a more consistent and faster runtime?

Submission

Task: Submit your write-up PDF to Gradescope as a group.

When you submit, mark which pages correspond to the questions we ask. After submitting, the partner who uploaded the PDF must add the other partner as a group member on Gradescope. Do not have each member submit individually. The process for adding group members is the same as for the programming portion; if you need a reminder, here’s the video demonstrating the process.

We may deduct points if you do not correctly submit as a group or do not properly assign your pages/problems.

After you’re done, remember to complete the individual feedback survey for extra credit, as described on the project main page.

Experiments

Table of contents

Overview

Experiments and Discussion Prompts

Experiment 1: ArrayMap Remove Order

Experiment 2: BinarySearchTreeMap Put Order

Experiment 3: Chaining with Different hashCodes vs. AVL trees

Experiment 4: Load Factor Thresholds for Resizing

Experiment 5: Initial Internal Chain Capacities

Submission

Experiment 1: `ArrayMap` Remove Order

Experiment 2: `BinarySearchTreeMap` Put Order

Experiment 3: Chaining with Different `hashCode`s vs. AVL trees