CSE 373, Summer 2019: P1 Part 2

Table of Contents

  1. Summary

  2. Working with the experiment plot window

  3. Experiments

  4. Individual feedback survey

Summary

Due Thursday, July 18 at 11:59pm.

In this half of the assignment, you will run some code that uses the data structures you implemented in the previous part, and then analyze the resulting behavior. This will generally involve the following steps:

  1. Predict the resulting behavior of the experiment to the best of your judgement.
  2. Run the experiment code and save an image of the resulting plot. (A window should automatically open while/after running the experiment that displays a plot of the experiment results.)
  3. Answer the discussion questions for each specific experiment in order to explain how or why the experiment produced those results.

The hypothesis/predict-based-on-the-code portions of your write-up will be graded based on completion, so just write down your honest thoughts before running the experiment. Please submit all plots generated along with your answers. The post-experiment analysis portions will be graded based on the clarity of your explanations and whether they match up with your plot.

You and your partner will work together on this write-up. Your completed write-up MUST be in PDF format. Submit your write-up to Gradescope here as a group. Make sure to log in to Gradescope with your @uw.edu email address. When you submit, mark which pages correspond to the questions we ask, and afterwards, the partner who uploaded the PDF must add the other partner as a group member on Gradescope. Do not have each member submit individually. A video showing how to do this can be found here. We may deduct points if you do not correctly submit as a group or do not properly assign your pages/problems.

You won't be writing any new code in this part of the project; instead, you'll be running the experiment files in src/main/java/analysis/experiments/.

Additionally, here are a few more files that you might want to review while completing the assignment (note that this is just a starting point, not necessarily an exhaustive list):

  • src/main/java/datastructures/lists/DoubleLinkedList.java
  • src/main/java/datastructures/dictionaries/ArrayDictionary.java

Fix bugs from part 1

The results of these experiments depend on your previous code, so it's important to make sure you fix all of the potential bugs you might have in your code before doing the final runs of the experiments. You will receive feedback on your part 1 code soon after the final due date (usually the weekend-ish) and you should determine any fixes required from that. In the meantime, however, you may still run the experiments—if your data structures passed all the provided tests, you can assume that their behavior is reasonably close to the expected behavior.

Working with the experiment plot window

When you run most of the experiment files, a new window should open that plots the results of the experiment as they run. (Only experiment 8 is different; it will finish running all code first before producing the plot). For many of the experiments, the x-axis represents the size of the data structure (see the experiment notes for more details), and the y-axis represents time taken to run the code.

The window should look something like this:

Experiment plot window

Saving plots

To save these files, right click anywhere on the plot and an options window should appear. Select 'Save as' > 'PNG...'' to save the file.

Experiment saving options window

Note: we've set PNG files to be ignored in your git repos to reduce the memory usage when we download all your repositories for grading. If you want to share these images, you will have to use other means.

Other settings

There are a couple more useful options for the plots, such as renaming and hiding results. To access the menus, right click and select 'Series Properties...' to be brought to the menu shown below. Here you can hide results that grow too quickly, allowing you to view the other results better. You can also rename their legend entries here.

Experiment series menu

Feel free to also explore the other menu options. If you wish to rename the plot titles or axes labels, you may do so from the 'Chart Properties' window.

Note: to zoom in on a specific region of your plot, click and drag right and down to select the region. If you want to reset the zoom, drag any direction that is not a combination of right and down. You can also zoom through the right-click menu.

Experiments

For each of the experiments, answer the bolded questions (and questions in the orange boxes) in your write-up. Also include the PNGs of the plots inside your write-up PDF.

Note: As a reminder, you may ignore Checkstyle errors in this code, since this isn't code that you'll be submitting. If the errors bother you though, we've updated our Checkstyle configuration to remove the Checkstyle errors in the experiment files for this assignment, but you may need to re-download the configuration file if you have it saved locally.

Experiment 1: Double linked list iterator

In this first experiment, we'll check the actual runtimes of getting indices out of a DoubleLinkedList, compared to using the iterator. First, open up the code for the experiment and read through the three test methods that the experiment will be calling. Based on the code, predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of the runtime of each test method as listSize increases, and record it in your write-up.

Now, run the experiment file. The main method in each experiment will open a plot window and plot the output of each test method when called with each input values from the listSizes array.

Questions (after running the experiment)

  1. You should see that the test that uses get has a much higher asymptotic growth than the others—why is that?
  2. Similarly, why are the runtimes for the iterator and the for-each loop so similar? (You may want to hide the series for the get test in order to see the other two more clearly in the plot window.)

Experiment 2: Double linked list get

This experiment will check the runtime of getting various indices of your doubly-linked list. Again, read through the code for each test method of the experiment and predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of the runtime of each as listSize increases.

Questions (after running the experiment)

You should see that getting runs very quickly for two areas of your list: the front and back. It should get slower as the indices approach the center of the list. For each test method, the calls to the get method at every iteration end up hitting some particular "case" of the runtime for that listSize—which one (best/worst) for each method?

Experiment 3: Array dictionary remove order

This experiment plots the runtime of removing all key-value pairs from your array dictionary, but with two different orderings. Read through each test method, and describe the order that each one removes from the dictionary in terms of the locations of the removed elements in the dictionary's array. Then, predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of the runtime of each as listSize increases.

Note that we deliberately avoid referring to "cases" in this section, since talking about cases is weird/confusing when the dictionary size is changing. The underlying intuition is very similar, however: for each list size, one of the two removal orderings ends up significantly slower than the other, to the point that we can see it has worse asymptotic growth.

Questions (after running the experiment)

Explain why the orderings differ so much in runtime. Be sure to mention what happens to the data structure's internals during each remove call and how that affects the runtime of futureremove calls.

Experiment 4: BST dictionary put order

This experiment plots the runtime of adding values to a BST dictionary (sorted based on the keys) with three different key orderings. Read through each test method, and describe the order that each one adds to the dictionary. Then, predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of the runtime of each as dictionarySize increases.

Questions (after running the experiment)

You should notice that test1 and test2 take similarly long to run, and take much longer than test3. Why are test1 and test2 so similarly bad in runtime even though the ordering is different? Describe the state/shape of the BinarySearchTreeDictionary to explain your answer.

Experiment 5: Data structure memory usage

In this experiment, we'll plot the memory usage of DoubleLinkedList and ArrayDictionary objects. Comparing these two isn't very useful in practice, since we would almost never need to compare an implementation of a list to an implementation of a dictionary, but we may as well plot them both at the same time for convenience.

Both test methods simply construct a DoubleLinkedList or ArrayDictionary of the input size, respectively; predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of the memory usage of each as the size increases.

Questions (after running the experiment)

You should notice that the memory usage for DoubleLinkedList increases by a constant amount as the size increases. This is in contrast to ArrayDictionary, which usually increases in memory usage by the same amount, but sometimes has spikes in memory usage. What is the cause of the spikes in the data for ArrayDictionary?

Experiment 6: Master theorem code

Recall that the master theorem allows us to easily calculate the big-\(\Theta\) runtime of recursive code. Using the master theorem, predict the big-\(\Theta\) runtimes for both tests. Next, compare your predicted big-\(\Theta\) runtimes with the actual runtime plot of the experiment. Do these results reflect what you predicted, and why or why not?

Questions (after running the experiment)

Only two of the three master theorem cases were used. Give one example of actual constants for a, b, and c could you input to the mystery function to cause the third remaining case.

Experiment 7: Recursive reverse

Now you've had some master theorem practice, let's look at how you would apply it to a real example: a method to reverse lists. We have the methods reverse1 and reverse2, they both do the same thing but work slightly differently.

Give a recurrence formula for the runtime of each of the reverse methods. Do not worry about finding the exact constants for the non-recursive term. (For example, if the running time is \(A(n) = 4A(n/3) + 25n\), you need to get the \(4\) and the \(3\) right but you don't have to worry about getting the \(25\) right.) Then, use the master theorem to predict the big-\(\Theta\) runtimes. Again, compare your predicted big-\(\Theta\) runtimes to the actual runtimes. Do the results match what you expected, and why or why not?

There are no additional questions for this part.

Experiment 8: Memory usage for code in loops

The experiments in this section are a little different from the others—the test functions only run a single time each and populate an array rather than outputting a single value.

The goal of the experiment is to demonstrate that the memory usage of code that runs multiple times (in this case, code in a for loop) changes depending on how the memory is used or saved

Note: some of these experiments will use a nontrivial amount of your computer's memory (~500MB). If the code takes more than a few minutes to run or your computer runs out of memory, you should be able to produce a sufficient plot by turning down the max size a bit.

Part a

In this half of the experiment, we'll first compare the memory usage of two different functions that create arrays in a loop. Read through the code for each and predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of the memory usage for each as maxSize increases.

(Note that the freq parameter is used solely for determining when to save values to plot, and has no relation to the memory usage.)

Results

The plot probably looks a bit different from what you'd expect. The first test has a nice quadratic curve, but the second should be a little bit... weird. It turns out that Java doesn't immediately clear out old memory that's no longer used (e.g., the arrays in the experiment whose only pointer got erased, and are thus no longer accessible by code). Instead, it waits a while until it feels the need to clean up, then clears out all of that memory at once—a process called garbage collection—which is why the memory usage for the second test seems to grow similarly to that of the first, before suddenly dropping off.

Part b

So, what's the actual asymptotic growth of the memory usage of that second test, then? In this half of the experiment, we'll play around a bit more with Java and its garbage collection to find out. We won't ask you to do any prediction this time around, since Java's garbage collector is not a main learning objective of this course, nor is it something we expect you to know about.

It may still help to read over the test functions, though; note that the first test function for this half is actually just the second test function from the previous half; then, tests 2 and 3 are similar functions, but with an extra System.gc call included at different points. You may have noticed this in the code comments from the previous half as well, but this method makes a request to Java to perform garbage collection (although Java has no obligation to comply). The inclusion of this method call should bring the memory usage much closer to our expected behavior.

Explanations

After you run the experiment, you should see that the memory usage for the new tests (2 and 3) are much lower than for the old test 1; probably so much that you can't really see them properly. Provide an image of all 3 plotted together, and also disable test 1 from the "Series Properties" window (accessed by right clicking on the plot) and provide an image of the plot showing results for only tests 2 and 3.

These two lines should both show approximately-linear growth in memory usage as maxSize increases, which is what we'd expect to happen when Java actually discards the unused memory. There's one last interesting thing to note about these results, though: test 2 runs garbage collection before getting the memory usage, but test 3 runs garbage collection right after requesting garbage collection; this essentially means that for each x-value, the y-value for test 3 is the memory usage before garbage collection and the y-value for test 2 is the memory usage after garbage collection, giving us a visual representation of the amount of memory cleared up by the garbage collection.

Here are two important disclaimers now that you've seen how Java handles memory:

  1. Whenever we talk or ask about memory usage of algorithms, we usually imply that unused memory will get cleaned up immediately, as if we're always running Java's garbage collection. If we don't do this, it becomes very difficult to talk about memory usage with any consistency, so we'll instead assume that whatever language we're implementing the algorithm in will perform garbage collection regularly (or if we're using a language that requires manual memory management, that our code is implemented in a way that cleans up its own memory properly).
  2. You shouldn't actually call System.gc everywhere in your code. There's a very good reason that Java's garbage collector is implemented as it is—take a look at the console output while running the second half of the experiment: you should see that the runtime of the tests that manually request garbage collection is multiple times worse than the version that just lets Java do its thing. Garbage collection is a very slow process, so Java usually waits a while or until it thinks it needs more memory before running garbage collection.

Part c

Consider the following method, and predict the complexity class (constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential) of its memory usage as its input increases:

public static Object[] test3(int n) {
    Object[] output = new Object[10];
    Object[] arr = output;
    for (int i = 0; i < n; i++) {
        arr[0] = new Object[10];
        arr = (Object[]) arr[0];
    }
    return output;
}

Now repeat for this method:

public static Object[] test4(int n) {
    Object[] output = new Object[10];
    for (int i = 0; i < n; i++) {
        output[0] = new Object[10];
        output = (Object[]) output[0];
    }
    return output;
}

Individual feedback survey

After finishing the project, take a couple of minutes to complete this individual feedback survey on Canvas.

Deliverables

The following deliverables are due on Thursday, July 18 at 11:59pm.

Before thinking you're done, be sure to double-check that: