CSE 373 Data Structures 12au, Homework 6

Individual assignment – no partners.

Electronic Turn-in due: 11pm Thursday, December 6, 2012.

This assignment allows you to explore the performance of several of the sorting algorithms we have learned using a program called Sort Explorer.

The Sort Explorer allows you to specify: an unnamed sort, an input size, and the ordering of the input. The program returns to you: the number of comparisons, number of times elements get moved, and the running time of the sort. By changing the type of input you give to the Sort Explorer, you can make an informed guess about which sort is being used.

The algorithms used are the following:

• Insertion sort

• Selection sort

• Heap sort

• Merge sort

• Quick sort (Simple)

• Quick sort (Optimized)

You are provided with a testing program called the Sort Explorer. The Sort Explorer allows you to run any of the algorithms, but it doesn't tell you which algorithm is which; it only refers to them by Greek letters. Your goal is to run the program several times with different list properties, observe the behavior of the sorting algorithms, and discover which is which. You use the Sort Explorer as follows:

1. Enter YOUR Student Number: When you run the Sort Explorer, you will first be prompted to input your student number. It is very important that you input YOUR student number accurately in order for us to grade your assignment correctly. Please be sure to type all 7 digits (including leading zeroes).

2. Create a list to sort: After entering your student number you should create a list to sort. You do this by specifying the total number of elements (you can type in a number or use the slider). You can also select how the elements should be ordered:

a. InOrder: elements' values go from smallest to largest; so they are already in ascending sorted order.

b. ReverseOrder: elements' values go from largest to smallest; so they are in backwards or descending order.

c. AlmostOrder: elements are almost InOrder (ascending order) but not quite.

d. Random: elements are given initial values randomly.

Once you have selected a size and an ordering you must click on the “Create the List” button to create the list. You can verify that your list has been created by seeing the list properties you specified displayed as “N:” and “DataType:” in the “Experimental Results” area.

3. Select a sorting algorithm to run: Once you have created a list, you should pick a sort to run on that list.

All sorts you run will use the same initial list until you hit “Create the List” again. (Thus you can use the same random or almost sorted list multiple times with different algorithms.)

Repeat steps 2 and 3 to gain information and help you determine which sort is which.

The Details

The sorting algorithms: We discussed all of these algorithms in lecture and you can find sample code in the textbook and in this sample file from the textbook. The code the Sort Explorer is using is not identical to what is in this file, but looking at it should help you gain a more concrete idea of approximately what the code looks like for each sort. Here are some more details about a few of the sort implementations used by the Sort Explorer:

• Heap sort: uses buildheap to create a binary maxheap and writes the values into the original array from back to front. Similar to the code from the book (and the optimized version discussed in class), this code uses the original array for everything.

• Merge sort: for each merge of two sub-arrays, it allocates a new temp array and merges into that, then copies back to the original array

• Quick sort (Simple): uses the first element in the range as the pivot for partitioning (does not use median of 3 partitioning), does not use a cutoff

• Quick sort (Optimized): uses median of three as the pivot for partitioning, uses a cutoff value and switches to insertion sort

The experimental results: The lists are created as arrays of ints in Java.

• Comparisons: Every time an element in the array is compared to something else (a temp value, another location in the array), this counts as a comparison.

• Movements: Any instruction that copies or moves a value from the array either to a temporary location or to another location in the array itself counts as a movement. Note this means doing a “swap” of two values, both in the array, would require 3 movements: temp=a[i]; a[i]=a[j]; a[j]=temp.

• Total time: Time is measured in milliseconds. It is recommended that you do several runs of a sort on a given list and take the median value (throw out the lowest and highest time). In order to get the most accurate runtimes possible, you may want to close other programs that might be taking a lot of processing power and memory and affecting the Sort Explorer runtimes.

Running SortExplorer: The sort explorer jar file is here: SortExplorer.jar . Once downloaded, most likely you can just click on the .jar file in a directory and it will run. If not, more information on running a jar file can be found here. For large input sizes and/or particular input patterns, some algorithms may run out of stack space. You will notice this because you click on the name of a sort and it returns without the experimental results changing. This should return fairly quickly (as opposed to waiting awhile for something to finally return) allowing you to hit other buttons – you will not be able to hit other buttons while waiting for a long-running algorithm to finish. If you run the jar file from the command line you will see the stack overflow messages reported there. If you run it by just clicking on the .jar file you will not see the messages. If you are interested in changing the runtime stack size you can do so when running from the command line. To increase stack size to 2 MB type: java -jar -Xss2m SortExplorer.jar

Don’t forget your student ID!! Remember to type in your student ID before running any experiments! Your student ID must be visible in the upper window before you run any sorts. Different student numbers will have different mappings of sorts to buttons. Be sure you are collecting data for YOUR student number or else we will not be able to grade your homework!

What to turn in

Turn in a written report (pdf or word are the preferred file formats) explaining which sort is which, including supporting data to back up your claims. The following supporting data is required, for every sorting algorithm:

1. Full statistics for input size 16: The runtime, number of comparisons, and movements for Random, InOrder, and ReverseOrder for an array of size 16. A table will be sufficient as your answer for question 1, no explanation required..

2. Runtimes for at least four nontrivial input sizes: In addition you must record the runtime for several (at least four) nontrivial input sizes for Random, InOrder, and ReverseOrder. A nontrivial input size means one where the runtime is more than just a few milliseconds. Feel free to also record number of comparisons and data movements for all of your input sizes, but it is only required for size 16. Feel free to record data on AlmostOrder but that is not required. A table will be sufficient as your answer for question 2, no explanation required.

When you can, increase the input size until the runtime takes at least one second. If input sizes are chosen well, they will allow you to see enough variation in the statistics to give you strong evidence to support your answers to (3) and (4) below. You do not need to use the same input sizes for every algorithm, since this might not produce the most useful data; but if you intend to compare two algorithms directly by their statistics, you should use the same input size and list properties for that comparison.

Each algorithm should have its own table. You will probably find it useful to create your tables in a spreadsheet program and just cut and paste values from the SortExplorer directly into the spreadsheet. Later on you can transfer your tables into a document. Here is a sample table, input sizes shown are just an example, and you are not required to put your values for N=16 into the table but you may find it convenient to keep all your data in one place. Shaded boxes show data that is optional (but may be useful) to collect.

Sort Alpha:

Input Size	Random			InOrder			ReverseOrder
	ms	Comp	Mvmt	ms	Comp	Mvmt	ms	Comp	Mvmt
16	0	5	23	0	75	15	0	80	45
10000	71	34234	234	123	7567	6767	237	4545	4545
20000	463	3434234	2323	456	87567	5767	628	5454	45453
30000	1527	9342342	52245	1891	755675	6756	2500	53454	54545
40000	2007	23423234	523424	2543	756776	7567	4370	545455	54545

3. Worst Case Big-O: Your estimation of the worst case Big-O runtime for each algorithm. You must gather appropriate data that clearly shows the relevant patterns of the algorithm's runtime. This likely means measuring different algorithms at different input sizes, since some algorithms are much faster than others.

No explanation is required here, however, if the runtime data you collected in question 2 is in conflict or in no way suggesting the runtimes you give for question 3 then you should consider collecting more runtime data. We don't expect curves that show a perfect nlogn or anything like that though, just reasonable data. The recommendation of "measuring different algorithms at different input sizes" is a good one to be mindful of. Try to pick values of n that help you discern the running time of a particular algorithm.

4. Which algorithm is it? Your statement of which sorting algorithm it is, along with your reasoning. You should base your reasoning on the following:

• growth rate (Big-O) of the algorithm's runtime as input size changes

• speed or number of comparisons and data movements of the algorithm compared to the other algorithms

• changes in behavior of the algorithm, if any, for different input orderings

If errors caused by the algorithm caused you to deduce that algorithm, explain why you suspect that algorithm would cause the error.

If all of your claims are based only on relative speeds of the algorithms ("sort Beta was the fastest, so therefore it's X-Sort"), you will not receive full credit. You will also not receive full credit if you use "process of elimination" to state which algorithm is being used ("I have already figured out the other four algorithms, so the last algorithm must be X-Sort").

It is only question 4 that requires you to give explanations. A good answer to question 4 is a strong argument. I would try for two, if possible three strong points that support your claim of what sort this is. A reason can be based on any of the criteria that are listed above. But if you are using the growth rate as one of your arguments - "I think this is Insertion sort because the runtime data I collected looks like O(n²)", then you should be prepared to say a bit about WHY exactly it is that you think it looks like O(n²) say, as opposed to O(nlogn) or some other runtime. But there are also lots of other things you can use other than trying to match your runtime data to a particular curve. Look at how the number of comparisons and movements vary both for different types of inputs (InOrder, ReverseOrder, AlmostOrder, etc.) and across different algorithms. You can look at how speed varies between different types of inputs. One reason we asked that you collect data for N=16 is that this a small enough problem size that you could actually walk through the algorithm and come up with a fairly close idea of how many comparisons or data movements would be required in some cases - this is something you can use as an argument.