CSE 373, Autumn, 2014 -- University of Washington |
In this assignment, you will use a program called Sort Explorer to explore the performance of several of the sorting algorithms we have learned. You will observe the performance of various "secret" implementations and use the results to decide what sorting algorithm is used in each implementation. Unlike previous assignments, in this one you'll not be doing programming or proofs; rather, you'll be performing empirical experiments, making observations, analyzing the data, and writing explanations of your conclusions. Also, unlike Assignment 5, there is no partnership option; each student is to do Assignment 6 individually.
We are providing a Java program known as the Sort Explorer. You get the compiled version of the program, but not the source. The program contains implementations of six sorting algorithms. You'll run each algorithm by clicking on one of 6 buttons, which are named Alpha, Beta, Gamma, Delta, Epsilon, and Zeta. The mapping between buttons and algorithms is dependent on the student ID value that you enter near the top of the interface window.
How to run the program: Download SortExplorer.jar, which is a (compiled) Java program that you should be able to run on any computer that has Java installed: Most likely you can just (double-)click on the file and it will run. You can also run it on a command-line using java -jar SortExplorer.jar assuming the terminal where you type commands is in the directory/folder holding the file. For (much) more information on running .jar files, see here.
The Sort Explorer program can run six sorting algorithms, each described here. It does not tell you which algorithm is which, identifying them only by Greek letters. But when running one of the algorithms, it tracks the number of comparisons performed, the number of times data items are moved, and the actual time elapsed.
More details on the program and the experimental data it reports:
Details on the sorting algorithms:
The first step when using Sort Explorer is to enter your 7-digit UW student ID (including any leading zeros) in the box at the top. The assignment of sorting algorithms to Greek letters is different for different student ID numbers and only the course staff knows the mapping. If you do not enter your student ID correctly, you will likely not have the mapping we expect you to, which will lead you to the wrong conclusions and therefore hurt your grade. Do not forget to enter your Student ID correctly every time you start the program.
The second step is to create an array to sort following these substeps:
Specify the array size either using the slider bar or typing directly into the box holding the number.
Click on the Create The List button. This array is not created until you click the button. When a new array is created, the program fills in the N and DataType fields in the Experimental Results section.
The third step is to select a sorting algorithm to run. Clicking one of the buttons with a Greek letter will run one of the sorting algorithms and fill the rest of the experimental results.
For each array you create, you can then sort it many times, using one or more algorithms. Every time you select a sorting algorithm to run, it will run starting with the array in the same configuration it was in when the array was originally created. So while you will want to create many different arrays to sort, for each one you create, you can try all the sorting algorithms (multiple times to get better timing information).
Stack overflow: For large input sizes and/or particular input patterns, some algorithms may run out of call-stack space. You will notice this because you click on the name of a sort and it returns without the experimental results changing. This should return fairly quickly (as opposed to waiting a while for something to finally return), allowing you to hit other buttons (in contrast, you will not be able to hit other buttons while waiting for a long-running algorithm to finish). If you run the .jar file from the command line, then you will see stack overflow messages reported there. If you run it by just clicking on the .jar file, then you will not see the messages. If you are interested in changing the runtime stack size, then you can do so when running from the command line. For example, to increase the maximum call-stack size to 2MB, type: java -jar -Xss2m SortExplorer.jar.
Other output: If you run Sort Explorer from the command-line, then whenever you create an array with 32 elements or fewer (including when you start the program), the contents of the array will be printed below your command line. This is just debugging output that you can ignore.
You will turn in two files: one named answer.txt and one named report.pdf.
Create a file named answer.txt. Make the first line of that file contain your 7-digit student number. On the following six lines, write the actual names of the sorting algorithms in the order you have identified them (in order from Alpha to Zeta). Here is an example:
0123456 heap sort selection sort quick sort (simple) quick sort (optimized) merge sort insertion sort
This file is used for auto-grading so make sure the file has the correct name and the correct format.
Turn in a written report report.pdf explaining which sort is which, with strong supporting evidence, by answering all the questions below for all the implementations of sorting. We prefer PDF or Microsoft Word file formats for your report; ask the course staff before submitting a report in another format.
Questions (1) and (2) require you to produce tables for each sorting algorithm. You will probably find it useful to create your tables in a spreadsheet program and just cut and paste values from the Sort Explorer directly into the spreadsheet. You can create the tables first with blank entries for experiments you need to run; this is a good way to make sure you understand the questions before running experiments. Later, you can transfer your tables into your written report.
For question (4), if all your claims are based only on relative speeds of the algorithms (“sort Beta was the fastest, so therefore it is X-Sort”), you will not receive full credit. You will also not receive full credit if you use process-of-elimination to state which algorithm is being used for the last one or two sorts.
A good answer to question (4) is a strong argument. Try for two or, if possible, three strong points that support each algorithm claim. A reason can be based on any of the criteria that are listed above. But if you are using the growth rate as one of your arguments, then include an explanation of why you think the growth rate is what you claim it is. For example, why do you think a growth rate is quadratic instead of n log n? But there are other things you can use beyond trying to match your runtime data to a particular curve. Look at how the number of comparisons and movements vary both for different types of inputs (InOrder, ReverseOrder, etc.) and across different algorithms. You can also look at how speed varies between different types of inputs. One reason we asked that you collect data for N=16 is that this is a small enough problem size that you may be able to walk through the algorithm and come up with a fairly close idea of how many comparisons or data movements would be required in some cases, which you can then use as an argument.
Turn in your two files through our Catalyst CollectIt dropbox. This assignment is due Wednesday, December 3 at 5:00 PM. Late penalties apply (see the policies page for details).
Andy Borui Li is serving as lead TA for this assignment.
This is version 1.0. Last major update: Tuesday, Nov. 25, 2014.