Running SortExplorer: The sort explorer jar file is here: SortExplorer.jar . Once downloaded, most likely you can just click on the .jar file in a directory and it will run. If not, more information on running a jar file can be found here. For large input sizes and/or particular input patterns, some algorithms may run out of stack space. You will notice this because you click on the name of a sort and it returns without the experimental results changing. This should return fairly quickly (as opposed to waiting awhile for something to finally return) allowing you to hit other buttons – you will not be able to hit other buttons while waiting for a long-running algorithm to finish. If you run the jar file from the command line you will see the stack overflow messages reported there. If you run it by just clicking on the .jar file you will not see the messages. If you are interested in changing the runtime stack size you can do so when running from the command line. To increase stack size to 2 MB type: java -jar -Xss2m SortExplorer.jar
Don’t forget your student ID!! Remember to type in your student ID before running any experiments! Your student ID must be visible in the upper window before you run any sorts. Different student numbers will have different mappings of sorts to buttons. Be sure you are collecting data for YOUR student number or else we will not be able to grade your homework!
What to turn
in
Turn
in a written report (pdf or word are the preferred file formats) explaining
which sort is which, including supporting data to back up your claims. The following supporting data is required,
for every sorting algorithm:
1. Full statistics for input size 16: The runtime, number of comparisons, and movements for Random, InOrder, and ReverseOrder for an array of size 16. A table will be sufficient as your answer for question 1, no explanation required..
2. Runtimes for at least four nontrivial input sizes: In addition you must record the runtime for several (at least four) nontrivial input sizes for Random, InOrder, and ReverseOrder. A nontrivial input size means one where the runtime is more than just a few milliseconds. Feel free to also record number of comparisons and data movements for all of your input sizes, but it is only required for size 16. Feel free to record data on AlmostOrder but that is not required. A table will be sufficient as your answer for question 2, no explanation required.
When you can, increase the input size until the runtime takes at least one second. If input sizes are chosen well, they will allow you to see enough variation in the statistics to give you strong evidence to support your answers to (3) and (4) below. You do not need to use the same input sizes for every algorithm, since this might not produce the most useful data; but if you intend to compare two algorithms directly by their statistics, you should use the same input size and list properties for that comparison.
Each algorithm should have its own table. You will probably find it useful to create your tables in a spreadsheet program and just cut and paste values from the SortExplorer directly into the spreadsheet. Later on you can transfer your tables into a document. Here is a sample table, input sizes shown are just an example, and you are not required to put your values for N=16 into the table but you may find it convenient to keep all your data in one place. Shaded boxes show data that is optional (but may be useful) to collect.
Sort Alpha:
Input Size |
Random |
InOrder |
ReverseOrder |
||||||
|
ms |
Comp |
Mvmt |
ms |
Comp |
Mvmt |
ms |
Comp |
Mvmt |
16 |
0 |
5 |
23 |
0 |
75 |
15 |
0 |
80 |
45 |
10000 |
71 |
34234 |
234 |
123 |
7567 |
6767 |
237 |
4545 |
4545 |
20000 |
463 |
3434234 |
2323 |
456 |
87567 |
5767 |
628 |
5454 |
45453 |
30000 |
1527 |
9342342 |
52245 |
1891 |
755675 |
6756 |
2500 |
53454 |
54545 |
40000 |
2007 |
23423234 |
523424 |
2543 |
756776 |
7567 |
4370 |
545455 |
54545 |
3. Worst Case Big-O: Your estimation of the worst case Big-O runtime for each algorithm. You must gather appropriate data that clearly shows the relevant patterns of the algorithm's runtime. This likely means measuring different algorithms at different input sizes, since some algorithms are much faster than others.
No explanation is required here, however, if the runtime data you collected in question 2 is in conflict or in no way suggesting the runtimes you give for question 3 then you should consider collecting more runtime data. We don't expect curves that show a perfect nlogn or anything like that though, just reasonable data. The recommendation of "measuring different algorithms at different input sizes" is a good one to be mindful of. Try to pick values of n that help you discern the running time of a particular algorithm.
4. Which algorithm is it? Your statement of which sorting algorithm it is, along with your reasoning. You should base your reasoning on the following:
• growth rate (Big-O) of the algorithm's runtime as input size changes
• speed or number of comparisons and data movements of the algorithm compared to the other algorithms
• changes in behavior of the algorithm, if any, for different input orderings
If errors caused by the algorithm caused you to deduce that algorithm, explain why you suspect that algorithm would cause the error.
If all of your claims are based only on relative speeds of the algorithms ("sort Beta was the fastest, so therefore it's X-Sort"), you will not receive full credit. You will also not receive full credit if you use "process of elimination" to state which algorithm is being used ("I have already figured out the other four algorithms, so the last algorithm must be X-Sort").
It is only question 4 that requires you to give explanations. A good answer to question 4 is a strong argument. I would try for two, if possible three strong points that support your claim of what sort this is. A reason can be based on any of the criteria that are listed above. But if you are using the growth rate as one of your arguments - "I think this is Insertion sort because the runtime data I collected looks like O(n2)", then you should be prepared to say a bit about WHY exactly it is that you think it looks like O(n2) say, as opposed to O(nlogn) or some other runtime. But there are also lots of other things you can use other than trying to match your runtime data to a particular curve. Look at how the number of comparisons and movements vary both for different types of inputs (InOrder, ReverseOrder, AlmostOrder, etc.) and across different algorithms. You can look at how speed varies between different types of inputs. One reason we asked that you collect data for N=16 is that this a small enough problem size that you could actually walk through the algorithm and come up with a fairly close idea of how many comparisons or data movements would be required in some cases - this is something you can use as an argument.