Three of the most used sorting algorithm are
insertion sort, mergesort, and quicksort.
Mergesort comes in two basic varieties - recursive
and iterative. Insertion sort and quicksort
have worst case running time O(n2), while
merge sort has worst case running time O(n log n). The average case
running time of quicksort is O(n
log n). This is all very nice theory, but it's good to know which
algorithm should be used in a certain situation. Common wisdom states
that insertion sort only does well on small input sets and/or sets that are
almost sorted. Mergesort uses twice the memory
as quicksort and does a lot of copying, so it is
slower than quicksort. In this project you
will either verify this common wisdom or refute it. There are a number of
variations of these algorithms. For example, in quicksort
there are several ways to choose the pivot.
1. Implement
insertion sort, recursive mergesort, iterative mergesort, and two varieties of
2. Do a timing study to determine the best value for CUTOFF for both versions of quicksort. For data set size of roughly N = 106 plot the normalized running time with CUTOFF values ranging from 1 to 25, where the normalized time is the running time divided by N log2 N. Use your optimal CUTOFF in part 3 below.
3. Do
a timing study to determine the average running time for both varieties of mergesort and both varieties of quicksort
for data sets ranging from roughly 102 to 107 elements.
In separate plots show: (i) the running times on a
log-log plot, running time vs. data set size, (ii) the normalized running times
of both varieties of mergesort and both varieties of
Experimental Methodology
Care must be taken to make sure that the numbers you generate are valid and
reflect the true performance of your algorithms. For timing measurements you
should use the provided CurrentTime() function to measure the wall clock time. A key issue in
using any clock is ensuring that it provides millesecond
or better resolution. You will need to run your experiments for
You will discover the need to make repeated experiments for each of your data points. If you are running your experiment on a multi-tasking system, then your timing measurement can be affected by other tasks running on the same machine. How many trials should you make in order to get good data? What data should be thrown out? A common experimental approach is to make a small number (like 11) of trials and take the median as an indication of the correct value. This is better than taking the average of all 11, which might include bad data. To minimize interference from other tasks you might consider running your experiments on a single user computer. Even on a single user machine that is also multi-tasking you should be careful to stop any background jobs. Care must also be taken to avoid reading from or writing to files while you are making measurements. Disk I/0 times can easily dominate the computation times you are trying to measure. In general, you should use as little I/O (this includes printf() statements) as possible in your program, and certainly no I/O at all while you are timing the sorting algorithm itself. Even a simple printf() statement has to invoke a kernel system call, and is pretty expensive to perform. If you want to verify the correctness of your sorting algorithm, you may use the provided VerifySortOrder() function or write your own I/O-less way of ensuring that your sort works correctly.
Skeleton code
The skeleton code is available here. Use 'tar xf proj1.tar' to untar the code.
Handing in the code
You are required to hand in your project directory with your implementation
of your sorting algorithms. Your code should be well commented to explain in English
what it does. A README file should explain how your code can be
Handing in the report
The report should have a title, and three sections: introduction, methodology, and results. The introduction explains the overall project. The methodology section explains the details of your experiments. No more than a page is needed for these sections. The results section includes the graphs as described above, with accompanying text. Each graph should have a title, with x and y coordinates clearly labeled, and with a legend for each line. With each graph there should be a paragraph explaining what was found by the experiments. You should explain, or speculate on the explanation of, any interesting features in the data. The reports are due at the beginning of lecture on Wednesday, January 30.
The programming portion of this assignment is small, but the overall workload is not as light as it may seem, so don't leave everything for the last minute. Report anything interesting that you come up with, especially if the numbers don't come out the way you expected. For plotting you can use excel, matlab, or mathematica. There is also a fairly nice tool called gnuplot in the Linux world.
Good luck!