Link

Comparison Sorts Reading

Complete the Reading Quiz by 3:00pm before lecture.

When discussing algorithm analysis, we saw two examples of comparison-based sorting algorithms: selection sort and merge sort. We’ve also made sorting a pre-requisite for autocompleting prefix queries (we need sorted results to implement range search). Selection sort works by repeatedly scanning for the smallest unsorted item in the array and swapping it to the sorted section at the front; this algorithm takes Θ(N2) time. In contrast, merge sort works by splitting the input array into two halves, recursively merge-sorting each half, and then merging the sorted result; this algorithm takes Θ(N log N) time.

In lecture, we will investigate how sorting algorithms work.

Sorting

Formally, sorting is well-defined only when the ordering relation satisfies total order. Given keys a, b, and c, a total order has the following properties.

  1. Law of Trichotomy: Exactly one of a < b, a = b, b < a is true.
  2. Law of Transitivity: If a < b and b < c, then a < c.

A sort is a permutation (rearrangement) of a sequence of keys that puts the keys into non-decreasing order relative to a given ordering relation.

For example, suppose we want to sort strings by their length from shortest to longest.

  1. Law of Trichotomy: Exactly one of the following is true:
    a.length() < b.length()
    a.length() = b.length()
    b.length() < a.length()
    
  2. Law of Transitivity: If a.length() < b.length() and b.length() < c.length(), then a.length() < c.length().
Using string length as the ordering relation, give two sorts for ["cows", "get", "going", "the"].

There are two valid sorts since “the” is considered equivalent to “get” when comparing by string length.

  • [“the”, “get”, “cows”, “going”]
  • [“get”, “the”, “cows”, “going”]

Stability

A sort is considered stable if the relative order of equivalent keys is maintained after sorting.

As we saw above, there are two valid sorts for [“cows”, “get”, “going”, “the”]. However, a stable sorting algorithm is guaranteed to return [“get”, “the”, “cows”, “going”]; “get” and “the” are equivalent-length strings, and “get” appears before “the” in the original input.

Maintaining the relative ordering of equivalent keys can be useful. For example, if a list of email messages is already sorted by date and then is stably-sorted by sender, the result will group messages by sender name, and within each sender’s messages they will be ordered chronologically. Maintaining the relative ordering of equivalent keys may also matter when our data has multiple fields.

Give an example of a Java data type that, when sorted by an unstable sorting algorithm instead of a stable sorting algorithm, won't affect any client programs.

Primitive data types such as int or double are not affected by stability since numbers do not have other fields. Mixing up the relative ordering of two equal numbers has no effect on any client programs.

In contrast, objects can have many fields and not all fields might be used when calculating equals. For objects that are considered equals, a client program might be surprised by an unstable sort. For this reason, Java’s sorting methods use a faster but unstable sort for primitive types and a slower but stable sort for reference types.


Reading Quiz