Link

Comparison Sorts Reading

Complete the Reading Quiz by noon before lecture.

When discussing algorithm analysis, we saw two examples of comparison-based sorting algorithms: selection sort and merge sort. We’ve used sorting as a procedure for range search to autocomplete prefix queries. But how do sorting algorithms work, and what algorithm design patterns can we learn from their ideas?

Sorting Problem

Formally, sorting is well-defined only when the ordering relation satisfies total order. Given keys a, b, and c, a total order has the following properties.

  1. Law of Trichotomy: Exactly one of a < b, a = b, b < a is true.
  2. Law of Transitivity: If a < b and b < c, then a < c.

A sort is a permutation (rearrangement) of a sequence of keys that puts the keys into non-decreasing order relative to a given ordering relation.

For example, suppose we want to sort strings by their length from shortest to longest.

  1. Law of Trichotomy: Exactly one of the following is true:
    a.length() < b.length()
    a.length() = b.length()
    b.length() < a.length()
    
  2. Law of Transitivity: If a.length() < b.length() and b.length() < c.length(), then a.length() < c.length().
Using string length as the ordering relation, give two sorts for ["cows", "get", "going", "the"].

There are two valid sorts since “the” is considered equivalent to “get” when comparing by string length.

  • [“the”, “get”, “cows”, “going”]
  • [“get”, “the”, “cows”, “going”]

A sort is considered stable if the relative order of equivalent keys is maintained after sorting.

There are two valid sorts for [“cows”, “get”, “going”, “the”] but a stable sorting algorithm is guaranteed to return [“get”, “the”, “cows”, “going”] since “get” and “the” are equivalent-length strings. A stable sorting algorithm guarantees that “get” comes before “the” in the resulting array. Maintaining the relative ordering of equivalent keys is useful for subsorting items. For example, sorting a list of messages by sender name and then stably-resorting by date will result in all messages in chronological order with the messages for each date alphabetized by sender name.

Maintaining the relative ordering of equivalent keys with a stable sorting algorithm also matters when our data has multiple fields.

Give an example of a Java data type that, when sorted by an unstable sorting algorithm instead of a stable sorting algorithm, won't affect any client programs.

Primitive data types such as int or double are not affected by stability since numbers do not have other fields. Mixing up the relative ordering of two equal numbers has no effect on any client programs.

In contrast, objects can have many fields. Even for objects that are considered equals, a client program that relies on reference equality == might still be surprised by an unstable sort. For this reason, Java’s sorting methods use a faster but unstable sort for primitive types and a slower but stable sort for reference types.

Selection sort works by repeatedly scanning for the smallest unsorted item in the array and swapping it to the sorted section at the front. This algorithm takes Θ(N2) time.

Merge sort works by splitting the input array into two halves, recursively merge-sorting each half, and then merging the sorted result. This algorithm takes Θ(N log N) time.


Reading Quiz