Data structures and algorithms are often considered foundational in computer science. Why do companies seek candidates with this specific knowledge?

One reason is that many real world problems can be solved with the data structures and algorithms that we’ve learned. This process of turning a real-world problem to a computable representation is called problem decomposition. Applying the algorithm design process to solve classic problems in computer science helps us learn patterns and strategies for solving novel problems.

Reductions are a special type of problem decomposition for transforming one problem into another problem.

1. Transform the input so that it can be framed in terms of another problem.
2. Run a standard algorithm on the transformed input.
3. Transform the output of the algorithm to solve the original problem.

Many graph problems can be solved using reduction, including particle detection, maximum spanning trees, and seam carving. These problems are typically transformed by a adding or modifying vertices or edges to solve a complex problem with a graph traversal algorithm, minimum spanning tree algorithm, or shortest paths algorithm.

Reductions are not necessarily limited to graph problems either. In Algorithm Analysis, we discussed two methods for finding duplicates in a sorted array: one that considered every pair of items and the other that used the fact that the input is sorted to speed up running time. We know that merge sort has a linearithmic (N log N) runtime, so finding a duplicate in an unsorted array reduces to sorting.

Duplicates. Are there any duplicate keys in an array of `Comparable` objects? How many distinct keys are there in an array? Which value appears most frequently? With sorting, you can answer these questions in linearithmic time: first sort the array, then make a pass through the sorted array, taking note of duplicate values that appear consecutively in the ordered array.1

What are the three reduction steps for reducing duplicate finding to sorting?
1. No transformation to the input array.
2. Sort the input array.
3. Scan over the sorted array. If there are adjacent duplicate keys, return true. Otherwise, return false.

A consequence of this efficient reduction is that finding duplicate keys in an unsorted array is no more difficult from a theoretical perspective than sorting (followed by scanning). If we develop a linear time sorting algorithm, we can also improve the runtime of duplicate finding. This insight has an important consequence since we’ll later show that it’s impossible to design a linear time sorting algorithm for `Comparable` keys.

Given that a linear time comparison sorting algorithm can't exist, does that imply a linear time duplicate finding algorithm can't exist either?

No, this just means that we need to use a different approach. Put all of the items into a `HashSet` and check that the size of the `HashSet` equals the size of the unsorted input array. Assuming a good hash function and efficient resizing, this results in a linear time duplicate finding algorithm.

## Negative edge-weighted graphs

Dijkstra’s algorithm is not guaranteed to return a correct shortest paths tree in graphs that contain negative edge weights. The runtime and correctness of Dijkstra’s algorithm depends on processing vertices in the order of shortest distance from the source. The problem arises when we attempt to relax a negative edge to a vertex that was previously processed and removed from the fringe.

One way to avoid this problem is to break the dependence on vertex ordering entirely. If we exhaustively relax every edge in the graph V - 1 times, then we can compute the correct shortest paths tree in O(V * E) time. This quadratic time algorithm is known as the Bellman-Ford algorithm.

## Directed acyclic graphs

We’d like an algorithm that computes the shortest paths tree in graphs with negative edge weights. It turns out that’s it’s possible to solve this problem in linear time O(V + E) time—better than standard Dijkstra’s algorithm—on weighted, directed acyclic graphs (DAGs).

This algorithm for finding the single-source shortest paths in a DAG relies on finding an ordering of the graph’s vertices such that all edges entering a vertex are considered before any of the edges exiting the vertex. Formally, we want to find a topological ordering of the vertices in the graph.

### Topological ordering

Given a digraph, put the vertices in order such that all its directed edges point from a vertex earlier in the order to a vertex later in the order (or report that doing so is not possible).1

Consider the following DAG. Give a topological ordering for the vertices in the graph.

A, B, D, E, H, C, F, G is one example. A, D, E, B, H, C, F, G is another. We can describe some of the constraints corresponding to each incoming edge.

• The first vertex needs to be A.
• E must come after D.
• H must come after B and E.
• C must come after B.
• F must come after C.
• The final vertex needs to be G.

A topological ordering can be computed with depth-first search. (It’s also possible to solve this with modifications to breadth-first search.)

Give the DFS preorder traversal of the graph assuming we explore neighbors in alphabetical order.

A, B, C, F, G, H, D, E. This doesn’t satisfy the ordering constraints, so preorder traversals don’t work.

Give the DFS postorder traversal of the graph assuming we explore neighbors in alphabetical order.

G, F, C, H, B, E, D, A. The result is backwards: all of the constraints are in reverse.

A valid topological ordering can be computed by reversing the DFS postorder traversal starting from a vertex with no incoming edges. In case not every vertex is reachable from the starting vertex, it’s necessary to continue the reverse DFS postorder traversal from another vertex with no incoming edges.

### Shortest paths

Consider the following negative-weighted DAG. Dijkstra’s algorithm fails to compute the correct shortest path to H (A-D-E-H) and instead prefers A-B-H because A-B-H has a lower cost than A-D-E in the shortest paths tree. A valid topological ordering is A, D, E, B, H, C, F, G. Notice that this ordering ensures that both B and E will be considered before deciding the shortest path to H. We can compute the shortest paths tree in a DAG by considering each vertex in topological order.

``````order = topological(G)
for v in order:
for (w, weight) in G.neighbors(v):
if distTo[w] > distTo[v] + weight:
edgeTo[w] = v
distTo[w] = distTo[v] + weight
``````

A vertex is considered only when all of its possible incoming edges have been processed.

Give the runtime for this algorithm.

O(V + E) for finding a valid topological ordering via reversing the postorder DFS traversal.

O(V + E) to consider every vertex and relax every edge in the graph.

This is also asymptotically faster than Dijkstra’s algorithm, which has a runtime in O(E log V) for E > V. This is also much faster than Bellman-Ford, which has a runtime in O(V * E) but works on graphs that contain cycles as well.

1. Robert Sedgewick and Kevin Wayne. Algorithms, Fourth Edition. 2011. Directed Graphs. https://algs4.cs.princeton.edu/42digraph/  2