Graph Traversals for Maze Search

A search is an algorithm that traverses a graph in search of one or more goal nodes. As we will discover in a few weeks, a maze is a special instance of the mathematical object known as a "graph". In the meantime, however, we will use "maze" and "graph" interchangeably. We are going to look at three graph traversal strategies for quickly finding donuts in a maze. The basic idea behind all three is the same: they all visit the node that is “next” in a data structure they maintain (a stack, queue or priority queue (implemented as either a binary heap or a three heap)), mark it, and then add its unvisited neighbors to the data structure. The traversals differ in which nodes they explore first and which they leave for later.

The basic algorithm used by all three searches is as follows. The type Set is implemented by one of the three ADTs, and affects the order in which nodes are explored.

Search( Maze m )

    Mark m.StartNode "Visited"

    Set.Insert( m.StartNode )

    While Set.NotEmpty

        c <- Set.Remove

        If c is the goal

            Exit

        Else

            Foreach neighbor n of c

                If n "Unvisited"

                    Mark n "Visited"

                    Set.Insert( n )

            Mark c "Examined"

End procedure

This algorithm uses three states for each cell.

Unvisited. The cell has not yet been visited by the search.
Visited but Not Examined. The cell has been discovered, but the search has not evaluated whether its neighbors should be added to the set.
Examined. The cell has been visited, and all its neighbors are/have been added to the set (ie, they are already "Visited but Not Examined" or "Examined").

I. Depth First Search (DFS)

The defining characteristic of this search is that, whenever DFS visits a maze cell c, it next searches the sub-maze whose origin is c before searching any other part of the maze. This is accomplished by using a Stack to store the nodes. Set.Insert corresponds to a Push, and Set.Remove to a Pop.

The end result is that DFS will follow some path through the maze as far as it will go, until a dead end or previously visited location is found. When this occurs, the search backtracks to try another path, until it finds an exit.

See this page for more information on DFS and BFS, and this page for a graphical comparison of BFS and DFS.

II. Breadth First Search (BFS)

The defining characteristic of this search is that, whenever BFS examines a maze cell c, it adds the neighbors of c to a set of cells which it will to examine later. In contrast to DFS, these cells are removed from this set in the order in which they were visited; that is, BFS maintains a queue of cells which have been visited but not yet examined (an examination of a cell c consists of visiting all of its neighbors). BFS is implemented using a Queue for the set. Set.Insert corresponds to an enQueue, and Set.Remove to a deQueue.

The end result is that BFS will visit all the cells in order of their distance from the entrance. First, it visits all locations one step away, then it visits all locations that are two steps away, and so on, until an exit is found. Because of this, BFS has the nice property that it will naturally discover the shortest route through the maze.

See this page for more information on DFS and BFS, and this page for a graphical comparison of BFS and DFS.

III. Best First search

The defining characteristic of this search is that, unlike DFS or BFS (which blindly examines/expands a cell without knowing anything about it or its properties), best first search uses an evaluation function (sometimes called a "heuristic") to determine which object is the most promising, and then examines this object. This "best first" behavior is implemented with a PriorityQueue for the set. Set.Insert corresponds to an Insert, and Set.Remove to a DeleteMin.

For our maze runners, the objects which we will store in the PriorityQueue are maze cells, and our heuristic will be the cell's "Manhattan distance" from the exit. The Manhattan distance is a fast-to-compute and surprisingly accurate measurement of how likely a MazeCell will be on the path to the exit. Geometrically, the Manhattan distance is distance between two points if you were only allowed to walk on paths that were at 90-degree angles from each other (similar to walking the streets of Manhattan). Mathematically, the Manhattan distance is:

| cell_x - exit_x | + | cell_y - exit_y |

(Sometimes, the Manhattan distance is scaled up by a constant factor, to ensure unique values for each cell.)

The end result is that best first search will visit what it thinks are the most promising cells first, which gives best first some of the nice properties of both BFS and DFS. However, this leaves best first search vulnerable to bad heuristics, or certain types of mazes which exploit weaknesses of certain heuristics.

See this page for more information on various heuristics (they will use game theory terminology).