CSE 521, Winter 2006.
Notes on network flows sketching the key steps and proofs
**********************************************************
NETWORK FLOW
------------
Problem description: directed graph. Source node. Sink node. Each
(directed) edge has a capacity which we assume to be a positive
integer. The goal is to flow as much as possible from source to sink.
Formally, a flow is a map f: E -> non-negative reals obeying the two rules:
Capacity constraint: on any edge, f(e) <= c(e)
Flow conservation: for any vertex except s and t, flow in = flow out.
The goal is to maximize the total flow on edges leaving the source s.
How can one find a maximum flow and prove that it is correct? Here is
a very natural strategy: find a path from s to t and push as much flow
on it as possible. Then look at the leftover graph called the
residual graph and repeat until there is no longer any path with
capacity left to push any more flow on. This is called the
Ford-Fulkerson algorithm.
Let us now formally define the leftover/residual graph G_f with
respect to the current flow f:
* If e is an edge in the original network, and f(e) < c(e), then
G_f has the edge e with "residual capacity" c_f(e) = c(e) -
f(e). These edges are called "forward" edges.
* Also, if edge e=(u,v) carries flow f(e) > 0, we have a
"backedge" e' in opposite direction with c_f(e') = f(e) (the
idea being that we can reduce the flow on e up to f(e) --
routing flow on backedges amounts to reducing flow on the
corresponding original edge, and re-routing that elsewhere
through the network).
Note: residual capacity can be *larger* than original capacity if we
had flow going in opposite direction. For example if there is an edge
(a,b) with capacity 2 and edge (b,a) with capacity 1 and the current
flow sends one unit of flow on (b,a), then the capacity of the edge
(a,b) in the residual graph will be 3. If this bothers you, you can
equivalently allow multiple edges between (a,b) and keep the forward
and backward edges separate with their respective capacities.
We are now ready to describe the natural, iterative flow algorithm:
Basic Ford-Fulkerson algorithm
------------------------------
Begin with the zero flow f(e) = 0 for all edges e.
While there exists an s-->t path P in G_f:
(i) push maximum possible flow along P (saturating at least one edge
on it) [These are called "augmenting paths"]
(ii) update the flow f and the residual graph G_f
Output the flow f.
RUNNING TIME: If capacities are all integers, the above algorithm
always maintains an integer-valued flow. Therefore, after each
augmentation the value of the flow increases by at least 1. Hence the
running time is polynomial in the number of nodes and the value of the
flow (which is clearly upper-bounded by the total capacity on edges
leaving s). But if we allow large capacities written in binary, this
is not necessarily polynomial time in the description length of the
input (it is what is called a pseudopolynomial time algorithm). We saw
a simple graph with two disjoint paths of length two with capacity
10000 edges and a capacity 1 link between the middle nodes of these
paths. The algorithm can take 20000 iterations to compute the max flow
of value 20000 (with an increase of 1 per augmentation, using the two
zig-zag paths in turn).
In your problem set, you will prove that by picking augmenting paths
that are the shortest, one can get a polynomial time algorithm, and in
fact a strongly polynomial time algorithm where the number of
augmentations depends only on the number of edges and vertices and not
on the capacities.
CORRECTNESS:
Algorithm by design finds a legal flow. Why is it maximum?
For this we turn to how one can provide an upper bound on the maximum
flow (so we know when we have the best possible flow). This is done
via "cuts", which identify bottlenecks in the network that impede lot
of flow going from s to t.
Definition: an s-t cut (A,B) is a partition of the vertex set into
disjoint sets A and B such that s belongs to A and t belongs to B.
The edges of the cut are then all edges going from A to B, and the
capacity of the cut, denoted c(A,B), is the total capacity on all
edges of the cut.
Definition: For a flow f and a subset A containing s, let f^{out}(A)
be the total flow on edges leaving A, and let f^{in}(A) be the total
flow on edges entering A.
We have the following lemma.
Lemma: The value of any flow, v(f), satisfies v(f) = f^{out}(A) -
f^{in}(A).
Proof: v(f) = f^{out}(s) = f^{out}(s) - f^{in}(s) = sum_{v in A}
f^{out}(v) - f^{in(v). Now in the last sum, the f(e) terms for edges e
with both endpoints in A cancel out. The outgoing edges have f(e)
appearing, and incoming edges have -f(e) appearing. It follows that
this sum equals f^{out}(A) - f^{in}(A), as claimed.
Corollary: If f is any flow and (A,B) any s-t cut, then v(f) <=
c(A,B). (That is, the max s-t flow <= min s-t cut)
Therefore, to prove that the flow f output by the algorithm is
optimal, we only need to exhibit a cut (A,B) such that v(f) = c(A,B),
and this is exactly what we do below.
Lemma: The flow output by the Ford-Fulkerson algorithm is a maximum flow.
Proof: Let f be the flow output by the algorithm. Let us look at the
final residual graph G_f. This graph must have s and t disconnected
by definition of the algorithm. Let A be all nodes reachable from s
in G_f and B = rest. Now, we know we can't route flow greater than
than c(A,B). The claim is that we in fact *did* find a flow of value
c(A,B) (which therefore implies it is maximum). Here's why.
(i) each edge e=(u,v) from A to B carries a a flow f(e) =
c(e). Indeed, if not, e will be present in G_f and thus v
should have been in A.
(ii) each edge e=(x,y) from B to A carries zero flow. Indeed, if
not there will be a backedge from y to x in G_f and hence x
should have been in A.
It follows that f^{out}(A) = c(A,B) and f^{in}(A) = 0. By the above
lemma, v(f) = f^{out}(A) - f^{in}(A) = c(A,B). We're done.
Note, we've actually proven the non-obvious MAXFLOW-MINCUT theorem.
In any flow network, max flow from s to t = capacity of minimum (s,t)-cut.
We started with saying max flow <= min cut. But now we've argued that
this algorithm finds a flow of value equal to SOME cut, so it can't be
less than the MINIMUM.
We have also proven the INTEGRAL-FLOW THEOREM: if all capacities are
integers, then there is a max flow in which all flows are integers
(and the Ford-Fulkerson algorithm finds such an integral flow).
This integrality is often very useful; we will make good use of it right
away for the maximum matching problem.
Bipartite matching problem
--------------------------
One nice use of network flow is to solve the bipartite matching
problem. A *matching* is a set of edges with no endpoints in
common. What we want here is a *perfect matching* - a matching that
connects every point on the left hand side with some point on the
right hand side. More generally (say there is no perfect matching) we
want a *maximum matching* - a matching with maximum possible number of
edges.
Here is a simple flow based algorithm to solve Maximum matching on
bipartite graph H=(X,Y,E):
1. Set up fake source node s connected to all in X. Connect all in Y to
a fake sink node t. Orient all edges left-to-right and give each a
capacity of 1.
2. Find a max flow from s to t using Ford-Fulkerson.
3. Output matching corresponding to flow (edges in the matching are
precisely those edges from X to Y that carry a unit of flow)
Running time: O(mn)
*******************************************************************
Minimum Cost Maximum Flow:
~~~~~~~~~~~~~~~~~~~~~~~~~
We saw that the min cost Max s-t flow problem can be reduced to a a
min cost circulation problem by adding an infinite capacity edge from
t to s with a very large negative cost. The incentive to push as much
flow as possible on this negative cost edge, together with flow
conservation, implies we can find a min cost max s-t flow using the
min cost circulation in the new network.
Recall in class we gave an algorithm for min cost circulation based on
repeatedly finding negative-cost cycles in the residual graph, and
then pushing flow on it till some edge on it is saturated. This is
called the cycle canceling algorithm. Here is the argument for
optimality of cycle canceling algorithm for min-cost circulation:
We only need to prove:
Lemma: A circulation f is optimal if and only if G_f has no negative
cost cycles.
Proof: One direction is obvious: if G_f has a negative cost cycle, we
can circulate some positive flow on it. This will a new circulation f'
with strictly smaller cost than f. Therefore, f is not optimal.
For the other direction, we need to show that if f is not optimal,
then G_f has a negative cost cycle. Let f^* be a flow of lower cost
than f (it exists since f is not optimal).
What we want to say is that f^*-f is a circulation in G_f, but we need
to be a little more precise about this. Let c_f denote the capacity
function in G_f. We define a flow F in the residual graph G_f
as follows:
* if f^*(e)-f(e) >= 0, F(e) = f^*(e) - f(e).
[Note that 0 <= F(e) <= c(e) - f(e) = c_f(e)]
* if f^*(e) -f(e) < 0, then we have f(e) > 0, and so G_f has a
backedge e' of capacity f(e), and we set F(e') = f(e) - f^*(e).
[Note that 0 < F(e') <= f(e) = c_f(e)]
The notes in brackets above show that F obeys capacity constraints in
G_f. F also obeys flow conservation since both f^* and f do,
(everything is nice and linear).
Therefore, F is a legal circulation in G_f. Also, since f^* had lower cost
than f, f^*-f and therefore F has negative cost.
Now, we claim "any circulation with negative cost (such as our F
above) must route positive flow along some cycle of negative
cost". Note that this would imply that G_f has a negative cost cycle
(since F only routes flow on edges in G_f), and this is exactly what
we wanted to show.
So, why is the claim true? One way prove this is by induction on the
number of flow carrying edges. The base case is 2 edges -- in this
case the circulation must route flow on a cycle of length 2 (i.e., an
edge and the reverse edge) and this cycle must have negative cost.
Now, given a circulation F, we can start at an arbitrary flow carrying
edge, and continue walking on the network, at each step using a flow
carrying edge out of the current node (by flow conservation, if
positive flow entered a node, there must be at least one outgoing edge
with positive flow). This process must eventually close a cycle, say
C. If C has negative cost, we are done finding a negative cost
cycle. Otherwise, we can reduce the flow on each edge of C by the
minimum value of flow on edges in C. This gives a new circulation F'
of cost at most that of F (and hence F' also has negative
cost). Moreover, F' has at least one less flow carrying edge (namely
the edge(s) in C with minimum flow) compared to F. By induction, F'
must route positive flow along some negative cost cycle, and of course
F too routes positive flow along this same cycle.