--------------------------------------------------------------------- CS 577 (Intro to Algorithms) Lec 18 (11/07/06) Shuchi Chawla --------------------------------------------------------------------- Today: Min-cost max-flow, min-cost circulation; Randomized algorithms Min-cost max-flow ================= Last time we discussed the min-cost bipartite maximum matching problem: we are given a bipartite graph with costs on edges; of all the matchings of maximum size in this graph, we want to find the one with the minimum cost. This problem can be reduced to the min-cost max-flow problem using the usual reduction from bipartite matching to flow. In the min-cost max-flow we are given a graph G with source s and sink t. Edges in the graph have a capacity c(e) as well as a cost w(e) associated with them. We want to find a maximum flow in this network, but if there are more than one max flows, we want to find the one with the minimum cost. How should we go about solving this? One approach is to modify the Ford-Fulkerson algorithm to take costs into account. Let us recall the FF algorithm: 1. Initialize flow to zero; construct residual graph G' 2. Repeat until done a. Find a path from s to t in G' b. Add flow along this path c. Update graph G' We mainly need to modify step 2a, but first we need to incorporate costs into the residual graph. This is simple: if an edge e=(u,v) in G has cost w(e), we assign the forward edge (u,v) in G' a cost of w(e) and the backward edge (v,u) in G' a cost of -w(e). Then, just modify step 2a to always pick the minimum cost from s to t. That's all! How do we find the minimum cost path from s to t? Just treat the costs as lengths and use Bellman-Ford to find the shortest path from s to t. Note that the residual graph G' includes negative cost edges, so we cannot use Dijkstras to find the shortest path. The running time of this algorithm is O(Fmn), where F is the size of the maximum flow. This can be sped up by using smarter ways of picking s-t paths, but that is beyond the scope of this course. --------------------------------------------------------------------- Min-cost Circulation ==================== Let us now look at an extension of the min-cost max-flow problem. Suppose that instead of a flow from some s to some t, we just wanted a flow that was balanced everywhere, that is, there is no source or sink, but the flow just circulates through the network. This is called the min-cost circulation problem. Exercise: How would you reduce the min-cost max-flow problem to the min-cost circulation problem? Note that we don't have a max-flow requirement any more. If the network has no negative cost cycles, then any circulation has a strictly positive cost, so the optimal solution sends no flow at all. On the other hand, if the network has negative cost cycles, then it helps to saturate those cycles completely. This suggests the following algorithm for solving this problem (due to Klein): just run Ford-Fulkerson and in step 2, pick any negative cost cycle in the residual graph G' and saturate it with flow. Why is this algorithm correct? Well, if there is any negative cost cycle remaining in the residual graph in the end, then clearly we can further reduce the cost of our circulation by sending more flow along this cycle. On the other hand, suppose that our algorithm finds a suboptimal flow f, while the min-cost flow is f*. Then f*-f is a circulation which is composed of negative cost cycles (think about why this is the case), which contradicts the fact that our algorithm ended with flow f. In order to complete the description of this algorithm, we must give an algorithm for finding negative cost cycles. One way of doing this is using the Bellman-Ford algorithm. Recall that Bellman-Ford finds shortest paths between all pairs of nodes assuming that there are no negative cost cycles in the graph. What happens to the algorithm when the graph has negative cost cycles? Let us recall Bellman-Ford in more detail. In the i-th iteration of the algorithm, we construct shortest paths with at most i hops from every node to every other node, using paths with fewer hops computed in previous iterations. When the graph does not contain negative cost cycles, this process converges after n-1 iterations and we obtain all pairs shortest paths. If the graph does contain negative cost cycles, owing to the cycle, the cost of some paths keeps decreasing beyond the n-th iteration. This is because a path containing such a negative cost cycle can become shorter and shorter by going over the cycle again and again. How do we use this to find the negative cost cycle? Recall that Bellman-Ford also keeps track, for every pair (u,v), of the next hop on the shortest path from u to v starting from u. We just consider a pair (u,v) whose shortest path length decreases after the n-th iteration. We follow the hops from u towards v until we find the negative cost cycle. This modification of Bellman-Ford has the same time complexity O(mn) as the original algorithm. Combining this with the F-F algorithm, we get an O(mnF) time algorithm for the min-cost circulation problem. --------------------------------------------------------------------- Randomized Algorithms ===================== We will now look at a new technique for algorithm design -- the use of randomness or coin tosses. Randomized algorithms are similar to non-random or "deterministic" algorithms, except that some times they toss coins and decide what to do next based on the outcome of those coins. One randomized algorithm that perhaps most of you are familiar with is Quicksort. We will sutdy this algorithm in detail in the next lecture. How does randomness help? - Sometimes it helps us save time or other resources. - Sometimes it makes algorithm design simpler, while providing the same time/space guarantees as some deterministic algorithm. - Sometimes it provides us additional properties such as privacy. Let us look at some examples of these. Example #1: Comparing numbers ============================= Alice and Bob have an n-bit number each -- A for alice and B for Bob. They want to find out whether the two numbers are the same. However, they are communicating through telegrams and each bit costs money to send. So they want to minimize the number of bits they send to each other. Note that they can fix a protocol for what bits to send before playing this game; deciding upon a protocol doesn't incur any cost. What is the best way for them to decide whether their numbers are the same or not? It turns out that if they use a deterministic algorithm and send fewer than n bits through the telegram, there is always a pair of numbers A and B on which they will get the wrong answer. This means that if they don't use randomization, they must communicate at least n bits. Can we use randomness to help here? Indeed we can! Here is the idea: Suppose that they pick some arbitrary number x between (say) 1 and 10, and compare A mod x to B mod x. If A and B are the same numbers, the numbers A mod x and B mod x will also be the same, and they will get the correct answer. However, if A and B are different, then what is the chance that A mod x = B mod x? This can only be the case if x divides A-B. Assuming that A-B has few factors in the range 1 to 10, if we pick x randomly, there is only a small chance that we will pick one of the factors of A-B. But how does this help us? Well, now we are sending across only log 10 bits instead of n bits (because both A mod x and B mod x are numbers smaller than 10). Note that we are trading off correctness with time. By sending fewer bits, we may some times get the wrong answer, but this will only happen with a certain small probability. Let us make this protocol more precise. Since we are talking about factors, let us pick x to be a prime number. Furthermore, in order to make the failure probability small, let us pick a range larger than 1 to 10. In particular, we will pick x uniformly at random from all prime numbers in the range 1 to n^2. (The phrase "uniformly at random" means that we will pick each prime number with the same probability.) Now, A-B is an n bit number, and so, A-B < 2^n. This implies that A-B can have at most n prime factors (do you see why?). On the other hand, the prime number theorem says that there are around O(n^2/log n) primes in the range 1 to n^2. This means that at most a (log n)/n fraction of prime numbers between 1 and n^2 divide A-B. So the probability that we pick x to be one of these factors is at most (log n)/n. In other words, the protocol will fail with probability (log n)/n, but with probability 1 - (log n)/n it will return the correct answer. (Note that (log n)/n is extremely small for large n.) The number of bits we send across is log (n^2) = O(log n). --------------------------------------------------------------------- The message to take away from this example is that often you can trade-off the correctness of an algorithm for running time. This is one of the guiding principles behind the design of randomized algorithms. We will see two kinds of randomized algorithms in this course: 1. Algorithms that always have a small running time but output the wrong answer with a small probability. 2. Algorithms that always output the correct answer but have a large running time with a small probability. In the latter case, we will bound the "average" or expected running time of the algorithm. We will now look at an example of the second kind. Example #2: Contention resolution ================================= See section 13.1 of the book. ---------------------------------------------------------------------