Vertex Sparsifiers: New Results from Old Techniques

Given a capacitated graph $G = (V,E)$ and a set of terminals $K \subseteq V$, how should we produce a graph $H$ only on the terminals $K$ so that every (multicommodity) flow between the terminals in $G$ could be supported in $H$ with low congestion, and vice versa? (Such a graph $H$ is called a flow-sparsifier for $G$.) What if we want $H$ to be a"simple"graph? What if we allow $H$ to be a convex combination of simple graphs? Improving on results of Moitra [FOCS 2009] and Leighton and Moitra [STOC 2010], we give efficient algorithms for constructing: (a) a flow-sparsifier $H$ that maintains congestion up to a factor of $O(\log k/\log \log k)$, where $k = |K|$, (b) a convex combination of trees over the terminals $K$ that maintains congestion up to a factor of $O(\log k)$, and (c) for a planar graph $G$, a convex combination of planar graphs that maintains congestion up to a constant factor. This requires us to give a new algorithm for the 0-extension problem, the first one in which the preimages of each terminal are connected in $G$. Moreover, this result extends to minor-closed families of graphs. Our improved bounds immediately imply improved approximation guarantees for several terminal-based cut and ordering problems.


Introduction
Given an undirected capacitated graph G = (V, E) and a set of terminal nodes K ⊆ V, we consider the question of producing a graph H only on the terminals K so that the congestion incurred on G and H for any multicommodity flow routed between terminal nodes is similar. Often, we will want the graph H to be structurally "simpler" than G as well. Such a graph H will be called a flow-sparsifier for G; the loss (also known as quality) of the flowsparsifier is the factor by which the congestions in the graphs G and H differ. For instance, when K = V, the results of [Räc08] give a convex combination of trees H with a loss of O(log n). We call this a tree-based flowsparsifier-it uses a convex combination of trees. 1 Here and throughout, k = |K| denotes the number of terminals, and n = |V| the size of the graph.
For the case where K V, it was shown by Moitra [Moi09] and by Leighton and Moitra [LM10] that for every G and K, there exists a flow-sparsifier H = (K, E H ) whose loss is O( log k log log k ), and moreover, one can efficiently find an H ′ = (K, E H ′ ) whose loss is O( log 2 k log log k ). They used these to give approximation algorithms for several terminal-based problems, where the approximation factor depended poly-logarithmically on the number of terminals k, and not on n. We note that they construct an arbitrary graph on K, and do not attempt to directly obtain "simple" graphs; e.g., to get tree-based flow-sparsifiers on K, they apply [Räc08] to H ′ , and increase the loss by an O(log k) factor.
In this paper, we simplify and unify some of these results: we show that using the general framework of interchanging distance-preserving mappings and capacity-preserving mappings from [Räc08] (which was reinterpreted in an abstract setting by Andersen and Feige [AF09]), we obtain the following improvements over the results of [Moi09,LM10]. 2 1. We show that using the 0-extension results [CKR04,FHRT03] in the framework of [Räc08,AF09] almost immediately gives us efficent constructions of flow-sparsifiers with loss O( log k log log k ). While the existential result of [LM10] also used the connection between 0-extensions and flow sparsifiers, the algorithmicallyefficient version of the result was done ab initio, increasing the loss by another O(log k) factor. We use existing machinery, thereby simplifying the exposition somewhat, and avoiding the increased loss.
embedding of [GNRS04], this gives an alternate proof of the fact that the metric induced on the vertices of a single face of a planar graph can be embedded into a distribution over trees [LS09].
4. The results for planar graphs are in fact much more general. Suppose G is a β G -decomposable graph (see definition in Section 1.1). Then we can efficiently output a distribution over graphs H f = (K, E f ) such that these are all minors of G, and the expected stretch max u,v∈V E[d H f ( f (u), f (v))] /d G (u,v) is bounded by O(β G log β G ). Now applying the same ideas of interchanging distance and capacity preservation, given any G and K, we can find minor-based flow sparsifiers with loss O(β G log β G ).
5. Finally, we show some lower bounds on flow-sparsifiers: we show that flow-sparsifiers that are 0-extensions of the original graph must have loss at least Ω( log k) in the worst-case. For this class of possible flow sparsifiers, this improves on the Ω(log log k) lower bound for sparsifiers proved in [LM10]. We also show that any flow-sparsifier that only uses edge capacities which are bounded from below by a constant, must suffer a loss of Ω( log k/ log log k) in the worst-case.
We can use these results to improve the approximation ratios of several application problems. In many cases, constructions based on trees allow us to use better algorithms. Our results are summarized in Table 1. Note that apart from the two linear-arrangement problems, our results smoothly approach the best known results for the case k = n.  Many of these applications further improve when the graph comes from a minor-closed family (and hence has good β-decompositions): e.g., for the Steiner Minimum Linear Arrangement problem on planar graphs, we can get an O(log log k)-approximation by using our minor-based flow-sparsifiers to reduce the problem to planar instances on the k terminals. Finally, in Appendix C we show how to get better approximations for the Steiner linear arrangement problems above using direct LP/SDP approaches.

Notation
Our graphs will have edge lengths or capacities; all edge-lengths will be denoted by ℓ : E → R ≥0 , and edge costs/capacities will be denoted by c : E → R ≥0 . When we refer to a graph (G, ℓ), we mean a graph G with edge-lengths ℓ(·); similarly (H, c) denotes one with capacities c(·). When there is potential for confusion, we will add subscripts (e.g., c H (·) or ℓ G (·)) for disambiguation. Given a graph (G, ℓ), the shortest-path distances under the edge lengths ℓ is denoted by Given a graph G = (V, E) and a subset of vertices K ⊆ V designated as terminals, a retraction is a map f : V → K such that f (x) = x for all x ∈ K. For (G, c) and terminals K ⊆ V, a K-flow in G is a multicommodity flow whose sources and sinks lie in K.
Decomposition of Metrics. Let (X, d) be a metric space with terminals K ⊂ X. A partition (i.e., a set of disjoint "clusters") P of X is called ∆-bounded if every cluster S ∈ P satisfies max u,v∈S d(u, v) ≤ ∆. The metric (X, d) with terminals K is called β-decomposable if for every ∆ > 0 there is polynomial time algorithm to sample from a probability distribution µ over partitions of X, with the following properties: • Diameter bound: Every partition P ∈ supp(µ) is ∆-bounded.
• Separation event: β-decompositions of metrics have become standard tools with many applications; for more information see, e.g., [LN05].
We say that a graph G = (V, E) is β-decomposable if for every nonnegative edge-lengths ℓ G , the resulting shortestpath metric d G is β-decomposable. Additionally, we assume that each cluster S in any partition P induces a connected subgraph of G; if not, break such a cluster into its connected components. The diameter bound and separation probabilities for edges remain unchanged by this operation; the separation probability for non-adjacent pairs (u, v) can be bounded by β · d(u, v)/∆ by noting that some edge on the u-v shortest path must be separated for (u, v) to be separated, and applying the union bound.

0-Extensions
In this section we provide a definition of 0-extension which is somewhat different than the standard definition, and review some known results for 0-extensions. We also derive in Corollary 2.4 a variation of a known result on tree embeddings, which will be applied in Section 3.
A 0-extension of graph (G = (V, E), ℓ G ) with terminals K ⊆ V is usually defined as a retraction f : V → K. We define a 0-extension to be a retraction f : V → K along with another graph (H = (K, E H ), ℓ H ); here, the length function ℓ H : E H → R + is defined as ℓ H (x, y) = d G (x, y) for every edge (x, y) ∈ E H . Note that this immediately implies d H (x, y) ≥ d G (x, y) for all x, y ∈ K. Note also that H f defined in Section 1 is a special case of H in which whereas, in general, H is allowed more flexibility (e.g., H can be a tree). This flexibility is precisely the reason we are interested both in the retraction f and in the graph H-we will often want H to be structurally simpler than G (just like we want a flow-sparsifier to be simpler than the original graph).
For a (randomized) algorithm A that takes as input (G, ℓ G ) and outputs a (random) 0-extension (H, ℓ H ), the stretch factor of algorithm A is the minimum α ≥ 1 such that The following are well-known results for 0-extension.
In particular, if the graph G belongs to a non-trivial family of graphs that is minor-closed, it follows from [KPR93,FT03] that α = O(1).

0-Extension With Trees
The following result is an extension of the tree-embedding theorem of Fakcharoenphol et al. [FRT04], where the difference is that the following result ensures the non-contracting property (a) only for terminal-terminal pairs, but replaces the O(log n) by O(log k) in the expected stretch between any pair of nodes.

Theorem 2.3 ([GNR10])
There is a randomized polynomial-time algorithm that takes as input a graph G = (V, E) with terminals K ⊆ V and outputs a (random) edge-weighted 2-HST T = (I ∪ L, E T ) with internal nodes I and leaves L, and a map f : V → L such that y) for all x, y ∈ V, and (c) for each non-terminal v ∈ V \ K, either there exists a terminal x v sharing the leaf node with it (i.e., Corollary 2.4 (Tree 0-extension) There is a randomized polynomial-time algorithm A GNR for 0-extension that has α GNR = O(log k); furthermore, the graphs output by the algorithm are trees on the vertex set K.
Proof: To prove the corollary, we need to show an algorithm that takes as input a graph G = (V, E) with terminals K ⊆ V and outputs a (random) edge-weighted tree T = (K, E) and a retraction f : We start with sampling a random tree T ′ = (I ∪ L, E ′ ) and associated map f from the distribution of Theorem 2.3. We can take any leaf l ∈ L whose pre-image set only contains non-terminals, remove the leaf, and remap all v ∈ f −1 (l) to some other leaf that is a descendent of l's parent node and also contains a terminal. (Such a leaf is guaranteed to exist by property (c) of Theorem 2.3.) While both the tree and the map change, we continue to call the modified tree T ′ and the map f . We repeat this process until all leaves in the modified tree T ′ contain at least one terminal. We can also assume that all terminals are at non-zero distance from each other (else we can remove some terminals, do the same proof, and add back in the terminals at the end)-now property (a) implies each leaf contains at most one terminal. Hence f | K is a 1-1 correspondence between the terminal set K and the remaining leaves in the tree T ′ . Since the tree T ′ is a 2-HST, the distances in the tree between a remapped non-terminal and any other node in T ′ (apart from the one it was identified with) do not change.
We can now remove all internal nodes in the modified version of T ′ (using, say, [Gup01]) to get a tree T ′′ = (L, E ′′ ) on just the (erstwhile) leaves such that none of the f (u)-f (v) distances are shrunk, and they are stretched by a factor of at most 8. The bijection between the set L and terminals K allows us to view the tree T ′′ as being on the node set K, and the map f as being a retraction from V → K. Finally, shrinking the edges of the tree T ′′ only makes the expected stretch smaller, so we can reduce the length of any tree edge e = (x, y) in T ′′ and set it equal to d G (x, y). Call this final tree T ; it is immediate from properties (a) and (b) that this random T and the associated retraction f : V → K satisfy properties (a') and (b') above, where the big-Oh term in property (b') hides an extra stretch of 8 due to this post-processing.  y) for all x, y ∈ V(H). Combining these two results proves the weaker claim.

Flow-Sparsifiers
Recall that given an edge-capacitated graph (G, c) and a set K ⊆ V of terminals, a flow-sparsifier with quality ρ is another capacitated graph (H = (K, E H ), c H ) such that (a) any feasible K-flow in G can be feasibly routed in H, and (b) any feasible K-flow in H can be routed in G with congestion ρ.

Interchanging distance and capacity
We now use the framework of Räcke [Räc08], as interpreted by Andersen and Feige [AF09]. Given a graph G = (V, E), let P be a collection of multisets of E, which will henceforth be called paths. A mapping M : E → P maps each edge e to a path M(e) in P. Such a map can be represented as a matrix M in Z |E|×|E| where M e,e ′ is the number of times the edge e ′ appears in the path (multiset) M(e). Given a collection M of mappings (which we call the admissible mappings), a probabilistic mapping is a probability distribution over (or, convex combination of) admissible mappings; i.e., define λ M ≥ 0 for each M ∈ M such that M∈M λ M = 1.
• The stretch of a probabilistic mapping is the maximum over all edges of their average stretch.

For every collection of edge capacities c e , there is a probabilistic mapping with congestion at most ρ.
In our settings, the techniques of Räcke [Räc08] can be used to make the result algorithmic: if one can efficiently sample from the probabilistic mapping with stretch ρ (which is true for the settings in this paper), one can efficiently sample from a probabilistic mapping with congestion O(ρ) (and vice versa). In fact, one can obtain an explicit distribution on polynomially many admissible mappings. We defer further discussion of efficiency issues to the full version of the paper.

Tree-Based Flow Sparsifiers
The distance mappings we will consider will be similar to Räcke's application. Let us first fix for each (3.1) In other words, this maps each tree edge (w, x) to its canonical path; for each non-tree edge (w, x), it considers the edges on the tree-path between the images of w and x in the tree, and maps (w, x) to the disjoint union of the canonical paths of these edges. Recall that M T ((w, x)) is a multiset. In the corresponding matrix representation, M e,e ′ is the multiplicity of e ′ in the set ⊎ (u,v)∈P T ( f (w), f (x)) S uv . Corollary 2.4 now implies the following: Theorem 3.2 Given a graph (G, ℓ) with terminals K ⊆ V(G), there is a polynomial-time procedure to sample from a probabilistic mapping (which is a distribution over tree 0-extensions) with stretch Now we can apply the Transfer Theorem. Recall that in a K-flow, all source-sink pairs belong to set K.

Theorem 3.3 (Tree-Based Flow-Sparsifiers) Given an edge-capacitated graph (G, c), and a set of terminals
Since c T, f (e T ) is the sum of the capacity of all edges e = (w, x) such that e T lies on the unique tree-path between However, this is exactly the expected load for e ′ under the notion of admissible maps defined in (3.1); hence this is bounded by the congestion (the maximum expected load over all edges), which is at most ρ dist by Theorem 3.1. This proves condition (b) above, that the congestion to route any K-flow in the convex combination H in the graph G is at most ρ dist .

General Flow Sparsifiers Theorem 3.4 (Flow-Sparsifiers) Given any graph G and terminals K, there is a randomized polynomial-time algorithm to output a flow-sparsifier H with loss O(
log k log log k ).
Proof: Suppose we use Theorem 2.1 instead of using the tree 0-extension result (Corollary 2.4), we use the constructive version of the Transfer Theorem to get a polynomial number of graphs H 1 , H 2 , . . . on the vertex set K such that a convex combination of these graphs is a flow-sparsifier for the original graph G where the load is O( log k log log k ). We can then construct a single graph H by setting the capacity of an edge to be the appropriate weighted combination of capacities of those edges in H i ; all feasible K-flows in G can be routed in H, and all feasible K-flows in H can be routed in G with congestion O( log k log log k ). The same idea using 0-extension results for β-decomposable graphs (Theorem 2.2) gives us the following:

Theorem 3.5 (Flow-Sparsifiers for Minor-Closed Families) For any graph G that is β-decomposable and any K, there is a randomized polynomial-time algorithm to construct a flow-sparsifier with loss O(β).
Note that the decomposability holds if G belongs to a non-trivial minor-closed-family G (e.g., if G is planar). However, Theorem 3.5 does not claim that the flow-sparsifier for G also belongs to the family G; this is the question we resolve in the next section.

Connected 0-Extensions and Minor-Based Flow-Sparsifiers
The results in this section apply to β-decomposable graphs. A prominent example of such graphs are planar graphs, which (along with every family of graphs excluding a fixed minor) are O(1)-decomposable [KPR93,FT03]. Thus, Theorem 4.1, Corollary 3.3 and Theorem 4.3 below all apply to planar graphs (and more generally to excluded-minor graphs) with β = O(1). We now state our results for β-decomposable graphs in general. In Section 4.2 we define a related notion called terminal-decomposability, and show analogous results forβterminal-decomposable graphs.
In what follows we use the definition of 0-extension from Section 2 with H = H f , i.e., E H = {( f (u), f (v)) : (u, v) ∈ E}, hence the 0-extension is completely defined by the retraction f . We say that a 0-extension f is connected if for every x, f −1 (x) induces a connected component in G. Our main result shows that we get connected 0-extensions with stretch O(β log β) for β-decomposable metrics.
Theorem 4.1 (Connected 0-Extension) There is a randomized polynomial-time algorithm that, given (G = (V, E), ℓ G ) with terminals K such that d G is β-decomposable, produces a connected 0-extension f : V → K such that for all u, v ∈ V, we have Note that if f is a connected 0-extension, the graph H f is a minor of G. Applying Theorem 3.1 to interchange the distance preservation with capacity preservation, we get the following analogue of Theorem 3.3. Since planar graphs are O(1)-decomposable and since their minors are planar, by Corollary 4.2 they have an efficiently constructable planar-based flow-sparsifier with quality O(1). By Theorem 4.1, they always have a connected 0-extension with stretch at most O(1). An interesting consequence of the latter result is that given any planar graph (G, ℓ G ), and a set K of terminals, we can "remove" the non-terminals and get a related planar graph on K while preserving inter-terminal distances in expectation. This generalizes a result of Gupta [Gup01] who showed a similar result for trees. (Obviously, this extends to every family of graphs excluding a fixed minor.)

Theorem 4.3 (Steiner Points Removal)
There is a randomized polynomial-time algorithm that, given (G = (V, E), ℓ G ) and K such that d G is β-decomposable, outputs minors H = (K, E H Note that these results only give us an O(log n log log n)-approximation for connected 0-extension on arbitrary graphs (or an O(log 2 k log log k)-approximation using results of Section 4.2). We can improve that to O(log k); the details are in Section 4.3.

Theorem 4.4 (Connected CKR)
There is a randomized polynomial-time algorithm that on input (G = (V, E), ℓ G ) and K, produces a connected 0-extension f with stretch factor Using the semi-metric relaxation for 0-extension, we get a connected 0-extension whose cost is at most O(log k) times the optimal (possibly disconnected) 0-extension. To our knowledge, this is the first approximation algorithm for connected 0-extension, and in fact shows that the gap between the optimum connected 0-extension and the optimum 0-extension is bounded by O(log k). The same is true with an O(1) bound for planar graphs. We remark that the connected 0-extension problem is a special case of the connected metric labeling problem, which has recently received attention in the vision community [VKR08,NL09].

The Algorithm for Decomposable Metrics
We now give the algorithm behind Theorem 4.1. Assume that edge lengths ℓ G are integral and scaled such that the shortest edge is of length 1. Let the diameter of the metric be at most 2 δ . For each vertex v ∈ V, define A v = min x∈K d G (v, x) to be the distance to the closest terminal. The algorithm maintains a partial mapping f at each point in time-some of the f (v)'s may be undefined (denoted by f (v) = ⊥) during the run, but f is a well-defined 0-extension when the algorithm terminates. We say a vertex v ∈ V is mapped if f (v) ⊥. The algorithm appears as Algorithm 1.
Algorithm 1 Algorithm for Connected 0-extension 1: input: sample a β-decomposition of d G with diameter bound r i to get a partition P 6: for all clusters C s in the partition P that contains both mapped and unmapped vertices do We can assume that in round δ = log diam(G), the partitioning algorithm returns a single cluster, in which case all vertices are mapped and the algorithm terminates. Let f i be the mapping at the end of iteration i. For x ∈ K, let V x i denote f −1 i (x), the set of nodes colored x. The following claim follows inductively: Lemma 4.5 For every iteration i and x ∈ K, the set V x i induces a connected component in G.

Proof:
We prove the claim inductively. For i = 0, there is nothing to prove since V x i = {x}. Suppose that in iteration i, we map vertex u to x so that u ∈ V x i . Thus for some component C containing u, the mapped neighbor w C chosen by the algorithm was in V x i−1 . Since we map all of C to x, there is a path connecting v to w C in V x i . Inductively, w C is connected to x in V x i−1 ⊆ V x i , and the claim follows. The following lemma will be useful in the analysis of the stretch; it says that any node mapped in iteration i is mapped to a terminal at distance O(2 i ).

Lemma 4.6 For every iteration i and x ∈ K, and every u
Proof: The proof is inductive. For i = 0, the claim is immediate. Suppose that in iteration i, we map vertex u to x so that u ∈ V x i . Thus for some component C containing u, the mapped neighbor w C chosen by the algorithm was in V x i−1 . Moreover, u and w C were in the same cluster in the decomposition so that d(u, w C ) ≤ r i . Inductively, d(w C , x) ≤ 2r i−1 and the claim follows by triangle inequality.
In the rest of the section, we bound the stretch of the 0-extension; for every edge e = (u, v) of G, we show that , and so it's enough to prove the claim for d G . The analogous claim for non-adjacent pairs will follow by triangle inequality, but here with d H . We say that the edge e = (u, v) is settled in round j if the later of its endpoints gets mapped in this round; e is untouched after round j if both u and v are unmapped at the end of round j. Let d G (u, K) ≤ d G (v, K) and let A e denote the distance d G (u, K). Let j e := ⌊log(A e )⌋ − 1.

Lemma 4.7 For edge e = (u, v), (a) edge e is untouched after round j e − 1, (b) if edge e is settled in round j then d G
Proof: For (a), if one of the end points of e is mapped before round j e , then 2 · 2 j e ≤ A e = d G (e, K), which contradicts Lemma 4.6. For (b), both d G (u, f (u)), d G (v, f (v)) ≤ 2 j+1 by Lemma 4.6; the triangle inequality completes the proof.
Let B j denote the "bad" event that the edge is settled in round j and that both end-points are mapped to different terminals. Let z := max{A e , d G (u, v)}. We want to use

Claim 4.8 Pr[B
Proof: Recall that an edge is untouched after round j ′ if neither of its endpoints is mapped at the end of this round. For this to happen, u must be separated from its closest terminal in the clustering in round j ′ , which happens with probability at most min{β A e 2 j ′ , 1}. Also recall that the probability that an edge e = (u, v) is cut in a round j ′ is at most β d G (u,v) 2 j ′ . Let i denote the round in which the edge is first touched. We upper bound the probability of the event B j separately depending on how i and j compare. Note that for j ≤ 2, the right hand side is at least 1 so the claim holds trivially.
• i ≤ j − 2. For B j to occur, the edge e must be cut in round j − 2 and j − 1, as otherwise it would already be settled in one of these rounds. The probability of this is at most • i = j − 1. For B j to occur, the edge e must be cut in round j − 1 and must be untouched after round j − 2.

The probability of this is at most min{β
For B j to occur, e must be cut in round j and must be untouched after round j − 1. The probability of this is at most min{β A e , the claim follows.
Lemma 4.7(b) implies that if the edge is settled before round j d := ⌊log(d G (u, v))⌋, the conditional expectation u, v)). Moreover the edge e cannot be settled before round j e = ⌊log(A e )⌋ − 1 by Lemma 4.7(a). Let j m := max{ j d , j e }. It therefore suffices to to show that j≥ j m Plugging in the upper bound for Pr[B j ] into the left hand side, we get In the last step, we used that z = max{A e , d G (u, v)} ≤ max{2 j e +2 , 2 j d +1 } ≤ 2 j m +2 , so the first O(log β) terms contribute O(β d G (u, v)), while the remaining terms form a geometric series and sum to O(d G (u, v)). This completes the proof of Theorem 4.1.

Terminal Decompositions
The general theorem for connected 0-extensions gives a guarantee in terms of its decomposition parameter β, and in general this quantity may depend on n. This seems wasteful, since we decompose the entire metric while we mostly care about separating the terminals.
To this end, we define terminal decompositions (the reader might find it useful to contrast it with definition of decompositions in Section 1.1). A partial partition of a set X is a collection of disjoint subsets (called "clusters" of X). A metric (X, d) with terminals K is calledβ-terminal-decomposable if for every ∆ > 0 there is probability distribution µ over partial partitions of X, with the following properties: • Diameter bound: Every partial partition P ∈ supp(µ) is connected and ∆-bounded.
• Terminal partition: For all x ∈ K, every partial partition P ∈ supp(µ) has a cluster containing x.
• Terminal-centered clusters: For every partial partition P ∈ supp(µ), every cluster S ∈ P contains a terminal.
A graph G = (V, E) with terminals K isβ-terminal-decomposable if for every nonnegative lengths ℓ G assigned to its edges, the resulting shortest-path metric d G with terminals K isβ-terminal-decomposable. Throughout, we assume that there is a polynomial time algorithm that, given the metric, terminals and ∆ as input, samples a partial partition P ∈ µ. Note that if K = V, the above definitions coincide with the definitions of β-decomposable metrics and graphs.
Our main theorem for terminal decomposable metrics is the following: Theorem 4.9 Given (G = (V, E), ℓ G ), suppose d G isβ-terminal-decomposable with respect to terminals K. There is a randomized polynomial-time algorithm that produces a connected 0-extension f : This theorem is interesting whenβ is much less than β, the decomposability of the metric itself. E.g., one can alter the CKR decomposition scheme to getβ(k, n) = O(log k), while β = O(log n).

The Modified Algorithm.
Algorithm 2 for the terminal-decomposable case is very similar to Algorithm 1: the main difference is that in each iteration we only obtain a partial partition of the vertices, we color only the nodes that lie in clusters of this partial partition.
A few words about the algorithm: recall that a partial partition returns a set of connected diameter-bounded clusters such that each cluster contains at least one terminal, and each terminal is in exactly one cluster-we use V x to denote the cluster containing x ∈ K. (Hence either V x = V y or V x ∩ V y = ∅.) Now when we delete all the vertices in some cluster V x that are already mapped, this includes the terminal x-and hence there is at least one candidate for w C in Line 9. Eventually, there will be only one cluster, in which case all vertices are mapped and the algorithm terminates.
Algorithm 2 Algorithm for Connected 0-extension: the terminal-decomposable case 1: input: find aβ-terminal-decomposition of d G with diameter bound r i ; let V x be the cluster containing terminal x. The analysis for Theorem 4.9 is almost the same as for Theorem 4.1; the only difference is that Claim 4.8 is replaced by the following weaker claim which immediately gives the O(β 2 logβ) bound.

Claim 4.10 Pr[B
Proof: Recall that an edge is untouched after round j ′ if neither of its endpoints is mapped at the end of this round. For this to happen, u must be separated from it's closest terminal in the clustering in round j ′ , which happens with probability at most min{β A e 2 j ′ , 1}. Also recall that the probability that an edge e = (u, v) is cut in a round j ′ is at mostβ d(u,v) 2 j ′ . Let i denote the round in which the edge is first touched. We upper bound the probability of the event B j separately depending on how i and j compare. Note that for j ≤ 3, the right hand side is at least 1 so the claim holds trivially.
• i ≤ j − 3. For B j to occur, it must happen that the edge is cut in round i and it is either untouched or cut in rounds j − 1 and j − 2. The probability for this to happen is at most min{β d(u,v) (u, v). Otherwise, observe that i ≥ j e as the edge cannot be touched before. Hence 2 i ≥ A e /4, and plugging this in gives a bound of min{8β z 2 j , 1} · 16β 2 d(u,v) 2 j , as well. • i = j − 2. For B j to occur, the edge e must be cut in round j − 2 and it must be cut or untouched in round j − 1, as otherwise it would already be settled in one of these rounds. The probability of this is at most

Future Directions
We gave a set of results on and around the idea of flow-sparsifiers and 0-extensions. Some of these results are not tight, and it would be interesting to obtain better bounds for these problems. Another interesting direction for future work is this: define an ℓ-sparse-extension of graph G = (V, E) with terminals K to be any graph y) for all x, y ∈ Z. (Note that a |K|-sparse-extension is just a 0-extension; one possible |V|-sparse-extension is G itself.) What if we consider ℓ-sparse-extensions (H, f ) with where ideally ℓ = poly(k), and α = O(1) (or just α ≪ log k log log k )? In other words, if we are willing to retain a small number of non-terminals, can we achieve better stretch bounds? Note that standard lower bounds for 0-extension have the property that |V| = poly(k)-hence the entire graph G is a "good" solution (poly(k)-sparse-extension with α = 1).

A Lower Bounds
In this section, we show two kinds of lower bounds. The first shows that any flow-sparsifier that is a convex combination of 0-extensions must suffer a loss of Ω( log k)-for such extension, this improves on the Ω(log log n) lower bound for (arbitrary) flow-sparsifiers [LM10]. The second shows that any flow-sparsifier that only uses edge capacities which are bounded from below by a constant, must suffer a loss of Ω( log k/ log log k).

A.1 Lower Bounds for 0-extension-based Sparsifiers
The following result can be viewed as following from the duality between 0-extensions and 0-extension-based flow-sparsifiers (Theorem 3.1); by that theorem, not only do good 0-extension algorithms give good 0-extensionbased flow-sparsifiers, the converse would also be true-and hence one can use a lower bound of Calinescu et al. [CKR04] to infer lower bounds on 0-extension-based flow-sparsifiers. The following theorem gives the explicit construction obtained thus.

Theorem A.1 For infinitely many values of k, there is a graph G = (V, E) and a set K ⊂ V of size k for which any flow-sparsifier that is a convex combination of 0-extension graphs has quality at least Ω( log k).
Proof: We use the lower bound of Ω( log k) on the 0-extension integrality ratio by Calinescu et al. [CKR04]. For completeness we describe their construction: Let G be an expander with n vertices, maximum degree ∆ and expansion at least α, where ∆ and α are fixed parameters. Define l = log n and k = n l . Choose any k distinct vertices h 1 , . . . .h k ∈ V (G) and add k new paths of length l starting at these vertices and ending at new vertices labeled 1, . . . , k. Denote the resulting graph by G ′ (note that |V (G ′ )| = O (n) and |E (G ′ )| = O (n)), and let the terminals K be the new vertices {1, . . . , k}. Set the capacities and lengths of the edges to 1. The distance dist G ′ (s, t) between terminals s, t is set to be the shortest path distance in G ′ between s, t.
For the described instance G ′ , K of the 0-extension problem, Calinescu et al. show a semimetric whose cost is |E (G ′ )|, namely the shortest path metric dist G ′ in G ′ with respect to edge lengths of 1. Indeed, On the other hand, they show that there exists a universal c > 0 such that for any 0-extension function f : We use the instance G ′ , K to construct a feasible solution to the dual LP for which the objective value β is at least Ω log k . To construct the feasible solution, we need to specify the values of the dual variables ℓ (e) for e ∈ E (G ′ ), and dist (s, t) for s, t ∈ K. We set ℓ (e) to be 1 |E(G ′ )| . Thus, e∈E(G ′ ) cap G ′ (e) ℓ (e) = 1. We set dist (s, t) to be the shortest path distance between terminals s, t in G ′ with respect to edge lengths ℓ (e). Clearly the specified values of ℓ (e) , dist (s, t) form a feasible solution to the dual LP for some value of β. By the above proof of Calinescu et al., for any 0-extension function f : Thus, we have show there exists a feasible solution to the dual LP such that β > Ω( log k). This implies that for any convex combination of 0-extensions H = λ i H i , the minimum congestion of routing H in G ′ is at least Ω( log k), completing the proof.

A.2 Lower Bounds for Sparsifiers having no Small Edges
Theorem A.2 For infinitely many values of k, there is a graph G = (V, E) and a terminal set K ⊂ V of size k for which any flow-sparsifier with edge capacities at least ε > 0 has quality at least Ω(ε log k/ log log k).
Proof: Let n be a sufficiently large prime. Let G = (V, E) be a graph whose nodes correspond to the elements of Z n and that contains an edge {u, v} if v = u + 1, v = u − 1, or v = u −1 (all operations are w.r.t. Z n and we define 0 −1 as 0.) In other words the graph consists of a Hamiltonian cycle plus some additional edges. This graph G is a 3-regular expander (see, e.g., [HLW06]).
Choose the set of terminals K as {i · ⌈ log n⌉ | 0 ≤ i ≤ k − 1}, with k = n/⌈ log n⌉. To simplify notation, we will omit floor-and ceiling-operations in the following. For i ∈ [0, k − 1], let B i be the set of the log n nodes on the Hamiltonian cycle between terminal i and i + 1, including i but excluding i + 1.
Let H = (K, E H ) be a flow sparsifier for G with edge capacities at least ε > 0. Let d be the maximum weight degree of H, where the weighted degree of a node is the sum over all capacities of incident edges.

Claim A.3 The maximum weighted degree d of H is at least
log n log log n for some constant c ′ .
Proof: Consider a demand of 1/k between all pairs of terminals.
Since the minimum edge capacity is at least ε, the unweighted degree of H is at most d/ε. Due to this bounded degree, for sufficiently large k, there are at least k 2 /4 terminal pairs that have distance at least log k/(2 log(d/ε)) from each other (see e.g. [CKR04,Lemma 4.2]).
Each of these pairs induces a load of 1/k on at least log k/(2 log(d/ε)) edges. Therefore, the total load in the network is at least k log k/(8 log(d/ε)). Since H has at most k · d/(2ε) edges, the congestion in H is at least ε log k/(4d log(d/ε)).
The same demand can be routed with congestion at most (c+1) log n in G, for some constant c depending on the edge expansion of G. Say each terminal i sends a total flow of 1. We can distribute this flow evenly between the nodes in B i using only edges inside of B i and with congestion of at most 1. This can easily be done, since we can send this flow along the Hamiltonian cycle to reach every node in B i . Now, we route a uniform multicommodity flow on the whole expander, where the flow leaving each node is 1/ log n, i.e., the demand between every pair of nodes is 1/(n log n). This requires congestion at most c log n · (1/ log n) = c log n [LR99]. Finally, the flow in each B i is routed inside B i to the respective terminal. Again, this can easily be done with congestion 1. In total, we sent a flow of 1/k between all pairs of terminals and the congestion is bounded by c log n+2 ≤ (c+1) log n.
Hence, we identified a demand, that requires congestion at least ε log k/(4d log(d/ε)) in H but can be routed with congestion at most (c + 1) log n in G. Since H is a flow sparsifier, its congestion has to be bounded by the congestion in G and thus, ε log k/(4d log(d/ε)) ≤ (c + 1) log n. It follows that Using the fact that k = n/ log n, the claim follows.
Now pick a node in H that as weighted degree at least c ′ · ε · log n/ log log n (such a nodes exists due to Claim A.3). Consider the situation in which the demand between this node and every other node corresponds to the capacity of the edge connecting them in H, and all other demands are 0. Clearly, in H this can be routed with congestion 1. The terminal in G corresponding to node u, however, has only degree 3. Therefore, routing this demand in G results in congestion at least c ′ · ε · log n/(3 log log n) ≥ c ′ · ε · log k/(3 log log k), since that is the load on at least one of the outgoing edges of u.

B Applications
Most of these applications were considered by Moitra [Moi09], and Leighton and Moitra [LM10]; we show how our results above give improved approximations to the problems.

B.1 Steiner Oblivious Routing
Theorem 3.3 is an exact analogue of Räcke's theorem on general flows [Räc08] for the special case of K-flows, and hence immediately gives an O(log k)-oblivious routing scheme for K-flows.

B.2 Steiner Minimum Linear Arrangement
Given G = (V, E) and K ⊆ V with |K| = k, the goal in the Steiner Minimum Linear Arrangement (SMLA) problem is to find a mapping F : V → [k] such that F| K : K → [k] is a bijection. The goal is to minimize Note that for the non-Steiner MLA case where K = V, Rao and Richa [RR98] gave an O(log n)-approximation for general graphs and an O(log log n)-approximation for graphs that admit O(1)-padded decompositions (which includes the family of all trees).
For our algorithm, we take a random tree/retraction pair (T, f ) from the distribution above; this ensures that the cost of the optimal map F * (viewed as a solution to the MLA problem on T ) increases by an expected O(log k)factor. Now solving the MLA problem on the tree to within an O(log log k) factor to get a map F T : K → [k], and defining F(x) = F T ( f (x)) gives us an expected O(log k log log k)-approximation. We show in Appendix C that this can be improved slightly to O(log k) using a more direct approach.

B.3 Requirement Cut
For requirement cut, [GNR10] already present an O(log k log g)-approximation.

B.4 Steiner Graph Bisection
In this problem, we are given a value k ′ and want to find a bipartition (A, V \ A) of the graph such that |A ∩ K| = k ′ , and that minimizes the cost of edges cut by the bipartition. The approach of Räcke, which embeds the graph G into a random tree and finds the best (k ′ , k − k ′ ) bipartition on that, gives us an O(log k) algorithm for this partitioning problem.

B.5 Steiner ℓ-Multicut
In this problem, we are given terminal pairs {s i , t i } i∈ [k] , and a value k ′ ≤ k, and we want to find a minimum cost set of edges whose deletion separates at least k ′ terminal pairs. Again, we can embed the graph into a random tree losing an O(log k) factor, and use the theorem of Golovin et al. [GNS06] to get a 4/3 + ǫ-approximation on this tree; this gives us the randomized O(log k)-approximation.

B.6 Steiner Min-Cut Linear Arrangement
The Steiner Min cut Linear Arrangement (SMCLA) problem is defined as follows: Given G = (V, E) and K ⊆ V with |K| = k, we want to find a mapping F : V → [k] such that F| K : K → [k] is a bijection. The goal is to minimize max i x∈F −1 ([i]),y F −1 ([i]) c xy . For the non-Steiner version of the problem, Leighton and Rao [LR99] show that given an α-approximation to the balanced partitioning (or to the bisection) problem, one can get an O(α log n)-approximation to the MCLA problem. Using [ARV09], this gives an O(log 1.5 n)-approximation to the MCLA problem.
We note that the reduction works immediately for the Steiner version of the problem: given an α-approximation to Steiner-bisection, one gets an O(α log k)-approximation to SMCLA. Thus we get an O(log 2 k)-approximation to the SMCLA problem. We show in Appendix C that this can be improved to O(log 1.5 k) using a more direct approach.

C Better Algorithms Using a Direct Approach
The vertex sparsifiers give a modular approach to solving steiner version of various problems. Not surprisingly, for some of these problems, a direct attack will lead to better algorithms. In this section, we show that applying known techniques for Minimum Linear Arrangement (MLA) problem lead to a better approximation ratio for Steiner MLA, and for Steiner Minimum Cut Linear Arrangement.

C.1 Steiner Minimum Linear Arrangement
Recall that the Steiner MLA problems is defined as follows. Given G = (V, E) and K ⊆ V with |K| = k, the goal is to find a mapping F : V → [k] such that F| K : K → [k] is a bijection. The goal is to minimize It follows from [ENRS00] that the above is a valid linear programming relaxation to the SMLA problem, and that one can efficiently separate for the spreading constraints so that the LP can be solved in polynomial time using the Ellipsoid algorithm. Further, it is easy to check that the spreading constraints imply that for any u ∈ K, |B d (u, r) ∩ K| ≤ 5r. (Here, B d (v, r) = {w | d(v, w) ≤ r} is the "ball" around v of radius r in the metric d.) Let d be a solution to the above linear program. Since d is a metric on V, it follows from Theorem 2.3 that we construct a (random) edge-weighted 2-HST T = (I ∪ K, E T ) with internal nodes I and leaves K, and a retraction f : V → K such that (a) d T ( f (x), f (y)) ≥ d(x, y) for all x, y ∈ K (with probability 1), We argue that given this HST, we can construct a mapping F T : V → [k] such that F T | K : K → [k] is a bijection. This mapping will have the property that |F T (u) − F T (v)| ≤ 5d T ( f (u), f (v)). The approximation ratio of O(log k) then follows from property (b) above.
The mapping F T is defined by taking the natural left-to-right ordering on K defined by T , and assigning every other vertex v ∈ V to the position f (v). Formally, let π be a pre-order traversal of T . For every terminal x ∈ K, set F T (x) to the number of terminals in π that occur before x, i.e. F T (x) = |K ∩ {π i : i ≤ π −1 (x)}|. For every other vertex u ∈ V, set F T (u) = F T ( f (u)). It is easy to check that F T | K is a bijection.
We next upper bound |F T (u) − F T (v)| for u, v ∈ V. Consider the terminals t u = f (u), t v = f (v); if t u = t v , then F T (x) = F T (y) and there is nothing to prove. Else let T uv be the smallest subtree of T containing t u and t v . By the properties of the HST, we have d T (t u , t v ) ≥ d T (t u , z) for all z ∈ T xy . Moreover, d T (u, v) = d T (t u , t v ). Now, (By property (a)) ≤ 5d T (t u , t v ) (By the spreading property) This proves Theorem C.1

C.2 Steiner Min Cut Linear Arrangement
Recall that the Steiner Min cut Linear Arrangement (SMCLA) problem is defined as follows. Given G = (V, E) and K ⊆ V with |K| = k, the goal is to find a mapping F : V → [k] such that F| K : K → [k] is a bijection. The goal is to minimize max i x∈F −1 ([i]),y F −1 ([i]) c xy . Specifically, we show the following result: Theorem C.2 There is a polynomial time O(log 1.5 k)-approximation algorithm for the SMCLA problem.