Abstract

Variational problems that involve Wasserstein distances and more generally optimal transport (OT) theory are playing an increasingly important role in data sciences. Such problems can be used to form an exemplar measure out of various probability measures, as in the Wasserstein barycenter problem, or to carry out parametric inference and density fitting, where the loss is measured in terms of an optimal transport cost to the measure of observations. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. Entropic regularization has recently emerged as an efficient tool to approximate the solution of such variational Wasserstein problems. In this paper, we give a thorough duality tour of these regularization techniques. In particular, we show how important concepts from classical OT such as $c$-transforms and semidiscrete approaches translate into similar ideas in a regularized setting. These dual formulations lead to smooth variational problems, which can be solved using smooth, differentiable, and convex optimization problems that are simpler to implement and numerically more stable than their unregularized counterparts. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spatial regularization functionals.

Keywords

  1. optimal transport
  2. Wasserstein barycenter
  3. Sinkhorn algorithm
  4. entropic regularization
  5. gradient flows
  6. convex optimization

MSC codes

  1. 49N99
  2. 49J20
  3. 65D99
  4. 68U10
  5. 90C08
  6. 90C25

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
I. Abraham, R. Abraham, M. Bergounioux, and G. Carlier, Tomographic reconstruction from a few views: A multi-marginal optimal transport approach, Appl. Math. Optim., 75 (2017), pp. 55--73.
2.
M. Agueh and G. Carlier, Barycenters in the Wasserstein space, SIAM J. Math. Anal., 43 (2011), pp. 904--924, https://doi.org/10.1137/100805741.
3.
Z. Allen-Zhu, Y. Li, R. Oliveira, and A. Wigderson, Much Faster Algorithms for Matrix Scaling, arXiv preprint, https://arxiv.org/abs/1704.02315, 2017.
4.
L. Ambrosio, N. Gigli, and G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures, Springer, New York, 2006.
5.
F. Aurenhammer, Power diagrams: Properties, algorithms and applications, SIAM J. Comput., 16 (1987), pp. 78--96, https://doi.org/10.1137/0216006.
6.
F. Aurenhammer, F. Hoffmann, and B. Aronov, Minkowski-type theorems and least-squares clustering, Algorithmica, 20 (1998), pp. 61--76.
7.
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), pp. 1705--1749.
8.
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2011.
9.
A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183--202, https://doi.org/10.1137/080716542.
10.
J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, Iterative Bregman projections for regularized transportation problems, SIAM J. Sci. Comput., 37 (2015), pp. A1111--A1138, https://doi.org/10.1137/141000439.
11.
J. Bigot and T. Klein, Consistent Estimation of a Population Barycenter in the Wasserstein Space, preprint, https://arxiv.org/abs/1212.2562v1, 2012.
12.
N. Bonneel, G. Peyré, and M. Cuturi, Wasserstein barycentric coordinates: Histogram regression using optimal transport, ACM Trans. Graphics, 35 (2016), art. 71.
13.
N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, Sliced and radon Wasserstein barycenters of measures, J. Math. Imaging Vision, 51 (2015), pp. 22--45.
14.
N. Bonneel, M. van de Panne, S. Paris, and W. Heidrich, Displacement interpolation using Lagrangian mass transport, ACM Trans. Graphics (SIGGRAPH ASIA'11), 30 (2011), art. 158.
15.
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
16.
M. Burger, M. Franeka, and C.-B. Schonlieb, Regularised regression and density estimation based on optimal transport, Appl. Math. Res. Express, 2 (2012), pp. 209--253.
17.
G. Carlier, A. Oberman, and E. Oudet, Numerical methods for matching for teams and Wasserstein barycenters, ESAIM Math. Model. Numer. Anal., 49 (2015), pp. 1621--1642.
18.
G. Carlier and C. Poon, On the total variation Wasserstein gradient flow and the TV-JKO scheme, ESAIM Control Optim. Calc. Var., to appear, https://doi.org/10.1051/cocv/2018042.
19.
V. Caselles, A. Chambolle, and M. Novaga, The discontinuity set of solutions of the TV denoising problem and some extensions, Multiscale Model. Simul., 6 (2007), pp. 879--894, https://doi.org/10.1137/070683003.
20.
E. Cazelles, J. Bigot, and N. Papadakis, Regularized barycenters in the Wasserstein space, in Geometric Science of Information, F. Nielsen and F. Barbaresco, eds., Springer International, Cham, 2017, pp. 83--90.
21.
T. M. Chan, Optimal output-sensitive convex hull algorithms in two and three dimensions, Discrete Comput. Geom., 16 (1996), pp. 361--368.
22.
S. Claici, E. Chien, and J. Solomon, Stochastic Wasserstein barycenters, in ICML'18, 2018.
23.
M. B. Cohen, A. Madry, D. Tsipras, and A. Vladu, Matrix scaling and balancing via box constrained Newton's method and interior point methods, in Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS), IEEE, 2017, pp. 902--913.
24.
R. Cominetti and J. San Martín, Asymptotic analysis of the exponential penalty trajectory in linear programming, Math. Programming, 67 (1994), pp. 169--187.
25.
N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, Optimal transport for domain adaptation, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), pp. 1853--1865, https://doi.org/10.1109/TPAMI.2016.2615921.
26.
M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, in Advances in Neural Information Processing Systems 26, 2013, pp. 2292--2300.
27.
M. Cuturi and A. Doucet, Fast computation of Wasserstein barycenters, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 685--693.
28.
M. Cuturi and G. Peyré, A smoothed dual approach for variational Wasserstein problems, SIAM J. Imaging Sci., 9 (2016), pp. 320--343, https://doi.org/10.1137/15M1032600.
29.
F. De Goes, K. Breeden, V. Ostromoukhov, and M. Desbrun, Blue noise through optimal transport, ACM Trans. Graphics, 31 (2012), art. 171.
30.
J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Algebra Appl., 114 (1989), pp. 717--735.
31.
A. Genevay, M. Cuturi, G. Peyré, and F. Bach, Stochastic optimization for large-scale optimal transport, in Advances in Neural Information Processing Systems, 2016, pp. 3440--3448.
32.
U. Gianazza, G. Savaré, and G. Toscani, The Wasserstein gradient flow of the Fisher information and the quantum drift-diffusion equation, Arch. Rational Mech. Anal., 194 (2009), pp. 133--220, https://doi.org/10.1007/s00205-008-0186-5.
33.
R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker--Planck equation, SIAM J. Math. Anal., 29 (1998), pp. 1--17, https://doi.org/10.1137/S0036141096303359.
34.
P. A. Knight and D. Ruiz, A fast algorithm for matrix balancing, IMA J. Numer. Anal., 33 (2013), pp. 1029--1047.
35.
D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788--791.
36.
J. Lellmann, D. A. Lorenz, C. Schönlieb, and T. Valkonen, Imaging with Kantorovich--Rubinstein discrepancy, SIAM J. Imaging Sci., 7 (2014), pp. 2833--2859, https://doi.org/10.1137/140975528.
37.
C. Léonard, A survey of the Schrödinger problem and some of its connections with optimal transport, Discrete Contin. Dyn. Syst., 34 (2014), pp. 1533--1574.
38.
B. Lévy, A numerical algorithm for $l^2$ semi-discrete optimal transport in 3D, ESAIM Math. Model. Numer. Anal., 49 (2015), pp. 1693--1715.
39.
B. Lévy and E. Schwindt, Notions of optimal transport theory and how to implement them on a computer, Comput. & Graphics, 72 (2018), pp. 135--148.
40.
J. Maas, M. Rumpf, C. Schönlieb, and S. Simon, A generalized model for optimal transport of images including dissipation and density modulation, ESAIM Math. Model. Numer. Anal., 49 (2015), pp. 1745--1769.
41.
Q. Mérigot, A multiscale approach to optimal transport, Comput. Graph. Forum, 30 (2011), pp. 1583--1592.
42.
S. Minsker, S. Srivastava, L. Lin, and D. B. Dunson, Robust and scalable Bayes via a median of subset posterior measures, J. Mach. Learn. Res., 18 (2017), pp. 1--40, http://jmlr.org/papers/v18/16-655.html.
43.
F. Nielsen, Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms, IEEE Signal Process. Lett., 20 (2013), pp. 657--660.
44.
V. Oliker and L. D. Prussner, On the numerical solution of the equation $\frac{\partial^2 z}{\partial x^2} \frac{\partial^2 z}{\partial y^2}-( \frac{\partial^2 z}{\partial x\partial y} )^2=f$ and its discretizations, I, Numer. Math., 54 (1989), pp. 271--293.
45.
F. Otto, The geometry of dissipative evolution equations: The porous medium equation, Comm. Partial Differential Equations, 26 (2001), pp. 101--174.
46.
O. Pele and M. Werman, Fast and robust earth mover's distances, in ICCV'09, 2009.
47.
G. Peyré, Entropic approximation of Wasserstein gradient flows, SIAM J. Imaging Sci., 8 (2015), pp. 2323--2351, https://doi.org/10.1137/15M1010087.
48.
G. Peyré and M. Cuturi, Computational Optimal Transport, preprint, https://arxiv.org/abs/1803.00567, 2018.
49.
J. Rabin and N. Papadakis, Convex color image segmentation with optimal transport distances, in Proc. SSVM'15, 2015.
50.
Y. Rubner, C. Tomasi, and L. Guibas, The earth mover's distance as a metric for image retrieval, Internat. J. Computer Vision, 40 (2000), pp. 99--121.
51.
F. Santambrogio, Optimal Transport for Applied Mathematicians, Birkhäuser, 2015.
52.
B. Schmitzer, Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems, preprint, https://arxiv.org/abs/1610.06519, 2016.
53.
B. Schmitzer and C. Schnorr, Object segmentation by shape matching with Wasserstein modes, in Proc. EMMCVPR'13, Lecture Notes in Comput. Sci. 8081, Springer, Berlin, Heidelberg, 2013, pp. 123--136.
54.
R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann. Math. Statist., 35 (1964), pp. 876--879.
55.
J. Solomon, F. de Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas, Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains, ACM Trans. Graphics (Proc. SIGGRAPH 2015), 34 (2015), art. 66, http://gpeyre.github.io/papers/2015-SIGGRAPH-convolutional-ot.pdf.
56.
J. Solomon, R. Rustamov, L. Guibas, and A. Butscher, Wasserstein propagation for semi-supervised learning, in Proc. ICML 2014, 2014.
57.
M. Sugiyama, H. Nakahara, and K. Tsuda, Tensor balancing on statistical manifold, in Proceedings of ICML, 2017, pp. 3270--3279.
58.
P. Swoboda and C. Schnörr, Convex variational image restoration with histogram priors, SIAM J. Imaging Sci., 6 (2013), pp. 1719--1735, https://doi.org/10.1137/120897535.
59.
C. Villani, Optimal Transport: Old and New, Grundlehren Math. Wiss. 338, Springer-Verlag, 2009.
60.
G.-S. Xia, S. Ferradans, G. Peyré, and J.-F. Aujol, Synthesizing and mixing stationary Gaussian texture models, SIAM J. Imaging Sci., 7 (2014), pp. 476--508, https://doi.org/10.1137/130918010.
61.
G. Zen, E. Ricci, and N. Sebe, Simultaneous ground metric learning and matrix factorization with earth movers distance, in Proc. ICPR'14, 2014.

Information & Authors

Information

Published In

cover image SIAM Review
SIAM Review
Pages: 941 - 965
ISSN (online): 1095-7200

History

Published online: 8 November 2018

Keywords

  1. optimal transport
  2. Wasserstein barycenter
  3. Sinkhorn algorithm
  4. entropic regularization
  5. gradient flows
  6. convex optimization

MSC codes

  1. 49N99
  2. 49J20
  3. 65D99
  4. 68U10
  5. 90C08
  6. 90C25

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media