Abstract

Conditional mean embeddings (CMEs) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces by providing a linear-algebraic relation for the kernel mean embeddings of the respective joint and conditional probability distributions. Both centered and uncentered covariance operators have been used to define CMEs in the existing literature. In this paper, we develop a mathematically rigorous theory for both variants, discuss the merits and problems of each, and significantly weaken the conditions for applicability of CMEs. In the course of this, we demonstrate a beautiful connection to Gaussian conditioning in Hilbert spaces.

Keywords

  1. conditional mean embedding
  2. kernel mean embedding
  3. Gaussian measure
  4. reproducing kernel Hilbert space

MSC codes

  1. 46E22
  2. 62J02
  3. 28C20

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: A Rigorous Theory of Conditional Mean Embeddings

Authors: Ilja Klebanov, Ingmar Schuster, and T. J. Sullivan

File: M130506supplement.pdf

Type: PDF

Contents: The first appendix contains several technical results used in the proofs of the theorems given in the article. The second appendix discusses how the results of the article might be extended to the practically relevant setting of empirical conditional mean embeddings based on sample data, and outlines both mathematical difficulties and possible strategies for overcoming them.

References

1.
M. L. Arias, G. Corach, and M. C. Gonzalez, Generalized inverses and Douglas equations, Proc. Amer. Math. Soc., 136 (2008), pp. 3177--3183, https://doi.org/10.1090/S0002-9939-08-09298-8.
2.
C. R. Baker, Joint measures and cross-covariance operators, Trans. Amer. Math. Soc., 186 (1973), pp. 273--289, https://doi.org/10.2307/1996566.
3.
A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, Springer, Boston, 2004, https://doi.org/10.1007/978-1-4419-9096-9.
4.
G. Corach, A. Maestripieri, and D. Stojanoff, Oblique projections and Schur complements, Acta Sci. Math. (Szeged), 67 (2001), pp. 337--356.
5.
J. Diestel and J. J. Uhl, Vector Measures, Math. Surveys 15, AMS, Providence, RI, 1977, https://doi.org/10.1090/surv/015.
6.
R. G. Douglas, On majorization, factorization, and range inclusion of operators on Hilbert space, Proc. Amer. Math. Soc., 17 (1966), pp. 413--415, https://doi.org/10.2307/2035178.
7.
H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Math. Appl. 375, Kluwer Academic, Dordrecht, 1996.
8.
P. A. Fillmore and J. P. Williams, On operator ranges, Adv. Math., 7 (1971), pp. 254--281, https://doi.org/10.1016/S0001-8708(71)80006-3.
9.
K. Fukumizu, Nonparametric Bayesian inference with kernel mean embedding, in Modern Methodology and Applications in Spatial-Temporal Modeling, Springer Japan, Tokyo, 2015, pp. 1--24, https://doi.org/10.1007/978-4-431-55339-7_1.
10.
K. Fukumizu, F. R. Bach, and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, J. Mach. Learn. Res., 5 (2004), pp. 73--99, http://www.jmlr.org/papers/volume5/fukumizu04a/fukumizu04a.pdf.
11.
K. Fukumizu, F. R. Bach, and M. I. Jordan, Erratum: Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, J. Mach. Learn. Res., (2004), http://www.jmlr.org/papers/volume5/fukumizu04a/fukumizu04a-erratum.pdf.
12.
K. Fukumizu, F. R. Bach, and M. I. Jordan, Kernel dimension reduction in regression, Ann. Statist., 37 (2009), pp. 1871--1905, https://doi.org/10.1214/08-AOS637.
13.
K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf, Kernel measures of conditional dependence, in Advances in Neural Information Processing Systems 20, Curran Associates, 2008, pp. 489--496, https://papers.nips.cc/paper/3340-kernel-measures-of-conditional-dependence.pdf.
14.
K. Fukumizu, L. Song, and A. Gretton, Kernel Bayes' rule: Bayesian inference with positive definite kernels, J. Mach. Learn. Res., 14 (2013), pp. 3753--3783, http://jmlr.org/papers/volume14/fukumizu13a/fukumizu13a.pdf.
15.
I. C. Gohberg and M. G. Kre\u\in, Introduction to the Theory of Linear Nonselfadjoint Operators, Transl. Math. Monogr. 18, AMS, Providence, RI, 1969; translated by A. Feinstein.
16.
S. Grünewälder, G. Lever, L. Baldassarre, S. Patterson, A. Gretton, and M. Pontil, Conditional mean embeddings as regressors, in Proceedings of the 29th International Conference on Machine Learning, Omnipress, Madison, WI, 2012, pp. 1823--1830, https://icml.cc/2012/papers/898.pdf.
17.
O. Kallenberg, Foundations of Modern Probability, Springer, New York, 2006, https://doi.org/10.1007/978-1-4757-4015-8.
18.
H. Owhadi and C. Scovel, Separability of reproducing kernel spaces, Proc. Amer. Math. Soc., 145 (2017), pp. 2131--2138, https://doi.org/10.1090/proc/13354.
19.
H. Owhadi and C. Scovel, Conditioning Gaussian measure on Hilbert space, J. Math. Stat. Anal., 1 (2018), https://arxiv.org/abs/1506.04208.
20.
J. Park and K. Muandet, A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings, preprint, https://arxiv.org/abs/2002.03689, 2020.
21.
S. Saitoh and Y. Sawano, Theory of Reproducing Kernels and Applications, Dev. Math. 44, Springer, Singapore, 2016, https://doi.org/10.1007/978-981-10-0530-5.
22.
V. V. Sazonov, On characteristic functionals, Teor. Veroyatnost. i Primenen., 3 (1958), pp. 201--205.
23.
I. Schuster, M. Mollenhauer, S. Klus, and K. Muandet, Kernel conditional density operators, in Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, Palermo, Sicily, Italy, 2020, Proceedings of Machine Learning Research, 2020, https://arxiv.org/abs/1905.11255.
24.
A. J. Smola, A. Gretton, L. Song, and B. Schölkopf, A Hilbert space embedding for distributions, in Proceedings of the 18th International Conference on Algorithmic Learning Theory, Springer, Berlin, Heidelberg, 2007, pp. 13--31, https://doi.org/10.1007/978-3-540-75225-7_5.
25.
L. Song, B. Boots, S. M. Siddiqi, G. Gordon, and A. Smola, Hilbert space embeddings of hidden Markov models, in Proceedings of the 27th International Conference on Machine Learning, ICML2010, ACM, New York, 2010, pp. 991--998, https://dl.acm.org/citation.cfm?id=3104322.3104448.
26.
L. Song, A. Gretton, and C. Guestrin, Nonparametric tree graphical models, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. 9, Y. W. Teh and M. Titterington, eds., Proceedings of Machine Learning Research, 2010, pp. 765--772, http://proceedings.mlr.press/v9/song10a/song10a.pdf.
27.
L. Song, J. Huang, A. Smola, and K. Fukumizu, Hilbert space embeddings of conditional distributions with applications to dynamical systems, in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, 2009, pp. 961--968, https://doi.org/10.1145/1553374.1553497.
28.
B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 12 (2011), pp. 2389--2410, http://www.jmlr.org/papers/volume12/sriperumbudur11a/sriperumbudur11a.pdf.
29.
B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet, Hilbert space embeddings and metrics on probability measures, J. Mach. Learn. Res., 11 (2010), pp. 1517--1561, http://www.jmlr.org/papers/volume11/sriperumbudur10a/sriperumbudur10a.pdf.
30.
I. Steinwart and A. Christmann, Support Vector Machines, Inf. Sci. Stat., Springer, New York, 2008, https://doi.org/10.1007/978-0-387-77242-4.
31.
I. Steinwart, D. Hush, and C. Scovel, An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels, IEEE Trans. Inform. Theory, 52 (2006), pp. 4635--4643, https://doi.org/10.1109/tit.2006.881713.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 583 - 606
ISSN (online): 2577-0187

History

Submitted: 6 December 2019
Accepted: 1 April 2020
Published online: 13 July 2020

Keywords

  1. conditional mean embedding
  2. kernel mean embedding
  3. Gaussian measure
  4. reproducing kernel Hilbert space

MSC codes

  1. 46E22
  2. 62J02
  3. 28C20

Authors

Affiliations

Funding Information

Deutsche Forschungsgemeinschaft https://doi.org/10.13039/501100001659 : 390685689

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media