Abstract

Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge and constructs new data by bootstrapping projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and to data augmentation to reduce overfitting.

Keywords

  1. probabilistic learning
  2. data manifold
  3. dynamical systems
  4. resampling
  5. data augmentation

MSC codes

  1. 37M22
  2. 53-08
  3. 53A07
  4. 62F40
  5. 62G09

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: Normal-Bundle Bootstrap

Authors: Ruda Zhang and Roger Ghanem

File: nbb-supplement.pdf

Type: PDF

Contents: Some algorithms and the system of symbols used in this article. Also a detailed comparison of assumptions with a relevant paper, which is not essential for the understanding of the main text.

References

1.
P.-A. Absil, R. Mahony, and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM J. Optim., 16 (2005), pp. 531--547, https://doi.org/10.1137/040605266.
2.
Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013), pp. 1798--1828, https://doi.org/10.1109/TPAMI.2013.50.
3.
M. Betancourt, S. Byrne, S. Livingstone, and M. Girolami, The geometric foundations of Hamiltonian Monte Carlo, Bernoulli, 23 (2017), pp. 2257--2298, https://doi.org/10.3150/16-BEJ810.
4.
A. Bhattacharya and R. Bhattacharya, Nonparametric Inference on Manifolds: With Applications to Shape Spaces, Cambridge University Press, Cambridge, UK, 2012, https://doi.org/10.1017/CBO9781139094764.
5.
C. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006.
6.
M. Brubaker, M. Salzmann, and R. Urtasun, A family of MCMC methods on implicitly defined manifolds, in Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. 22, PMLR, 2012, pp. 161--172, http://proceedings.mlr.press/v22/brubaker12.html.
7.
S. Byrne and M. Girolami, Geodesic Monte Carlo on embedded manifolds, Scand. J. Stat., 40 (2013), pp. 825--845, https://doi.org/10.1111/sjos.12036.
8.
Y.-C. Chen, Solution Manifold and Its Statistical Applications, preprint, https://arxiv.org/abs/2002.05297, 2020.
9.
Y.-C. Chen, C. R. Genovese, S. Ho, and L. Wasserman, Optimal ridge detection using coverage risk, in Advances in Neural Information Processing Systems 28, NeurIPS, San Deigo, CA, 2015, pp. 316--324, http://papers.nips.cc/paper/5996-optimal-ridge-detection-using-coverage-risk.
10.
Y.-C. Chen, C. R. Genovese, R. J. Tibshirani, and L. Wasserman, Nonparametric modal regression, Ann. Statist., 44 (2016), pp. 489--514, https://doi.org/10.1214/15-AOS1373.
11.
Y.-C. Chen, C. R. Genovese, and L. Wasserman, Asymptotic theory for density ridges, Ann. Statist., 43 (2015), pp. 1896--1928, https://doi.org/10.1214/15-AOS1329.
12.
Y. Chikuse, Statistics on Special Manifolds, Lect. Notes Stat. 174, Springer-Verlag, New York, 2003, https://doi.org/10.1007/978-0-387-21540-2.
13.
K.-W. E. Chu, On multiple eigenvalues of matrices depending on several parameters, SIAM J. Numer. Anal., 27 (1990), pp. 1368--1385, https://doi.org/10.1137/0727079.
14.
P. Diaconis, S. Holmes, and M. Shahshahani, Sampling from a manifold, in Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton, Inst. Math. Stat. (IMS) Collect. 10, Inst. Math. Statist., Beachwood, OH, 2013, pp. 102--125, https://doi.org/10.1214/12-IMSCOLL1006.
15.
B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York, 1993.
16.
C. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman, Finding singular features, J. Comput. Graph. Statist., 26 (2017), pp. 598--609, https://doi.org/10.1080/10618600.2016.1260472.
17.
C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman, Nonparametric ridge estimation, Ann. Statist., 42 (2014), pp. 1511--1545, https://doi.org/10.1214/14-AOS1218.
18.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2016, https://mitpress.mit.edu/books/deep-learning.
19.
J. Guckenheimer and P. Holmes, Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, Appl. Math. Sci. 42, Springer, New York, 1983, https://doi.org/10.1007/978-1-4612-1140-2.
20.
T. Hastie and W. Stuetzle, Principal curves, J. Amer. Statist. Assoc., 84 (1989), pp. 502--516, https://doi.org/10.1080/01621459.1989.10478797.
21.
M. W. Hirsch, Differential Topology, Grad. Texts Math. 33, Springer, New York, NY, 1976, https://doi.org/10.1007/978-1-4684-9449-5.
22.
M. C. Irwin, Smooth Dynamical Systems, Academic Press, New York, 1980.
23.
J. M. Lee, Introduction to Smooth Manifolds, Grad. Texts Math. 218, Springer, New York, 2012, https://doi.org/10.1007/978-1-4419-9982-5.
24.
T. Lelievre, M. Rousset, and G. Stoltz, Hybrid Monte Carlo methods for sampling probability measures on submanifolds, Numer. Math., 143 (2019), pp. 379--421, https://doi.org/10.1007/s00211-019-01056-4.
25.
U. Ozertem and D. Erdogmus, Locally defined principal curves and surfaces, J. Mach. Learn. Res., 12 (2011), pp. 1249--1286, http://www.jmlr.org/papers/v12/ozertem11a.html.
26.
V. Patrangenaru and L. Ellingson, Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis, CRC Press, Boca Raton, FL, 2015, https://doi.org/10.1201/b18969.
27.
G. Rudolph and M. Schmidt, Differential Geometry and Mathematical Physics. Part I. Manifolds, Lie Groups and Hamiltonian Systems, Springer, Dordrecht, The Netherlands, 2013, https://doi.org/10.1007/978-94-007-5345-7.
28.
C. Soize and R. Ghanem, Data-driven probability concentration and sampling on manifold, J. Comput. Phys., 321 (2016), pp. 242--258, https://doi.org/10.1016/j.jcp.2016.05.044.
29.
C. Soize and R. Ghanem, Probabilistic Learning on Manifolds, preprint, https://arxiv.org/abs/2002.12653, 2020.
30.
L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), pp. 501--532, https://doi.org/10.1146/annurev-statistics-031017-100045.
31.
C.-F. J. Wu, Jackknife, bootstrap and other resampling methods in regression analysis, Ann. Statist., 14 (1986), pp. 1261--1295, https://doi.org/10.1214/aos/1176350142.
32.
E. Zappa, M. Holmes-Cerfon, and J. Goodman, Monte Carlo on manifolds: Sampling densities and integrating functions, Comm. Pure Appl. Math., 71 (2018), pp. 2609--2647, https://doi.org/10.1002/cpa.21783.
33.
R. Zhang, Newton Retraction as Approximate Geodesics on Submanifolds, preprint, https://arxiv.org/abs/2006.14751, 2020.
34.
R. Zhang, P. Wingo, R. Duran, K. Rose, J. Bauer, and R. Ghanem, Environmental economics and uncertainty: Review and a machine learning outlook, Oxford Research Encyclopedia of Environmental Science, 27 (2020), https://doi.org/10.1093/acrefore/9780199389414.013.572.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 573 - 592
ISSN (online): 2577-0187

History

Submitted: 28 July 2020
Accepted: 12 February 2021
Published online: 4 May 2021

Keywords

  1. probabilistic learning
  2. data manifold
  3. dynamical systems
  4. resampling
  5. data augmentation

MSC codes

  1. 37M22
  2. 53-08
  3. 53A07
  4. 62F40
  5. 62G09

Authors

Affiliations

Funding Information

National Science Foundation https://doi.org/10.13039/100000001 : DMS-1638521

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

On May 28, 2024, our site will enter Read Only mode for a limited time in order to complete a platform upgrade. As a result, the following functions will be temporarily unavailable: registering new user accounts, any updates to existing user accounts, access token activations, and shopping cart transactions. Contact [email protected] with any questions.