This paper investigates the use of methods from partial differential equations and the calculus of variations to study learning problems that are regularized using graph Laplacians. Graph Laplacians are a powerful, flexible method for capturing local and global geometry in many classes of learning problems, and the techniques developed in this paper help to broaden the methodology of studying such problems. In particular, we develop the use of maximum principle arguments to establish asymptotic consistency guarantees within the context of noise corrupted, nonparametric regression with samples living on an unknown manifold embedded in $\mathbb{R}^d$. The maximum principle arguments provide a new technical tool which informs parameter selection by giving concrete error estimates in terms of various regularization parameters. A review of learning algorithms which utilize graph Laplacians, as well as previous developments in the use of differential equation and variational techniques to study those algorithms, is given. In addition, new connections are drawn between Laplacian methods and other machine learning techniques, such as kernel regression and $k$-nearest neighbor methods.


  1. empirical risk minimization
  2. graph Laplacian
  3. discrete to continuum
  4. nonparametric regression

MSC codes

  1. 35J05
  2. 49J55
  3. 60D05
  4. 62G08
  5. 68R10

Get full access to this article

View all available purchase options and get full access to this article.


R. K. Ando and T. Zhang, Learning on graph with Laplacian regularization, in Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS'06, MIT Press, Cambridge, MA, 2006, pp. 25--32, http://dl.acm.org/citation.cfm?id=2976456.2976460.
M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15 (2003), pp. 1373--1396.
M. Belkin and P. Niyogi, Towards a theoretical foundation for Laplacian-based manifold methods, in Proceedings of the 18th Annual Conference on Learning Theory, COLT'05, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 486--500, https://doi.org/10.1007/11503415_33.
M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res., 7 (2006), pp. 2399--2434.
A. Bertozzi, X. Luo, A. Stuart, and K. Zygalakis, Uncertainty quantification in graph-based classification of high dimensional data, SIAM/ASA J. Uncertain. Quantif., 6 (2018), pp. 568--595, https://doi.org/10.1137/17M1134214.
S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities, Oxford University Press, Oxford, 2013, https://doi.org/10.1093/acprof:oso/9780199535255.001.0001,
G. Bourdaud, Le calcul fonctionnel dans les espaces de Sobolev, in Séminaire sur les Équations aux Dérivées Partielles, 1990--1991, École Polytechnic, Palaiseau, 1991, Exp. No. I.
O. Bousquet, O. Chapelle, and M. Hein, Measure based regularization, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2004, pp. 1221--1228.
D. Burago, S. Ivanov, and Y. Kurylev, A graph discretization of the Laplace-Beltrami operator, J. Spectr. Theory, 4 (2014), pp. 675--714, https://doi.org/10.4171/JST/83.
J. Calder, The game theoretic p-Laplacian and semi-supervised learning with few labels, Nonlinearity, 32 (2019), pp. 301--330.
J. Calder and D. Slepčev, Properly-weighted graph Laplacian for semi-supervised learning, Appl. Math. Optim., to appear.
V. Caselles, A. Chambolle, and M. Novaga, Total variation in imaging, in Handbook of Mathematical Methods in Imaging. Vol. 1, 2, 3, Springer, New York, 2015, pp. 1455--1499.
K. Chaudhuri and S. Dasgupta, Rates of convergence for nearest neighbor classification, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 3437--3445, http://papers.nips.cc/paper/5439-rates-of-convergence-for-nearest-neighbor-classification.pdf.
P. G. Ciarlet and P.-A. Raviart, Maximum principle and uniform convergence for the finite element method, Comput. Methods Appl. Mech. Engrg., 2 (1973), pp. 17--31, https://doi.org/10.1016/0045-7825(73)90019-4.
R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), pp. 5--30.
R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proc. Natl. Acad. Sci. USA, 102 (2005), pp. 7426--7431, https://doi.org/10.1073/pnas.0500334102.
O. Delalleau, Y. Bengio, and N. Le Roux, Efficient non-parametric function induction in semi-supervised learning, in Semi-Supervised Learning, Robert G. Cowell and Zoubin Ghahramani, eds., in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Barbados, 2005, Society for Artificial Intelligence and Statistics, 2005, pp. 96--103; available online at http://www.gatsby.ucl.ac.uk/aistats/.
L. Devroye, L. Györfi, A. Krzyżak, and G. Lugosi, On the strong universal consistency of nearest neighbor regression function estimates, Ann. Statist., 22 (1994), pp. 1371--1385, https://doi.org/10.1214/aos/1176325633.
L. Devroye, L. Györfi, G. Lugosi, and H. Walk, On the measure of Voronoi cells, J. Appl. Probab., 54 (2017), pp. 394--408, https://doi.org/10.1017/jpr.2017.7.
M. P. do Carmo, Riemannian Geometry, Math. Theory Appl., translated from the second Portuguese edition by Francis Flaherty, Birkhäuser Boston, Boston, 1992.
M. Dunlop, D. Slepčev, A. Stuart, and M. Thorpe, Large data and zero noise limits of graph-based semi-supervised learning algorithms, Appl. Comput. Harmon. Anal., 49 (2020), pp. 655--697.
A. El Alaoui, X. Cheng, A. Ramdas, M. J. Wainwright, and M. I. Jordan, Asymptotic behavior of $\ell_p$-based Laplacian regularization in semi-supervised learning, in Proceedings of the 29th Annual Conference on Learning Theory, 2016, pp. 879--906.
L. C. Evans, Partial Differential Equations, 2nd ed., Grad. Stud. Math. 19, American Mathematical Society, Providence, RI, 2010, https://doi.org/10.1090/gsm/019.
A. Gadde, A. Anis, and A. Ortega, Active semi-supervised learning using sampling theory for graph signals, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2014, pp. 492--501.
N. García Trillos and R. Murray, A new analytical approach to consistency and overfitting in regularized empirical risk minimization, European J. Appl. Math., 28 (2017), pp. 886--921, https://doi.org/10.1017/S0956792517000201.
N. García Trillos and D. Sanz-Alonso, Continuum limits of posteriors in graph Bayesian inverse problems, SIAM J. Math. Anal., 50 (2018), pp. 4020--4040, https://doi.org/10.1137/17M1138005.
N. García Trillos and D. Slepčev, Continuum limit of total variation on point clouds, Arch. Ration. Mech. Anal., 220 (2015), pp. 1--49, https://doi.org/10.1007/s00205-015-0929-z.
N. García Trillos, D. Slepčev, J. von Brecht, T. Laurent, and X. Bresson, Consistency of Cheeger and ratio graph cuts, J. Mach. Learn. Res., 17 (2016), 181.
N. García Trillos, M. Gerlach, M. Hein, and D. Slepčev, Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace--Beltrami operator, Found. Comput. Math., 20 (2020), pp. 827--887.
D. Gilbarg and N. Trudinger, Elliptic Partial Differential Equations of Second Order, 2nd ed., Grundlehren Math. Wiss. 224, Springer-Verlag, Berlin, 1983.
E. Giné and V. Koltchinskii, Empirical graph Laplacian approximation of Laplace--Beltrami operators: Large sample results, IMS Lecture Notes Monogr. Ser. 51, Institute of Mathematical Statistics, Beachwood, OH, 2006, pp. 238--259, https://doi.org/10.1214/074921706000000888.
D. F. Gleich and M. W. Mahoney, Using local spectral methods to robustify graph-based learning algorithms, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2015, pp. 359--368.
A. Grigor'yan, Heat Kernel and Analysis on Manifolds, AMS/IP Stud. Adv. Math. 47, American Mathematical Society, Providence, RI, International Press, Boston, 2009.
Q. Gu, T. Zhang, J. Han, and C. H. Ding, Selective labeling via error bound minimization, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2012, pp. 323--331.
L. Györfi, M. Kohler, A. Krzyżak, and H. Walk, A Distribution-free Theory of Nonparametric Regression, Springer Ser. Statist., Springer-Verlag, New York, 2002, https://doi.org/10.1007/b97848.
M. Hein, J.-Y. Audibert, and U. von Luxburg, Graph Laplacians and their convergence on random neighborhood graphs, J. Mach. Learn. Res., 8 (2007), pp. 1325--1368.
M. Hein, J.-Y. Audibert, and U. von Luxburg, From graphs to manifolds---Weak and strong pointwise consistency of graph Laplacians, in Learning Theory: Proceedings of the 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, 2005, Lecture Notes in Comput. Sci. 3559, Springer, Berlin, 2005, pp. 470--485.
A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12 (1970), pp. 55--67.
F. Isaia, On the superposition operator between Sobolev spaces: Well-definedness, continuity, boundedness, and higher-order chain rule, Houston J. Math., 41 (2015), pp. 1277--1294.
T. Joachims, Transductive learning via spectral graph partitioning, in Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 290--297.
A. Kirichenko and H. van Zanten, Estimating a smooth function on a large graph by Bayesian Laplacian regularisation, Electron. J. Statist., 11 (2017), pp. 891--915, https://doi.org/10.1214/17-EJS1253.
S. Kpotufe, k-NN regression adapts to local intrinsic dimension, in Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, eds., Curran Associates, Red Hook, NY, 2011, pp. 729--737, http://papers.nips.cc/paper/4455-k-nn-regression-adapts-to-local-intrinsic-dimension.pdf.
A. V. Little, M. Maggioni, and L. Rosasco, Multiscale geometric methods for data sets I: Multiscale SVD, noise and curvature, Appl. Comput. Harmon. Anal., 43 (2017), pp. 504--567, https://doi.org/https://doi.org/10.1016/j.acha.2015.09.009.
A. Moscovich, A. Jaffe, and N. Boaz, Minimax-optimal semi-supervised regression on unknown manifolds, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, A. Singh and J. Zhu, eds., Proceedings of Machine Learning Research 54, Fort Lauderdale, FL, PMLR, 2017, pp. 933--942, http://proceedings.mlr.press/v54/moscovich17a.html.
E. A. Nadaraya, On estimating regression, Theory Probab. Appl., 9 (1964), pp. 141--142, https://doi.org/10.1137/1109020.
B. Nadler, N. Srebro, and X. Zhou, Semi-supervised learning with the graph Laplacian: The limit of infinite unlabelled data, in Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009, pp. 1330--1338.
A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, MIT Press, Cambridge, MA, 2002, pp. 849--856.
B. Osting, C. D. White, and É. Oudet, Minimal Dirichlet energy partitions for graphs, SIAM J. Sci. Comput., 36 (2014), pp. A1635--A1651, https://doi.org/10.1137/130934568.
M. Renardy and R. C. Rogers, An Introduction to Partial Differential Equations, 2nd ed., Texts Appl. Math. 13, Springer-Verlag, New York, 2004.
K. Rohe, S. Chatterjee, and B. Yu, Spectral clustering and the high-dimensional stochastic blockmodel, Ann. Statist., 39 (2011), pp. 1878--1915, https://doi.org/10.1214/11-AOS887.
D. Romero, M. Ma, and G. B. Giannakis, Kernel-based reconstruction of graph signals, IEEE Trans. Signal Process., 65 (2016), pp. 764--778.
L. Rosasco, S. Villa, S. Mosci, M. Santoro, and A. Verri, Nonparametric sparsity and regularization, J. Mach. Learn. Res., 14 (2013), pp. 1665--1714.
B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002.
Z. Shi, B. Wang, and S. J. Osher, Error estimation of weighted nonlocal Laplacian on random point cloud, Multiscale Model. Simul., submitted.
A. Singer, From graph to manifold Laplacian: The convergence rate, Appl. Comput. Harmon. Anal., 21 (2006), pp. 128--134, https://doi.org/10.1016/j.acha.2006.03.004.
D. Slepčev and M. Thorpe, Analysis of p-Laplacian regularization in semi-supervised learning, SIAM J. Math. Anal., 51 (2019), pp. 2085--2120, https://doi.org/10.1137/17M115222X.
A. J. Smola and R. Kondor, Kernels and regularization on graphs, in Learning Theory and Kernel Machines, Lecture Notes in Comput. Sci. 2777, Springer, Berlin, Heidelberg, 2003, pp. 144--158.
M. Thorpe, S. Park, S. Kolouri, G. K. Rohde, and D. Slepčev, A transportation ${L}^p$ distance for signal analysis, J. Math. Imaging Vision, 59 (2017), pp. 187--210, https://doi.org/10.1007/s10851-017-0726-4.
R. J. Tibshirani, Adaptive piecewise polynomial estimation via trend filtering, Ann. Statist., 42 (2014), pp. 285--323.
A. N. Tikhonov, Regularization of incorrectly posed problems, in Soviet Math. Dokl., 4 (1963), pp. 1624--1627.
N. G. Trillos and D. Slepčev, A variational approach to the consistency of spectral clustering, Appl. Comput. Harmon. Anal., 45 (2018), pp. 239--281.
J. W. Tukey, Nonparametric estimation, III. Statistically equivalent blocks and multivariate tolerance regions--the discontinuous case, Ann. Math. Statist., 19 (1948), pp. 30--39, https://doi.org/10.1214/aoms/1177730287.
S. R. S. Varadhan, On the behavior of the fundamental solution of the heat equation with variable coefficients, Comm. Pure Appl. Math., 20 (1967), pp. 431--455.
A. Venkitaraman, S. Chatterjee, and P. Händel, Kernel Regression for Signals over Graphs, preprint, https://arxiv.org/abs/1706.02191, 2017.
U. von Luxburg, M. Belkin, and O. Bousquet, Consistency of spectral clustering, Ann. Statist., 36 (2008), pp. 555--586, https://doi.org/10.1214/009053607000000640.
G. Wahba, Spline Models for Observational Data, CBMS-NSF Regional Conf. Ser. in Appl. Math. 59, SIAM, Philadelphia, 1990, https://doi.org/10.1137/1.9781611970128.
Y.-X. Wang, J. Sharpnack, A. J. Smola, and R. J. Tibshirani, Trend filtering on graphs, J. Mach. Learn. Res., 17 (2016), pp. 3651--3691.
G. S. Watson, Smooth regression analysis, Sankhyā Ser. A, 26 (1964), pp. 359--372.
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, Learning with local and global consistency, in Proceedings of the 16th International Conference on Neural Information Processing Systems, 2004, pp. 321--328.
X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in Proceedings of the Twentieth International Conference on Machine Learning, ICML'03, AAAI Press, Palo Alto, CA, 2003, pp. 912--919, http://dl.acm.org/citation.cfm?id=3041838.3041953.

Information & Authors


Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 705 - 739
ISSN (online): 2577-0187


Submitted: 20 February 2019
Accepted: 27 May 2020
Published online: 31 August 2020


  1. empirical risk minimization
  2. graph Laplacian
  3. discrete to continuum
  4. nonparametric regression

MSC codes

  1. 35J05
  2. 49J55
  3. 60D05
  4. 62G08
  5. 68R10



Nicolas García Trillos

Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By







Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.