Abstract.

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes a diffusion operator and then applies it to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show that diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology, as well as the ambient persistent homology, of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insight into the convergence of diffusion condensation and shows that it provides a link between topological and geometric data analysis.

Keywords

  1. diffusion
  2. time-inhomogeneous process
  3. topological data analysis
  4. persistent homology
  5. hierarchical clustering

MSC codes

  1. 57M50
  2. 57R40
  3. 62R40
  4. 37B25
  5. 68

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Materials

Index of Supplementary Materials
PLEASE NOTE: These supplementary files have not been peer-reviewed.
Title of paper: Time-Inhomogeneous Diffusion Geometry and Topology
Authors: Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, and Smita Krishnaswamy
File: supplementary_material_dc.pdf
Type: PDF
Contents: additional proofs and brief review of relevant topological data analysis.

References

1.
S. A. Barannikov, The framed Morse complex and its invariants, Adv. Soviet Math., 21 (1994), pp. 93–115.
2.
N. Brugnone, A. Gonopolskiy, M. W. Moyle, M. Kuchroo, D. van Dijk, K. R. Moon, D. Colon-Ramos, G. Wolf, M. J. Hirn, and S. Krishnaswamy, Coarse graining of data via inhomogeneous diffusion condensation, in Proceedings of the 2019 IEEE International Conference on Big Data, 2019, pp. 2624–2633.
3.
G. Carlsson, V. de Silva, and D. Morozov, Zigzag persistent homology and real-valued functions, in Proceedings of the Annual Symposium on Computational Geometry, 2009, pp. 247–256.
4.
F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas, and S. Y. Oudot, Proximity of persistence modules and their diagrams, in Proceedings of the Twenty-Fifth Annual Symposium on Computational Geometry, Association for Computing Machinery, 2009, pp. 237–246.
5.
F. Chazal, V. de Silva, and S. Oudot, Persistence stability for geometric complexes, Geom. Dedicata, 173 (2014), pp. 193–214.
6.
F. Chazal, B. Fasy, F. Lecci, B. Michel, A. Rinaldo, and L. Wasserman, Subsampling methods for persistent homology, in Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 2143–2151.
7.
Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal., 17 (1995), pp. 790–799.
8.
D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), pp. 103–120.
9.
R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), pp. 5–30.
10.
P. Diaconis and D. Stroock, Geometric bounds for eigenvalues of Markov chains, Ann. Appl. Probab., (1991), pp. 36–61.
11.
R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.
12.
H. Edelsbrunner, D. Letscher, and A. J. Zomorodian, Topological persistence and simplification, Discrete Comput. Geom., 28 (2002), pp. 511–533.
13.
A. Feragen, F. Lauze, and S. Hauberg, Geodesic exponential kernels: When curvature and linearity conflict, in Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3032–3042.
14.
K. Fukunaga and L. Hostetler, The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Trans. Inform. Theory, 21 (1975), pp. 32–40.
15.
R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc., 45 (2008), pp. 61–75.
16.
J. C. Gower, A comparison of some methods of cluster analysis, Biometrics, 23 (1967), pp. 623–637.
17.
F. Hensel, M. Moor, and B. Rieck, A survey of topological machine learning methods, Frontiers in Artificial Intelligence, 4 (2021).
18.
K. Katoh, MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res., 30 (2002), pp. 3059–3066.
19.
L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Ser. Probab. Statist. 344, John Wiley & Sons, 2009.
20.
M. Kuchroo, M. DiStasio, E. Calapkulu, M. Ige, L. Zhang, A. H. Sheth, M. Menon, Y. Xing, S. Gigante, J. Huang, R. M. Dhodapkar, B. Rieck, G. Wolf, S. Krishnaswamy, and B. P. Hafler, Topological Analysis of Single-Cell Data Reveals Shared Glial Landscape of Macular Degeneration and Neurodegenerative Diseases, preprint, bioRxiv, 2021.
21.
M. Kuchroo, J. Huang, P. Wong, J.-C. Grenier, D. Shung, A. Tong, C. Lucas, J. Klein, D. B. Burkhardt, S. Gigante, A. Godavarthi, B. Rieck, B. Israelow, M. Simonov, T. Mao, J. E. Oh, J. Silva, T. Takahashi, C. D. Odio, A. Casanovas-Massana, J. Fournier, Yale I. M. P. A. C. T. Team, A. Obaid, A. Moore, A. Lu-Culligan, A. Nelson, A. Brito, A. Nunez, A. Martin, A. L. Wyllie, A. Watkins, A. Park, A. Venkataraman, B. Geng, C. Kalinich, C. B. F. Vogels, C. Harden, C. Todeasa, C. Jensen, D. Kim, D. McDonald, D. Shepard, E. Courchaine, E. B. White, E. Song, E. Silva, E. Kudo, G. DeIuliis, H. Wang, H. Rahming, H.-J. Park, I. Matos, I. M. Ott, J. Nouws, J. Valdez, J. Fauver, J. Lim, K.-A. Rose, K. Anastasio, K. Brower, L. Glick, L. Sharma, L. Sewanan, L. Knaggs, M. Minasyan, M. Batsu, M. Tokuyama, M. C. Muenker, M. Petrone, M. Kuang, M. Nakahata, M. Campbell, M. Linehan, M. H. Askenase, M. Simonov, M. Smolgovsky, N. D. Grubaugh, N. Sonnert, N. Naushad, P. Vijayakumar, P. Lu, R. Earnest, R. Martinello, R. Herbst, R. Datta, R. Handoko, S. Bermejo, S. Lapidus, S. Prophet, S. Bickerton, S. Velazquez, S. Mohanty, T. Alpert, T. Rice, W. Schulz, W. Khoury-Hanold, X. Peng, Y. Yang, Y. Cao, Y. Strong, S. Farhadian, C. S. Dela Cruz, A. I. Ko, M. J. Hirn, F. P. Wilson, J. G. Hussin, G. Wolf, A. Iwasaki, and S. Krishnaswamy, Multiscale PHATE identifies multimodal signatures of COVID-19, Nat. Biotechnol., 40 (2022), pp. 681–691.
22.
S. R. Lay, Convex sets and their applications, Courier Corporation, 2007.
23.
M. Lesnick and M. Wright, Interactive Visualization of 2-D Persistence Modules, preprint, https://arxiv.org/abs/1512.00180, 2015.
24.
M. Maggioni and J. M. Murphy, Learning by unsupervised nonlinear diffusion, J. Mach. Learn. Res., 20 (2019), pp. 1–56.
25.
N. F. Marshall and M. J. Hirn, Time coupled diffusion maps, Appl. Comput. Harmon. Anal., 45 (2018), pp. 709–728.
26.
F. Mémoli, On the use of Gromov-Hausdorff distances for shape comparison, in Proceedings of the Eurographics Symposium on Point-Based Graphics, M. Botsch, R. Pajarola, B. Chen, and M. Zwicker, eds., The Eurographics Association, 2007, pp. 81–90.
27.
K. R. Moon, D. van Dijk, Z. Wang, S. Gigante, D. B. Burkhardt, W. S. Chen, K. Yim, A. van den Elzen, M. J. Hirn, R. R. Coifman, N. B. Ivanova, G. Wolf, and S. Krishnaswamy, Visualizing structure and transitions in high-dimensional biological data, Nat. Biotechnol., 37 (2019), pp. 1482–1492.
28.
M. W. Moyle, K. M. Barnes, M. Kuchroo, A. Gonopolskiy, L. H. Duncan, T. Sengupta, L. Shao, M. Guo, A. Santella, R. Christensen, A. Kumar, Y. Wu, K. R. Moon, G. Wolf, S. Krishnaswamy, Z. Bao, H. Shroff, W. A. Mohler, and D. A. Colón-Ramos, Structural and developmental principles of neuropil assembly in C. elegans, Nature, 591 (2021), pp. 99–104.
29.
J. M. Murphy and S. L. Polk, A multiscale environment for learning by diffusion, Appl. Comput. Harmon. Anal., 57 (2022), pp. 58–100, https://doi.org/10.1016/j.acha.2021.11.004.
30.
R. R. Sokal and C. D. Michener, A statistical method for evaluating systematic relationships, Univ. Kansas Sci. Bull., 28 (1958), pp. 1409–1438.
31.
A. D. Szlam, M. Maggioni, and R. R. Coifman, Regularization on graphs with function-adapted diffusion processes, J. Mach. Learn. Res., 9 (2008), pp. 1711–1739.
32.
D. Van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A. J. Carr, C. Burdziak, K. R. Moon, C. L. Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf, and S. Krishnaswamy, Recovering gene interactions from single-cell data using data diffusion, Cell, 174 (2018), pp. 716–729.
33.
L. Vietoris, Über den höheren Zusammenhang kompakter Räume und eine Klasse von zusammenhangstreuen Abbildungen, Math. Ann., 97 (1927), pp. 454–472.
34.
U. Von Luxburg, A tutorial on spectral clustering, Stat. Comput., 17 (2007), pp. 395–416.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 346 - 372
ISSN (online): 2577-0187

History

Submitted: 28 March 2022
Accepted: 6 January 2023
Published online: 22 May 2023

Keywords

  1. diffusion
  2. time-inhomogeneous process
  3. topological data analysis
  4. persistent homology
  5. hierarchical clustering

MSC codes

  1. 57M50
  2. 57R40
  3. 62R40
  4. 37B25
  5. 68

Authors

Affiliations

Guillaume Huguet
Department of Mathematics and Statistics, Université de Montréal, Montréal, QC H3T 1J4, Canada, and Mila - Quebec AI Institute, Montréal, QC H2S 3H1, Canada.
Department of Computer Science and Operations Research, Université de Montréal, Montréal, QC H3T 1J4, Canada, and Mila - Quebec AI Institute, Montréal, QC H2S 3H1, Canada.
Helmholtz Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany, and Technical University of Munich, Arcisstr. 21, 80333 Munich, Germany.
Jessie Huang
Departments of Computer Science and Genetics, Yale University, New Haven, CT 06520 USA.
Manik Kuchroo
Department of Neuroscience, Yale University, New Haven, CT 06520 USA.
Departments of CMSE and Mathematics, Michigan State University, East Lansing, MI 48824 USA.
Corresponding coauthor. Department of Mathematics and Statistics, Université de Montréal, Montréal, QC H3T 1J4, Canada, and Mila - Quebec AI Institute, Montréal, QC H2S 3H1, Canada.
Corresponding coauthor. Departments of Computer Science and Genetics, Yale University, New Haven, CT 06520 USA, and Departments of CMSE and Mathematics, Michigan State University, East Lansing, MI 48824 USA.

Funding Information

IVADO Professor funds
CIFAR
Funding: The sixth author was partially supported by NSF grant DMS-1845856. The seventh author was partially funded by IVADO Professor funds, CIFAR AI Chair, and NSERC Discovery grant 03267. The sixth, seventh, and eighth authors were partially supported by NIH grant NIGMS-R01GM135929. The content provided here is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.