Abstract

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.

Keywords

  1. butterfly algorithm
  2. Egorov operator
  3. Radon transform
  4. parallel
  5. Blue Gene/Q

MSC codes

  1. 65R10
  2. 65Y05
  3. 65Y20
  4. 44A12

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
A. Alexandrov, M. Ionescu, K. Schauser, and C. Scheiman, LogGP: Incorporating long messages into the LogP model for parallel computation, J. Parallel Dist. Comput., 44 (1997), pp. 71--79.
2.
M. Barnett, S. Gupta, D. Payne, L. Shuler, and R. van de Geijn, Interprocessor collective communication library (InterCom), in Proceedings of the Scalable High Performance Computing Conference, 1994, pp. 357--364.
3.
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing communication in numerical linear algebra, SIAM J. Matrix Anal. Appl., 32 (2011), pp. 866--901.
4.
G. Beylkin, The inversion problem and applications of the generalized Radon transform, Comm. Pure Appl. Math., 37 (1984), pp. 579--599.
5.
P. Businger and G. Golub, Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965), pp. 269--276.
6.
E. Candès, L. Demanet, and L. Ying, A fast butterfly algorithm for the computation of Fourier integral operators, Multiscale Model. Simul., 7 (2009), no. 4, pp. 1727--1750.
7.
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn, Collective communication: Theory, practice, and experience, Concurrency Comp. Pract. Experience, 19 (2007), pp. 1749--1783.
8.
S. Chandrasekaran and I. Ipsen, On rank-revealing QR factorizations, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 592--622.
9.
W. Chew and J. Song, Fast Fourier transform of sparse spatial data to sparse Fourier data, Antennas and Propagation Society International Symposium, 4 (2000), pp. 2324--2327.
10.
D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a realistic model of parallel computation, in Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1993, pp. 1--12.
11.
K. Czechowski, C. Battaglino, C. McClanahan, K. Iyer, P.-K. Yeung, and R. Vuduc, On the communication complexity of $3$D FFTs and its implications for exascale, in Proceedings of the NSF/TCPP Workshop on Parallel and Distributed Computing Education, Shanghai, China, 2012.
12.
L. Demanet, M. Ferrara, N. Maxwell, J. Poulson, and L. Ying, A butterfly algorithm for synthetic aperture radar imaging, SIAM J. Imaging Sciences, 5 (2012), pp. 203--243.
13.
J. Demmel, L. Grigori, M. Gu, and H. Xiang, Communication Avoiding Rank Revealing QR Factorization with Column Pivoting, Technical report, University of Tennessee, 2013.
14.
B. Engquist and L. Ying, Fast directional multilevel algorithms for oscillatory kernels, SIAM J. Sci. Comput., 29 (2007), pp. 1710--1737.
15.
I. Foster and P. Worley, Parallel algorithms for the spectral transform method, SIAM J. Sci. Comput., 18 (1997), pp. 806--837.
16.
M. Gu and S. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848--869.
17.
R. Hockney, The communication challenge for MPP: Intel Paragon and Meiko CS-$2$, J. Parallel Comput., 20 (1994), pp. 389--398.
18.
J. Hu, S. Fomel, L. Demanet, and L. Ying, A fast butterfly algorithm for generalized Radon transforms, Geophysics, 78 (2013), pp. U41--U51.
19.
S. Kunis and I. Melzer, A stable and accurate butterfly sparse Fourier transform, SIAM J. Numer. Anal., 50 (2012), pp. 1777--1800.
20.
P.-G. Martinsson and V. Rokhlin, A fast direct solver for scattering problems involving elongated structures, J. Comput. Phys., 221 (2007), pp. 288--302.
21.
P.-G. Martinsson, V. Rokhlin, Y. Shkolnisky, and M. Tygert, ID: A Software Package for Low-Rank Approximation of Matrices via Interpolative Decompositions, Version 0.2, http://cims.nyu.edu/˜tygert/software.html (2008).
22.
E. Michielssen and A. Boag, A multilevel matrix decomposition algorithm for analyzing scattering from large structures, IEEE Trans. Antennas and Propagation, 44 (1996), pp. 1086--1093.
23.
M. O'Neil, F. Woolfe, and V. Rokhlin, An algorithm for the rapid evaluation of special function transforms, Appl. Comput. Harmon. Anal., 28 (2010), pp. 203--226.
24.
M. Pippig, PFFT: An extension of FFTW to massively parallel architectures, SIAM J. Sci. Comput., 35 (2013), pp. C213--C236.
25.
M. Pippig and D. Potts, Parallel Three-Dimensional Nonequispaced Fast Fourier Transforms and Their Application to Particle Simulation, Technical report, Chemnitz University of Technology, 2012.
26.
J. Poulson, B. Marker, R. van de Geijn, J. Hammond, and N. Romero, Elemental: A new framework for distributed memory dense matrix computations, ACM TOMS, 39 (2013), pp. 13:1--13:24.
27.
D. Seljebotn, Wavemoth---Fast spherical harmonic transforms by butterfly matrix compression, Astrophys. J. Suppl. Ser., 199 (2012).
28.
R. Thakur, R. Rabenseifner, and W. Gropp, Optimization of collective communication operations in MPICH, Internat. J. High Perf. Comput. Appl., 19 (2005), pp. 49--66.
29.
M. Tygert, Fast algorithms for spherical harmonic expansions, III, J. Comput. Phys., 229 (2010), pp. 6181--6192.
30.
F. van Zee, T. Smith, F. Igual, M. Smelyanskiy, X. Zhang, M. Kistler, V. Austel, J. Gunnels, T. Meng Low, B. Marker, L. Killough, and R. van de Geijn, Implementing Level-3 BLAS with BLIS: Early experience, Technical report TR-13-03, University of Texas at Austin, 2013.
31.
F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, A fast randomized algorithm for the approximation of matrices, Appl. Comput. Harmon. Anal., 25 (2008), pp. 335--366.
32.
L. Ying, Sparse Fourier transforms via butterfly algorithm, SIAM J. Sci. Comput., 31 (2009), pp. 1678--1694.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C49 - C65
ISSN (online): 1095-7197

History

Submitted: 20 May 2013
Accepted: 25 November 2013
Published online: 4 February 2014

Keywords

  1. butterfly algorithm
  2. Egorov operator
  3. Radon transform
  4. parallel
  5. Blue Gene/Q

MSC codes

  1. 65R10
  2. 65Y05
  3. 65Y20
  4. 44A12

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media