SIAM Digital Library
 
 
 

SIAM J. on Scientific Computing

All Journal content published prior to 1997 is part of LOCUS.

Search Issue | RSS Feeds RSS
Previous Issue

2010

Volume 32, Issue 6, pp. 3151-3649


A Multilevel Jacobi–Davidson Method for Polynomial PDE Eigenvalue Problems Arising in Plasma Physics

Marlis Hochbruck and Dominik Löchel

SIAM J. Sci. Comput. 32, pp. 3151-3169 (19 pages)

Online Publication Date: October 21, 2010

Full Text: | Download PDF

Show Abstract
The simulation of drift instabilities in the plasma edge leads to cubic polynomial PDE eigenvalue problems with parameter dependent coefficients. The aim is to determine the wave number which leads to the maximum growth rate of the amplitude of the wave. This requires the solution of a large number of PDE eigenvalue problems. Since we are only interested in a smooth eigenfunction corresponding to the eigenvalue with largest imaginary part, the Jacobi–Davidson method can be applied. Unfortunately, a naive implementation of this method is much too expensive for the large number of problems that have to be solved. In this paper we will present a multilevel approach for the construction of an appropriate initial search space. We will also discuss the efficient solution of the correction equation, and we will show how optimal scaling helps to accelerate the convergence.

An "$hp$" Certified Reduced Basis Method for Parametrized Elliptic Partial Differential Equations

Jens L. Eftang, Anthony T. Patera, and Einar M. Rønquist

SIAM J. Sci. Comput. 32, pp. 3170-3200 (31 pages)

Online Publication Date: October 21, 2010

Full Text: | Download PDF

Show Abstract
We present a new “$hp$” parameter multidomain certified reduced basis (RB) method for rapid and reliable online evaluation of functional outputs associated with parametrized elliptic partial differential equations. We propose, and provide theoretical justification for, a new procedure for adaptive partition (“$h$”-refinement) of the parameter domain into smaller parameter subdomains: we pursue a hierarchical splitting of the parameter (sub)domains based on proximity to judiciously chosen parameter anchor points within each subdomain. Subsequently, we construct individual standard RB approximation spaces (“$p$”-refinement) over each subdomain. Greedy parameter sampling procedures and a posteriori error estimation play important roles in both the “$h$”-type and “$p$”-type stages of the new algorithm. We present illustrative numerical results for a convection-diffusion problem: the new “$hp$”-approach is considerably faster (respectively, more costly) than the standard “$p$”-type reduced basis method in the online (respectively, offline) stage.

Controlling Unstructured Mesh Partitions for Massively Parallel Simulations

Min Zhou, Onkar Sahni, Karen D. Devine, Mark S. Shephard, and Kenneth E. Jansen

SIAM J. Sci. Comput. 32, pp. 3201-3227 (27 pages)

Online Publication Date: November 09, 2010

Full Text: | Download PDF

Show Abstract
Parallel simulations at extreme scale require that the mesh is distributed across a large number of processors with equal work load and minimum interpart communications. A number of algorithms have been developed to meet these goals, e.g., graph/hypergraph and coordinate-based methods. However, the global implementation of current approaches can fail on very large core counts, which is resolved by combining global and local partitioning using multiple parts per processor. The other limitation of graph/hypergraph-based partitioning is that it uses one type of mesh entity as graph nodes; thus, the balance of other mesh entities may not be optimal. In the case of three-dimensional (3-D) linear finite element analysis, it is common to select mesh regions (elements) as partition objects. In current examples, the regions are well balanced up to 163,840 parts for a 1.07 billion element mesh, while the vertices have an imbalance which is as high as 19.52%. Two methods are developed that work in conjunction with graph/hypergraph-based procedures to provide improved partitions. Example computations executed on an IBM Blue Gene/P system using up to 163,840 cores demonstrate the usefulness of the procedures, particularly for time-critical calculations where individual cores may be lightly loaded in terms of the number of mesh entities per core. The algorithms presented in this paper reduced the vertex imbalance from 17.8% to 4.97% for a partition with 131,072 parts and accelerated the equation solution phase of the finite element analysis by 10.4%.

Efficient Spectral Sparse Grid Methods and Applications to High-Dimensional Elliptic Problems

Jie Shen and Haijun Yu

SIAM J. Sci. Comput. 32, pp. 3228-3250 (23 pages) | Cited 1 time

Online Publication Date: November 09, 2010

Full Text: | Download PDF

Show Abstract
We develop in this paper some efficient algorithms which are essential to implementations of spectral methods on the sparse grid by Smolyak's construction based on a nested quadrature. More precisely, we develop a fast algorithm for the discrete transform between the values at the sparse grid and the coefficients of expansion in a hierarchical basis; and by using the aforementioned fast transform, we construct two very efficient sparse spectral-Galerkin methods for a model elliptic equation. In particular, the Chebyshev–Legendre–Galerkin method leads to a sparse matrix with a much lower number of nonzero elements than that of low-order sparse grid methods based on finite elements or wavelets, and can be efficiently solved by a suitable sparse solver. Ample numerical results are presented to demonstrate the efficiency and accuracy of our algorithms.

Adaptive ADER Methods Using Kernel-Based Polyharmonic Spline WENO Reconstruction

Terhemen Aboiyar, Emmanuil H. Georgoulis, and Armin Iske

SIAM J. Sci. Comput. 32, pp. 3251-3277 (27 pages)

Online Publication Date: November 09, 2010

Full Text: | Download PDF

Show Abstract
An adaptive ADER finite volume method on unstructured meshes is proposed. The method combines high order polyharmonic spline weighted essentially non-oscillatory (WENO) reconstruction with high order flux evaluation. Polyharmonic splines are utilized in the recovery step of the finite volume method yielding a WENO reconstruction that is stable, flexible, and optimal in the associated Sobolev (Beppo-Levi) space. The flux evaluation is accomplished by solving generalized Riemann problems across cell interfaces. The mesh adaptation is performed through an a posteriori error indicator, which relies on the polyharmonic spline reconstruction scheme. The performance of the proposed method is illustrated by a series of numerical experiments, including linear advection, Burgers's equation, Smolarkiewicz's deformational flow test, and the five-spot problem.

A Krylov Method for the Delay Eigenvalue Problem

Elias Jarlebring, Karl Meerbergen, and Wim Michiels

SIAM J. Sci. Comput. 32, pp. 3278-3300 (23 pages) | Cited 1 time

Online Publication Date: November 09, 2010

Full Text: | Download PDF

Show Abstract
The Arnoldi method is currently a very popular algorithm to solve large-scale eigenvalue problems. The main goal of this paper is to generalize the Arnoldi method to the characteristic equation of a delay-differential equation (DDE), here called a delay eigenvalue problem (DEP). The DDE can equivalently be expressed with a linear infinite-dimensional operator whose eigenvalues are the solutions to the DEP. We derive a new method by applying the Arnoldi method to the generalized eigenvalue problem associated with a spectral discretization of the operator and by exploiting the structure. The result is a scheme where we expand a subspace not only in the traditional way done in the Arnoldi method. The subspace vectors are also expanded with one block of rows in each iteration. More important, the structure is such that if the Arnoldi method is started in an appropriate way, it has the (somewhat remarkable) property that it is in a sense independent of the number of discretization points. It is mathematically equivalent to an Arnoldi method with an infinite matrix, corresponding to the limit where we have an infinite number of discretization points. We also show an equivalence with the Arnoldi method in an operator setting. It turns out that with an appropriately defined operator over a space equipped with scalar product with respect to which Chebyshev polynomials are orthonormal, the vectors in the Arnoldi iteration can be interpreted as the coefficients in a Chebyshev expansion of a function. The presented method yields the same Hessenberg matrix as the Arnoldi method applied to the operator.

A FETI–DP Formulation for the Three-Dimensional Stokes Problem without Primal Pressure Unknowns

Hyea Hyun Kim and Chang-Ock Lee

SIAM J. Sci. Comput. 32, pp. 3301-3322 (22 pages)

Online Publication Date: November 09, 2010

Full Text: | Download PDF

Show Abstract
A FETI–DP (dual-primal finite element tearing and interconnecting) algorithm for the three-dimensional Stokes problem is developed and analyzed. This is an extension of the previous work for the two-dimensional problem in [H. H. Kim, C.-O. Lee, and E.-H. Park, SIAM J. Numer. Anal., 47 (2010), pp. 4142–4162]. Advantages of this approach are the coarse problem without primal pressure unknowns and the use of a computationally cheap lumped preconditioner. Especially in three dimensions, these advantages provide a more practical FETI-DP algorithm. In three dimensions, the velocity unknowns at subdomain corners and the averages of velocity unknowns over common faces are selected as the primal unknowns in the FETI-DP formulation. Its condition number bound is analyzed to be $CH/h$, where $C$ is a positive constant which is independent of any mesh parameters and $H/h$ is the number of elements across each subdomain. Numerical results are included.

Unified Embedded Parallel Finite Element Computations via Software-Based Fréchet Differentiation

Kevin Long, Robert Kirby, and Bart van Bloemen Waanders

SIAM J. Sci. Comput. 32, pp. 3323-3351 (29 pages)

Online Publication Date: November 17, 2010

Full Text: | Download PDF

Show Abstract
Computational analysis of systems governed by partial differential equations (PDEs) requires not only the calculation of a solution but the extraction of additional information such as the sensitivity of that solution with respect to input parameters or the inversion of the system in an optimization or design loop. Moving beyond the automation of discretization of PDEs by finite element methods, we present a mathematical framework that unifies the discretization of PDEs with these additional analysis requirements. In particular, Fréchet differentiation on a class of functionals together with a high-performance finite element framework has led to a code, called Sundance, that provides high-level programming abstractions for the automatic, efficient evaluation of finite variational forms together with the derived operators required by engineering analysis.

Quasi-Newton Methods on Grassmannians and Multilinear Approximations of Tensors

Berkant Savas and Lek-Heng Lim

SIAM J. Sci. Comput. 32, pp. 3352-3393 (42 pages) | Cited 4 times

Online Publication Date: November 30, 2010

Full Text: | Download PDF

Show Abstract
In this paper we proposed quasi-Newton and limited memory quasi-Newton methods for objective functions defined on Grassmannians or a product of Grassmannians. Specifically we defined BFGS and limited memory BFGS updates in local and global coordinates on Grassmannians or a product of these. We proved that, when local coordinates are used, our BFGS updates on Grassmannians share the same optimality property as the usual BFGS updates on Euclidean spaces. When applied to the best multilinear rank approximation problem for general and symmetric tensors, our approach yields fast, robust, and accurate algorithms that exploit the special Grassmannian structure of the respective problems and which work on tensors of large dimensions and arbitrarily high order. Extensive numerical experiments are included to substantiate our claims.

Semi-Implicit Formulations of the Navier–Stokes Equations: Application to Nonhydrostatic Atmospheric Modeling

F. X. Giraldo, M. Restelli, and M. Läuter

SIAM J. Sci. Comput. 32, pp. 3394-3425 (32 pages)

Online Publication Date: November 30, 2010

Full Text: | Download PDF

Show Abstract
We present semi-implicit (implicit-explicit) formulations of the compressible Navier–Stokes equations (NSE) for applications in nonhydrostatic atmospheric modeling. The compressible NSE in nonhydrostatic atmospheric modeling include buoyancy terms that require special handling if one wishes to extract the Schur complement form of the linear implicit problem. We present results for five different forms of the compressible NSE and describe in detail how to formulate the semi-implicit time-integration method for these equations. Finally, we compare all five equations and compare the semi-implicit formulations of these equations both using the Schur and No Schur forms against an explicit Runge–Kutta method. Our simulations show that, if efficiency is the main criterion, it matters which form of the governing equations you choose. Furthermore, the semi-implicit formulations are faster than the explicit Runge–Kutta method for all the tests studied, especially if the Schur form is used. While we have used the spectral element method for discretizing the spatial operators, the semi-implicit formulations that we derive are directly applicable to all other numerical methods. We show results for our five semi-implicit models for a variety of problems of interest in nonhydrostatic atmospheric modeling, including inertia-gravity waves, density current (i.e., Kelvin–Helmholtz instabilities), and mountain test cases; the latter test case requires the implementation of nonreflecting boundary conditions. Therefore, we show results for all five semi-implicit models using the appropriate boundary conditions required in nonhydrostatic atmospheric modeling: no-flux (reflecting) and nonreflecting boundary conditions (NRBCs). It is shown that the NRBCs exert a strong impact on the accuracy and efficiency of the models.

Hypergraph-Based Unsymmetric Nested Dissection Ordering for Sparse LU Factorization

Laura Grigori, Erik G. Boman, Simplice Donfack, and Timothy A. Davis

SIAM J. Sci. Comput. 32, pp. 3426-3446 (21 pages) | Cited 1 time

Online Publication Date: November 30, 2010

Full Text: | Download PDF

Show Abstract
In this paper we discuss a hypergraph-based unsymmetric nested dissection (HUND) ordering for reducing the fill-in incurred during Gaussian elimination. It has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It takes into account the asymmetry of the input matrix by using a hypergraph to represent its structure. It is suitable for performing Gaussian elimination in parallel, with partial pivoting. This is possible because the row permutations performed due to partial pivoting do not destroy the column separators identified by the nested dissection approach. The hypergraph nested dissection approach is essentially equivalent to graph nested dissection on the matrix $A^{T}A$, but we need only the original matrix $A$ and never form the usually denser matrix $A^{T}A$. The usage of hypergraphs in our approach is fairly standard, and HUND can be implemented by calling an existing hypergraph partitioner that uses recursive bisection. Our implementation uses local reordering constrained column approximate minimum degree (CCOLAMD) to further improve the ordering. We also explain how weighted matching (HSL routine MC64) can be used in this context. Experimental results on 27 medium and large size matrices with highly unsymmetric structures compare our approach to four other well-known reordering algorithms. The results show that it provides a robust reordering algorithm, in the sense that it is the best or close to the best (often within 10%) of all the other methods, in particular on matrices with highly unsymmetric structures.

An Interior-Point Algorithm for Large-Scale Nonlinear Optimization with Inexact Step Computations

Frank E. Curtis, Olaf Schenk, and Andreas Wächter

SIAM J. Sci. Comput. 32, pp. 3447-3475 (29 pages)

Online Publication Date: November 30, 2010

Full Text: | Download PDF

Show Abstract
We present a line-search algorithm for large-scale continuous optimization. The algorithm is matrix-free in that it does not require the factorization of derivative matrices. Instead, it uses iterative linear system solvers. Inexact step computations are supported in order to save computational expense during each iteration. The algorithm is an interior-point approach derived from an inexact Newton method for equality constrained optimization proposed by Curtis, Nocedal, and Wächter [SIAM J. Optim., 20 (2009), pp. 1224–1249], with additional functionality for handling inequality constraints. The algorithm is shown to be globally convergent under loose assumptions. Numerical results are presented for nonlinear optimization test set collections and a pair of PDE-constrained model problems.

Scalable Direct Vlasov Solver with Discontinuous Galerkin Method on Unstructured Mesh

J. Xu, P. N. Ostroumov, B. Mustapha, and J. Nolen

SIAM J. Sci. Comput. 32, pp. 3476-3494 (19 pages)

Online Publication Date: December 02, 2010

Full Text: | Download PDF

Show Abstract
This paper presents the development of parallel direct Vlasov solvers with discontinuous Galerkin (DG) method for beam and plasma simulations in four dimensions. Both physical and velocity spaces are in two dimesions (2P2V) with unstructured mesh. Contrary to the standard particle-in-cell (PIC) approach for kinetic space plasma simulations, i.e., solving Vlasov-Maxwell equations, direct method has been used in this paper. There are several benefits to solving a Vlasov equation directly, such as avoiding noise associated with a finite number of particles and the capability to capture fine structure in the plasma. The most challanging part of a direct Vlasov solver comes from higher dimensions, as the computational cost increases as $N^{2d}$, where $d$ is the dimension of the physical space. Recently, due to the fast development of supercomputers, the possibility has become more realistic. Many efforts have been made to solve Vlasov equations in low dimensions before; now more interest has focused on higher dimensions. Different numerical methods have been tried so far, such as the finite difference method, Fourier Spectral method, finite volume method, and spectral element method. This paper is based on our previous efforts to use the DG method. The DG method has been proven to be very successful in solving Maxwell equations, and this paper is our first effort in applying the DG method to Vlasov equations. DG has shown several advantages, such as local mass matrix, strong stability, and easy parallelization. These are particularly suitable for Vlasov equations. Domain decomposition in high dimensions has been used for parallelization; these include a highly scalable parallel two-dimensional Poisson solver. Benchmark results have been shown and simulation results will be reported.

Communication-optimal Parallel and Sequential Cholesky Decomposition

Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz

SIAM J. Sci. Comput. 32, pp. 3495-3523 (29 pages) | Cited 1 time

Online Publication Date: December 02, 2010

Full Text: | Download PDF

Show Abstract
Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional ($O(n^3)$) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [J. Demmel et al., Communication-optimal Parallel and Sequential QR and LU Factorizations, Technical report EECS-2008-89, University of California, Berkeley, CA, 2008], [J. Demmel et al., Implementing Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM. J. Sci. Comp., submitted], and [J. Demmel, L. Grigori, and H. Xiang, Communication-avoiding Gaussian Elimination, Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008] this gives a set of communication-optimal algorithms for $O(n^3)$ implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR, and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.

Generalized Monge–Kantorovich Optimization for Grid Generation and Adaptation in $L_{p}$

G. L. Delzanno and J. M. Finn

SIAM J. Sci. Comput. 32, pp. 3524-3547 (24 pages)

Online Publication Date: December 02, 2010

Full Text: | Download PDF

Show Abstract
The Monge–Kantorovich grid generation and adaptation scheme of [Delzanno et al., J. Comput. Phys., 227 (2008), pp. 9841–9864] is generalized from a variational principle based on $L_{2}$ to a variational principle based on $L_{p}$. A generalized Monge–Ampère (MA) equation is derived and its properties are discussed. Results for $p>1$ are obtained and compared in terms of the quality of the resulting grid and a measure of computational performance. We conclude that for the grid generation application, the formulation based on $L_{p}$ for $p$ close to unity can lead to serious problems associated with the boundary. On the other hand, $p\gg2$ also leads to worse quality grids and performance. Thus, it is concluded that $p=2$ produces the best quality grids, particularly in terms of mean grid cell distortion. Furthermore, the Newton–Krylov methods used to solve the generalized MA equation perform best for $p=2$.

Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach

Dongmin Kim, Suvrit Sra, and Inderjit S. Dhillon

SIAM J. Sci. Comput. 32, pp. 3548-3563 (16 pages)

Online Publication Date: December 16, 2010

Full Text: | Download PDF

Show Abstract
Numerous scientific applications across a variety of fields depend on box-constrained convex optimization. Box-constrained problems therefore continue to attract research interest. We address box-constrained (strictly convex) problems by deriving two new quasi-Newton algorithms. Our algorithms are positioned between the projected-gradient [J. B. Rosen, J. SIAM, 8 (1960), pp. 181–217] and projected-Newton [D. P. Bertsekas, SIAM J. Control Optim., 20 (1982), pp. 221–246] methods. We also prove their convergence under a simple Armijo step-size rule. We provide experimental results for two particular box-constrained problems: nonnegative least squares (NNLS), and nonnegative Kullback–Leibler (NNKL) minimization. For both NNLS and NNKL our algorithms perform competitively as compared to well-established methods on medium-sized problems; for larger problems our approach frequently outperforms the competition.

Numerical Implementation of Streaming Down the Gradient: Application to Fluid Modeling of Cosmic Rays and Saturated Conduction

Prateek Sharma, Phillip Colella, and Daniel F. Martin

SIAM J. Sci. Comput. 32, pp. 3564-3583 (20 pages)

Online Publication Date: December 16, 2010

Full Text: | Download PDF

Show Abstract
The equation governing the streaming of a quantity down its gradient superficially looks similar to the simple constant velocity advection equation. In fact, it is the same as an advection equation if there are no local extrema in the computational domain or at the boundary. However, in general when there are local extrema in the computational domain, it is a nontrivial nonlinear equation. The standard upwind time evolution with a CFL-limited timestep results in spurious oscillations at the grid scale. These oscillations, which originate at the extrema, propagate throughout the computational domain and are undamped even at late times. These oscillations arise because of unphysically large fluxes leaving (entering) the maxima (minima) with the standard CFL-limited explicit methods. Regularization of the equation shows that it is diffusive at the extrema; because of this, an explicit method for the regularized equation with $\Delta t\propto\Delta x^2$ behaves fine. We show that the implicit methods show stable and converging results with $\Delta t\propto\Delta x$; however, surprisingly, even implicit methods are not stable with large enough timesteps. In addition to these subtleties in the numerical implementation, the solutions to the streaming equation are quite novel: nondifferentiable solutions emerge from initially smooth profiles; the solutions show transport over large length scales, e.g., in form of tails. The fluid model for cosmic rays interacting with a thermal plasma (valid at space scales much larger than the cosmic ray Larmor radius) and the equation of saturated conduction in a collisionless plasma are similar to the streaming equation, so our method will find applications in fluid modeling of important processes in plasma astrophysics.

Nonoverlapping Domain Decomposition with Second Order Transmission Condition for the Time-Harmonic Maxwell's Equations

Vineet Rawat and Jin-Fa Lee

SIAM J. Sci. Comput. 32, pp. 3584-3603 (20 pages) | Cited 1 time

Online Publication Date: December 16, 2010

Full Text: | Download PDF

Show Abstract
Nonoverlapping domain decomposition methods have been shown to provide efficient iterative algorithms for the solution of the time-harmonic Maxwell's equations. Convergence of the algorithms depends strongly upon the nature of the transmission conditions that communicate information between adjacent subdomains. In this work, we introduce a new second order transmission condition that improves convergence. In contrast to previous high order interface conditions, the new condition uses two second order transverse derivatives to improve the convergence of evanescent modes without deteriorating the convergence of propagating modes. An analysis using a splitting of the field into traverse electric and magnetic components demonstrates the improved convergence provided by the transmission condition. Numerical experiments demonstrate the effectiveness of the algorithm.

BDDC Preconditioners for Spectral Element Discretizations of Almost Incompressible Elasticity in Three Dimensions

Luca F. Pavarino, Olof B. Widlund, and Stefano Zampini

SIAM J. Sci. Comput. 32, pp. 3604-3626 (23 pages)

Online Publication Date: December 21, 2010

Full Text: | Download PDF

Show Abstract
Balancing domain decomposition by constraints (BDDC) algorithms are constructed and analyzed for the system of almost incompressible elasticity discretized with Gauss–Lobatto–Legendre spectral elements in three dimensions. Initially mixed spectral elements are employed to discretize the almost incompressible elasticity system, but a positive definite reformulation is obtained by eliminating all pressure degrees of freedom interior to each subdomain into which the spectral elements have been grouped. Appropriate sets of primal constraints can be associated with the subdomain vertices, edges, and faces so that the resulting BDDC methods have a fast convergence rate independent of the almost incompressibility of the material. In particular, the condition number of the BDDC preconditioned operator is shown to depend only weakly on the polynomial degree $n$, the ratio $H/h$ of subdomain and element diameters, and the inverse of the inf-sup constants of the subdomains and the underlying mixed formulation, while being scalable, i.e., independent of the number of subdomains and robust, i.e., independent of the Poisson ratio and Young's modulus of the material considered. These results also apply to the related dual-primal finite element tearing and interconnect (FETI-DP) algorithms defined by the same set of primal constraints. Numerical experiments, carried out on parallel computing systems, confirm these results.

Design of a Multicore Sparse Cholesky Factorization Using DAGs

J. D. Hogg, J. K. Reid, and J. A. Scott

SIAM J. Sci. Comput. 32, pp. 3627-3649 (23 pages)

Online Publication Date: December 21, 2010

Full Text: | Download PDF

Show Abstract
The rapid emergence of multicore machines has led to the need to design new algorithms that are efficient on these architectures. Here, we consider the solution of sparse symmetric positive-definite linear systems by Cholesky factorization. We were motivated by the successful division of the computation in the dense case into tasks on blocks and use of a task manager to exploit all the parallelism that is available between these tasks, whose dependencies may be represented by a directed acyclic graph (DAG). Our sparse algorithm is built on the assembly tree and subdivides the work at each node into tasks on blocks of the Cholesky factor. The dependencies between these tasks may again be represented by a DAG. To limit memory requirements, blocks are updated directly rather than through generated-element matrices. Our algorithm is implemented within a new efficient and portable solver HSL_MA87. It is written in Fortran 95 plus OpenMP and is available as part of the software library HSL. Using problems arising from a range of applications, we present experimental results that support our design choices and demonstrate that HSL_MA87 obtains good serial and parallel times on our 8-core test machines. Comparisons are made with existing modern solvers and show that HSL_MA87 performs well, particularly in the case of very large problems.
Close

close