Abstract

Several successful partitioning models and methods have been proposed and used for computational load balancing of irregularly sparse applications in a distributed-memory setting. However, the literature lacks partitioning models and methods that encode both computational and data load balancing. In this article, we try to close this gap in the literature by proposing two hypergraph partitioning (HP) models which simultaneously encode computational and data load balancing. Both models utilize a two-constraint formulation, where the first constraint encodes the computational loads and the second constraint encodes the data loads. In the first model, we introduce explicit data vertices for encoding data load and we replicate those data vertices at each recursive bipartitioning (RB) step for encoding data replication. In the second model, we introduce a data weight distribution scheme for encoding data load and we update those weights at each RB step. The nice property of both proposed models is that they do not necessitate developing a new partitioner from scratch. Both models can easily be implemented by invoking any HP tool that supports multiconstraint partitioning as a two-way partitioner at each RB step. The validity of the proposed models are tested on two widely used irregularly sparse applications: parallel mesh simulations and parallel sparse matrix sparse matrix multiplication. Both proposed models achieve significant improvement over a baseline model.

Keywords

  1. computational load balance
  2. data load balance
  3. distributed-memory systems
  4. hypergraph partitioning
  5. recursive bipartitioning
  6. multi-constraint partitioning
  7. general sparse matrix-matrix multiplication
  8. mesh partitioning

MSC codes

  1. 05C85
  2. 05C65
  3. 65F50
  4. 68R10

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
S. Acer, O. Selvitopi, and C. Aykanat, Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems, Parallel Comput., 59 (2016), pp. 71--96.
2.
S. Acer, O. Selvitopi, and C. Aykanat, Optimizing nonzero-based sparse matrix partitioning models via reducing latency, J. Parallel and Distrib. Comput., 122 (2018), pp. 145--158.
3.
K. Akbudak, O. Selvitopi, and C. Aykanat, Partitioning models for scaling parallel sparse matrix-matrix multiplication, ACM Trans. Parallel Comput., 4 (2017), 13.
4.
E. Angel, C. Chevalier, F. Ledoux, S. Morais, and D. Regnault, FPT approximation algorithm for scheduling with memory constraints, in Euro-Par 2016: Parallel Processing, Springer, Cham, Switzerland, 2016, pp. 196--208.
5.
E. Angel, S. Morais, and D. Regnault, A Bi-criteria FPTAS for scheduling with memory constraints on graph with bounded tree-width, in European Conference on Parall Processing, Springer, Cham, Switzerland, 2022, pp. 136-151.
6.
C. Aykanat, B. B. Cambazoglu, F. Findik, and T. Kurc, Adaptive decomposition and remapping algorithms for object-space-parallel direct volume rendering of unstructured grids, J. Parallel Distrib. Comput., 67 (2007), pp. 77--99.
7.
R. Barat, Load Balancing of Multi-physics Simulation by Multi-criteria Graph Partitioning, Ph.D. thesis, Cédric Informatique Bordeaux, France, 2017.
8.
N. Bell, S. Dalton, and L. N. Olson, Exposing fine-grained parallelism in algebraic multigrid methods, SIAM J. Sci. Comput., 34 (2012), pp. C123--C152.
9.
R. H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP, 2nd ed., Oxford University Press, New York, 2020.
10.
R. H. Bisseling and I. Flesch, Mondriaan sparse matrix partitioning for attacking cryptosystems by a parallel block Lanczos algorithm -- a case study, Parallel Comput., 32 (2006), pp. 551--567.
11.
R. H. Bisseling and W. Meesen, Communication balancing in parallel sparse matrix-vector multiplication, Electron. Trans. Numer. Anal., 21 (2005), pp. 47--65.
12.
E. G. Boman and M. M. Wolf, A nested dissection partitioning method for parallel sparse matrix-vector multiplication, in Proceedings of the 2013 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Piscataway, NJ, 2013, pp. 1--6.
13.
B. B. Cambazoglu and C. Aykanat, Hypergraph-partitioning-based remapping models for image-space-parallel direct volume rendering of unstructured grids, IEEE Trans. Parallel Distrib. Syst., 18 (2007), pp. 3--16.
14.
Ü. V. Çatalyürek and C. Aykanat, Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication, IEEE Trans. Parallel Distrib. Syst., 10 (1999), pp. 673--693.
15.
Ü. V. Çatalyürek and C. Aykanat, A hypergraph-partitioning approach for coarse-grain decomposition, in Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, SC '01, New York, 2001, Association for Computing Machinery, New York, 2001, 28.
16.
Ü. V. Çatalyürek and C. Aykanat, PaToH (partitioning tool for hypergraphs), in Encyclopedia of Parallel Computing, Springer, New York, 2011, pp. 1479--1487.
17.
Ü. V. Çatalyürek, C. Aykanat, and B. Uçar, On two-dimensional sparse matrix partitioning: Models, methods, and a recipe, SIAM J. Sci. Comput., 32 (2010), pp. 656--683.
18.
Ü. V. Çatalyürek, E. G. Boman, K. D. Devine, D. Bozdag, R. Heaphy, and L. A. Riesen, Hypergraph-based dynamic load balancing for adaptive scientific computations, in Proceedings of the 2007 IEEE International Parallel and Distributed Processing Symposium, IEEE, Piscataway, NJ, 2007, pp. 1--11.
19.
Ü. V. Çatalyürek, E. G. Boman, K. D. Devine, D. Bozdağ, R. T. Heaphy, and L. A. Riesen, A repartitioning hypergraph model for dynamic load balancing, J. Parallel Distrib. Comput., 69 (2009), pp. 711--724.
20.
Ü. V. Çatalyürek, M. Deveci, K. Kaya, and B. Uçar, UMPa: A multi-objective, multi-level partitioner for communication minimization, Contemp. Math. Graph Partitioning and Graph Clustering, 588, 2013, pp. 53--66.
21.
A. Cevahir, C. Aykanat, A. Turk, and B. B. Cambazoglu, Site-based partitioning and repartitioning techniques for parallel pagerank computation, IEEE Trans. Parallel Distrib. Syst., 22 (2011), pp. 786--802.
22.
C. Chevalier, F. Ledoux, and S. Morais, A multilevel mesh partitioning algorithm driven by memory constraints, in 2020 Proceedings of the SIAM Workshop on Combinatorial Scientific Computing, SIAM, Philadelphia, 2020, pp. 85--95.
23.
T. A. Davis and Y. Hu, The University of Florida sparse matrix collection, ACM Trans. Math. Software, 38 (2011), 1.
24.
M. Deveci, K. Kaya, B. Uçar, and Ü. V. Çatalyürek, Hypergraph partitioning for multiple communication cost metrics: Model and methods, J. Parallel Distrib. Comput., 77 (2015), pp. 69--83.
25.
K. Devine, B. Hendrickson, E. Boman, M. St. John, and C. Vaughan, Design of dynamic load-balancing tools for parallel applications, in Proceedings of the 14th International Conference on Supercomputing, ICS '00, New York, 2000, Association for Computing Machinery, New York, pp. 110--118.
26.
K. D. Devine, E. G. Boman, R. T. Heaphy, B. A. Hendrickson, J. D. Teresco, J. Faik, J. E. Flaherty, and L. G. Gervasio, New challenges in dynamic load balancing, Appl. Numer. Math., 52 (2005), pp. 133--152.
27.
K. D. Devine, E. G. Boman, and G. Karypis, Partitioning and load balancing for emerging parallel applications and architectures, in Parallel Processing for Scientific Computing, SIAM, Philadelphia, 2006, pp. 99--126.
28.
E. D. Dolan and J. J. Moré, Benchmarking optimization software with performance profiles, Math. Program., 91 (2002), pp. 201--213.
29.
J. Flaherty, R. Loy, P. Scully, M. Shephard, B. Szymanski, J. Teresco, and L. Ziantz, Load balancing and communication optimization for parallel adaptive finite element methods, in Proceedings 17th International Conference of the Chilean Computer Science Society, IEEE, Computer Society, Los Alamitos, CA, 1997, pp. 246--255.
30.
S. jun Liang, J. Cheng, and J. wei Zhang, Research on data load balancing technology of massive storage system for wearable devices, Digital Commun. Netw., 8 (2020), pp. 143--149.
31.
G. Linden, B. Smith, and J. York, Amazon.com recommendations: Item-to-item collaborative filtering, IEEE Internet Comput., 7 (2003), pp. 76--80.
32.
K. Liu, G. Xu, and G. Yuan, An improved hadoop data load balancing algorithm, J. Netw., 8 (2013), pp. 2816--2822.
33.
H. Meyerhenke, Dynamic load balancing for parallel numerical simulations based on repartitioning with disturbed diffusion, in Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems, IEEE, Piscataway, NJ, 2009, pp. 150--157.
34.
S. Morais, Etude et obtention d'heuristiques et d'algorithmes exacts et approchés pour un problème de partitionnement de maillage sous contraintes mémoire, theses, Université Paris Saclay, Paris, 2016.
35.
N. Patel and S. Chauhan, A survey on load balancing and scheduling in cloud computing, Int. J. Innovative Res. Sci. Technol., 1 (2015), pp. 185--189.
36.
D. M. Pelt and R. H. Bisseling, A medium-grain method for fast 2D bipartitioning of sparse matrices, in Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IEEE, Piscataway, NJ, 2014, pp. 529--539.
37.
A. Pinar and B. Hendrickson, Improving load balance with flexibly assignable tasks, IEEE Trans. Parallel Distrib. Syst., 16 (2005), pp. 956--965.
38.
O. Selvitopi, S. Acer, and C. Aykanat, A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously, IEEE Trans. Parallel Distrib. Syst., 28 (2017), pp. 345--358.
39.
O. Selvitopi and C. Aykanat, Reducing latency cost in $2$D sparse matrix partitioning models, Parallel Comput., 57 (2016), pp. 1--24.
40.
C. Tzovas, M. Predari, and H. Meyerhenke, Distributing sparse matrix/graph applications in heterogeneous clusters - an experimental study, in Proceedings of the 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), IEEE, Computer Society, Los Alamitos, CA, 2020, pp. 72--81.
41.
B. Uçar and C. Aykanat, Minimizing communication cost in fine-grain partitioning of sparse matrices, in International Symposium on Computer and Information Sciences, Springer, Berlin, 2003, pp. 926--933.
42.
B. Uçar and C. Aykanat, Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies, SIAM J. Sci. Comput., 25 (2004), pp. 1837--1859.
43.
B. Vastenhouw and R. H. Bisseling, A two-dimensional data distribution method for parallel sparse matrix-vector multiplication, SIAM Rev., 47 (2005), pp. 67--95.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C399 - C424
ISSN (online): 1095-7197

History

Submitted: 22 March 2022
Accepted: 25 July 2022
Published online: 15 November 2022

Keywords

  1. computational load balance
  2. data load balance
  3. distributed-memory systems
  4. hypergraph partitioning
  5. recursive bipartitioning
  6. multi-constraint partitioning
  7. general sparse matrix-matrix multiplication
  8. mesh partitioning

MSC codes

  1. 05C85
  2. 05C65
  3. 65F50
  4. 68R10

Authors

Affiliations

Mestan Firat Çeliktuğ

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media