SIAM Journal on Scientific Computing


Reducing Floating Point Error in Dot Product Using the Superblock Family of Algorithms

Related Databases

Web of Science

You must be logged in with an active subscription to view this.

Article Data

History

Submitted: 11 January 2007
Accepted: 25 July 2008
Published online: 17 December 2008

Publication Data

ISSN (print): 1064-8275
ISSN (online): 1095-7197
CODEN: sjoce3

This paper discusses both the theoretical and statistical errors obtained by various well-known dot products, from the canonical to pairwise algorithms, and introduces a new and more general framework that we have named superblock which subsumes them and permits a practitioner to make trade-offs between computational performance, memory usage, and error behavior. We show that algorithms with lower error bounds tend to behave noticeably better in practice. Unlike many such error-reducing algorithms, superblock requires no additional floating point operations and should be implementable with little to no performance loss, making it suitable for use as a performance-critical building block of a linear algebra kernel.

Copyright © 2008 Society for Industrial and Applied Mathematics

Cited by

, , and . (2020) A Class of Fast and Accurate Summation Algorithms. SIAM Journal on Scientific Computing 42:3, A1541-A1557. Abstract | PDF (859 KB) 
and . (2019) A New Approach to Probabilistic Rounding Error Analysis. SIAM Journal on Scientific Computing 41:5, A2815-A2835. Abstract | PDF (1163 KB) 
(2016) Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures. 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 653-662. Crossref
(2014) The Better Accuracy of Strassen-Winograd Algorithms (FastMMW). Advances in Linear Algebra & Matrix Theory 04:01, 9-39. Crossref
(2014) Minimizing synchronizations in sparse iterative solvers for distributed supercomputers. Computers & Mathematics with Applications 67:1, 199-209. Crossref
(2011) Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms. 2011 23rd International Symposium on Computer Architecture and High Performance Computing, 120-127. Crossref