Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures
2011 (English)In: Parallel Computing, ISSN 0167-8191, Vol. 37, no 12, 771-782 p.Article in journal (Refereed) Published
We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2. (C) 2011 Elsevier B.V. All rights reserved.
Place, publisher, year, edition, pages
2011. Vol. 37, no 12, 771-782 p.
Hessenberg reduction, Blocked algorithm, Parallel computing, Dynamic scheduling, High performance, Multi-core, Memory hierarchies
Computer Science Computational Mathematics
IdentifiersURN: urn:nbn:se:umu:diva-41221DOI: 10.1016/j.parco.2011.05.001ISI: 000298522200005OAI: oai:DiVA.org:umu-41221DiVA: diva2:405041