Change search
ReferencesLink to record
Permanent link

Direct link
High-performance library software for QR factorization
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2001 (English)In: Applied Parallel Computing: New Paradigms for HPC in Industry and Academia. 5th International Workshop, PARA 2000 Bergen, Norway, June 18–20, 2000 Proceedings / [ed] Tor Sørevik, Fredrik Manne, Assefaw Hadish Gebremedhin, Randi Moe, Heidelberg/Berlin, Germany: Springer , 2001, Vol. 1947, 53-63 p.Conference paper (Other academic)
Abstract [en]

In [5],[6], we presented algorithm RGEQR3, a purely recursive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan [10]. RGEQR3 is a performance critical subroutine for the main (hybrid recursive) routine RGEQRF for QR factorization of a general m×n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results [5],[6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.

Place, publisher, year, edition, pages
Heidelberg/Berlin, Germany: Springer , 2001. Vol. 1947, 53-63 p.
, Lecture Notes in Computer Science, ISSN 0302-9743 ; 1947/2001
Keyword [en]
Serial and parallel library software, QR factorization, recursion, register blocking, unrolling, SMP systems, dynamic load balancing
URN: urn:nbn:se:umu:diva-40423DOI: 10.1007/3-540-70734-4_9OAI: diva2:399625
5th International Workshop, PARA 2000 Bergen, Norway, June 18–20, 2000 Proceedings
Available from: 2011-02-23 Created: 2011-02-23 Last updated: 2011-02-28Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Elmroth, Erik
By organisation
Department of Computing Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 29 hits
ReferencesLink to record
Permanent link

Direct link