umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Applying recursion to serial and parallel QR factorization leads to better performance
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
2000 (engelsk)Inngår i: IBM Journal of Research and Development, ISSN 0018-8646, E-ISSN 2151-8556, Vol. 44, nr 4, s. 605-624Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

We present new recursive serial and parallel algorithms for QR factorization of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a standard block algorithm with Level 3 operations. However, there are significant additional costs for creating and performing the updates, which prohibit the efficient use of the recursion for large n. We present a quantitative analysis of these extra costs. This analysis leads us to introduce a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices. Uniprocessor performance results are presented for two IBM RS/6000(R) SP nodes-a 120-MHz IBM POWER2 node and one processor of a four-way 332-MHz IBM PowerPC(R) 604e SMP node. The hybrid recursive algorithm reaches more than 90% of the theoretical peak performance of the POWER2 node, Compared to standard block algorithms, the recursive approach also shows a significant advantage in the automatic tuning obtained from its automatic variable blocking. A successful parallel implementation on a four-way 332-MHz IBM PPC604e SMP node based on dynamic load balancing is presented. For two, three, and four processors it shows speedups of up to 1.97, 2.99, and 3.97.

sted, utgiver, år, opplag, sider
IEEE Press, 2000. Vol. 44, nr 4, s. 605-624
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-21923DOI: 10.1147/rd.444.0605ISI: 000088316600011ISBN: 0018-8646 (tryckt)OAI: oai:DiVA.org:umu-21923DiVA, id: diva2:212178
Tilgjengelig fra: 2009-04-21 Laget: 2009-04-21 Sist oppdatert: 2018-06-08bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Personposter BETA

Elmroth, ErikGustavson, F. G.

Søk i DiVA

Av forfatter/redaktør
Elmroth, ErikGustavson, F. G.
Av organisasjonen
I samme tidsskrift
IBM Journal of Research and Development

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 243 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf