Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
Umeå University, Faculty of Science and Technology, Department of Computing Science. (IBM TJ Watson Res Ctr, Yorktown Hts, NY USA)
Show others and affiliations
2013 (English)In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 39, no 2, p. 9-Article in journal (Refereed) Published
Abstract [en]

Four routines called DPOTF3i, i = a, b, c, d, are presented. DPOTF3i are a novel type of level-3 BLAS for use by BPF (Blocked Packed Format) Cholesky factorization and LAPACK routine DPOTRF. Performance of routines DPOTF3i are still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts decreasing. This is our main result and it implies, due to the use of larger block size nb, that DGEMM, DSYRK, and DTRSM performance also increases! The four DPOTF3i routines use simple register blocking. Different platforms have different numbers of registers. Thus, our four routines have different register blocking sizes. BPF is introduced. LAPACK routines for POTRF and PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK POTRF source codes. We call these codes BPTRF. There are two variants of BPF: lower and upper. Upper BPF is "identical" to Square Block Packed Format (SBPF). "LAPACK" implementations on multicore processors use SBPF. Lower BPF is less efficient than upper BPF. Vector inplace transposition converts lower BPF to upper BPF very efficiently. Corroborating performance results for DPOTF3i versus DPOTF2 on a variety of common platforms are given for n approximate to nb as well as results for large n comparing DBPTRF versus DPOTRF.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2013. Vol. 39, no 2, p. 9-
Keywords [en]
Algorithms, Performance, LAPACK, real symmetric matrices, complex Hermitian matrices, positive definite matrices, Cholesky factorization and solution, novel blocked packed matrix data structures, inplace transposition, Cache Blocking, BLAS
National Category
Computer Sciences Mathematics
Identifiers
URN: urn:nbn:se:umu:diva-67808DOI: 10.1145/2427023.2427026ISI: 000315458000003Scopus ID: 2-s2.0-84875206154OAI: oai:DiVA.org:umu-67808DiVA, id: diva2:614234
Available from: 2013-04-03 Created: 2013-04-03 Last updated: 2023-03-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Gustavson, Fred G.

Search in DiVA

By author/editor
Gustavson, Fred G.
By organisation
Department of Computing Science
In the same journal
ACM Transactions on Mathematical Software
Computer SciencesMathematics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 130 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf