Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (UMIT)
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (UMIT)
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (UMIT)
2018 (Engelska)Ingår i: Parallel Processing and Applied Mathematics. PPAM 2017: Part 1 / [ed] Wyrzykowski R., Dongarra J., Deelman E., Karczewski K., Springer, 2018, s. 579-589Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The reduction of a general dense square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallel algorithms exist, the Hessenberg reduction is one of the bottlenecks in AED, a main part in state-of-the-art software for the distributed multishift QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the sensitivity of its algorithmic parameters. The proposed algorithm is faster than LAPACK for all problem sizes and faster than ScaLAPACK for the relatively small problem sizes typical for AED.

Ort, förlag, år, upplaga, sidor
Springer, 2018. s. 579-589
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
Nyckelord [en]
Hessenberg reduction, Parallel cache assignment, NUMA-aware algorithm, Shared-memory, Tunable parameters, Off-line tuning
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
URN: urn:nbn:se:umu:diva-145342DOI: 10.1007/978-3-319-78024-5_50ISI: 000458563300050Scopus ID: 2-s2.0-85044775461ISBN: 978-3-319-78023-8 (tryckt)ISBN: 978-3-319-78024-5 (digital)OAI: oai:DiVA.org:umu-145342DiVA, id: diva2:1186599
Konferens
12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Lublin, Poland, 10–13 September, 2017
Anmärkning

Tillgänglig från: 2018-02-28 Skapad: 2018-02-28 Senast uppdaterad: 2023-03-23Bibliografiskt granskad
Ingår i avhandling
1. Identification and tuning of algorithmic parameters in parallel matrix computations: Hessenberg reduction and tensor storage format conversion
Öppna denna publikation i ny flik eller fönster >>Identification and tuning of algorithmic parameters in parallel matrix computations: Hessenberg reduction and tensor storage format conversion
2018 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis considers two problems in numerical linear algebra and high performance computing (HPC): (i) the parallelization of a new blocked Hessenberg reduction algorithm using Parallel Cache Assignment (PCA) and the tunability of its algorithm parameters, and (ii) storing and manipulating dense tensors on shared memory HPC systems.

The Hessenberg reduction appears in the Aggressive Early Deflation (AED) process for identifying converged eigenvalues in the distributed multishift QR algorithm (state-of-the-art algorithm for computing all eigenvalues for dense square matrices). Since the AED process becomes a parallel bottleneck it motivates a further study of AED components. We present a new Hessenberg reduction algorithm based on PCA which is NUMA-aware and targeting relatively small problem sizes on shared memory systems. The tunability of the algorithm parameters are investigated. A simple off-line tuning is presented and the performance of the new Hessenberg reduction algorithm is compared to its counterparts from LAPACK and ScaLAPACK. The new algorithm outperforms LAPACK in all tested cases and outperforms ScaLAPACK in problems smaller than order 1500, which are common problem sizes for AED in the context of the distributed multishift QR algorithm.

We also investigate automatic tuning of the algorithm parameters. The parameters span a huge search space and it is impractical to tune them using standard auto-tuning and optimization techniques. We present a modular auto-tuning framework which applies: search space decomposition, binning, and multi-stage search to enable searching the huge search space efficiently. The framework using these techniques exposes the underlying subproblems which allows using standard auto-tuning methods to tune them. In addition, the framework defines an abstract interface, which combined with its modular design, allows testing various tuning algorithms.

In the last part of the thesis, the focus is on the problem of storing and manipulating dense tensors. Developing open source tensor algorithms and applications is hard due to the lack of open source software for fundamental tensor operations. We present a software library dten, which includes tools for storing dense tensors in shared memory and converting a tensor storage format from one canonical form to another. The library provides two different ways to perform the conversion in parallel, in-place and out-of-place. The conversion involves moving blocks of contiguous data and are done to maximize the size of the blocks to move. In addition, the library supports tensor matricization for one or two tensors at the same time. The latter case is important in preparing tensors for contraction operations. The library is general purpose and highly flexible.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2018. s. 15
Serie
Report / UMINF, ISSN 0348-0542 ; UMINF 18.22
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-145345 (URN)978-91-7601-843-9 (ISBN)
Handledare
Tillgänglig från: 2018-03-01 Skapad: 2018-02-28 Senast uppdaterad: 2018-06-09Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Eljammaly, MahmoudKarlsson, LarsKågström, Bo

Sök vidare i DiVA

Av författaren/redaktören
Eljammaly, MahmoudKarlsson, LarsKågström, Bo
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)Beräkningsmatematik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 901 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf