umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).ORCID-id: 0000-0002-4675-7434
2016 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cache-blocked algorithm by Kågstrom, Kressner, E.S. Quintana-Ortí, and G. Quintana-Ortí (2008). A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new algorithm is an improvement of the state of the art but also identifies factors that currently limit its scalability.

Ort, förlag, år, upplaga, sidor
Umeå: Department of Computing Science, Umeå University , 2016. , s. 26
Serie
Report / UMINF, ISSN 0348-0542 ; 16.10
Nationell ämneskategori
Beräkningsmatematik
Identifikatorer
URN: urn:nbn:se:umu:diva-120002OAI: oai:DiVA.org:umu-120002DiVA, id: diva2:926156
Tillgänglig från: 2016-05-04 Skapad: 2016-05-04 Senast uppdaterad: 2018-06-07Bibliografiskt granskad
Ingår i avhandling
1. Parallel Algorithms and Library Software for the Generalized Eigenvalue Problem on Distributed Memory Computer Systems
Öppna denna publikation i ny flik eller fönster >>Parallel Algorithms and Library Software for the Generalized Eigenvalue Problem on Distributed Memory Computer Systems
2016 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Parallella algoritmer och biblioteksprogramvara för det generaliserade egenvärdesproblemet på datorsystem med distribuerat minne
Abstract [en]

We present and discuss algorithms and library software for solving the generalized non-symmetric eigenvalue problem (GNEP) on high performance computing (HPC) platforms with distributed memory. Such problems occur frequently in computational science and engineering, and our contributions make it possible to solve GNEPs fast and accurate in parallel using state-of-the-art HPC systems. A generalized eigenvalue problem corresponds to finding scalars y and vectors x such that Ax = yBx, where A and B are real square matrices. A nonzero x that satisfies the GNEP equation is called an eigenvector of the ordered pair (A,B), and the scalar y is the associated eigenvalue. Our contributions include parallel algorithms for transforming a matrix pair (A,B) to a generalized Schur form (S,T), where S is quasi upper triangular and T is upper triangular. The eigenvalues are revealed from the diagonals of S and T. Moreover, for a specified set of eigenvalues an associated pair of deflating subspaces can be computed, which typically is requested in various applications. In the first stage the matrix pair (A,B) is reduced to a Hessenberg-triangular form (H,T), where H is upper triangular with one nonzero subdiagonal and T is upper triangular, in a finite number of steps. The second stage reduces the matrix pair further to generalized Schur form (S,T) using an iterative QZ-based method. Outgoing from a one-stage method for the reduction from (A,B) to (H,T), a novel parallel algorithm is developed. In brief, a delayed update technique is applied to several partial steps, involving low level operations, before associated accumulated transformations are applied in a blocked fashion which together with a wave-front task scheduler makes the algorithm scale when running in a parallel setting. The potential presence of infinite eigenvalues makes a generalized eigenvalue problem ill-conditioned. Therefore the parallel algorithm for the second stage, reduction to (S,T) form, continuously scan for and robustly deflate infinite eigenvalues. This will reduce the impact so that they do not interfere with other real eigenvalues or are misinterpreted as real eigenvalues. In addition, our parallel iterative QZ-based algorithm makes use of multiple implicit shifts and an aggressive early deflation (AED) technique, which radically speeds up the convergence. The multi-shift strategy is based on independent chains of so called coupled bulges and computational windows which is an important source of making the algorithm scalable. The parallel algorithms have been implemented in state-of-the-art library software. The performance is demonstrated and evaluated using up to 1600 CPU cores for problems with matrices as large as 100000 x 100000. Our library software is described in a User Guide. The software is, optionally, tunable via a set of parameters for various thresholds and buffer sizes etc. These parameters are discussed, and recommended values are specified which should result in reasonable performance on HPC systems similar to the ones we have been running on.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2016. s. 18
Serie
Report / UMINF, ISSN 0348-0542 ; 16.11
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
data- och systemvetenskap
Identifikatorer
urn:nbn:se:umu:diva-119439 (URN)978-91-7601-491-2 (ISBN)
Presentation
2016-05-27, MC313, Umeå universitet, Umeå, 10:00 (Engelska)
Handledare
Tillgänglig från: 2016-04-19 Skapad: 2016-04-19 Senast uppdaterad: 2018-06-07Bibliografiskt granskad

Open Access i DiVA

fulltext(1337 kB)156 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1337 kBChecksumma SHA-512
44f94e97d9dc608e672218359bb4a57017bc7d238cb47e70bd5de5f7390892112ade665a603f48e85c525559ab42595cd6897643bb509adce29fa9a148044190
Typ fulltextMimetyp application/pdf

Övriga länkar

URL

Personposter BETA

Adlerborn, BjörnKågström, BoKarlsson, Lars

Sök vidare i DiVA

Av författaren/redaktören
Adlerborn, BjörnKågström, BoKarlsson, Lars
Av organisationen
Institutionen för datavetenskapHögpresterande beräkningscentrum norr (HPC2N)
Beräkningsmatematik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 156 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 1491 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf