umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Department of Mathematical Information Technology, University of Jyväskylä.ORCID-id: 0000-0002-3689-0899
Department of Mathematical Information Technology, University of Jyväskylä.
Department of Mathematical Information Technology, University of Jyväskylä; Department of Aeronautics & Astronautics, Stanford University.
2018 (Engelska)Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, s. 56-66Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

Ort, förlag, år, upplaga, sidor
Elsevier, 2018. Vol. 115, s. 56-66
Nyckelord [en]
Fast direct solver, GPU computing, Partial solution technique, PSCR method, Roofline model, Separable block tridiagonal linear system
Nationell ämneskategori
Datavetenskap (datalogi) Programvaruteknik
Forskningsämne
administrativ databehandling
Identifikatorer
URN: urn:nbn:se:umu:diva-145462DOI: 10.1016/j.jpdc.2018.01.004ISI: 000427809200005OAI: oai:DiVA.org:umu-145462DiVA, id: diva2:1187714
Tillgänglig från: 2018-03-05 Skapad: 2018-03-05 Senast uppdaterad: 2018-06-09Bibliografiskt granskad

Open Access i DiVA

Publikationen är tillgänglig i fulltext från 2020-06-01 15:36
Tillgänglig från 2020-06-01 15:36

Övriga länkar

Förlagets fulltext

Personposter BETA

Myllykoski, Mirko

Sök vidare i DiVA

Av författaren/redaktören
Myllykoski, Mirko
Av organisationen
Institutionen för datavetenskap
I samma tidskrift
Journal of Parallel and Distributed Computing
Datavetenskap (datalogi)Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 120 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf