umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
Umeå University, Faculty of Science and Technology, Department of Computing Science. Department of Mathematical Information Technology, University of Jyväskylä.ORCID iD: 0000-0002-3689-0899
Department of Mathematical Information Technology, University of Jyväskylä.
Department of Mathematical Information Technology, University of Jyväskylä; Department of Aeronautics & Astronautics, Stanford University.
2018 (English)In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, p. 56-66Article in journal (Refereed) Published
Abstract [en]

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

Place, publisher, year, edition, pages
Elsevier, 2018. Vol. 115, p. 56-66
Keywords [en]
Fast direct solver, GPU computing, Partial solution technique, PSCR method, Roofline model, Separable block tridiagonal linear system
National Category
Computer Sciences Software Engineering
Research subject
Computing Science
Identifiers
URN: urn:nbn:se:umu:diva-145462DOI: 10.1016/j.jpdc.2018.01.004ISI: 000427809200005OAI: oai:DiVA.org:umu-145462DiVA, id: diva2:1187714
Available from: 2018-03-05 Created: 2018-03-05 Last updated: 2018-06-09Bibliographically approved

Open Access in DiVA

The full text will be freely available from 2020-06-01 15:36
Available from 2020-06-01 15:36

Other links

Publisher's full text

Authority records BETA

Myllykoski, Mirko

Search in DiVA

By author/editor
Myllykoski, Mirko
By organisation
Department of Computing Science
In the same journal
Journal of Parallel and Distributed Computing
Computer SciencesSoftware Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 101 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf