umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (2 of 2) Show all publications
Myllykoski, M. (2018). A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel Processing and Applied Mathematics: PPAM 2017. Paper presented at 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017) (pp. 207-216). Springer
Open this publication in new window or tab >>A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form
2018 (English)In: Parallel Processing and Applied Mathematics: PPAM 2017 / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, p. 207-216Conference paper, Published paper (Refereed)
Abstract [en]

A task-based parallel algorithm for reordering the eigenvalues of a matrix in real Schur form is presented.The algorithm is realized on top of the StarPU runtime system.Only the aspects which are relevant for shared memory machines are discussed here, but the implementation can be configured to run on distributed memory machines as well.Various techniques to reduce the overhead and the core idle time are discussed.Computational experiments indicate that the new algorithm is between 1.5 and 6.6 times faster than a state of the art MPI-based implementation found in ScaLAPACK.With medium to large matrices, strong scaling efficiencies above 60\% up to 28 CPU cores are reported.The overhead and the core idle time are shown to be negligible with the exception of the smallest matrices and highest core counts.

Place, publisher, year, edition, pages
Springer, 2018
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
Keywords
Eigenvalue reordering problem, Task based programming, Shared memory machines
National Category
Computer Sciences Computational Mathematics
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-145987 (URN)10.1007/978-3-319-78024-5_19 (DOI)000458563300019 ()978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
Conference
12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017)
Available from: 2018-03-24 Created: 2018-03-24 Last updated: 2019-04-16Bibliographically approved
Myllykoski, M., Rossi, T. & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. Journal of Parallel and Distributed Computing, 115, 56-66
Open this publication in new window or tab >>On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
2018 (English)In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, p. 56-66Article in journal (Refereed) Published
Abstract [en]

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

Place, publisher, year, edition, pages
Elsevier, 2018
Keywords
Fast direct solver, GPU computing, Partial solution technique, PSCR method, Roofline model, Separable block tridiagonal linear system
National Category
Computer Sciences Software Engineering
Research subject
business data processing
Identifiers
urn:nbn:se:umu:diva-145462 (URN)10.1016/j.jpdc.2018.01.004 (DOI)000427809200005 ()
Available from: 2018-03-05 Created: 2018-03-05 Last updated: 2018-06-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3689-0899

Search in DiVA

Show all publications