umu.sePublikasjoner
Endre søk
Link to record
Permanent link

Direct link
BETA
Publikasjoner (2 av 2) Visa alla publikasjoner
Myllykoski, M. (2018). A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel Processing and Applied Mathematics: PPAM 2017. Paper presented at 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017) (pp. 207-216). Springer
Åpne denne publikasjonen i ny fane eller vindu >>A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form
2018 (engelsk)Inngår i: Parallel Processing and Applied Mathematics: PPAM 2017 / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, s. 207-216Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

A task-based parallel algorithm for reordering the eigenvalues of a matrix in real Schur form is presented.The algorithm is realized on top of the StarPU runtime system.Only the aspects which are relevant for shared memory machines are discussed here, but the implementation can be configured to run on distributed memory machines as well.Various techniques to reduce the overhead and the core idle time are discussed.Computational experiments indicate that the new algorithm is between 1.5 and 6.6 times faster than a state of the art MPI-based implementation found in ScaLAPACK.With medium to large matrices, strong scaling efficiencies above 60\% up to 28 CPU cores are reported.The overhead and the core idle time are shown to be negligible with the exception of the smallest matrices and highest core counts.

sted, utgiver, år, opplag, sider
Springer, 2018
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
Emneord
Eigenvalue reordering problem, Task based programming, Shared memory machines
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-145987 (URN)10.1007/978-3-319-78024-5_19 (DOI)000458563300019 ()978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
Konferanse
12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017)
Tilgjengelig fra: 2018-03-24 Laget: 2018-03-24 Sist oppdatert: 2019-04-16bibliografisk kontrollert
Myllykoski, M., Rossi, T. & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. Journal of Parallel and Distributed Computing, 115, 56-66
Åpne denne publikasjonen i ny fane eller vindu >>On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
2018 (engelsk)Inngår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, s. 56-66Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

sted, utgiver, år, opplag, sider
Elsevier, 2018
Emneord
Fast direct solver, GPU computing, Partial solution technique, PSCR method, Roofline model, Separable block tridiagonal linear system
HSV kategori
Forskningsprogram
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-145462 (URN)10.1016/j.jpdc.2018.01.004 (DOI)000427809200005 ()
Tilgjengelig fra: 2018-03-05 Laget: 2018-03-05 Sist oppdatert: 2018-06-09bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-3689-0899