umu.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
BETA
Publikationer (2 of 2) Visa alla publikationer
Myllykoski, M. (2018). A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel Processing and Applied Mathematics: PPAM 2017. Paper presented at 12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017) (pp. 207-216). Springer
Öppna denna publikation i ny flik eller fönster >>A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form
2018 (Engelska)Ingår i: Parallel Processing and Applied Mathematics: PPAM 2017 / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, s. 207-216Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

A task-based parallel algorithm for reordering the eigenvalues of a matrix in real Schur form is presented.The algorithm is realized on top of the StarPU runtime system.Only the aspects which are relevant for shared memory machines are discussed here, but the implementation can be configured to run on distributed memory machines as well.Various techniques to reduce the overhead and the core idle time are discussed.Computational experiments indicate that the new algorithm is between 1.5 and 6.6 times faster than a state of the art MPI-based implementation found in ScaLAPACK.With medium to large matrices, strong scaling efficiencies above 60\% up to 28 CPU cores are reported.The overhead and the core idle time are shown to be negligible with the exception of the smallest matrices and highest core counts.

Ort, förlag, år, upplaga, sidor
Springer, 2018
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
Nyckelord
Eigenvalue reordering problem, Task based programming, Shared memory machines
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-145987 (URN)10.1007/978-3-319-78024-5_19 (DOI)000458563300019 ()978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
Konferens
12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017)
Tillgänglig från: 2018-03-24 Skapad: 2018-03-24 Senast uppdaterad: 2019-04-16Bibliografiskt granskad
Myllykoski, M., Rossi, T. & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. Journal of Parallel and Distributed Computing, 115, 56-66
Öppna denna publikation i ny flik eller fönster >>On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
2018 (Engelska)Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, s. 56-66Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

Ort, förlag, år, upplaga, sidor
Elsevier, 2018
Nyckelord
Fast direct solver, GPU computing, Partial solution technique, PSCR method, Roofline model, Separable block tridiagonal linear system
Nationell ämneskategori
Datavetenskap (datalogi) Programvaruteknik
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-145462 (URN)10.1016/j.jpdc.2018.01.004 (DOI)000427809200005 ()
Tillgänglig från: 2018-03-05 Skapad: 2018-03-05 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-3689-0899

Sök vidare i DiVA

Visa alla publikationer