Umeå University's logo

umu.sePublications
Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Bispo, João
    et al.
    University of Porto, Portugal.
    Barbosa, Jorge G.
    University of Porto, Portugal.
    Silva, Pedro Filipe
    University of Porto, Portugal.
    Morales, Cristian
    BSC, Spain.
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Ojeda-May, Pedro
    Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Bialczak, Milosz
    WCSS, Poland.
    Uchronski, Mariusz
    WCSS, Poland.
    Wlodarczyk, Adam
    WCSS, Poland.
    Wauligmann, Peter
    HLRS, Germany.
    Krishnasamy, Ezhilmathi
    University of Luxembourg, Luxembourg.
    Varrette, Sebastien
    University of Luxembourg, Luxembourg.
    Lührs, Sebastian
    JSC, Germany.
    Shoukourian, Hayk
    LRZ, Germany.
    Best Practice Guide: Modern Accelerators2021Report (Other academic)
  • 2.
    Karlsson, Lars
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Eljammaly, Mahmoud
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    D6.5 Evaluation of auto-tuning techniques2019Report (Other academic)
    Download full text (pdf)
    fulltext
  • 3.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencils2020In: Parallel Processing and Applied Mathematics: Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Ewa Deelman, Jack Dongarra, Konrad Karczewski, Springer, 2020, p. 58-69Conference paper (Refereed)
    Abstract [en]

    In this paper we consider the problem of computing generalized eigenvectors of a matrix pencil in real Schur form. In exact arithmetic, this problem can be solved using substitution. In practice, substitution is vulnerable to floating-point overflow. The robust solvers xtgevc in LAPACK prevent overflow by dynamically scaling the eigenvectors.These subroutines are scalar and sequential codes which compute theeigenvectors one by one. In this paper, we discuss how to derive robust algorithms which are blocked and parallel. The new StarNEig librarycontains a robust task-parallel solver Zazamoukh which runs on top of StarPU. Our numerical experiments show that Zazamoukh achieves a super-linear speedup compared with dtgevc for sufficiently large matrices.

    Download full text (pdf)
    fulltext
  • 4.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Adlerborn, Björn
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    D2.5 Eigenvalue problem solvers2017Report (Other academic)
    Download full text (pdf)
    fulltext
  • 5.
    Kågström, Bo
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Cayrols, Sébastien
    Science and Technology Facilities Council.
    Duff, Iain
    Science and Technology Facilities Council.
    Lopez, Florent
    Science and Technology Facilities Council.
    Nakov, Stojce
    Science and Technology Facilities Council.
    Pranesh, Srikara
    The University of Manchester.
    Stevens, David
    The University of Manchester.
    Dongarra, Jack
    The University of Manchester.
    Donfack, Simplice
    National Institute for Research in Computer Science and Control.
    Grigori, Laura
    National Institute for Research in Computer Science and Control.
    Tissot, Olivier
    National Institute for Research in Computer Science and Control.
    D7.8 Release of the NLAFET library2019Report (Other academic)
    Download full text (pdf)
    fulltext
  • 6.
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form2018In: Parallel Processing and Applied Mathematics: PPAM 2017 / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, p. 207-216Conference paper (Refereed)
    Abstract [en]

    A task-based parallel algorithm for reordering the eigenvalues of a matrix in real Schur form is presented.The algorithm is realized on top of the StarPU runtime system.Only the aspects which are relevant for shared memory machines are discussed here, but the implementation can be configured to run on distributed memory machines as well.Various techniques to reduce the overhead and the core idle time are discussed.Computational experiments indicate that the new algorithm is between 1.5 and 6.6 times faster than a state of the art MPI-based implementation found in ScaLAPACK.With medium to large matrices, strong scaling efficiencies above 60\% up to 28 CPU cores are reported.The overhead and the core idle time are shown to be negligible with the exception of the smallest matrices and highest core counts.

    Download full text (pdf)
    fulltext
  • 7.
    Myllykoski, Mirko
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation2022In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 48, no 1, p. 1-36, article id 11Article in journal (Refereed)
    Abstract [en]

    The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.

    Download full text (pdf)
    fulltext
  • 8.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Eljammaly, Mahmoud
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Pranesh, Srikara
    The University of Manchester.
    Zounon, Mawussi
    The University of Manchester.
    D2.6 Prototype Software for Eigenvalue Problem Solvers2018Report (Other academic)
    Download full text (pdf)
    fulltext
  • 9.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Introduction to StarNEig: A Task-based Library for Solving Nonsymmetric Eigenvalue Problems2020In: Parallel Processing and Applied Mathematics: Revised Selected Papers, Part I / [ed] Roman Wyrzykowski and Boleslaw Szymanski, Springer, 2020, p. 70-81Conference paper (Refereed)
    Abstract [en]

    Abstract. In this paper, we present the StarNEig library for solvingdense nonsymmetric (generalized) eigenvalue problems. The library isbuilt on top of the StarPU runtime system and targets both shared anddistributed memory machines. Some components of the library supportGPUs. The library is currently in an early beta state and only real arith-metic is supported. Support for complex data types is planned for afuture release. This paper is aimed at potential users of the library. Wedescribe the design choices and capabilities of the library, and contrastthem to existing software such as ScaLAPACK. StarNEig implements aScaLAPACK compatibility layer that should make it easy for new usersto transition to StarNEig. We demonstrate the performance of the librarywith a small set of computational experiments.

    Download full text (pdf)
    fulltext
  • 10.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Task‐based, GPU‐accelerated and robust library for solving dense nonsymmetric eigenvalue problems2021In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 33, no 11, article id e5915Article in journal (Refereed)
    Abstract [en]

    In this paper, we present the StarNEig library for solving dense nonsymmetric standard and generalized eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library have support for GPU acceleration. The library currently applies to real matrices with real and complex eigenvalues and all calculations are done using real arithmetic. Support for complex matrices is planned for a future release. This paper is aimed at potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as LAPACK and ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer which should assist new users in the transition to StarNEig. We demonstrate the performance of the library with a sample of computational experiments.

    Download full text (pdf)
    fulltext
  • 11.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Task-Based Parallel Algorithms for Eigenvalue Reordering of Matrices in Real Schur Forms2017Report (Other academic)
    Abstract [en]

    We develop a task-based parallel algorithm for reordering eigenvalues of matrices in real Schur form. We describe how we implemented the algorithm using StarPU runtime system and report on experiments performed on a shared memory machine. Compared with ScaLAPACK we achieve average speedup of 3. We have strong and weak scaling efficiencies which are well above 50%. We are able to achieve more than 50% of the peak flop rate for all but the smallest matrices. The idle time and the overhead is negligible except for the smallest matrices. The next step is to reconfigure and further develop the code so that it can be applied to matrix pairs in generalized Schur forms and run efficiently on distributed memory machines.

    Download full text (pdf)
    fulltext
  • 12.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Schwarz, Angelika Beatrix
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    D2.7 Eigenvalue solvers for nonsymmetric problems2019Report (Other academic)
    Download full text (pdf)
    fulltext
  • 13.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Department of Mathematical Information Technology, University of Jyväskylä.
    Rossi, Tuomo
    Department of Mathematical Information Technology, University of Jyväskylä.
    Toivanen, Jari
    Department of Mathematical Information Technology, University of Jyväskylä; Department of Aeronautics & Astronautics, Stanford University.
    On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method2018In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, p. 56-66Article in journal (Refereed)
    Abstract [en]

    Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

    Download full text (pdf)
    fulltext
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf