umu.sePublications
Change search

BETA
##### Publications (10 of 28) Show all publications
Schwarz, A. B. & Karlsson, L. (2019). Scalable eigenvector computation for the non-symmetric eigenvalue problem. Parallel Computing, 85, 131-140
Open this publication in new window or tab >>Scalable eigenvector computation for the non-symmetric eigenvalue problem
2019 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 85, p. 131-140Article in journal (Refereed) Published
##### Abstract [en]

We present two task-centric algorithms for computing selected eigenvectors of a non-symmetric matrix reduced to real Schur form. Our approach eliminates the sequential phases present in the current LAPACK/ScaLAPACK implementation. We demonstrate the scalability of our implementation on multicore, manycore and distributed memory systems.

Elsevier, 2019
##### Keywords
Eigenvectors, Real Schur form, Tiled algorithmsMPI + OpenMP parallel programming
Computer Systems
Computer Science
##### Identifiers
urn:nbn:se:umu:diva-159296 (URN)10.1016/j.parco.2019.04.001 (DOI)
Available from: 2019-05-23 Created: 2019-05-23 Last updated: 2019-05-28Bibliographically approved
Bujanović, Z., Karlsson, L. & Kressner, D. (2018). A Householder-Based Algorithm for Hessenberg-Triangular Reduction. SIAM Journal on Matrix Analysis and Applications, 39(3), 1270-1294
Open this publication in new window or tab >>A Householder-Based Algorithm for Hessenberg-Triangular Reduction
2018 (English)In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, E-ISSN 1095-7162, Vol. 39, no 3, p. 1270-1294Article in journal (Refereed) Published
##### Abstract [en]

The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil $A - \lambda B$ requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations regrouped and accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A nonvanishing fraction of the total flop-count must nevertheless still be performed as sequences of overlapping Givens rotations alternately applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and poor scalability. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into block reflectors, by using (compact) WY representations. Even though the new algorithm requires more floating point operations than the state-of-the-art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multithreaded BLAS. The design and evaluation of a parallel formulation is future work.

##### Place, publisher, year, edition, pages
SIAM Publications, 2018
##### National Category
Computational Mathematics
##### Identifiers
urn:nbn:se:umu:diva-154648 (URN)10.1137/17M1153637 (DOI)000453716400010 ()
##### Funder
EU, Horizon 2020, 671633 Available from: 2018-12-21 Created: 2018-12-21 Last updated: 2019-01-08Bibliographically approved
Eljammaly, M., Karlsson, L. & Kågström, B. (2018). An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm. In: ICPE '18 Companion of the 2018 ACM/SPEC International Conference on Performance Engineering: . Paper presented at International Conference on Performance Engineering (ICPE 2018) (pp. 5-8). ACM Digital Library
Open this publication in new window or tab >>An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm
2018 (English)In: ICPE '18 Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, ACM Digital Library, 2018, , p. 4p. 5-8Conference paper, Published paper (Refereed)
##### Place, publisher, year, edition, pages
ACM Digital Library, 2018. p. 4
##### Keywords
Auto-tuning, Tuning framework, Binning, Search space decomposition, Multistage search, Hessenberg reduction, NUMA-aware
##### National Category
Computer Sciences
##### Identifiers
urn:nbn:se:umu:diva-154392 (URN)10.1145/3185768.3186304 (DOI)978-1-4503-5629-9 (ISBN)
##### Conference
International Conference on Performance Engineering (ICPE 2018)
Available from: 2018-12-17 Created: 2018-12-17 Last updated: 2018-12-17
Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2018). Blocked Algorithms for Robust Solution of Triangular Linear Systems. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I. Paper presented at 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017 (pp. 68-78). Springer, 1
Open this publication in new window or tab >>Blocked Algorithms for Robust Solution of Triangular Linear Systems
2018 (English)In: Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, Vol. 1, p. 68-78Conference paper, Published paper (Refereed)
##### Abstract [en]

We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx = αb can be computed without exceeding an overflow threshold Ω. Here T is a non-singular upper triangular matrix and b is a single vector, and Ω is less than the largest representable number. This problem is central to the computation of eigenvectors from Schur forms. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK. We explain why it is impractical to parallelize these algorithms. We then derive a robust blocked algorithm which can be executed in parallel using a task-based run-time system such as StarPU. The parallel overhead is increased marginally compared with regular blocked backward substitution.

Springer, 2018
##### Series
Theoretical Computer Science and General Issues, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
##### Keywords
Triangular linear systems, overflow, blocked algorithms, robust algorithms
##### National Category
Computer Sciences
##### Identifiers
urn:nbn:se:umu:diva-154296 (URN)10.1007/978-3-319-78024-5_7 (DOI)000458563300007 ()2-s2.0-85044716269 (Scopus ID)978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
##### Conference
12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017
Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2019-04-16Bibliographically approved
Adlerborn, B., Karlsson, L. & Kågström, B. (2018). Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling. SIAM Journal on Scientific Computing, 40(2), C157-C180
Open this publication in new window or tab >>Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling
2018 (English)In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 40, no 2, p. C157-C180Article in journal (Refereed) Published
##### Abstract [en]

A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cacheblocked algorithm by K degrees agstrom et al. [BIT, 48 (2008), pp. 563 584]. A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new formulation is an improvement of the state of the art and also identify factors that limit its scalability.

##### Place, publisher, year, edition, pages
Society for Industrial and Applied Mathematics, 2018
##### Keywords
generalized eigenvalue problem, Hessenberg-triangular reduction, parallel algorithms, wavefront scheduling
##### National Category
Computational Mathematics
##### Identifiers
urn:nbn:se:umu:diva-147843 (URN)10.1137/16M1103890 (DOI)000431100400039 ()
Available from: 2018-05-18 Created: 2018-05-18 Last updated: 2018-06-09Bibliographically approved
Eljammaly, M., Karlsson, L. & Kågström, B. (2018). On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment. In: Wyrzykowski R., Dongarra J., Deelman E., Karczewski K. (Ed.), Parallel Processing and Applied Mathematics. PPAM 2017: Part 1. Paper presented at 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Lublin, Poland, 10–13 September, 2017 (pp. 579-589). Springer
Open this publication in new window or tab >>On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment
2018 (English)In: Parallel Processing and Applied Mathematics. PPAM 2017: Part 1 / [ed] Wyrzykowski R., Dongarra J., Deelman E., Karczewski K., Springer, 2018, p. 579-589Conference paper, Published paper (Refereed)
##### Abstract [en]

The reduction of a general dense square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallel algorithms exist, the Hessenberg reduction is one of the bottlenecks in AED, a main part in state-of-the-art software for the distributed multishift QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the sensitivity of its algorithmic parameters. The proposed algorithm is faster than LAPACK for all problem sizes and faster than ScaLAPACK for the relatively small problem sizes typical for AED.

Springer, 2018
##### Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
##### Keywords
Hessenberg reduction, Parallel cache assignment, NUMA-aware algorithm, Shared-memory, Tunable parameters, Off-line tuning
##### National Category
Computer Sciences Computational Mathematics
##### Identifiers
urn:nbn:se:umu:diva-145342 (URN)10.1007/978-3-319-78024-5_50 (DOI)000458563300050 ()978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
##### Conference
12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Lublin, Poland, 10–13 September, 2017
##### Note

Available from: 2018-02-28 Created: 2018-02-28 Last updated: 2019-03-06Bibliographically approved
Kjelgaard Mikkelsen, C. C., Schwarz, A. B. & Karlsson, L. (2018). Parallel robust solution of triangular linear systems. Concurrency and Computation, Article ID e5064.
Open this publication in new window or tab >>Parallel robust solution of triangular linear systems
2018 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, article id e5064Article in journal (Refereed) Epub ahead of print
##### Abstract [en]

Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.

##### Place, publisher, year, edition, pages
John Wiley & Sons, 2018
##### Keywords
Overflow protection, parallel algorithms, task-based parallelism, triangular linear systems
##### National Category
Computer Sciences
##### Identifiers
urn:nbn:se:umu:diva-154297 (URN)10.1002/cpe.5064 (DOI)2-s2.0-85056329436 (Scopus ID)
Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2019-05-28
Eljammaly, M., Karlsson, L. & Kågström, B. (2017). An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm. Umeå: Department of computing science, Umeå university
Open this publication in new window or tab >>An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm
##### Abstract [en]

The performance of a recently developed Hessenberg reduction algorithm greatly depends on the values chosen for its tunable parameters. The search space is huge combined with other complications makes the problem hard to solve effectively with generic methods and tools. We describe a modular auto-tuning framework in which the underlying optimization algorithm is easy to substitute. The framework exposes sub-problems of standard auto-tuning type for which existing generic methods can be reused. The outputs of concurrently executing sub-tuners are assembled by the framework into a solution to the original problem.

##### Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå university, 2017. p. 14
##### Series
Report / UMINF, ISSN 0348-0542 ; 17.19
##### Keywords
Auto-tuning, Tuning framework, Binning, Search space decomposition, Multistage search, Hessenberg reduction, NUMA-aware
##### National Category
Computer Sciences
##### Identifiers
urn:nbn:se:umu:diva-145297 (URN)
Available from: 2018-02-28 Created: 2018-02-28 Last updated: 2018-06-09Bibliographically approved
Karlsson, L. & Kjelgaard Mikkelsen, C. C. (2017). Negative stride in the column-major format makes sense and has useful applications.
Open this publication in new window or tab >>Negative stride in the column-major format makes sense and has useful applications
##### Abstract [en]

Two lower triangular or two upper triangular matrices of the same size can be stored with minimal memory footprint. If both positive and negative strides are used, then both matrices can be accessed as if they were stored in regular column-major format.

p. 4
##### Series
Report / UMINF, ISSN 0348-0542 ; 17.17
##### National Category
Other Computer and Information Science
##### Identifiers
urn:nbn:se:umu:diva-154646 (URN)
Available from: 2018-12-21 Created: 2018-12-21 Last updated: 2018-12-21
Bosner, N. & Karlsson, L. (2017). Parallel and Heterogeneous $m$-Hessenberg-Triangular-Triangular Reduction. SIAM Journal on Scientific Computing, 39(1), C29-C47
Open this publication in new window or tab >>Parallel and Heterogeneous $m$-Hessenberg-Triangular-Triangular Reduction
2017 (English)In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 39, no 1, p. C29-C47Article in journal (Refereed) Published
##### Abstract [en]

The m-Hessenberg-triangular-triangular (mHTT) reduction is a simultaneous orthogonal reduction of three matrices to condensed form. It has applications, for example, in solving shifted linear systems arising in various control theory problems. A new heterogeneous CPU/GPU implementation of the mHTT reduction is presented and evaluated against an existing CPU implementation. The algorithm offloads the compute-intensive matrix-matrix multiplications to the GPU and keeps the inner loop, which is memory intensive and has a complicated control flow, on the CPU. Experiments demonstrate that the heterogeneous implementation can be superior to the existing CPU implementation on a system with 2 x 8 CPU cores and one GPU. Future development should focus on improving the scalability of the CPU computations.

##### Keywords
m-Hessenberg-triangular-triangular form, solving shifted linear systems, Givens rotations, terogeneous CPU/GPU implementation
##### National Category
Control Engineering
##### Identifiers
urn:nbn:se:umu:diva-133267 (URN)10.1137/15M1047349 (DOI)000395747800025 ()
Available from: 2017-04-07 Created: 2017-04-07 Last updated: 2018-06-09Bibliographically approved
##### Identifiers
ORCID iD: orcid.org/0000-0002-4675-7434

#### Search in DiVA

Show all publications