Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 41) Show all publications
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition. Frontiers in Applied Mathematics and Statistics, 8, Article ID 830270.
Open this publication in new window or tab >>Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition
2022 (English)In: Frontiers in Applied Mathematics and Statistics, E-ISSN 2297-4687, Vol. 8, article id 830270Article in journal (Refereed) Published
Abstract [en]

The Canonical Polyadic (CP) tensor decomposition is frequently used as a model in applications in a variety of different fields. Using jackknife resampling to estimate parameter uncertainties is often desirable but results in an increase of the already high computational cost. Upon observation that the resampled tensors, though different, are nearly identical, we show that it is possible to extend the recently proposed Concurrent ALS (CALS) technique to a jackknife resampling scenario. This extension gives access to the computational efficiency advantage of CALS for the price of a modest increase (typically a few percent) in the number of floating point operations. Numerical experiments on both synthetic and real-world datasets demonstrate that the new workflow based on a CALS extension can be several times faster than a straightforward workflow where the jackknife submodels are processed individually.

Keywords
ALS, Alternating Least Squares, Canonical Polyadic Decomposition, CP, decomposition, jackknife, Tensors
National Category
Computer Sciences Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-194532 (URN)10.3389/fams.2022.830270 (DOI)000792602600001 ()2-s2.0-85128906913 (Scopus ID)
Available from: 2022-05-10 Created: 2022-05-10 Last updated: 2023-11-10Bibliographically approved
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions. ACM Transactions on Mathematical Software, 48(3), Article ID 3519383.
Open this publication in new window or tab >>Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions
2022 (English)In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 48, no 3, article id 3519383Article in journal (Refereed) Published
Abstract [en]

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this article, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time, the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.

Place, publisher, year, edition, pages
ACM Digital Library, 2022
Keywords
CP, decomposition, high-performance, PARAFAC, Tensor
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-203243 (URN)10.1145/3519383 (DOI)000865883900010 ()2-s2.0-85140090648 (Scopus ID)
Available from: 2023-01-17 Created: 2023-01-17 Last updated: 2023-11-10Bibliographically approved
López, F., Karlsson, L. & Bientinesi, P. (2022). FLOPs as a discriminant for dense linear algebra algorithms. In: ICPP '22: proceedings of the 51st international conference on parallel processing. Paper presented at ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022. ACM Digital Library, Article ID 11.
Open this publication in new window or tab >>FLOPs as a discriminant for dense linear algebra algorithms
2022 (English)In: ICPP '22: proceedings of the 51st international conference on parallel processing, ACM Digital Library, 2022, article id 11Conference paper, Published paper (Refereed)
Abstract [en]

Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing "anomalies": problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.

Place, publisher, year, edition, pages
ACM Digital Library, 2022
Keywords
algorithm selection, linear algebra, scientific computing
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-214445 (URN)10.1145/3545008.3545072 (DOI)2-s2.0-85138311336 (Scopus ID)9781450397339 (ISBN)
Conference
ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022
Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2023-11-10Bibliographically approved
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem. Umeå universitet
Open this publication in new window or tab >>Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem
2020 (English)Report (Other academic)
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schurform relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

Place, publisher, year, edition, pages
Umeå universitet, 2020. p. 25
Series
Report / UMINF, ISSN 0348-0542 ; 20.02
National Category
Computer Sciences
Research subject
Computer Science; Mathematics
Identifiers
urn:nbn:se:umu:diva-168433 (URN)
Projects
NLAFET
Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2023-03-07Bibliographically approved
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust parallel eigenvector computation for the non-symmetric eigenvalue problem. Parallel Computing, 100, Article ID 102707.
Open this publication in new window or tab >>Robust parallel eigenvector computation for the non-symmetric eigenvalue problem
2020 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 100, article id 102707Article in journal (Refereed) Published
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

Place, publisher, year, edition, pages
Elsevier, 2020
Keywords
Overflow protection, Eigenvectors, Real Schur form, Tiled algorithms
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-177534 (URN)10.1016/j.parco.2020.102707 (DOI)000597159800001 ()2-s2.0-85089018373 (Scopus ID)
Projects
NLAFET
Note

Preprint version: http://umu.diva-portal.org/smash/record.jsf?pid=diva2%3A1396196&dswid=-7045

Available from: 2020-12-11 Created: 2020-12-11 Last updated: 2023-03-24Bibliographically approved
Karlsson, L., Eljammaly, M. & Myllykoski, M. (2019). D6.5 Evaluation of auto-tuning techniques. NLAFET Consortium; Umeå University
Open this publication in new window or tab >>D6.5 Evaluation of auto-tuning techniques
2019 (English)Report (Other academic)
Place, publisher, year, edition, pages
NLAFET Consortium; Umeå University, 2019. p. 27
National Category
Computer Sciences
Research subject
Computer Science; Mathematics
Identifiers
urn:nbn:se:umu:diva-168425 (URN)
Projects
NLAFET
Note

This work is c by the NLAFET Consortium, 2015–2018. Its duplication is allowed only for personal, educational, or research uses.

Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2020-02-26Bibliographically approved
Kågström, B., Myllykoski, M., Karlsson, L., Kjelgaard Mikkelsen, C. C., Cayrols, S., Duff, I., . . . Tissot, O. (2019). D7.8 Release of the NLAFET library. NLAFET Consortium; Umeå University
Open this publication in new window or tab >>D7.8 Release of the NLAFET library
Show others...
2019 (English)Report (Other academic)
Place, publisher, year, edition, pages
NLAFET Consortium; Umeå University, 2019. p. 27
National Category
Computer Sciences
Research subject
Mathematics; Computer Science
Identifiers
urn:nbn:se:umu:diva-168426 (URN)
Projects
NLAFET
Note

This work is c by the NLAFET Consortium, 2015–2019. Its duplication is allowed only for personal, educational, or research uses.

Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2020-02-27Bibliographically approved
Karlsson, L. & Pettersson, J. (2019). Experiences of using mastery learning and the personalized system of instruction. In: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin. Paper presented at Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019. (pp. 21-21). Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet
Open this publication in new window or tab >>Experiences of using mastery learning and the personalized system of instruction
2019 (English)In: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin, Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet , 2019, p. 21-21Conference paper, Oral presentation with published abstract (Other academic)
Place, publisher, year, edition, pages
Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet, 2019
Series
Skriftserie från Universitetspedagogik och lärandestöd (UPL) ; 2019:1
National Category
Educational Sciences
Identifiers
urn:nbn:se:umu:diva-194800 (URN)
Conference
Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019.
Available from: 2022-05-17 Created: 2022-05-17 Last updated: 2022-05-23Bibliographically approved
Kjelgaard Mikkelsen, C. C., Schwarz, A. B. & Karlsson, L. (2019). Parallel robust solution of triangular linear systems. Concurrency and Computation, 31(19), Article ID e5064.
Open this publication in new window or tab >>Parallel robust solution of triangular linear systems
2019 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 31, no 19, article id e5064Article in journal (Refereed) Published
Abstract [en]

Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.

Place, publisher, year, edition, pages
John Wiley & Sons, 2019
Keywords
Overflow protection, parallel algorithms, task-based parallelism, triangular linear systems
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-154297 (URN)10.1002/cpe.5064 (DOI)000486203400007 ()2-s2.0-85056329436 (Scopus ID)
Note

Special Issue

Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2023-03-07Bibliographically approved
Schwarz, A. B. & Karlsson, L. (2019). Scalable eigenvector computation for the non-symmetric eigenvalue problem. Parallel Computing, 85, 131-140
Open this publication in new window or tab >>Scalable eigenvector computation for the non-symmetric eigenvalue problem
2019 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 85, p. 131-140Article in journal (Refereed) Published
Abstract [en]

We present two task-centric algorithms for computing selected eigenvectors of a non-symmetric matrix reduced to real Schur form. Our approach eliminates the sequential phases present in the current LAPACK/ScaLAPACK implementation. We demonstrate the scalability of our implementation on multicore, manycore and distributed memory systems.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
Eigenvectors, Real Schur form, Tiled algorithmsMPI + OpenMP parallel programming
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-159296 (URN)10.1016/j.parco.2019.04.001 (DOI)000471087700012 ()2-s2.0-85064437701 (Scopus ID)
Funder
EU, Horizon 2020, 671633eSSENCE - An eScience Collaboration, UFV 2010/149
Available from: 2019-05-23 Created: 2019-05-23 Last updated: 2023-03-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4675-7434

Search in DiVA

Show all publications