Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 43) Show all publications
López Sánchez, F., Karlsson, L. & Bientinesi, P. (2025). On the parenthesisations of matrix chains: all are useful, few are essential. Journal of combinatorial optimization, 49(3), Article ID 52.
Open this publication in new window or tab >>On the parenthesisations of matrix chains: all are useful, few are essential
2025 (English)In: Journal of combinatorial optimization, ISSN 1382-6905, E-ISSN 1573-2886, Vol. 49, no 3, article id 52Article in journal (Refereed) Published
Abstract [en]

The product of a matrix chain consisting of n matrices can be computed in Cn-1 (Catalan’s number) different ways, each identified by a distinct parenthesisation of the chain. The best algorithm to select a parenthesisation that minimises the cost runs in O(nlogn) time. Approximate algorithms run in O(n) time and find solutions that are guaranteed to be within a certain factor from optimal; the best factor is currently 1.155. In this article, we first prove two results that characterise different parenthesisations, and then use those results to improve on the best known approximation algorithms. Specifically, we show that (a) each parenthesisation is optimal somewhere in the problem domain, and (b) exactly n+1 parenthesisations are essential in the sense that the removal of any one of them causes an unbounded penalty for an infinite number of problem instances. By focusing on essential parenthesisations, we improve on the best known approximation algorithm and show that the approximation factor is at most 1.143.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Approximation algorithm, Linear algebra compilers, Matrix chain, Matrix multiplication
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-238200 (URN)10.1007/s10878-025-01290-7 (DOI)001466493300002 ()2-s2.0-105002887010 (Scopus ID)
Funder
eSSENCE - An eScience Collaboration
Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-05-06Bibliographically approved
Sankaran, A., Karlsson, L. & Bientinesi, P. (2025). Ranking with ties based on noisy performance data. International Journal of Data Science and Analytics
Open this publication in new window or tab >>Ranking with ties based on noisy performance data
2025 (English)In: International Journal of Data Science and Analytics, ISSN 2364-415XArticle in journal (Refereed) Epub ahead of print
Abstract [en]

We consider the problem of ranking a set of objects based on their performance when the measurement of said performance is subject to noise. In this scenario, the performance is measured repeatedly, resulting in a range of measurements for each object. If the ranges of two objects do not overlap, then we consider one object as ‘better’ than the other, and we expect it to receive a higher rank; if, however, the ranges overlap, then the objects are incomparable, and we wish them to be assigned the same rank. Unfortunately, the incomparability relation of ranges is in general not transitive; as a consequence, in general the two requirements cannot be satisfied simultaneously, i.e., it is not possible to guarantee both distinct ranks for objects with separated ranges, and same rank for objects with overlapping ranges. This conflict leads to more than one reasonable way to rank a set of objects. Although the problem of ranking with ties has been widely studied, there remains a lack of clarity regarding what constitutes a set of reasonable rankings. In this paper, we explore the ambiguities that arise when ranking with ties, and define a set of reasonable rankings, which we call partial rankings. We develop and analyze three different methodologies to compute a partial ranking. Finally, we show how performance differences among objects can be investigated with the help of partial ranking.

Place, publisher, year, edition, pages
Springer, 2025
Keywords
Knowledge discovery, Noise, Partial orders, Performance, Ranking
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-236240 (URN)10.1007/s41060-025-00722-1 (DOI)001411719700001 ()2-s2.0-85218821095 (Scopus ID)
Funder
German Research Foundation (DFG), IRTG 2379
Available from: 2025-04-01 Created: 2025-04-01 Last updated: 2025-04-01
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition. Frontiers in Applied Mathematics and Statistics, 8, Article ID 830270.
Open this publication in new window or tab >>Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition
2022 (English)In: Frontiers in Applied Mathematics and Statistics, E-ISSN 2297-4687, Vol. 8, article id 830270Article in journal (Refereed) Published
Abstract [en]

The Canonical Polyadic (CP) tensor decomposition is frequently used as a model in applications in a variety of different fields. Using jackknife resampling to estimate parameter uncertainties is often desirable but results in an increase of the already high computational cost. Upon observation that the resampled tensors, though different, are nearly identical, we show that it is possible to extend the recently proposed Concurrent ALS (CALS) technique to a jackknife resampling scenario. This extension gives access to the computational efficiency advantage of CALS for the price of a modest increase (typically a few percent) in the number of floating point operations. Numerical experiments on both synthetic and real-world datasets demonstrate that the new workflow based on a CALS extension can be several times faster than a straightforward workflow where the jackknife submodels are processed individually.

Keywords
ALS, Alternating Least Squares, Canonical Polyadic Decomposition, CP, decomposition, jackknife, Tensors
National Category
Computer Sciences Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-194532 (URN)10.3389/fams.2022.830270 (DOI)000792602600001 ()2-s2.0-85128906913 (Scopus ID)
Available from: 2022-05-10 Created: 2022-05-10 Last updated: 2023-11-10Bibliographically approved
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions. ACM Transactions on Mathematical Software, 48(3), Article ID 3519383.
Open this publication in new window or tab >>Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions
2022 (English)In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 48, no 3, article id 3519383Article in journal (Refereed) Published
Abstract [en]

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this article, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time, the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.

Place, publisher, year, edition, pages
ACM Digital Library, 2022
Keywords
CP, decomposition, high-performance, PARAFAC, Tensor
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-203243 (URN)10.1145/3519383 (DOI)000865883900010 ()2-s2.0-85140090648 (Scopus ID)
Available from: 2023-01-17 Created: 2023-01-17 Last updated: 2023-11-10Bibliographically approved
López, F., Karlsson, L. & Bientinesi, P. (2022). FLOPs as a discriminant for dense linear algebra algorithms. In: ICPP '22: proceedings of the 51st international conference on parallel processing. Paper presented at ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022. ACM Digital Library, Article ID 11.
Open this publication in new window or tab >>FLOPs as a discriminant for dense linear algebra algorithms
2022 (English)In: ICPP '22: proceedings of the 51st international conference on parallel processing, ACM Digital Library, 2022, article id 11Conference paper, Published paper (Refereed)
Abstract [en]

Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing "anomalies": problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.

Place, publisher, year, edition, pages
ACM Digital Library, 2022
Keywords
algorithm selection, linear algebra, scientific computing
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-214445 (URN)10.1145/3545008.3545072 (DOI)001062754200011 ()2-s2.0-85138311336 (Scopus ID)9781450397339 (ISBN)
Conference
ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022
Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2025-04-24Bibliographically approved
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem. Umeå universitet
Open this publication in new window or tab >>Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem
2020 (English)Report (Other academic)
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schurform relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

Place, publisher, year, edition, pages
Umeå universitet, 2020. p. 25
Series
Report / UMINF, ISSN 0348-0542 ; 20.02
National Category
Computer Sciences
Research subject
Computer Science; Mathematics
Identifiers
urn:nbn:se:umu:diva-168433 (URN)
Projects
NLAFET
Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2023-03-07Bibliographically approved
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust parallel eigenvector computation for the non-symmetric eigenvalue problem. Parallel Computing, 100, Article ID 102707.
Open this publication in new window or tab >>Robust parallel eigenvector computation for the non-symmetric eigenvalue problem
2020 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 100, article id 102707Article in journal (Refereed) Published
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

Place, publisher, year, edition, pages
Elsevier, 2020
Keywords
Overflow protection, Eigenvectors, Real Schur form, Tiled algorithms
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-177534 (URN)10.1016/j.parco.2020.102707 (DOI)000597159800001 ()2-s2.0-85089018373 (Scopus ID)
Projects
NLAFET
Note

Preprint version: http://umu.diva-portal.org/smash/record.jsf?pid=diva2%3A1396196&dswid=-7045

Available from: 2020-12-11 Created: 2020-12-11 Last updated: 2023-03-24Bibliographically approved
Karlsson, L., Eljammaly, M. & Myllykoski, M. (2019). D6.5 Evaluation of auto-tuning techniques. NLAFET Consortium; Umeå University
Open this publication in new window or tab >>D6.5 Evaluation of auto-tuning techniques
2019 (English)Report (Other academic)
Place, publisher, year, edition, pages
NLAFET Consortium; Umeå University, 2019. p. 27
National Category
Computer Sciences
Research subject
Computer Science; Mathematics
Identifiers
urn:nbn:se:umu:diva-168425 (URN)
Projects
NLAFET
Note

This work is c by the NLAFET Consortium, 2015–2018. Its duplication is allowed only for personal, educational, or research uses.

Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2020-02-26Bibliographically approved
Kågström, B., Myllykoski, M., Karlsson, L., Kjelgaard Mikkelsen, C. C., Cayrols, S., Duff, I., . . . Tissot, O. (2019). D7.8 Release of the NLAFET library. NLAFET Consortium; Umeå University
Open this publication in new window or tab >>D7.8 Release of the NLAFET library
Show others...
2019 (English)Report (Other academic)
Place, publisher, year, edition, pages
NLAFET Consortium; Umeå University, 2019. p. 27
National Category
Computer Sciences
Research subject
Mathematics; Computer Science
Identifiers
urn:nbn:se:umu:diva-168426 (URN)
Projects
NLAFET
Note

This work is c by the NLAFET Consortium, 2015–2019. Its duplication is allowed only for personal, educational, or research uses.

Available from: 2020-02-25 Created: 2020-02-25 Last updated: 2020-02-27Bibliographically approved
Karlsson, L. & Pettersson, J. (2019). Experiences of using mastery learning and the personalized system of instruction. In: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin. Paper presented at Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019. (pp. 21-21). Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet
Open this publication in new window or tab >>Experiences of using mastery learning and the personalized system of instruction
2019 (English)In: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin, Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet , 2019, p. 21-21Conference paper, Oral presentation with published abstract (Other academic)
Place, publisher, year, edition, pages
Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet, 2019
Series
Skriftserie från Universitetspedagogik och lärandestöd (UPL) ; 2019:1
National Category
Educational Sciences
Identifiers
urn:nbn:se:umu:diva-194800 (URN)
Conference
Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019.
Available from: 2022-05-17 Created: 2022-05-17 Last updated: 2022-05-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4675-7434

Search in DiVA

Show all publications