Umeå University's logo

umu.sePublikasjoner
Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 43) Visa alla publikasjoner
López Sánchez, F., Karlsson, L. & Bientinesi, P. (2025). On the parenthesisations of matrix chains: all are useful, few are essential. Journal of combinatorial optimization, 49(3), Article ID 52.
Åpne denne publikasjonen i ny fane eller vindu >>On the parenthesisations of matrix chains: all are useful, few are essential
2025 (engelsk)Inngår i: Journal of combinatorial optimization, ISSN 1382-6905, E-ISSN 1573-2886, Vol. 49, nr 3, artikkel-id 52Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The product of a matrix chain consisting of n matrices can be computed in Cn-1 (Catalan’s number) different ways, each identified by a distinct parenthesisation of the chain. The best algorithm to select a parenthesisation that minimises the cost runs in O(nlogn) time. Approximate algorithms run in O(n) time and find solutions that are guaranteed to be within a certain factor from optimal; the best factor is currently 1.155. In this article, we first prove two results that characterise different parenthesisations, and then use those results to improve on the best known approximation algorithms. Specifically, we show that (a) each parenthesisation is optimal somewhere in the problem domain, and (b) exactly n+1 parenthesisations are essential in the sense that the removal of any one of them causes an unbounded penalty for an infinite number of problem instances. By focusing on essential parenthesisations, we improve on the best known approximation algorithm and show that the approximation factor is at most 1.143.

sted, utgiver, år, opplag, sider
Springer Nature, 2025
Emneord
Approximation algorithm, Linear algebra compilers, Matrix chain, Matrix multiplication
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-238200 (URN)10.1007/s10878-025-01290-7 (DOI)001466493300002 ()2-s2.0-105002887010 (Scopus ID)
Forskningsfinansiär
eSSENCE - An eScience Collaboration
Tilgjengelig fra: 2025-05-06 Laget: 2025-05-06 Sist oppdatert: 2025-05-06bibliografisk kontrollert
Sankaran, A., Karlsson, L. & Bientinesi, P. (2025). Ranking with ties based on noisy performance data. International Journal of Data Science and Analytics
Åpne denne publikasjonen i ny fane eller vindu >>Ranking with ties based on noisy performance data
2025 (engelsk)Inngår i: International Journal of Data Science and Analytics, ISSN 2364-415XArtikkel i tidsskrift (Fagfellevurdert) Epub ahead of print
Abstract [en]

We consider the problem of ranking a set of objects based on their performance when the measurement of said performance is subject to noise. In this scenario, the performance is measured repeatedly, resulting in a range of measurements for each object. If the ranges of two objects do not overlap, then we consider one object as ‘better’ than the other, and we expect it to receive a higher rank; if, however, the ranges overlap, then the objects are incomparable, and we wish them to be assigned the same rank. Unfortunately, the incomparability relation of ranges is in general not transitive; as a consequence, in general the two requirements cannot be satisfied simultaneously, i.e., it is not possible to guarantee both distinct ranks for objects with separated ranges, and same rank for objects with overlapping ranges. This conflict leads to more than one reasonable way to rank a set of objects. Although the problem of ranking with ties has been widely studied, there remains a lack of clarity regarding what constitutes a set of reasonable rankings. In this paper, we explore the ambiguities that arise when ranking with ties, and define a set of reasonable rankings, which we call partial rankings. We develop and analyze three different methodologies to compute a partial ranking. Finally, we show how performance differences among objects can be investigated with the help of partial ranking.

sted, utgiver, år, opplag, sider
Springer, 2025
Emneord
Knowledge discovery, Noise, Partial orders, Performance, Ranking
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-236240 (URN)10.1007/s41060-025-00722-1 (DOI)001411719700001 ()2-s2.0-85218821095 (Scopus ID)
Forskningsfinansiär
German Research Foundation (DFG), IRTG 2379
Tilgjengelig fra: 2025-04-01 Laget: 2025-04-01 Sist oppdatert: 2025-04-01
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition. Frontiers in Applied Mathematics and Statistics, 8, Article ID 830270.
Åpne denne publikasjonen i ny fane eller vindu >>Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition
2022 (engelsk)Inngår i: Frontiers in Applied Mathematics and Statistics, E-ISSN 2297-4687, Vol. 8, artikkel-id 830270Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The Canonical Polyadic (CP) tensor decomposition is frequently used as a model in applications in a variety of different fields. Using jackknife resampling to estimate parameter uncertainties is often desirable but results in an increase of the already high computational cost. Upon observation that the resampled tensors, though different, are nearly identical, we show that it is possible to extend the recently proposed Concurrent ALS (CALS) technique to a jackknife resampling scenario. This extension gives access to the computational efficiency advantage of CALS for the price of a modest increase (typically a few percent) in the number of floating point operations. Numerical experiments on both synthetic and real-world datasets demonstrate that the new workflow based on a CALS extension can be several times faster than a straightforward workflow where the jackknife submodels are processed individually.

Emneord
ALS, Alternating Least Squares, Canonical Polyadic Decomposition, CP, decomposition, jackknife, Tensors
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-194532 (URN)10.3389/fams.2022.830270 (DOI)000792602600001 ()2-s2.0-85128906913 (Scopus ID)
Tilgjengelig fra: 2022-05-10 Laget: 2022-05-10 Sist oppdatert: 2023-11-10bibliografisk kontrollert
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions. ACM Transactions on Mathematical Software, 48(3), Article ID 3519383.
Åpne denne publikasjonen i ny fane eller vindu >>Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions
2022 (engelsk)Inngår i: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 48, nr 3, artikkel-id 3519383Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this article, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time, the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.

sted, utgiver, år, opplag, sider
ACM Digital Library, 2022
Emneord
CP, decomposition, high-performance, PARAFAC, Tensor
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-203243 (URN)10.1145/3519383 (DOI)000865883900010 ()2-s2.0-85140090648 (Scopus ID)
Tilgjengelig fra: 2023-01-17 Laget: 2023-01-17 Sist oppdatert: 2023-11-10bibliografisk kontrollert
López, F., Karlsson, L. & Bientinesi, P. (2022). FLOPs as a discriminant for dense linear algebra algorithms. In: ICPP '22: proceedings of the 51st international conference on parallel processing. Paper presented at ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022. ACM Digital Library, Article ID 11.
Åpne denne publikasjonen i ny fane eller vindu >>FLOPs as a discriminant for dense linear algebra algorithms
2022 (engelsk)Inngår i: ICPP '22: proceedings of the 51st international conference on parallel processing, ACM Digital Library, 2022, artikkel-id 11Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing "anomalies": problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.

sted, utgiver, år, opplag, sider
ACM Digital Library, 2022
Emneord
algorithm selection, linear algebra, scientific computing
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-214445 (URN)10.1145/3545008.3545072 (DOI)001062754200011 ()2-s2.0-85138311336 (Scopus ID)9781450397339 (ISBN)
Konferanse
ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022
Tilgjengelig fra: 2023-09-15 Laget: 2023-09-15 Sist oppdatert: 2025-04-24bibliografisk kontrollert
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem. Umeå universitet
Åpne denne publikasjonen i ny fane eller vindu >>Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem
2020 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schurform relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

sted, utgiver, år, opplag, sider
Umeå universitet, 2020. s. 25
Serie
Report / UMINF, ISSN 0348-0542 ; 20.02
HSV kategori
Forskningsprogram
datalogi; matematik
Identifikatorer
urn:nbn:se:umu:diva-168433 (URN)
Prosjekter
NLAFET
Tilgjengelig fra: 2020-02-25 Laget: 2020-02-25 Sist oppdatert: 2023-03-07bibliografisk kontrollert
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust parallel eigenvector computation for the non-symmetric eigenvalue problem. Parallel Computing, 100, Article ID 102707.
Åpne denne publikasjonen i ny fane eller vindu >>Robust parallel eigenvector computation for the non-symmetric eigenvalue problem
2020 (engelsk)Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 100, artikkel-id 102707Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

sted, utgiver, år, opplag, sider
Elsevier, 2020
Emneord
Overflow protection, Eigenvectors, Real Schur form, Tiled algorithms
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-177534 (URN)10.1016/j.parco.2020.102707 (DOI)000597159800001 ()2-s2.0-85089018373 (Scopus ID)
Prosjekter
NLAFET
Merknad

Preprint version: http://umu.diva-portal.org/smash/record.jsf?pid=diva2%3A1396196&dswid=-7045

Tilgjengelig fra: 2020-12-11 Laget: 2020-12-11 Sist oppdatert: 2023-03-24bibliografisk kontrollert
Karlsson, L., Eljammaly, M. & Myllykoski, M. (2019). D6.5 Evaluation of auto-tuning techniques. NLAFET Consortium; Umeå University
Åpne denne publikasjonen i ny fane eller vindu >>D6.5 Evaluation of auto-tuning techniques
2019 (engelsk)Rapport (Annet vitenskapelig)
sted, utgiver, år, opplag, sider
NLAFET Consortium; Umeå University, 2019. s. 27
HSV kategori
Forskningsprogram
datalogi; matematik
Identifikatorer
urn:nbn:se:umu:diva-168425 (URN)
Prosjekter
NLAFET
Merknad

This work is c by the NLAFET Consortium, 2015–2018. Its duplication is allowed only for personal, educational, or research uses.

Tilgjengelig fra: 2020-02-25 Laget: 2020-02-25 Sist oppdatert: 2020-02-26bibliografisk kontrollert
Kågström, B., Myllykoski, M., Karlsson, L., Kjelgaard Mikkelsen, C. C., Cayrols, S., Duff, I., . . . Tissot, O. (2019). D7.8 Release of the NLAFET library. NLAFET Consortium; Umeå University
Åpne denne publikasjonen i ny fane eller vindu >>D7.8 Release of the NLAFET library
Vise andre…
2019 (engelsk)Rapport (Annet vitenskapelig)
sted, utgiver, år, opplag, sider
NLAFET Consortium; Umeå University, 2019. s. 27
HSV kategori
Forskningsprogram
matematik; datalogi
Identifikatorer
urn:nbn:se:umu:diva-168426 (URN)
Prosjekter
NLAFET
Merknad

This work is c by the NLAFET Consortium, 2015–2019. Its duplication is allowed only for personal, educational, or research uses.

Tilgjengelig fra: 2020-02-25 Laget: 2020-02-25 Sist oppdatert: 2020-02-27bibliografisk kontrollert
Karlsson, L. & Pettersson, J. (2019). Experiences of using mastery learning and the personalized system of instruction. In: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin. Paper presented at Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019. (pp. 21-21). Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet
Åpne denne publikasjonen i ny fane eller vindu >>Experiences of using mastery learning and the personalized system of instruction
2019 (engelsk)Inngår i: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin, Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet , 2019, s. 21-21Konferansepaper, Oral presentation with published abstract (Annet vitenskapelig)
sted, utgiver, år, opplag, sider
Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet, 2019
Serie
Skriftserie från Universitetspedagogik och lärandestöd (UPL) ; 2019:1
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-194800 (URN)
Konferanse
Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019.
Tilgjengelig fra: 2022-05-17 Laget: 2022-05-17 Sist oppdatert: 2022-05-23bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-4675-7434