Umeå universitets logga

umu.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 43) Visa alla publikationer
López Sánchez, F., Karlsson, L. & Bientinesi, P. (2025). On the parenthesisations of matrix chains: all are useful, few are essential. Journal of combinatorial optimization, 49(3), Article ID 52.
Öppna denna publikation i ny flik eller fönster >>On the parenthesisations of matrix chains: all are useful, few are essential
2025 (Engelska)Ingår i: Journal of combinatorial optimization, ISSN 1382-6905, E-ISSN 1573-2886, Vol. 49, nr 3, artikel-id 52Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The product of a matrix chain consisting of n matrices can be computed in Cn-1 (Catalan’s number) different ways, each identified by a distinct parenthesisation of the chain. The best algorithm to select a parenthesisation that minimises the cost runs in O(nlogn) time. Approximate algorithms run in O(n) time and find solutions that are guaranteed to be within a certain factor from optimal; the best factor is currently 1.155. In this article, we first prove two results that characterise different parenthesisations, and then use those results to improve on the best known approximation algorithms. Specifically, we show that (a) each parenthesisation is optimal somewhere in the problem domain, and (b) exactly n+1 parenthesisations are essential in the sense that the removal of any one of them causes an unbounded penalty for an infinite number of problem instances. By focusing on essential parenthesisations, we improve on the best known approximation algorithm and show that the approximation factor is at most 1.143.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2025
Nyckelord
Approximation algorithm, Linear algebra compilers, Matrix chain, Matrix multiplication
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-238200 (URN)10.1007/s10878-025-01290-7 (DOI)001466493300002 ()2-s2.0-105002887010 (Scopus ID)
Forskningsfinansiär
eSSENCE - An eScience Collaboration
Tillgänglig från: 2025-05-06 Skapad: 2025-05-06 Senast uppdaterad: 2025-05-06Bibliografiskt granskad
Sankaran, A., Karlsson, L. & Bientinesi, P. (2025). Ranking with ties based on noisy performance data. International Journal of Data Science and Analytics
Öppna denna publikation i ny flik eller fönster >>Ranking with ties based on noisy performance data
2025 (Engelska)Ingår i: International Journal of Data Science and Analytics, ISSN 2364-415XArtikel i tidskrift (Refereegranskat) Epub ahead of print
Abstract [en]

We consider the problem of ranking a set of objects based on their performance when the measurement of said performance is subject to noise. In this scenario, the performance is measured repeatedly, resulting in a range of measurements for each object. If the ranges of two objects do not overlap, then we consider one object as ‘better’ than the other, and we expect it to receive a higher rank; if, however, the ranges overlap, then the objects are incomparable, and we wish them to be assigned the same rank. Unfortunately, the incomparability relation of ranges is in general not transitive; as a consequence, in general the two requirements cannot be satisfied simultaneously, i.e., it is not possible to guarantee both distinct ranks for objects with separated ranges, and same rank for objects with overlapping ranges. This conflict leads to more than one reasonable way to rank a set of objects. Although the problem of ranking with ties has been widely studied, there remains a lack of clarity regarding what constitutes a set of reasonable rankings. In this paper, we explore the ambiguities that arise when ranking with ties, and define a set of reasonable rankings, which we call partial rankings. We develop and analyze three different methodologies to compute a partial ranking. Finally, we show how performance differences among objects can be investigated with the help of partial ranking.

Ort, förlag, år, upplaga, sidor
Springer, 2025
Nyckelord
Knowledge discovery, Noise, Partial orders, Performance, Ranking
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-236240 (URN)10.1007/s41060-025-00722-1 (DOI)001411719700001 ()2-s2.0-85218821095 (Scopus ID)
Forskningsfinansiär
Deutsche Forschungsgemeinschaft (DFG), IRTG 2379
Tillgänglig från: 2025-04-01 Skapad: 2025-04-01 Senast uppdaterad: 2025-04-01
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition. Frontiers in Applied Mathematics and Statistics, 8, Article ID 830270.
Öppna denna publikation i ny flik eller fönster >>Accelerating Jackknife Resampling for the Canonical Polyadic Decomposition
2022 (Engelska)Ingår i: Frontiers in Applied Mathematics and Statistics, E-ISSN 2297-4687, Vol. 8, artikel-id 830270Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The Canonical Polyadic (CP) tensor decomposition is frequently used as a model in applications in a variety of different fields. Using jackknife resampling to estimate parameter uncertainties is often desirable but results in an increase of the already high computational cost. Upon observation that the resampled tensors, though different, are nearly identical, we show that it is possible to extend the recently proposed Concurrent ALS (CALS) technique to a jackknife resampling scenario. This extension gives access to the computational efficiency advantage of CALS for the price of a modest increase (typically a few percent) in the number of floating point operations. Numerical experiments on both synthetic and real-world datasets demonstrate that the new workflow based on a CALS extension can be several times faster than a straightforward workflow where the jackknife submodels are processed individually.

Nyckelord
ALS, Alternating Least Squares, Canonical Polyadic Decomposition, CP, decomposition, jackknife, Tensors
Nationell ämneskategori
Datavetenskap (datalogi) Sannolikhetsteori och statistik
Identifikatorer
urn:nbn:se:umu:diva-194532 (URN)10.3389/fams.2022.830270 (DOI)000792602600001 ()2-s2.0-85128906913 (Scopus ID)
Tillgänglig från: 2022-05-10 Skapad: 2022-05-10 Senast uppdaterad: 2023-11-10Bibliografiskt granskad
Psarras, C., Karlsson, L., Bro, R. & Bientinesi, P. (2022). Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions. ACM Transactions on Mathematical Software, 48(3), Article ID 3519383.
Öppna denna publikation i ny flik eller fönster >>Algorithm 1026: Concurrent Alternating Least Squares for Multiple Simultaneous Canonical Polyadic Decompositions
2022 (Engelska)Ingår i: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 48, nr 3, artikel-id 3519383Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this article, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time, the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.

Ort, förlag, år, upplaga, sidor
ACM Digital Library, 2022
Nyckelord
CP, decomposition, high-performance, PARAFAC, Tensor
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:umu:diva-203243 (URN)10.1145/3519383 (DOI)000865883900010 ()2-s2.0-85140090648 (Scopus ID)
Tillgänglig från: 2023-01-17 Skapad: 2023-01-17 Senast uppdaterad: 2023-11-10Bibliografiskt granskad
López, F., Karlsson, L. & Bientinesi, P. (2022). FLOPs as a discriminant for dense linear algebra algorithms. In: ICPP '22: proceedings of the 51st international conference on parallel processing. Paper presented at ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022. ACM Digital Library, Article ID 11.
Öppna denna publikation i ny flik eller fönster >>FLOPs as a discriminant for dense linear algebra algorithms
2022 (Engelska)Ingår i: ICPP '22: proceedings of the 51st international conference on parallel processing, ACM Digital Library, 2022, artikel-id 11Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Expressions that involve matrices and vectors, known as linear algebra expressions, are commonly evaluated through a sequence of invocations to highly optimised kernels provided in libraries such as BLAS and LAPACK. A sequence of kernels represents an algorithm, and in general, because of associativity, algebraic identities, and multiple kernels, one expression can be evaluated via many different algorithms. These algorithms are all mathematically equivalent (i.e., in exact arithmetic, they all compute the same result), but often differ noticeably in terms of execution time. When faced with a decision, high-level languages, libraries, and tools such as Julia, Armadillo, and Linnea choose by selecting the algorithm that minimises the FLOP count. In this paper, we test the validity of the FLOP count as a discriminant for dense linear algebra algorithms, analysing "anomalies": problem instances for which the fastest algorithm does not perform the least number of FLOPs. To do so, we focused on relatively simple expressions and analysed when and why anomalies occurred. We found that anomalies exist and tend to cluster into large contiguous regions. For one expression anomalies were rare, whereas for the other they were abundant. We conclude that FLOPs is not a sufficiently dependable discriminant even when building algorithms with highly optimised kernels. Plus, most of the anomalies remained as such even after filtering out the inter-kernel cache effects. We conjecture that combining FLOP counts with kernel performance models will significantly improve our ability to choose optimal algorithms.

Ort, förlag, år, upplaga, sidor
ACM Digital Library, 2022
Nyckelord
algorithm selection, linear algebra, scientific computing
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Identifikatorer
urn:nbn:se:umu:diva-214445 (URN)10.1145/3545008.3545072 (DOI)001062754200011 ()2-s2.0-85138311336 (Scopus ID)9781450397339 (ISBN)
Konferens
ICPP '22: 51st International Conference on Parallel Processing Bordeaux France 29 August 2022- 1 September 2022
Tillgänglig från: 2023-09-15 Skapad: 2023-09-15 Senast uppdaterad: 2025-04-24Bibliografiskt granskad
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem. Umeå universitet
Öppna denna publikation i ny flik eller fönster >>Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem
2020 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schurform relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

Ort, förlag, år, upplaga, sidor
Umeå universitet, 2020. s. 25
Serie
Report / UMINF, ISSN 0348-0542 ; 20.02
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi; matematik
Identifikatorer
urn:nbn:se:umu:diva-168433 (URN)
Projekt
NLAFET
Tillgänglig från: 2020-02-25 Skapad: 2020-02-25 Senast uppdaterad: 2023-03-07Bibliografiskt granskad
Schwarz, A. B., Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2020). Robust parallel eigenvector computation for the non-symmetric eigenvalue problem. Parallel Computing, 100, Article ID 102707.
Öppna denna publikation i ny flik eller fönster >>Robust parallel eigenvector computation for the non-symmetric eigenvalue problem
2020 (Engelska)Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 100, artikel-id 102707Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

Ort, förlag, år, upplaga, sidor
Elsevier, 2020
Nyckelord
Overflow protection, Eigenvectors, Real Schur form, Tiled algorithms
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-177534 (URN)10.1016/j.parco.2020.102707 (DOI)000597159800001 ()2-s2.0-85089018373 (Scopus ID)
Projekt
NLAFET
Anmärkning

Preprint version: http://umu.diva-portal.org/smash/record.jsf?pid=diva2%3A1396196&dswid=-7045

Tillgänglig från: 2020-12-11 Skapad: 2020-12-11 Senast uppdaterad: 2023-03-24Bibliografiskt granskad
Karlsson, L., Eljammaly, M. & Myllykoski, M. (2019). D6.5 Evaluation of auto-tuning techniques. NLAFET Consortium; Umeå University
Öppna denna publikation i ny flik eller fönster >>D6.5 Evaluation of auto-tuning techniques
2019 (Engelska)Rapport (Övrigt vetenskapligt)
Ort, förlag, år, upplaga, sidor
NLAFET Consortium; Umeå University, 2019. s. 27
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi; matematik
Identifikatorer
urn:nbn:se:umu:diva-168425 (URN)
Projekt
NLAFET
Anmärkning

This work is c by the NLAFET Consortium, 2015–2018. Its duplication is allowed only for personal, educational, or research uses.

Tillgänglig från: 2020-02-25 Skapad: 2020-02-25 Senast uppdaterad: 2020-02-26Bibliografiskt granskad
Kågström, B., Myllykoski, M., Karlsson, L., Kjelgaard Mikkelsen, C. C., Cayrols, S., Duff, I., . . . Tissot, O. (2019). D7.8 Release of the NLAFET library. NLAFET Consortium; Umeå University
Öppna denna publikation i ny flik eller fönster >>D7.8 Release of the NLAFET library
Visa övriga...
2019 (Engelska)Rapport (Övrigt vetenskapligt)
Ort, förlag, år, upplaga, sidor
NLAFET Consortium; Umeå University, 2019. s. 27
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
matematik; datalogi
Identifikatorer
urn:nbn:se:umu:diva-168426 (URN)
Projekt
NLAFET
Anmärkning

This work is c by the NLAFET Consortium, 2015–2019. Its duplication is allowed only for personal, educational, or research uses.

Tillgänglig från: 2020-02-25 Skapad: 2020-02-25 Senast uppdaterad: 2020-02-27Bibliografiskt granskad
Karlsson, L. & Pettersson, J. (2019). Experiences of using mastery learning and the personalized system of instruction. In: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin. Paper presented at Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019. (pp. 21-21). Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet
Öppna denna publikation i ny flik eller fönster >>Experiences of using mastery learning and the personalized system of instruction
2019 (Engelska)Ingår i: Universitetspedagogiska konferensen 2019: Helhetssyn på undervisning - kropp, känsla och kognition i akademin, Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet , 2019, s. 21-21Konferensbidrag, Muntlig presentation med publicerat abstract (Övrigt vetenskapligt)
Ort, förlag, år, upplaga, sidor
Umeå: Universitetspedagogik och lärandestöd (UPL), Umeå universitet, 2019
Serie
Skriftserie från Universitetspedagogik och lärandestöd (UPL) ; 2019:1
Nationell ämneskategori
Utbildningsvetenskap
Identifikatorer
urn:nbn:se:umu:diva-194800 (URN)
Konferens
Universitetspedagogiska konferensen 2019, Umeå, 10-11 oktober, 2019.
Tillgänglig från: 2022-05-17 Skapad: 2022-05-17 Senast uppdaterad: 2022-05-23Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-4675-7434

Sök vidare i DiVA

Visa alla publikationer