Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 27) Show all publications
Chen, Y., de Oliveira Castro, P., Bientinesi, P., Jansson, N. & Iakymchuk, R. (2026). Enabling mixed-precision in spectral element codes. Future Generation Computer Systems, 174, Article ID 107990.
Open this publication in new window or tab >>Enabling mixed-precision in spectral element codes
Show others...
2026 (English)In: Future Generation Computer Systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 174, article id 107990Article in journal (Refereed) Published
Abstract [en]

Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we propose a methodology for enabling mixed-precision with the help of computer arithmetic tools, roofline model, and computer arithmetic techniques. As case studies, we consider Nekbone (Nek5000 developers), a mini-application for the Computational Fluid Dynamics (CFD) solver Nek5000 (Fischer et al.), and a modern Neko (Jansson et al., 2024) CFD application. With the help of the Verificarlo (Denis et al., 2016) tool and computer arithmetic techniques, we introduce a strategy to address stagnation issues in the preconditioned Conjugate Gradient method in Nekbone and apply these insights to implement a mixed-precision version of Neko. We evaluate the derived mixed-precision versions of these codes by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, mixed-precision in Nekbone reduces time-to-solution by roughly 1.62x and energy-to-solution by 2.43x on MareNostrum 5, while in the real-world Neko application, the gain is up to 1.3x in both time and energy, with the accuracy that matches double-precision results.

Place, publisher, year, edition, pages
Elsevier, 2026
Keywords
Computer arithmetic tool, Conjugate gradient, Energy-to-solution, Mixed-precision, Neko, Roofline model, Verificarlo
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-242183 (URN)10.1016/j.future.2025.107990 (DOI)2-s2.0-105009726439 (Scopus ID)
Available from: 2025-07-14 Created: 2025-07-14 Last updated: 2025-07-14Bibliographically approved
Sehlstedt, P., Brandejs, J., Bientinesi, P. & Karlsson, L. (2026). The software landscape for the density matrix renormalization group. Computer Physics Communications, 324, Article ID 110136.
Open this publication in new window or tab >>The software landscape for the density matrix renormalization group
2026 (English)In: Computer Physics Communications, ISSN 0010-4655, E-ISSN 1879-2944, Vol. 324, article id 110136Article in journal (Refereed) Published
Abstract [en]

The density matrix renormalization group (DMRG) algorithm is a cornerstone computational method for studying quantum many-body systems, renowned for its accuracy and adaptability. Because DMRG provides a general framework applicable across various fields such as materials science, quantum chemistry, and quantum computing, one might expect a shared, flexible library to serve most users. Nevertheless, numerous independent implementations continue to appear, resulting in significant duplication of effort. To identify collaboration opportunities that can promote a more unified approach, we map the rapidly expanding DMRG software landscape and provide a comprehensive comparison of features across 37 existing packages. When comparing key features, such as parallelism strategies for high-performance computing and symmetry-adapted formulations that enhance efficiency, we found significant overlap among the packages. This overlap suggests opportunities for collaboration to modularize common functionality—e.g., tensor operations, symmetry representations, and eigensolvers—as the packages are mostly independent and share few third-party library dependencies. More collaboration on modularization could reduce duplication of effort, improve interoperability, and enable prioritization and quicker spread of new advances. We believe the current lack of modularity is more socially driven than a technical issue; hence, we see raising awareness about the existing implementations as a first step in the right direction. Ultimately, this work emphasizes the value of greater cohesion through modularity, which would benefit DMRG software and related tensor-network-centered software, enabling the solution of more complex and ambitious problems. 

Place, publisher, year, edition, pages
Elsevier, 2026
Keywords
DMRG, Survey, Modularity
National Category
Computer and Information Sciences Condensed Matter Physics Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-251711 (URN)10.1016/j.cpc.2026.110136 (DOI)001729801000001 ()2-s2.0-105033662580 (Scopus ID)
Funder
EU, Horizon 2020EU, European Research Council
Available from: 2026-04-02 Created: 2026-04-02 Last updated: 2026-04-15Bibliographically approved
Chen, Y., Castro, P. d., Bientinesi, P. & Iakymchuk, R. (2025). Enabling mixed-precision with the help of tools: a nekbone case study. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel processing and applied mathematics: 15Th International Conference, Ppam 2024, Ostrava, Czech Republic, September 8–11, 2024, Revised Selected Papers, Part I. Paper presented at 15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024, Ostrava, Czech Republic, September 8–11, 2024 (pp. 34-50). Cham: Springer Nature
Open this publication in new window or tab >>Enabling mixed-precision with the help of tools: a nekbone case study
2025 (English)In: Parallel processing and applied mathematics: 15Th International Conference, Ppam 2024, Ostrava, Czech Republic, September 8–11, 2024, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Cham: Springer Nature, 2025, p. 34-50Conference paper, Published paper (Refereed)
Abstract [en]

Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the Computational Fluid Dynamics (CFD) solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model. We evaluate the derived mixed-precision program by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, the introduction of mixed-precision in Nekbone, reducing time-to-solution by 40.7% and energy-to-solution by 47% on 128 MPI ranks without sacrificing the accuracy.

Place, publisher, year, edition, pages
Cham: Springer Nature, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15579
Keywords
computer arithmetic tool, Conjugate Gradient, energy-to-solution, Mixed-precision, Nekbone, roofline model, Verificarlo
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-238100 (URN)10.1007/978-3-031-85697-6_3 (DOI)2-s2.0-105002711656 (Scopus ID)9783031856969 (ISBN)
Conference
15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024, Ostrava, Czech Republic, September 8–11, 2024
Available from: 2025-05-05 Created: 2025-05-05 Last updated: 2025-05-05Bibliographically approved
López Sánchez, F., Karlsson, L. & Bientinesi, P. (2025). On the parenthesisations of matrix chains: all are useful, few are essential. Journal of combinatorial optimization, 49(3), Article ID 52.
Open this publication in new window or tab >>On the parenthesisations of matrix chains: all are useful, few are essential
2025 (English)In: Journal of combinatorial optimization, ISSN 1382-6905, E-ISSN 1573-2886, Vol. 49, no 3, article id 52Article in journal (Refereed) Published
Abstract [en]

The product of a matrix chain consisting of n matrices can be computed in Cn-1 (Catalan’s number) different ways, each identified by a distinct parenthesisation of the chain. The best algorithm to select a parenthesisation that minimises the cost runs in O(nlogn) time. Approximate algorithms run in O(n) time and find solutions that are guaranteed to be within a certain factor from optimal; the best factor is currently 1.155. In this article, we first prove two results that characterise different parenthesisations, and then use those results to improve on the best known approximation algorithms. Specifically, we show that (a) each parenthesisation is optimal somewhere in the problem domain, and (b) exactly n+1 parenthesisations are essential in the sense that the removal of any one of them causes an unbounded penalty for an infinite number of problem instances. By focusing on essential parenthesisations, we improve on the best known approximation algorithm and show that the approximation factor is at most 1.143.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Approximation algorithm, Linear algebra compilers, Matrix chain, Matrix multiplication
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-238200 (URN)10.1007/s10878-025-01290-7 (DOI)001466493300002 ()2-s2.0-105002887010 (Scopus ID)
Funder
eSSENCE - An eScience Collaboration
Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-05-06Bibliographically approved
Sankaran, A., Karlsson, L. & Bientinesi, P. (2025). Ranking with ties based on noisy performance data. International Journal of Data Science and Analytics, 20, 4363-4384
Open this publication in new window or tab >>Ranking with ties based on noisy performance data
2025 (English)In: International Journal of Data Science and Analytics, ISSN 2364-415X, Vol. 20, p. 4363-4384Article in journal (Refereed) Published
Abstract [en]

We consider the problem of ranking a set of objects based on their performance when the measurement of said performance is subject to noise. In this scenario, the performance is measured repeatedly, resulting in a range of measurements for each object. If the ranges of two objects do not overlap, then we consider one object as ‘better’ than the other, and we expect it to receive a higher rank; if, however, the ranges overlap, then the objects are incomparable, and we wish them to be assigned the same rank. Unfortunately, the incomparability relation of ranges is in general not transitive; as a consequence, in general the two requirements cannot be satisfied simultaneously, i.e., it is not possible to guarantee both distinct ranks for objects with separated ranges, and same rank for objects with overlapping ranges. This conflict leads to more than one reasonable way to rank a set of objects. Although the problem of ranking with ties has been widely studied, there remains a lack of clarity regarding what constitutes a set of reasonable rankings. In this paper, we explore the ambiguities that arise when ranking with ties, and define a set of reasonable rankings, which we call partial rankings. We develop and analyze three different methodologies to compute a partial ranking. Finally, we show how performance differences among objects can be investigated with the help of partial ranking.

Place, publisher, year, edition, pages
Springer, 2025
Keywords
Knowledge discovery, Noise, Partial orders, Performance, Ranking
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-236240 (URN)10.1007/s41060-025-00722-1 (DOI)001411719700001 ()2-s2.0-85218821095 (Scopus ID)
Funder
German Research Foundation (DFG), IRTG 2379
Available from: 2025-04-01 Created: 2025-04-01 Last updated: 2025-11-28Bibliographically approved
Zehren, M., Alunno, M. & Bientinesi, P. (2024). In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription. In: Proceedings of the 25th international society for music information retrieval conference: . Paper presented at 25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, USA, 10-14 november, 2024. (pp. 1060-1067). San Francisco: ISMIR
Open this publication in new window or tab >>In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription
2024 (English)In: Proceedings of the 25th international society for music information retrieval conference, San Francisco: ISMIR , 2024, p. 1060-1067Conference paper, Published paper (Refereed)
Abstract [en]

The importance of automatic drum transcription lies in the potential to extract useful information from a musical track; however, the low reliability of the models for this task represents a limiting factor. Indeed, even though in the recent literature the quality of the generated transcription has improved thanks to the curation of large training datasets via crowdsourcing, there is still a large margin of improvement for this task to be considered solved. Aiming to steer the development of future models, we identify the most common errors from training and testing on the aforementioned crowdsourced datasets. We perform this study in three steps: First, we detail the quality of the transcription for each class of interest; second, we employ a new metric and a pseudo confusion matrix to quantify different mistakes in the estimations; last, we compute the agreement between different annotators of the same track to estimate the accuracy of the ground-truth. Our findings are twofold: On the one hand, we observe that the previously reported issue that less represented instruments (e.g., toms) are less reliably transcribed is mostly solved now. On the other hand, cymbal instruments have unprecedented relative low performance. We provide intuitive explanations as to why cymbal instruments are difficult to transcribe and we identify that they represent the main source of disagreement among annotators.

Place, publisher, year, edition, pages
San Francisco: ISMIR, 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-228264 (URN)2-s2.0-85219129262 (Scopus ID)
Conference
25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, USA, 10-14 november, 2024.
Available from: 2024-08-07 Created: 2024-08-07 Last updated: 2025-04-02Bibliographically approved
Sankaran, A., Zhukov, I., Frings, W. & Bientinesi, P. (2024). Inspection of I/O operations from system call traces using Directly-Follows-Graph. In: SC24-W: workshops of the international conference for high performance computing, networking, storage and analysis. Paper presented at 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024, Atlanta, USA, November 17-22, 2024 (pp. 1562-1575). IEEE
Open this publication in new window or tab >>Inspection of I/O operations from system call traces using Directly-Follows-Graph
2024 (English)In: SC24-W: workshops of the international conference for high performance computing, networking, storage and analysis, IEEE, 2024, p. 1562-1575Conference paper, Published paper (Refereed)
Abstract [en]

We aim to identify the differences in Input/Output (I/O) behavior between multiple user programs through the inspection of system calls (i.e., requests made to the operating system). A typical program issues a large number of I/O requests to the operating system, thereby making the process of inspection challenging. In this paper, we address this challenge by presenting a methodology to synthesize I/O system call traces into a specific type of directed graph, known as the Directly-Follows-Graph (DFG). Based on the DFG, we present a technique to compare the traces from multiple programs or different configurations of the same program, such that it is possible to identify the differences in the I/O behavior. We apply our methodology to the IOR benchmark, and compare the contentions for file accesses when the benchmark is run with different options for file output and software interface.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Directly-Follows Graph, High-Performance Computing, Input/Output, Performance Analysis, Process Mining, strace
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:umu:diva-235656 (URN)10.1109/SCW63240.2024.00196 (DOI)2-s2.0-85217181573 (Scopus ID)9798350355543 (ISBN)9798350355550 (ISBN)
Conference
2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024, Atlanta, USA, November 17-22, 2024
Available from: 2025-02-26 Created: 2025-02-26 Last updated: 2025-02-26Bibliographically approved
Zehren, M., Alunno, M. & Bientinesi, P. (2024). Interpretability of methods for switch point detection in electronic dance music. Signals, 5(4), 642-658
Open this publication in new window or tab >>Interpretability of methods for switch point detection in electronic dance music
2024 (English)In: Signals, E-ISSN 2624-6120, Vol. 5, no 4, p. 642-658Article in journal (Refereed) Published
Abstract [en]

Switch points are a specific kind of cue point that DJs carefully look for when mixing music tracks. As the name says, a switch point is the point in time where the current track in a DJ mix is replaced by the upcoming track. Being able to identify these positions is a first step toward the interpretation and the emulation of DJ mixes. With the aim of automatically detecting switch points, we evaluate one experience-driven and several statistics-driven methods. By comparing the decision process of each method, contrasted by their performance, we deduce the characteristics linked to switch points. Specifically, we identify the most impactful features for their detection, namely, the novelty in the signal energy, the timbre, the number of drum onsets, and the harmony. Furthermore, we expose multiple interactions among these features.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
cue point detection, DJ mixing, electronic dance music, switch points
National Category
Computer Sciences Music Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:umu:diva-234020 (URN)10.3390/signals5040036 (DOI)001386624600001 ()2-s2.0-85213493298 (Scopus ID)
Funder
eSSENCE - An eScience Collaboration
Available from: 2025-01-13 Created: 2025-01-13 Last updated: 2025-02-21Bibliographically approved
Zehren, M., Alunno, M. & Bientinesi, P. (2023). High-quality and reproducible automatic drum transcription from crowdsourced data. Signals, 4(4), 768-787
Open this publication in new window or tab >>High-quality and reproducible automatic drum transcription from crowdsourced data
2023 (English)In: Signals, E-ISSN 2624-6120, Vol. 4, no 4, p. 768-787Article in journal (Refereed) Published
Abstract [en]

Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data obtained from crowds of annotators have made it possible to implement large-scale supervised learning architectures for ADT. In this study, we explored the untapped potential of these new datasets by addressing three key points: First, we reviewed recent trends in DL architectures and focused on two techniques, self-attention mechanisms and tatum-synchronous convolutions. Then, to mitigate the noise and bias that are inherent in crowdsourced data, we extended the training data with additional annotations. Finally, to quantify the potential of the data, we compared many training scenarios by combining up to six different datasets, including zero-shot evaluations. Our findings revealed that crowdsourced datasets outperform previously utilized datasets, and regardless of the DL architecture employed, they are sufficient in size and quality to train accurate models. By fully exploiting this data source, our models produced high-quality drum transcriptions, achieving state-of-the-art results. Thanks to this accuracy, our work can be more successfully used by musicians (e.g., to learn new musical pieces by reading, or to convert their performances to MIDI) and researchers in music information retrieval (e.g., to retrieve information from the notes instead of audio, such as the rhythm or structure of a piece).

Place, publisher, year, edition, pages
MDPI, 2023
Keywords
automatic drum transcription, crowdsourced dataset, self-attention mechanism, tatum
National Category
Signal Processing Computer Sciences
Identifiers
urn:nbn:se:umu:diva-216394 (URN)10.3390/signals4040042 (DOI)001177003200001 ()2-s2.0-85180709684 (Scopus ID)
Funder
Swedish National Infrastructure for Computing (SNIC)Swedish Research Council, 2022-06725Swedish Research Council, 2018-05973
Available from: 2023-11-10 Created: 2023-11-10 Last updated: 2025-04-24Bibliographically approved
Sankaran, A. & Bientinesi, P. (2022). A test for FLOPs as a discriminant for linear algebra algorithms. In: 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD): . Paper presented at 34th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2022, November 2-5, 2022 (pp. 221-230). IEEE
Open this publication in new window or tab >>A test for FLOPs as a discriminant for linear algebra algorithms
2022 (English)In: 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, 2022, p. 221-230Conference paper, Published paper (Refereed)
Abstract [en]

Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing strategy, i.e., an algorithm, and normally for one linear algebra expression many alternative algorithms exist. Although mathematically equivalent, those algorithms might exhibit significant differences in terms of performance. Several high-level languages and tools for matrix computations such as Julia, Armadillo, Linnea, etc., make algorithmic choices by minimizing the number of Floating Point Operations (FLOPs). However, there can be several algorithms that share the same (or have nearly identical) number of FLOPs; in many cases, these algorithms exhibit execution times which are statistically equivalent and one could arbitrarily select one of them as the best algorithm. It is however not unlikely to find cases where the execution times are significantly different from one another (despite the FLOP count being almost the same). It is also possible that the algorithm that minimizes FLOPs is not the one that minimizes execution time. In this work, we develop a methodology to test the reliability of FLOPs as discriminant for linear algebra algorithms. Given a set of algorithms (for an instance of a linear algebra expression) as input, the methodology ranks them into performance classes; i.e., multiple algorithms are allowed to share the same rank. To this end, we measure the algorithms iteratively until the changes in the ranks converge to a value close to zero. FLOPs are a valid discriminant for an instance if all the algorithms with minimum FLOPs are assigned the best rank; otherwise, the instance is regarded as an anomaly, which can then be used in the investigation of the root cause of performance differences.

Place, publisher, year, edition, pages
IEEE, 2022
Series
Proceedings (Symposium on Computer Architecture and High Performance Computing), ISSN 1550-6533
Keywords
Algorithm ranking, Linear algebra algorithms, Mathematical software performance, Performance Analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-203573 (URN)10.1109/SBAC-PAD55451.2022.00033 (DOI)000905612800023 ()2-s2.0-85145881711 (Scopus ID)9781665451550 (ISBN)
Conference
34th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2022, November 2-5, 2022
Available from: 2023-01-19 Created: 2023-01-19 Last updated: 2023-11-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4972-7097

Search in DiVA

Show all publications