Umeå University's logo

umu.sePublikasjoner
Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 27) Visa alla publikasjoner
Chen, Y., de Oliveira Castro, P., Bientinesi, P., Jansson, N. & Iakymchuk, R. (2026). Enabling mixed-precision in spectral element codes. Future Generation Computer Systems, 174, Article ID 107990.
Åpne denne publikasjonen i ny fane eller vindu >>Enabling mixed-precision in spectral element codes
Vise andre…
2026 (engelsk)Inngår i: Future Generation Computer Systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 174, artikkel-id 107990Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we propose a methodology for enabling mixed-precision with the help of computer arithmetic tools, roofline model, and computer arithmetic techniques. As case studies, we consider Nekbone (Nek5000 developers), a mini-application for the Computational Fluid Dynamics (CFD) solver Nek5000 (Fischer et al.), and a modern Neko (Jansson et al., 2024) CFD application. With the help of the Verificarlo (Denis et al., 2016) tool and computer arithmetic techniques, we introduce a strategy to address stagnation issues in the preconditioned Conjugate Gradient method in Nekbone and apply these insights to implement a mixed-precision version of Neko. We evaluate the derived mixed-precision versions of these codes by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, mixed-precision in Nekbone reduces time-to-solution by roughly 1.62x and energy-to-solution by 2.43x on MareNostrum 5, while in the real-world Neko application, the gain is up to 1.3x in both time and energy, with the accuracy that matches double-precision results.

sted, utgiver, år, opplag, sider
Elsevier, 2026
Emneord
Computer arithmetic tool, Conjugate gradient, Energy-to-solution, Mixed-precision, Neko, Roofline model, Verificarlo
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-242183 (URN)10.1016/j.future.2025.107990 (DOI)2-s2.0-105009726439 (Scopus ID)
Tilgjengelig fra: 2025-07-14 Laget: 2025-07-14 Sist oppdatert: 2025-07-14bibliografisk kontrollert
Sehlstedt, P., Brandejs, J., Bientinesi, P. & Karlsson, L. (2026). The software landscape for the density matrix renormalization group. Computer Physics Communications, 324, Article ID 110136.
Åpne denne publikasjonen i ny fane eller vindu >>The software landscape for the density matrix renormalization group
2026 (engelsk)Inngår i: Computer Physics Communications, ISSN 0010-4655, E-ISSN 1879-2944, Vol. 324, artikkel-id 110136Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The density matrix renormalization group (DMRG) algorithm is a cornerstone computational method for studying quantum many-body systems, renowned for its accuracy and adaptability. Because DMRG provides a general framework applicable across various fields such as materials science, quantum chemistry, and quantum computing, one might expect a shared, flexible library to serve most users. Nevertheless, numerous independent implementations continue to appear, resulting in significant duplication of effort. To identify collaboration opportunities that can promote a more unified approach, we map the rapidly expanding DMRG software landscape and provide a comprehensive comparison of features across 37 existing packages. When comparing key features, such as parallelism strategies for high-performance computing and symmetry-adapted formulations that enhance efficiency, we found significant overlap among the packages. This overlap suggests opportunities for collaboration to modularize common functionality—e.g., tensor operations, symmetry representations, and eigensolvers—as the packages are mostly independent and share few third-party library dependencies. More collaboration on modularization could reduce duplication of effort, improve interoperability, and enable prioritization and quicker spread of new advances. We believe the current lack of modularity is more socially driven than a technical issue; hence, we see raising awareness about the existing implementations as a first step in the right direction. Ultimately, this work emphasizes the value of greater cohesion through modularity, which would benefit DMRG software and related tensor-network-centered software, enabling the solution of more complex and ambitious problems. 

sted, utgiver, år, opplag, sider
Elsevier, 2026
Emneord
DMRG, Survey, Modularity
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-251711 (URN)10.1016/j.cpc.2026.110136 (DOI)001729801000001 ()2-s2.0-105033662580 (Scopus ID)
Forskningsfinansiär
EU, Horizon 2020EU, European Research Council
Tilgjengelig fra: 2026-04-02 Laget: 2026-04-02 Sist oppdatert: 2026-04-15bibliografisk kontrollert
Chen, Y., Castro, P. d., Bientinesi, P. & Iakymchuk, R. (2025). Enabling mixed-precision with the help of tools: a nekbone case study. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel processing and applied mathematics: 15Th International Conference, Ppam 2024, Ostrava, Czech Republic, September 8–11, 2024, Revised Selected Papers, Part I. Paper presented at 15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024, Ostrava, Czech Republic, September 8–11, 2024 (pp. 34-50). Cham: Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>Enabling mixed-precision with the help of tools: a nekbone case study
2025 (engelsk)Inngår i: Parallel processing and applied mathematics: 15Th International Conference, Ppam 2024, Ostrava, Czech Republic, September 8–11, 2024, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Cham: Springer Nature, 2025, s. 34-50Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the Computational Fluid Dynamics (CFD) solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model. We evaluate the derived mixed-precision program by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, the introduction of mixed-precision in Nekbone, reducing time-to-solution by 40.7% and energy-to-solution by 47% on 128 MPI ranks without sacrificing the accuracy.

sted, utgiver, år, opplag, sider
Cham: Springer Nature, 2025
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15579
Emneord
computer arithmetic tool, Conjugate Gradient, energy-to-solution, Mixed-precision, Nekbone, roofline model, Verificarlo
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-238100 (URN)10.1007/978-3-031-85697-6_3 (DOI)2-s2.0-105002711656 (Scopus ID)9783031856969 (ISBN)
Konferanse
15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024, Ostrava, Czech Republic, September 8–11, 2024
Tilgjengelig fra: 2025-05-05 Laget: 2025-05-05 Sist oppdatert: 2025-05-05bibliografisk kontrollert
López Sánchez, F., Karlsson, L. & Bientinesi, P. (2025). On the parenthesisations of matrix chains: all are useful, few are essential. Journal of combinatorial optimization, 49(3), Article ID 52.
Åpne denne publikasjonen i ny fane eller vindu >>On the parenthesisations of matrix chains: all are useful, few are essential
2025 (engelsk)Inngår i: Journal of combinatorial optimization, ISSN 1382-6905, E-ISSN 1573-2886, Vol. 49, nr 3, artikkel-id 52Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The product of a matrix chain consisting of n matrices can be computed in Cn-1 (Catalan’s number) different ways, each identified by a distinct parenthesisation of the chain. The best algorithm to select a parenthesisation that minimises the cost runs in O(nlogn) time. Approximate algorithms run in O(n) time and find solutions that are guaranteed to be within a certain factor from optimal; the best factor is currently 1.155. In this article, we first prove two results that characterise different parenthesisations, and then use those results to improve on the best known approximation algorithms. Specifically, we show that (a) each parenthesisation is optimal somewhere in the problem domain, and (b) exactly n+1 parenthesisations are essential in the sense that the removal of any one of them causes an unbounded penalty for an infinite number of problem instances. By focusing on essential parenthesisations, we improve on the best known approximation algorithm and show that the approximation factor is at most 1.143.

sted, utgiver, år, opplag, sider
Springer Nature, 2025
Emneord
Approximation algorithm, Linear algebra compilers, Matrix chain, Matrix multiplication
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-238200 (URN)10.1007/s10878-025-01290-7 (DOI)001466493300002 ()2-s2.0-105002887010 (Scopus ID)
Forskningsfinansiär
eSSENCE - An eScience Collaboration
Tilgjengelig fra: 2025-05-06 Laget: 2025-05-06 Sist oppdatert: 2025-05-06bibliografisk kontrollert
Sankaran, A., Karlsson, L. & Bientinesi, P. (2025). Ranking with ties based on noisy performance data. International Journal of Data Science and Analytics, 20, 4363-4384
Åpne denne publikasjonen i ny fane eller vindu >>Ranking with ties based on noisy performance data
2025 (engelsk)Inngår i: International Journal of Data Science and Analytics, ISSN 2364-415X, Vol. 20, s. 4363-4384Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

We consider the problem of ranking a set of objects based on their performance when the measurement of said performance is subject to noise. In this scenario, the performance is measured repeatedly, resulting in a range of measurements for each object. If the ranges of two objects do not overlap, then we consider one object as ‘better’ than the other, and we expect it to receive a higher rank; if, however, the ranges overlap, then the objects are incomparable, and we wish them to be assigned the same rank. Unfortunately, the incomparability relation of ranges is in general not transitive; as a consequence, in general the two requirements cannot be satisfied simultaneously, i.e., it is not possible to guarantee both distinct ranks for objects with separated ranges, and same rank for objects with overlapping ranges. This conflict leads to more than one reasonable way to rank a set of objects. Although the problem of ranking with ties has been widely studied, there remains a lack of clarity regarding what constitutes a set of reasonable rankings. In this paper, we explore the ambiguities that arise when ranking with ties, and define a set of reasonable rankings, which we call partial rankings. We develop and analyze three different methodologies to compute a partial ranking. Finally, we show how performance differences among objects can be investigated with the help of partial ranking.

sted, utgiver, år, opplag, sider
Springer, 2025
Emneord
Knowledge discovery, Noise, Partial orders, Performance, Ranking
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-236240 (URN)10.1007/s41060-025-00722-1 (DOI)001411719700001 ()2-s2.0-85218821095 (Scopus ID)
Forskningsfinansiär
German Research Foundation (DFG), IRTG 2379
Tilgjengelig fra: 2025-04-01 Laget: 2025-04-01 Sist oppdatert: 2025-11-28bibliografisk kontrollert
Zehren, M., Alunno, M. & Bientinesi, P. (2024). In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription. In: Proceedings of the 25th international society for music information retrieval conference: . Paper presented at 25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, USA, 10-14 november, 2024. (pp. 1060-1067). San Francisco: ISMIR
Åpne denne publikasjonen i ny fane eller vindu >>In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription
2024 (engelsk)Inngår i: Proceedings of the 25th international society for music information retrieval conference, San Francisco: ISMIR , 2024, s. 1060-1067Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The importance of automatic drum transcription lies in the potential to extract useful information from a musical track; however, the low reliability of the models for this task represents a limiting factor. Indeed, even though in the recent literature the quality of the generated transcription has improved thanks to the curation of large training datasets via crowdsourcing, there is still a large margin of improvement for this task to be considered solved. Aiming to steer the development of future models, we identify the most common errors from training and testing on the aforementioned crowdsourced datasets. We perform this study in three steps: First, we detail the quality of the transcription for each class of interest; second, we employ a new metric and a pseudo confusion matrix to quantify different mistakes in the estimations; last, we compute the agreement between different annotators of the same track to estimate the accuracy of the ground-truth. Our findings are twofold: On the one hand, we observe that the previously reported issue that less represented instruments (e.g., toms) are less reliably transcribed is mostly solved now. On the other hand, cymbal instruments have unprecedented relative low performance. We provide intuitive explanations as to why cymbal instruments are difficult to transcribe and we identify that they represent the main source of disagreement among annotators.

sted, utgiver, år, opplag, sider
San Francisco: ISMIR, 2024
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-228264 (URN)2-s2.0-85219129262 (Scopus ID)
Konferanse
25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, USA, 10-14 november, 2024.
Tilgjengelig fra: 2024-08-07 Laget: 2024-08-07 Sist oppdatert: 2025-04-02bibliografisk kontrollert
Sankaran, A., Zhukov, I., Frings, W. & Bientinesi, P. (2024). Inspection of I/O operations from system call traces using Directly-Follows-Graph. In: SC24-W: workshops of the international conference for high performance computing, networking, storage and analysis. Paper presented at 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024, Atlanta, USA, November 17-22, 2024 (pp. 1562-1575). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Inspection of I/O operations from system call traces using Directly-Follows-Graph
2024 (engelsk)Inngår i: SC24-W: workshops of the international conference for high performance computing, networking, storage and analysis, IEEE, 2024, s. 1562-1575Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We aim to identify the differences in Input/Output (I/O) behavior between multiple user programs through the inspection of system calls (i.e., requests made to the operating system). A typical program issues a large number of I/O requests to the operating system, thereby making the process of inspection challenging. In this paper, we address this challenge by presenting a methodology to synthesize I/O system call traces into a specific type of directed graph, known as the Directly-Follows-Graph (DFG). Based on the DFG, we present a technique to compare the traces from multiple programs or different configurations of the same program, such that it is possible to identify the differences in the I/O behavior. We apply our methodology to the IOR benchmark, and compare the contentions for file accesses when the benchmark is run with different options for file output and software interface.

sted, utgiver, år, opplag, sider
IEEE, 2024
Emneord
Directly-Follows Graph, High-Performance Computing, Input/Output, Performance Analysis, Process Mining, strace
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-235656 (URN)10.1109/SCW63240.2024.00196 (DOI)2-s2.0-85217181573 (Scopus ID)9798350355543 (ISBN)9798350355550 (ISBN)
Konferanse
2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024, Atlanta, USA, November 17-22, 2024
Tilgjengelig fra: 2025-02-26 Laget: 2025-02-26 Sist oppdatert: 2025-02-26bibliografisk kontrollert
Zehren, M., Alunno, M. & Bientinesi, P. (2024). Interpretability of methods for switch point detection in electronic dance music. Signals, 5(4), 642-658
Åpne denne publikasjonen i ny fane eller vindu >>Interpretability of methods for switch point detection in electronic dance music
2024 (engelsk)Inngår i: Signals, E-ISSN 2624-6120, Vol. 5, nr 4, s. 642-658Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Switch points are a specific kind of cue point that DJs carefully look for when mixing music tracks. As the name says, a switch point is the point in time where the current track in a DJ mix is replaced by the upcoming track. Being able to identify these positions is a first step toward the interpretation and the emulation of DJ mixes. With the aim of automatically detecting switch points, we evaluate one experience-driven and several statistics-driven methods. By comparing the decision process of each method, contrasted by their performance, we deduce the characteristics linked to switch points. Specifically, we identify the most impactful features for their detection, namely, the novelty in the signal energy, the timbre, the number of drum onsets, and the harmony. Furthermore, we expose multiple interactions among these features.

sted, utgiver, år, opplag, sider
MDPI, 2024
Emneord
cue point detection, DJ mixing, electronic dance music, switch points
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-234020 (URN)10.3390/signals5040036 (DOI)001386624600001 ()2-s2.0-85213493298 (Scopus ID)
Forskningsfinansiär
eSSENCE - An eScience Collaboration
Tilgjengelig fra: 2025-01-13 Laget: 2025-01-13 Sist oppdatert: 2025-02-21bibliografisk kontrollert
Zehren, M., Alunno, M. & Bientinesi, P. (2023). High-quality and reproducible automatic drum transcription from crowdsourced data. Signals, 4(4), 768-787
Åpne denne publikasjonen i ny fane eller vindu >>High-quality and reproducible automatic drum transcription from crowdsourced data
2023 (engelsk)Inngår i: Signals, E-ISSN 2624-6120, Vol. 4, nr 4, s. 768-787Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data obtained from crowds of annotators have made it possible to implement large-scale supervised learning architectures for ADT. In this study, we explored the untapped potential of these new datasets by addressing three key points: First, we reviewed recent trends in DL architectures and focused on two techniques, self-attention mechanisms and tatum-synchronous convolutions. Then, to mitigate the noise and bias that are inherent in crowdsourced data, we extended the training data with additional annotations. Finally, to quantify the potential of the data, we compared many training scenarios by combining up to six different datasets, including zero-shot evaluations. Our findings revealed that crowdsourced datasets outperform previously utilized datasets, and regardless of the DL architecture employed, they are sufficient in size and quality to train accurate models. By fully exploiting this data source, our models produced high-quality drum transcriptions, achieving state-of-the-art results. Thanks to this accuracy, our work can be more successfully used by musicians (e.g., to learn new musical pieces by reading, or to convert their performances to MIDI) and researchers in music information retrieval (e.g., to retrieve information from the notes instead of audio, such as the rhythm or structure of a piece).

sted, utgiver, år, opplag, sider
MDPI, 2023
Emneord
automatic drum transcription, crowdsourced dataset, self-attention mechanism, tatum
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-216394 (URN)10.3390/signals4040042 (DOI)001177003200001 ()2-s2.0-85180709684 (Scopus ID)
Forskningsfinansiär
Swedish National Infrastructure for Computing (SNIC)Swedish Research Council, 2022-06725Swedish Research Council, 2018-05973
Tilgjengelig fra: 2023-11-10 Laget: 2023-11-10 Sist oppdatert: 2025-04-24bibliografisk kontrollert
Sankaran, A. & Bientinesi, P. (2022). A test for FLOPs as a discriminant for linear algebra algorithms. In: 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD): . Paper presented at 34th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2022, November 2-5, 2022 (pp. 221-230). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>A test for FLOPs as a discriminant for linear algebra algorithms
2022 (engelsk)Inngår i: 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, 2022, s. 221-230Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing strategy, i.e., an algorithm, and normally for one linear algebra expression many alternative algorithms exist. Although mathematically equivalent, those algorithms might exhibit significant differences in terms of performance. Several high-level languages and tools for matrix computations such as Julia, Armadillo, Linnea, etc., make algorithmic choices by minimizing the number of Floating Point Operations (FLOPs). However, there can be several algorithms that share the same (or have nearly identical) number of FLOPs; in many cases, these algorithms exhibit execution times which are statistically equivalent and one could arbitrarily select one of them as the best algorithm. It is however not unlikely to find cases where the execution times are significantly different from one another (despite the FLOP count being almost the same). It is also possible that the algorithm that minimizes FLOPs is not the one that minimizes execution time. In this work, we develop a methodology to test the reliability of FLOPs as discriminant for linear algebra algorithms. Given a set of algorithms (for an instance of a linear algebra expression) as input, the methodology ranks them into performance classes; i.e., multiple algorithms are allowed to share the same rank. To this end, we measure the algorithms iteratively until the changes in the ranks converge to a value close to zero. FLOPs are a valid discriminant for an instance if all the algorithms with minimum FLOPs are assigned the best rank; otherwise, the instance is regarded as an anomaly, which can then be used in the investigation of the root cause of performance differences.

sted, utgiver, år, opplag, sider
IEEE, 2022
Serie
Proceedings (Symposium on Computer Architecture and High Performance Computing), ISSN 1550-6533
Emneord
Algorithm ranking, Linear algebra algorithms, Mathematical software performance, Performance Analysis
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-203573 (URN)10.1109/SBAC-PAD55451.2022.00033 (DOI)000905612800023 ()2-s2.0-85145881711 (Scopus ID)9781665451550 (ISBN)
Konferanse
34th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2022, November 2-5, 2022
Tilgjengelig fra: 2023-01-19 Laget: 2023-01-19 Sist oppdatert: 2023-11-10bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-4972-7097