Umeå University's logo

umu.sePublikasjoner
Endre søk
Link to record
Permanent link

Direct link
Eljammaly, Mahmoud
Publikasjoner (8 av 8) Visa alla publikasjoner
Karlsson, L., Eljammaly, M. & Myllykoski, M. (2019). D6.5 Evaluation of auto-tuning techniques. NLAFET Consortium; Umeå University
Åpne denne publikasjonen i ny fane eller vindu >>D6.5 Evaluation of auto-tuning techniques
2019 (engelsk)Rapport (Annet vitenskapelig)
sted, utgiver, år, opplag, sider
NLAFET Consortium; Umeå University, 2019. s. 27
HSV kategori
Forskningsprogram
datalogi; matematik
Identifikatorer
urn:nbn:se:umu:diva-168425 (URN)
Prosjekter
NLAFET
Merknad

This work is c by the NLAFET Consortium, 2015–2018. Its duplication is allowed only for personal, educational, or research uses.

Tilgjengelig fra: 2020-02-25 Laget: 2020-02-25 Sist oppdatert: 2020-02-26bibliografisk kontrollert
Eljammaly, M., Karlsson, L. & Kågström, B. (2018). An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm. In: ICPE '18 Companion of the 2018 ACM/SPEC International Conference on Performance Engineering: . Paper presented at International Conference on Performance Engineering (ICPE 2018), Berlin, Germany, April 9-13, 2018 (pp. 5-8). ACM Digital Library
Åpne denne publikasjonen i ny fane eller vindu >>An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm
2018 (engelsk)Inngår i: ICPE '18 Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, ACM Digital Library, 2018, , s. 4s. 5-8Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
ACM Digital Library, 2018. s. 4
Emneord
Auto-tuning, Tuning framework, Binning, Search space decomposition, Multistage search, Hessenberg reduction, NUMA-aware
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-154392 (URN)10.1145/3185768.3186304 (DOI)000744421000002 ()2-s2.0-85052016714 (Scopus ID)978-1-4503-5629-9 (ISBN)
Konferanse
International Conference on Performance Engineering (ICPE 2018), Berlin, Germany, April 9-13, 2018
Tilgjengelig fra: 2018-12-17 Laget: 2018-12-17 Sist oppdatert: 2023-09-05bibliografisk kontrollert
Myllykoski, M., Karlsson, L., Kågström, B., Eljammaly, M., Pranesh, S. & Zounon, M. (2018). D2.6 Prototype Software for Eigenvalue Problem Solvers. NLAFET Consortium; Umeå University
Åpne denne publikasjonen i ny fane eller vindu >>D2.6 Prototype Software for Eigenvalue Problem Solvers
Vise andre…
2018 (engelsk)Rapport (Annet vitenskapelig)
sted, utgiver, år, opplag, sider
NLAFET Consortium; Umeå University, 2018. s. 32
HSV kategori
Forskningsprogram
matematik; datalogi
Identifikatorer
urn:nbn:se:umu:diva-170222 (URN)
Prosjekter
NLAFET
Merknad

Part of: Public Deliverables: WP2 – Dense Linear Systems and Eigenvalue Problem Solvers

Tilgjengelig fra: 2020-04-29 Laget: 2020-04-29 Sist oppdatert: 2020-05-05bibliografisk kontrollert
Eljammaly, M. (2018). Identification and tuning of algorithmic parameters in parallel matrix computations: Hessenberg reduction and tensor storage format conversion. (Licentiate dissertation). Umeå: Umeå universitet
Åpne denne publikasjonen i ny fane eller vindu >>Identification and tuning of algorithmic parameters in parallel matrix computations: Hessenberg reduction and tensor storage format conversion
2018 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

This thesis considers two problems in numerical linear algebra and high performance computing (HPC): (i) the parallelization of a new blocked Hessenberg reduction algorithm using Parallel Cache Assignment (PCA) and the tunability of its algorithm parameters, and (ii) storing and manipulating dense tensors on shared memory HPC systems.

The Hessenberg reduction appears in the Aggressive Early Deflation (AED) process for identifying converged eigenvalues in the distributed multishift QR algorithm (state-of-the-art algorithm for computing all eigenvalues for dense square matrices). Since the AED process becomes a parallel bottleneck it motivates a further study of AED components. We present a new Hessenberg reduction algorithm based on PCA which is NUMA-aware and targeting relatively small problem sizes on shared memory systems. The tunability of the algorithm parameters are investigated. A simple off-line tuning is presented and the performance of the new Hessenberg reduction algorithm is compared to its counterparts from LAPACK and ScaLAPACK. The new algorithm outperforms LAPACK in all tested cases and outperforms ScaLAPACK in problems smaller than order 1500, which are common problem sizes for AED in the context of the distributed multishift QR algorithm.

We also investigate automatic tuning of the algorithm parameters. The parameters span a huge search space and it is impractical to tune them using standard auto-tuning and optimization techniques. We present a modular auto-tuning framework which applies: search space decomposition, binning, and multi-stage search to enable searching the huge search space efficiently. The framework using these techniques exposes the underlying subproblems which allows using standard auto-tuning methods to tune them. In addition, the framework defines an abstract interface, which combined with its modular design, allows testing various tuning algorithms.

In the last part of the thesis, the focus is on the problem of storing and manipulating dense tensors. Developing open source tensor algorithms and applications is hard due to the lack of open source software for fundamental tensor operations. We present a software library dten, which includes tools for storing dense tensors in shared memory and converting a tensor storage format from one canonical form to another. The library provides two different ways to perform the conversion in parallel, in-place and out-of-place. The conversion involves moving blocks of contiguous data and are done to maximize the size of the blocks to move. In addition, the library supports tensor matricization for one or two tensors at the same time. The latter case is important in preparing tensors for contraction operations. The library is general purpose and highly flexible.

sted, utgiver, år, opplag, sider
Umeå: Umeå universitet, 2018. s. 15
Serie
Report / UMINF, ISSN 0348-0542 ; UMINF 18.22
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-145345 (URN)978-91-7601-843-9 (ISBN)
Veileder
Tilgjengelig fra: 2018-03-01 Laget: 2018-02-28 Sist oppdatert: 2018-06-09bibliografisk kontrollert
Eljammaly, M., Karlsson, L. & Kågström, B. (2018). On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment. In: Wyrzykowski R., Dongarra J., Deelman E., Karczewski K. (Ed.), Parallel Processing and Applied Mathematics. PPAM 2017: Part 1. Paper presented at 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Lublin, Poland, 10–13 September, 2017 (pp. 579-589). Springer
Åpne denne publikasjonen i ny fane eller vindu >>On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment
2018 (engelsk)Inngår i: Parallel Processing and Applied Mathematics. PPAM 2017: Part 1 / [ed] Wyrzykowski R., Dongarra J., Deelman E., Karczewski K., Springer, 2018, s. 579-589Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The reduction of a general dense square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallel algorithms exist, the Hessenberg reduction is one of the bottlenecks in AED, a main part in state-of-the-art software for the distributed multishift QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the sensitivity of its algorithmic parameters. The proposed algorithm is faster than LAPACK for all problem sizes and faster than ScaLAPACK for the relatively small problem sizes typical for AED.

sted, utgiver, år, opplag, sider
Springer, 2018
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
Emneord
Hessenberg reduction, Parallel cache assignment, NUMA-aware algorithm, Shared-memory, Tunable parameters, Off-line tuning
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-145342 (URN)10.1007/978-3-319-78024-5_50 (DOI)000458563300050 ()2-s2.0-85044775461 (Scopus ID)978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
Konferanse
12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Lublin, Poland, 10–13 September, 2017
Merknad

Tilgjengelig fra: 2018-02-28 Laget: 2018-02-28 Sist oppdatert: 2023-03-23bibliografisk kontrollert
Eljammaly, M., Karlsson, L. & Kågström, B. (2017). An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm. Umeå: Department of computing science, Umeå university
Åpne denne publikasjonen i ny fane eller vindu >>An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm
2017 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

The performance of a recently developed Hessenberg reduction algorithm greatly depends on the values chosen for its tunable parameters. The search space is huge combined with other complications makes the problem hard to solve effectively with generic methods and tools. We describe a modular auto-tuning framework in which the underlying optimization algorithm is easy to substitute. The framework exposes sub-problems of standard auto-tuning type for which existing generic methods can be reused. The outputs of concurrently executing sub-tuners are assembled by the framework into a solution to the original problem.

sted, utgiver, år, opplag, sider
Umeå: Department of computing science, Umeå university, 2017. s. 14
Serie
Report / UMINF, ISSN 0348-0542 ; 17.19
Emneord
Auto-tuning, Tuning framework, Binning, Search space decomposition, Multistage search, Hessenberg reduction, NUMA-aware
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-145297 (URN)
Tilgjengelig fra: 2018-02-28 Laget: 2018-02-28 Sist oppdatert: 2018-06-09bibliografisk kontrollert
Eljammaly, M. & Karlsson, L. (2016). A library for storing and manipulating dense tensors. Umeå: Department of computing science, Umeå university
Åpne denne publikasjonen i ny fane eller vindu >>A library for storing and manipulating dense tensors
2016 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

Aiming to build a layered infrastructure for high-performance dense tensor applications, we present a library, called dten, for storing and manipulating dense tensors. The library focuses on storing dense tensors in canonical storage formats and converting between storage formats in parallel. In addition, it supports tensor matricization in different ways. The library is general-purpose and provides a high degree of flexibility.

sted, utgiver, år, opplag, sider
Umeå: Department of computing science, Umeå university, 2016. s. 21
Serie
Report / UMINF, ISSN 0348-0542 ; 17.22
Emneord
Dense tensors, canonical storage format, tensor matricization, tensor storage format conversion, out-of-place conversion, in-place conversion
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-145341 (URN)
Tilgjengelig fra: 2018-02-28 Laget: 2018-02-28 Sist oppdatert: 2018-06-09bibliografisk kontrollert
Eljammaly, M., Karlsson, L. & Kågström, B. (2016). Evaluation of the Tunability of a New NUMA-Aware Hessenberg Reduction Algorithm. Umeå University
Åpne denne publikasjonen i ny fane eller vindu >>Evaluation of the Tunability of a New NUMA-Aware Hessenberg Reduction Algorithm
2016 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

The reduction of a general dense and square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallel algorithms exist, the Hessenberg reduction is still one of the bottlenecks in state-of-the-art software for the distributed QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the tunability of its algorithmic parameters. The proposed algorithm can be faster than LAPACK and ScaLAPACK for small problem sizes. In addition, evaluating the algorithmic parameters shows that there is potential for auto-tuning some of the parameters.

sted, utgiver, år, opplag, sider
Umeå University, 2016. s. 26
Serie
Report / UMINF, ISSN 0348-0542 ; 16.21
Emneord
Hessenberg reduction, parallel cache assignment, NUMA-aware algorithm, shared-memory algorithm, tunable parameters, off-line tuning
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-152576 (URN)
Tilgjengelig fra: 2018-10-14 Laget: 2018-10-14 Sist oppdatert: 2020-07-09bibliografisk kontrollert
Organisasjoner