umu.sePublications

Please wait ... |

Link to record
http://umu.diva-portal.org/smash/person.jsf?pid=authority-person:63919 $(function(){PrimeFaces.cw("InputTextarea","widget_formSmash_upper_j_idt122_recordDirectLink",{id:"formSmash:upper:j_idt122:recordDirectLink",widgetVar:"widget_formSmash_upper_j_idt122_recordDirectLink",autoResize:true});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_upper_j_idt122_j_idt124",{id:"formSmash:upper:j_idt122:j_idt124",widgetVar:"widget_formSmash_upper_j_idt122_j_idt124",target:"formSmash:upper:j_idt122:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});

Permanent link

Direct link

Adlerborn, Björn

Open this publication in new window or tab >>Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling### Adlerborn, Björn

### Karlsson, Lars

### Kågström, Bo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_some",{id:"formSmash:j_idt184:0:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_otherAuthors",{id:"formSmash:j_idt184:0:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_otherAuthors",multiple:true}); 2018 (English)In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 40, no 2, p. C157-C180Article in journal (Refereed) Published
##### Abstract [en]

##### Place, publisher, year, edition, pages

Society for Industrial and Applied Mathematics, 2018
##### Keywords

generalized eigenvalue problem, Hessenberg-triangular reduction, parallel algorithms, wavefront scheduling
##### National Category

Computational Mathematics
##### Identifiers

urn:nbn:se:umu:diva-147843 (URN)10.1137/16M1103890 (DOI)000431100400039 ()
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt359",{id:"formSmash:j_idt184:0:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt365",{id:"formSmash:j_idt184:0:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt371",{id:"formSmash:j_idt184:0:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt371",multiple:true});
#####

Available from: 2018-05-18 Created: 2018-05-18 Last updated: 2018-06-09Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).

A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cacheblocked algorithm by K degrees agstrom et al. [*BIT*, 48 (2008), pp. 563 584]. A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new formulation is an improvement of the state of the art and also identify factors that limit its scalability.

Open this publication in new window or tab >>Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling### Adlerborn, Björn

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kågström, Bo

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Karlsson, Lars

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_some",{id:"formSmash:j_idt184:1:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_otherAuthors",{id:"formSmash:j_idt184:1:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_otherAuthors",multiple:true}); 2016 (English)Report (Other academic)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Umeå: Department of Computing Science, Umeå University, 2016. p. 26
##### Series

Report / UMINF, ISSN 0348-0542 ; 16.10
##### National Category

Computational Mathematics
##### Identifiers

urn:nbn:se:umu:diva-120002 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt359",{id:"formSmash:j_idt184:1:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt365",{id:"formSmash:j_idt184:1:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt371",{id:"formSmash:j_idt184:1:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt371",multiple:true});
#####

Available from: 2016-05-04 Created: 2016-05-04 Last updated: 2018-06-07Bibliographically approved

A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cache-blocked algorithm by Kågstrom, Kressner, E.S. Quintana-Ortí, and G. Quintana-Ortí (2008). A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new algorithm is an improvement of the state of the art but also identifies factors that currently limit its scalability.

Open this publication in new window or tab >>Parallel Algorithms and Library Software for the Generalized Eigenvalue Problem on Distributed Memory Computer Systems### Adlerborn, Björn

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_some",{id:"formSmash:j_idt184:2:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_otherAuthors",{id:"formSmash:j_idt184:2:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_otherAuthors",multiple:true}); 2016 (English)Licentiate thesis, comprehensive summary (Other academic)
##### Alternative title[sv]

Parallella algoritmer och biblioteksprogramvara för det generaliserade egenvärdesproblemet på datorsystem med distribuerat minne
##### Abstract [en]

##### Place, publisher, year, edition, pages

Umeå: Umeå universitet, 2016. p. 18
##### Series

Report / UMINF, ISSN 0348-0542 ; 16.11
##### National Category

Computer Sciences
##### Research subject

Computer and Information Science
##### Identifiers

urn:nbn:se:umu:diva-119439 (URN)978-91-7601-491-2 (ISBN)
##### Presentation

2016-05-27, MC313, Umeå universitet, Umeå, 10:00 (English)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt359",{id:"formSmash:j_idt184:2:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt359",multiple:true});
##### Supervisors

### Kågström, Bo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt365",{id:"formSmash:j_idt184:2:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt371",{id:"formSmash:j_idt184:2:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt371",multiple:true});
#####

Available from: 2016-04-19 Created: 2016-04-19 Last updated: 2018-06-07Bibliographically approved

We present and discuss algorithms and library software for solving the generalized non-symmetric eigenvalue problem (GNEP) on high performance computing (HPC) platforms with distributed memory. Such problems occur frequently in computational science and engineering, and our contributions make it possible to solve GNEPs fast and accurate in parallel using state-of-the-art HPC systems. A generalized eigenvalue problem corresponds to finding scalars y and vectors x such that Ax = yBx, where A and B are real square matrices. A nonzero x that satisfies the GNEP equation is called an eigenvector of the ordered pair (A,B), and the scalar y is the associated eigenvalue. Our contributions include parallel algorithms for transforming a matrix pair (A,B) to a generalized Schur form (S,T), where S is quasi upper triangular and T is upper triangular. The eigenvalues are revealed from the diagonals of S and T. Moreover, for a specified set of eigenvalues an associated pair of deflating subspaces can be computed, which typically is requested in various applications. In the first stage the matrix pair (A,B) is reduced to a Hessenberg-triangular form (H,T), where H is upper triangular with one nonzero subdiagonal and T is upper triangular, in a finite number of steps. The second stage reduces the matrix pair further to generalized Schur form (S,T) using an iterative QZ-based method. Outgoing from a one-stage method for the reduction from (A,B) to (H,T), a novel parallel algorithm is developed. In brief, a delayed update technique is applied to several partial steps, involving low level operations, before associated accumulated transformations are applied in a blocked fashion which together with a wave-front task scheduler makes the algorithm scale when running in a parallel setting. The potential presence of infinite eigenvalues makes a generalized eigenvalue problem ill-conditioned. Therefore the parallel algorithm for the second stage, reduction to (S,T) form, continuously scan for and robustly deflate infinite eigenvalues. This will reduce the impact so that they do not interfere with other real eigenvalues or are misinterpreted as real eigenvalues. In addition, our parallel iterative QZ-based algorithm makes use of multiple implicit shifts and an aggressive early deflation (AED) technique, which radically speeds up the convergence. The multi-shift strategy is based on independent chains of so called coupled bulges and computational windows which is an important source of making the algorithm scalable. The parallel algorithms have been implemented in state-of-the-art library software. The performance is demonstrated and evaluated using up to 1600 CPU cores for problems with matrices as large as 100000 x 100000. Our library software is described in a User Guide. The software is, optionally, tunable via a set of parameters for various thresholds and buffer sizes etc. These parameters are discussed, and recommended values are specified which should result in reasonable performance on HPC systems similar to the ones we have been running on.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Open this publication in new window or tab >>PDHGEQZ user guide### Adlerborn, Björn

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kågström, Bo

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kressner, Daniel

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_some",{id:"formSmash:j_idt184:3:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_otherAuthors",{id:"formSmash:j_idt184:3:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_otherAuthors",multiple:true}); 2015 (English)Report (Other academic)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Umeå: Department of Computing Science, Umeå University, 2015. p. 16
##### Series

Report / UMINF, ISSN 0348-0542 ; 15.12
##### Keywords

software, userguide, generalized eigenvalue problem, nonsymmetric QZ algorithm, multishift, bulge chasing, infinite eigenvalues, parallel algorithms, level 3 performance, aggressive early deflation
##### National Category

Computational Mathematics
##### Identifiers

urn:nbn:se:umu:diva-120008 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt359",{id:"formSmash:j_idt184:3:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt365",{id:"formSmash:j_idt184:3:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt371",{id:"formSmash:j_idt184:3:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt371",multiple:true});
#####

Available from: 2016-05-04 Created: 2016-05-04 Last updated: 2018-06-07Bibliographically approved

SB–MATHICSE–ANCHP, EPF Lausanne.

Given a general matrix pair (A,B) with real entries, we provide software routines for computing a generalized Schur decomposition (S, T). The real and complex conjugate pairs of eigenvalues appear as 1×1 and 2×2 blocks, respectively, along the diagonals of (S, T) and can be reordered in any order. Typically, this functionality is used to compute orthogonal bases for a pair of deflating subspaces corresponding to a selected set of eigenvalues. The routines are written in Fortran 90 and targets distributed memory machines.

Open this publication in new window or tab >>A parallel QZ algorithm for distributed memory HPC systems### Adlerborn, Björn

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kågström, Bo

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kressner, Daniel

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_some",{id:"formSmash:j_idt184:4:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_otherAuthors",{id:"formSmash:j_idt184:4:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_otherAuthors",multiple:true}); 2014 (English)In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 36, no 5, p. C480-C503Article in journal (Refereed) Published
##### Abstract [en]

##### Place, publisher, year, edition, pages

SIAM publications, 2014
##### Keywords

generalized eigenvalue problem, nonsymmetric QZ algorithm, multishift, bulge chasing, infinite genvalues, parallel algorithms, level 3 performance, aggressive early deflation, MMEL J, 1993, ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, V19, P175
##### National Category

Mathematics Computer Sciences
##### Identifiers

urn:nbn:se:umu:diva-97903 (URN)10.1137/140954817 (DOI)000346123200025 ()
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt359",{id:"formSmash:j_idt184:4:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt365",{id:"formSmash:j_idt184:4:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt371",{id:"formSmash:j_idt184:4:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt371",multiple:true});
#####

Available from: 2015-01-12 Created: 2015-01-08 Last updated: 2018-06-07Bibliographically approved

Appearing frequently in applications, generalized eigenvalue problems represent one of the core problems in numerical linear algebra. The QZ algorithm of Moler and Stewart is the most widely used algorithm for addressing such problems. Despite its importance, little attention has been paid to the parallelization of the QZ algorithm. The purpose of this work is to fill this gap. We propose a parallelization of the QZ algorithm that incorporates all modern ingredients of dense eigensolvers, such as multishift and aggressive early deflation techniques. To deal with (possibly many) infinite eigenvalues, a new parallel deflation strategy is developed. Numerical experiments for several random and application examples demonstrate the effectiveness of our algorithm on two different distributed memory HPC systems.

Open this publication in new window or tab >>Parallel Variants of the Multishift QZ Algorithm with Advanced Deflation Techniques### Adlerborn, Björn

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kågström, Bo

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).### Kressner, Daniel

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_some",{id:"formSmash:j_idt184:5:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_otherAuthors",{id:"formSmash:j_idt184:5:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_otherAuthors",multiple:true}); 2007 (English)In: Applied Parallel Computing - State of the Art in Scientific Computing: 8th International Workshop, PARA 2006, Springer , 2007, p. 117-126Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer, 2007
##### Series

Lecture Notes in Computer Science ; LNCS 4699
##### Identifiers

urn:nbn:se:umu:diva-23166 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt359",{id:"formSmash:j_idt184:5:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt365",{id:"formSmash:j_idt184:5:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt371",{id:"formSmash:j_idt184:5:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt371",multiple:true});
#####

Available from: 2009-06-02 Created: 2009-06-02 Last updated: 2018-06-08

The QZ algorithm reduces a regular matrix pair to generalized Schur form, which can be used to address the generalized eigenvalue problem. This paper summarizes recent work on improving the performance of the QZ algorithm on serial machines and work in progress on a novel parallel implementation. In both cases, the QZ iterations are based on chasing chains of tiny bulges. This allows to formulate the majority of the computation in terms of matrix-matrix multiplications, resulting in natural parallelism and better performance on modern computing systems with memory hierarchies. In addition, advanced deflation strategies are used, specifically the so called aggressive early deflation, leading to a considerable convergence acceleration and consequently to a reduction of floating point operations and computing time.