umu.sePublications
Change search
Refine search result
12 1 - 50 of 90
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Adlerborn, Björn
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling2018In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 40, no 2, p. C157-C180Article in journal (Refereed)
    Abstract [en]

    A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cacheblocked algorithm by K degrees agstrom et al. [BIT, 48 (2008), pp. 563 584]. A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new formulation is an improvement of the state of the art and also identify factors that limit its scalability.

  • 2.
    Adlerborn, Björn
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Distributed one-stage Hessenberg-triangular reduction with wavefront scheduling2016Report (Other academic)
    Abstract [en]

    A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cache-blocked algorithm by Kågstrom, Kressner, E.S. Quintana-Ortí, and G. Quintana-Ortí (2008). A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new algorithm is an improvement of the state of the art but also identifies factors that currently limit its scalability.

  • 3.
    Adlerborn, Björn
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kressner, Daniel
    A parallel QZ algorithm for distributed memory HPC systems2014In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 36, no 5, p. C480-C503Article in journal (Refereed)
    Abstract [en]

    Appearing frequently in applications, generalized eigenvalue problems represent one of the core problems in numerical linear algebra. The QZ algorithm of Moler and Stewart is the most widely used algorithm for addressing such problems. Despite its importance, little attention has been paid to the parallelization of the QZ algorithm. The purpose of this work is to fill this gap. We propose a parallelization of the QZ algorithm that incorporates all modern ingredients of dense eigensolvers, such as multishift and aggressive early deflation techniques. To deal with (possibly many) infinite eigenvalues, a new parallel deflation strategy is developed. Numerical experiments for several random and application examples demonstrate the effectiveness of our algorithm on two different distributed memory HPC systems.

  • 4.
    Adlerborn, Björn
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kressner, Daniel
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Parallel Variants of the Multishift QZ Algorithm with Advanced Deflation Techniques2007In: Applied Parallel Computing - State of the Art in Scientific Computing: 8th International Workshop, PARA 2006, Springer , 2007, p. 117-126Conference paper (Refereed)
    Abstract [en]

    The QZ algorithm reduces a regular matrix pair to generalized Schur form, which can be used to address the generalized eigenvalue problem. This paper summarizes recent work on improving the performance of the QZ algorithm on serial machines and work in progress on a novel parallel implementation. In both cases, the QZ iterations are based on chasing chains of tiny bulges. This allows to formulate the majority of the computation in terms of matrix-matrix multiplications, resulting in natural parallelism and better performance on modern computing systems with memory hierarchies. In addition, advanced deflation strategies are used, specifically the so called aggressive early deflation, leading to a considerable convergence acceleration and consequently to a reduction of floating point operations and computing time.

  • 5.
    Adlerborn, Björn
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kressner, Daniel
    SB–MATHICSE–ANCHP, EPF Lausanne.
    PDHGEQZ user guide2015Report (Other academic)
    Abstract [en]

    Given a general matrix pair (A,B) with real entries, we provide software routines for computing a generalized Schur decomposition (S, T). The real and complex conjugate pairs of eigenvalues appear as 1×1 and 2×2 blocks, respectively, along the diagonals of (S, T) and can be reordered in any order. Typically, this functionality is used to compute orthogonal bases for a pair of deflating subspaces corresponding to a selected set of eigenvalues. The routines are written in Fortran 90 and targets distributed memory machines.

  • 6.
    Andersson, Per
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Granat, Robert
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Jonsson, Isak
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Parallel Algorithms for Triangular Periodic Sylvester-Type Matrix Equations2008In: Euro-Par 2008 - Parallel Processing: 14th International Euro-Par Conference Conference Location: Las Palmas de Gran Canaria, SPAIN, Springer , 2008, p. 780-789Conference paper (Refereed)
    Abstract [en]

    We present parallel algorithms for triangular periodic Sylvester-type matrix equations, conceptually being the third step of a periodic Bartels-Stewart-like solution method for general periodic Sylvester-type matrix equations based on variants of the periodic Schur decomposition. The presented algorithms are designed and implemented in the framework of the recently developed HPG library SCASY and are based on explicit blocking, 2-dimensional block cyclic data distribution and a wavefront-like traversal of the right hand side matrices. High performance is obtained by rich usage of level 3 BLAS operations. It is also demonstrated how several important key concepts of SCASY regarding communications and the treatment of quasi-triangular coefficient matrices are generalized to the periodic case. Some experimental results from a distributed memory Linux cluster demonstrate are also presented.

  • 7.
    Ciegis, Raimondas
    et al.
    Vilnius Gediminas Technical University, Lithuania.
    Henty, David
    University of Edingburg, UK.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Zilinskas, Julius
    Vilnius Gediminas Technical University, Lithuania.
    Parallel Scientific Computing and Optimization: Advances with Applications2009Book (Other academic)
  • 8.
    Dackland, Krister
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    A ring-oriented approach for block matrix factorizations on shared and distributed memory architectures1993In: Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing / [ed] R. F. Sincovec et al., Norfolk: SIAM Publications , 1993, p. 330-338Conference paper (Refereed)
    Abstract [en]

    A block (column) wrap-mapping approach for design of parallel block matrix factorization algorithms that are (trans)portable over and between shared memory multiprocessors (SMM) and distributed memory multicomputers (DMM) is presented. By reorganizing the matrix on the SMM architecture, the same ring-oriented algorithms can be used on both SMM and DMM systems with all machine dependencies comprised to a small set of communication routines. The algorithms are described on high level with focus on portability and scalability aspects. Implementation aspects of the LU , Cholesky, and QR factorizations and machine specific communication routines for some SMM and DMM systems are discussed. Timing results show that our portable algorithms have similar performance as machine specific implementations. 1 Introduction With the introduction of advanced parallel computer architectures a demand for efficient and portable algorithms has emerged. Several attempts to design algorithms and implementat.

  • 9.
    Dackland, Krister
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Van Loan, C.
    Parallel block matrix factorizations on the shared memory multiprocessor IBM 3090 VF/600J1992In: International Journal of Supercomputer Applications, ISSN 0890-2720, Vol. 6, no 1, p. 69-97Article in journal (Refereed)
  • 10.
    Dackland, Krister
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Van Loan, Charles
    Design and evaluation of parallel block algorithms:  LU factorization on an IBM 3090 VF/600J1992In: Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing / [ed] Jack Dongarra, Ken Kennedy, Paul Messina, Danny C. Sorensen, Robert G. Voigt, Houston: SIAM Publications , 1992, p. 3-10Conference paper (Refereed)
  • 11.
    Dackland, Krister
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Blocked Algorithms and Software for Reduction of a Regular Matrix Pair to Generalized Schur Form1999In: ACM Transactions on Mathematical Software, Vol. 25, no 4, p. 425-454Article in journal (Refereed)
    Abstract [en]

    A two-stage blocked algorithm for reduction of a regular matrix pair (A, B) to upper Hessenberg-triangular form is presented. In stage 1 (A, B) is reduced to block upper Hessenberg-triangular form using mainly level 3 (matrix-matrix) operations that permit data reuse in the higher levels of a memory hierarchy. In the second stage all but one of the r subdiagonals of the block Hessenberg A-part are set to zero using Givens rotations. The algorithm proceeds in a sequence of supersweeps, each reducing m columns. The updates with respect to row and column rotations are organized to reference consecutive columns of A and B. To further improve the data locality, all rotations produced in a supersweep are stored to enable a left-looking reference pattern, i.e., all updates are delayed until they are required for the continuation of the supersweep. Moreover, we present a blocked variant of the single diagonal double-shift QZ method for computing the generalized Schur form of(A, B) in upper Hessenberg-triangular form. The blocking for improved data locality is done similarly, now by restructuring the reference pattern of the updates associated with the bulge chasing in the QZ iteration. Timing results show that our new blocked variants outperform the current LAPACK routines, including drivers for the generalized eigenvalue problem, by a factor 2-5 for sufficiently large problems.

  • 12.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Futorny, Vyacheslav
    University of Sao Paulo, Brazil .
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Klimenko, Lena
    Kiev Polytechnic Institute, Ukraine.
    Sergeichuk, Vladimir
    Institute of Mathematics, Kiev, Ukraine.
    Change of the congruence canonical form of 2-by-2 and 3-by-3 matrices under perturbations and bundles of matrices under congruence2015In: Linear Algebra and its Applications, ISSN 0024-3795, E-ISSN 1873-1856, Vol. 469, p. 305-334Article in journal (Refereed)
    Abstract [en]

    We construct the Hasse diagrams G2 and G3 for the closure ordering on the sets of congruence classes of 2 × 2 and 3 × 3 complex matrices. In other words, we construct two directed graphs whose vertices are 2 × 2 or, respectively, 3 × 3 canonical matrices under congruence, and there is a directed path from A to B if and only if A can be transformed by an arbitrarily small perturbation to a matrix that is congruent to B. A bundle of matrices under congruence is defined as a set of square matrices A for which the pencils A + λAT belong to the same bundle under strict equivalence. In support of this definition, we show that all matrices in a congruence bundle of 2 × 2 or 3 × 3 matrices have the same properties with respect to perturbations. We construct the Hasse diagrams G2 B and G3 B for the closure ordering on the sets of congruence bundles of 2 × 2 and, respectively, 3 × 3 matrices. We find the isometry groups of 2 × 2 and 3 × 3 congruence canonical matrices.

  • 13.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Canonical structure transitions of system pencils2015Report (Other academic)
    Abstract [en]

    We investigate the changes under small perturbations of the canonical structure information for a system pencil (A B C D) − s (E 0 0 0), det(E) ≠ 0, associated with a (generalized) linear time-invariant state-space system. The equivalence class of the pencil is taken with respect to feedback-injection equivalence transformation. The results allow to track possible changes under small perturbations of important linear system characteristics.

  • 14.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Canonical structure transitions of system pencils2017In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, E-ISSN 1095-7162, Vol. 38, no 4, p. 1249-1267Article in journal (Refereed)
    Abstract [en]

    We investigate the changes of the canonical structure information under small perturbations for a system pencil associated with a (generalized) linear time-invariant state-space system. The equivalence class of the pencil is taken with respect to feedback-injection equivalence transformations. The results allow us to track possible changes of important linear system characteristics under small perturbations.

  • 15.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Codimension computations of congruence orbits of matrices, symmetric and skew-symmetric matrix pencils using Matlab2013Report (Other academic)
    Abstract [en]

    Matlab functions to work with the canonical structures for congru-ence and *congruence of matrices, and for congruence of symmetricand skew-symmetric matrix pencils are presented. A user can providethe canonical structure objects or create (random) matrix examplesetups with a desired canonical information, and compute the codi-mensions of the corresponding orbits: if the structural information(the canonical form) of a matrix or a matrix pencil is known it isused for the codimension computations, otherwise they are computednumerically. Some auxiliary functions are provided too. All thesefunctions extend the Matrix Canonical Structure Toolbox.

  • 16.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Van Dooren, Paul
    Universite catholique de Louvain, Belgium.
    Geometry of spaces for matrix polynomial Fiedler linearizations2015Report (Other academic)
    Abstract [en]

    We study how small perturbations of matrix polynomials may change their elementary divisors and minimal indices by constructing the closure hierarchy graphs (stratifications) of orbits and bundles of matrix polynomial Fiedler linearizations. We show that the stratifica-tion graphs do not depend on the choice of Fiedler linearization which means that all the spaces of the matrix polynomial Fiedler lineariza-tions have the same geometry (topology). The results are illustrated by examples using the software tool StratiGraph.

  • 17.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå Univ, HPC2N, SE-90187 Umeå, Sweden.
    Kågstrom, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå Univ, HPC2N, SE-90187 Umeå, Sweden.
    Coupled Sylvester-type Matrix Equations and Block Diagonalization2015In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, E-ISSN 1095-7162, Vol. 36, no 2, p. 580-593Article in journal (Refereed)
    Abstract [en]

    We prove Roth-type theorems for systems of matrix equations including an arbitrary mix of Sylvester and $\star$-Sylvester equations, in which the transpose or conjugate transpose of the unknown matrices also appear. In full generality, we derive consistency conditions by proving that such a system has a solution if and only if the associated set of $2 \times 2$ block matrix representations of the equations are block diagonalizable by (linked) equivalence transformations. Various applications leading to several particular cases have already been investigated in the literature, some recently and some long ago. Solvability of these cases follow immediately from our general consistency theory. We also show how to apply our main result to systems of Stein-type matrix equations.

  • 18.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågstrom, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Orbit closure hierarchies of skew-symmetric matrix pencils2014In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, E-ISSN 1095-7162, Vol. 35, no 4, p. 1429-1443Article in journal (Refereed)
    Abstract [en]

    We study how small perturbations of a skew-symmetric matrix pencil may change its canonical form under congruence. This problem is also known as the stratification problem of skew-symmetric matrix pencil orbits and bundles. In other words, we investigate when the closure of the congruence orbit (or bundle) of a skew-symmetric matrix pencil contains the congruence orbit (or bundle) of another skew-symmetric matrix pencil. The developed theory relies on our main theorem stating that a skew-symmetric matrix pencil A - lambda B can be approximated by pencils strictly equivalent to a skew-symmetric matrix pencil C - lambda D if and only if A - lambda B can be approximated by pencils congruent to C - lambda D.

  • 19.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Orbit closure hierarchies of skew-symmetric matrix pencils2014Report (Other academic)
    Abstract [en]

    We study how small perturbations of a skew-symmetric matrix pencil may change its canonical form under congruence. This problem is also known as the stratification problem of skew-symmetric matrix pencil orbits and bundles. In other words, we investigate when the closure of the congruence orbit (or bundle) of a skew-symmetric matrix pencil contains the congruence orbit (or bundle) of another skew-symmetric matrix pencil. This theory relies on our main theorem stating that a skew-symmetric matrix pencil A-λB can be approximated by pencils strictly equivalent to a skew-symmetric matrix pencil C-λD if and only if A-λB can be approximated by pencils congruent to C-λD.

  • 20.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Sergeichuk, Vladimir V.
    Skew-symmetric matrix pencils: codimension counts and the solution of a pair of matrix equations2013In: Linear Algebra and its Applications, ISSN 0024-3795, E-ISSN 1873-1856, Vol. 438, no 8, p. 3375-3396Article in journal (Refereed)
    Abstract [en]

    The homogeneous system of matrix equations (X(T)A + AX, (XB)-B-T + BX) = (0, 0), where (A, B) is a pair of skew-symmetric matrices of the same size is considered: we establish the general solution and calculate the codimension of the orbit of (A, B) under congruence. These results will be useful in the development of the stratification theory for orbits of skew-symmetric matrix pencils.

  • 21.
    Dmytryshyn, Andrii
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Sergeichuk, Vladimir V.
    Ukrainian Acad Sci, Kiev, Ukraine.
    Symmetric matrix pencils: codimension counts and the solution of a pair of matrix equations2014In: The Electronic Journal of Linear Algebra, ISSN 1537-9582, E-ISSN 1081-3810, Vol. 27, p. 1-18Article in journal (Refereed)
    Abstract [en]

    The set of all solutions to the homogeneous system of matrix equations (X-T A + AX, X-T B + BX) = (0, 0), where (A, B) is a pair of symmetric matrices of the same size, is characterized. In addition, the codimension of the orbit of (A, B) under congruence is calculated. This paper is a natural continuation of the article [A. Dmytryshyn, B. Kagstrom, and V. V. Sergeichuk. Skew-symmetric matrix pencils: Codimension counts and the solution of a pair of matrix equations. Linear Algebra Appl., 438:3375-3396, 2013.], where the corresponding problems for skew-symmetric matrix pencils are solved. The new results will be useful in the development of the stratification theory for orbits of symmetric matrix pencils.

  • 22.
    Edelman, Alan
    et al.
    MIT, USA.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    A Geometric Approach to Perturbation Theory of Matrices and Matrix Pencils. Part I: Versal Deformations1997In: SIAM Journal on Matrix Analysis and Applications, Vol. 18, no 3, p. 653-692Article in journal (Refereed)
    Abstract [en]

    We derive versal deformations of the Kronecker canonical form by deriving the tangent space and orthogonal bases for the normal space to the orbits of strictly equivalent matrix pencils. These deformations reveal the local perturbation theory of matrix pencils related to the Kronecker canonical form. We also obtain a new singular value bound for the distance to the orbits of less generic pencils. The concepts, results, and their derivations are mainly expressed in the language of numerical linear algebra. We conclude with experiments and applications.

  • 23.
    Edelman, Alan
    et al.
    MIT, USA.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    A Geometric Approach to Perturbation Theory of Matrices and Matrix Pencils. Part II: A Stratification-Enhanced Staircase Algorithm1999In: SIAM Journal on Matrix Analysis and Applications, Vol. 20, no 3, p. 667-699Article in journal (Refereed)
    Abstract [en]

    Computing the Jordan form of a matrix or the Kronecker structure of a pencil is a well-known ill-posed problem. We propose that knowledge of the closure relations, i.e., the stratification, of the orbits and bundles of the various forms may be applied in the staircase algorithm. Here we discuss and complete the mathematical theory of these relationships and show how they may be applied to the staircase algorithm. This paper is a continuation of our Part I paper on versal deformations, but it may also be read independently.

  • 24.
    Edmundsson, Niklas
    et al.
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Mårtensson, Markus
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Nylén, Mats
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Sandgren, Åke
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Wadenstein, Mattias
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Design and Evaluation of a TOP100 Linux Super Cluster System2004In: Concurrency and Computation: Practice & Experiences, ISSN 1532-0634, Vol. 16, no 8, p. 735-750Article in journal (Refereed)
    Abstract [en]

    The High Performance Computing Center North (HPC2N) Super Cluster is a truly self-made high-performance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a high-bandwidth, low-latency SCI network. This contribution describes the hardware selected for the system, the work needed to build it, important software issues and an extensive performance analysis. The performance is evaluated using a number of state-of-the-art benchmarks and software, including STREAM, Pallas MPI, the Atlas DGEMM, High-Performance Linpack and NAS Parallel benchmarks. Using these benchmarks we first determine the raw memory bandwidth and network characteristics; the practical peak performance of a single CPU, a single dual-node and the complete 240-processor system; and investigate the parallel performance for non-optimized dusty-deck Fortran applications. In summary, this $500 000 system is extremely cost-effective and shows the performance one would expect of a large-scale supercomputing system with distributed memory architecture. According to the TOP500 list of June 2002, this cluster was the 94th fastest computer in the world. It is now fully operational and stable as the main computing facility at HPC2N. The system’s utilization figures exceed 90%, i.e. all 240 processors are on average utilized over 90% of the time, 24 hours a day, seven days a week.

  • 25.
    Eljammaly, Mahmoud
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm2017Report (Other academic)
    Abstract [en]

    The performance of a recently developed Hessenberg reduction algorithm greatly depends on the values chosen for its tunable parameters. The search space is huge combined with other complications makes the problem hard to solve effectively with generic methods and tools. We describe a modular auto-tuning framework in which the underlying optimization algorithm is easy to substitute. The framework exposes sub-problems of standard auto-tuning type for which existing generic methods can be reused. The outputs of concurrently executing sub-tuners are assembled by the framework into a solution to the original problem.

  • 26.
    Eljammaly, Mahmoud
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm2018In: ICPE '18 Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, ACM Digital Library, 2018, , p. 4p. 5-8Conference paper (Refereed)
  • 27.
    Eljammaly, Mahmoud
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Evaluation of the Tunability of a New NUMA-Aware Hessenberg Reduction Algorithm2016Report (Other academic)
  • 28.
    Eljammaly, Mahmoud
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment2018In: Parallel Processing and Applied Mathematics. PPAM 2017: Part 1 / [ed] Wyrzykowski R., Dongarra J., Deelman E., Karczewski K., Springer, 2018, p. 579-589Conference paper (Refereed)
    Abstract [en]

    The reduction of a general dense square matrix to Hessenberg form is a well known first step in many standard eigenvalue solvers. Although parallel algorithms exist, the Hessenberg reduction is one of the bottlenecks in AED, a main part in state-of-the-art software for the distributed multishift QR algorithm. We propose a new NUMA-aware algorithm that fits the context of the QR algorithm and evaluate the sensitivity of its algorithmic parameters. The proposed algorithm is faster than LAPACK for all problem sizes and faster than ScaLAPACK for the relatively small problem sizes typical for AED.

  • 29.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Gustavson, Fred
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Jonsson, Isak
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software2004In: SIAM Review, Vol. 46, no 1, p. 3-45Article in journal (Refereed)
    Abstract [en]

    Matrix computations are both fundamental and ubiquitous in computational science and its vast application areas. Along with the development of more advanced computer systems with complex memory hierarchies, there is a continuing demand for new algorithms and library software that efficiently utilize and adapt to new architecture features. This article reviews and details some of the recent advances made by applying the paradigm of recursion to dense matrix computations on today's memory-tiered computer systems. Recursion allows for efficient utilization of a memory hierarchy and generalizes existing fixed blocking by introducing automatic variable blocking that has the potential of matching every level of a deep memory hierarchy. Novel recursive blocked algorithms offer new ways to compute factorizations such as Cholesky and QR and to solve matrix equations. In fact, the whole gamut of existing dense linear algebra factorization is beginning to be reexamined in view of the recursive paradigm. Use of recursion has led to using new hybrid data structures and optimized superscalar kernels. The results we survey include new algorithms and library software implementations for level 3 kernels, matrix factorizations, and the solution of general systems of linear equations and several common matrix equations. The software implementations we survey are robust and show impressive performance on today's high performance computing systems.

  • 30.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Johansson, Pedher
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Orbit and Bundle Stratification for Controllability and Observability Matrix Pairs in StratiGraph2004In: Proceedings of the 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS), 2004, p. 1-9Conference paper (Refereed)
    Abstract [en]

    The canonical structures of controllability and observability pairs (A,B) and (A,C) associated with a state-space system are studied under small perturbations. We show how previous work for general matrix pencils can be applied to the stratification of orbits and bundles of matrix pairs. A stratification provides qualitative information about the closure relation between canonical structures.We also present how the new results are used in StratiGraph, which is a software tool for computing and visualizing closure hierarchies.

  • 31.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Pedher
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Orbit and bundle stratification of controllability and observability matrix pairs in StratiGraph2004In: Proceedings MTNS 2004Article in journal (Refereed)
  • 32.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Pedher
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kreßner, Daniel
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    A Web Computing Environment for the SLICOT Library2001In: The Third NICONET Workshop on Numerical Control Software, p. 53-61Article in journal (Refereed)
    Abstract [en]

    A prototype web computing environment for computations related to the design and analysis of control systems using the SLICOT software library is presented. The web interface can be accessed from a standard world wide web browser with no need for additional software installations on the local machine. The environment provides user-friendly access to SLICOT routines where run-time options are specified by mouse clicks on appropriate buttons. Input data can be entered directly into the web interface by the user or uploaded from a local computer in a standard text format or in Matlab binary format. Output data is presented in the web browser window and possible to download in a number of different formats, including Matlab binary. The environment is ideal for testing the SLICOT software before performing a software installation or for performing a limited number of computations. It is also highly recommended for education as it is easy to use, and basically self-explanatory, with the users' guide integrated in the user interface.

  • 33.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Pedher
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Bounds for the distance between nearby Jordan and Kronecker structures in a closure hierarchy2003In: Journal of Mathematical Science, ISSN 1072-3374, Vol. 114, no 6, p. 1765-1779Article in journal (Refereed)
    Abstract [en]

    Computing the fine-canonical-structure elements of matrices and matrix pencils are ill-posed problems. Therefore, besides knowing the canonical structure of a matrix or a matrix pencil, it is equally important to know what are the nearby canonical structures that explain the behavior under small perturbations. Qualitative strata information is provided by our StratiGraph tool. Here, we present lower and upper bounds for the distance between Jordan and Kronecker structures in a closure hierarchy of an orbit or bundle stratification. This quantitative information is of importance in applications, e.g., distance to more degenerate systems (uncontrollability). Our upper bounds are based on staircase regularizing perturbations. The lower bounds are of EckartYoung type and are derived from a matrix representation of the tangent space of the orbit of a matrix or a matrix pencil. Computational results illustrate the use of the bounds.

  • 34.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Pedher
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Computation and presentation of graphs displaying closure hierarchies of Jordan and Kronecker structures2001In: Numerical Linear Algebra with Applications, ISSN 1070-5325, E-ISSN 1099-1506, Vol. 8, no 6-7, p. 381-399Article in journal (Refereed)
    Abstract [en]

    StratiGraph, a Java-based tool for computation and presentation of closure hierarchies of Jordan and Kronecker structures is presented. The tool is based on recent theoretical results on stratifications of orbits and bundles of matrices and matrix pencils. A stratification reveals the complete hierarchy of nearby structures. information critical for explaining the qualitative behaviour of linear systems under perturbations. StratiGraph facilitates the application of these theories and visualizes the resulting hierarchy as a graph. Nodes in the graph represent orbits or bundles of matrices or matrix pencils. Edges represent covering relations in the closure hierarchy. Given a Jordan or Kronecker structure, a user can obtain the complete information of nearby structures simply by mouse clicks on nodes of interest. This contribution gives an overview of the StratiGraph tool, presents its main functionalities and other features, and illustrates its use by sample applications.

    Copyright (C) 2001 John Wiley & Sons, Ltd.

  • 35.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Johansson, Stefan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Stratification of controllability and observability pairs: theory and use in applications2009In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, E-ISSN 1095-7162, Vol. 31, no 2, p. 203-226Article in journal (Refereed)
    Abstract [en]

    Cover relations for orbits and bundles of controllability and observability pairs associated with linear time-invariant systems are derived. The cover relations are combinatorial rules acting on integer sequences, each representing a subset of the Jordan and singular Kronecker structures of the corresponding system pencil. By representing these integer sequences as coin piles, the derived stratification rules are expressed as minimal coin moves between and within these piles, which satisfy and preserve certain monotonicity properties. The stratification theory is illustrated with two examples from systems and control applications, a mechanical system consisting of a thin uniform platform supported at both ends by springs, and a linearized Boeing 747 model. For both examples, nearby uncontrollable systems are identified as subsets of the complete closure hierarchy for the associated system pencils.

  • 36.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    The Set of 2-by-3 Matrix Pencils - Kronecker Structures and Their Transitions Under Perturbations1996In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, Vol. 17, no 1, p. 1-34Article in journal (Refereed)
    Abstract [en]

    The set (or family) of 2-by-3 matrix pencils A-lambda B comprises 18 structurally different Kronecker structures (canonical forms). The algebraic and geometric characteristics of the generic and the 17 nongeneric cases are examined in full detail. The complete closure hierarchy of the orbits of all different Kronecker structures is derived and presented in a closure graph that shows how the structures relate to each other in the la-dimensional space spanned by the set of 2-by-3 pencils. Necessary conditions on perturbations for transiting from the orbit of one Kronecker structure to another in the closure hierarchy are presented in a labeled closure graph. The node and are labels shows geometric characteristics of an orbit's Kronecker structure and the change of geometric characteristics when transiting to an adjacent node, respectively. Computable normwise bounds for the smallest perturbations (delta A, delta B) of a generic 2-by-3 pencil A lambda B such that (A+delta A)-lambda(B+delta B) has a specific nongeneric Kronecker structure are presented. First, explicit expressions for the perturbations that transfer A-lambda B to a specified nongeneric form are derived. In this context tractable and intractable perturbations are defined. Second, a modified GUPTRI that computes a specified Kronecker structure of a generic pencil is used. Perturbations devised to impose a certain nongeneric structure are computed in a way that guarantees one will find a Kronecker canonical form (KCF) on the closure of the orbit of the intended KCF. Both approaches are illustrated by computational experiments. Moreover, a study of the behaviour of the nongeneric structures under random perturbations in finite precision arithmetic (using the GUPTRI software) show for which sizes of perturbations the structures are invariant and also that structure transitions occur in accordance with the closure hierarchy. Finally, some of the results are extended to the general m-by-(m+1) case.

  • 37.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Jonsson, Isak
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Combining Explicit and Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations on Distributed Memory Platforms2004In: Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference, Springer , 2004, p. 742-750Conference paper (Refereed)
  • 38.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Jonsson, Isak
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N).
    RECSY and SCASY library software: recursive blocked and parallel algorithms for Sylvester-type matrix equations with some applications2009In: Parallel scientific computing and optimization: advances and applications / [ed] Raimondas Čiegis, David Henty, Bo Kågström, Julius Žilinskas, Springer-Verlag New York, 2009, p. 3-24Chapter in book (Other academic)
  • 39.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Jonsson, Isak
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Recursive Blocked Algorithms for Solving Periodic Triangular Sylvester-Type Matrix Equations2007In: Applied Parallel Computing - State of the Art in Scientific Computing: 8th International Workshop, PARA 2006, Springer , 2007, p. 531-539Conference paper (Refereed)
    Abstract [en]

    Recently, recursive blocked algorithms for solving triangular one-sided and two-sided Sylvester-type equations were introduced by Jonsson and Kagstrom. This elegant yet simple technique enables an automatic variable blocking that has the potential of matching the memory hierarchies of today's HPC systems. The main parts of the computations are performed as level 3 general matrix multiply and add (GEMM) operations. We extend and apply the recursive blocking technique to solving periodic Sylvester-type matrix equations. Successive recursive splittings are performed on 3-dimensional arrays, where the third dimension represents the periodicity of a matrix equation.

  • 40.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kagstrom, Bo
    Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N). Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kressner, Daniel
    Shao, Meiyue
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Algorithm 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation2015In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 41, no 4, article id 29Article in journal (Refereed)
    Abstract [en]

    Library software implementing a parallel small-bulge multishift QR algorithm with Aggressive Early Deflation (AED) targeting distributed memory high-performance computing systems is presented. Starting from recent developments of the parallel multishift QR algorithm [Granat et al., SIAM J. Sci. Comput. 32(4), 2010], we describe a number of algorithmic and implementation improvements. These include communication avoiding algorithms via data redistribution and a refined strategy for balancing between multishift QR sweeps and AED. Guidelines concerning several important tunable algorithmic parameters are provided. As a result of these improvements, a computational bottleneck within AED has been removed in the parallel multishift QR algorithm. A performance model is established to explain the scalability behavior of the new parallel multishift QR algorithm. Numerous computational experiments confirm that our new implementation significantly outperforms previous parallel implementations of the QR algorithm.

  • 41.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Algorithm 904: the SCASY library - parallel solvers for Sylvester-type matrix equations with applications in condition estimation, part II2010In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 37, no 3, article id 33Article in journal (Refereed)
    Abstract [en]

    We continue our presentation of parallel ScaLAPACK-style algorithms for solving Sylvester-type matrix equations. In Part II, we present SCASY (SCAlable SYlvester solvers), a state-of-the-art HPC software library for solving 44 sign and transpose variants of eight common standard and generalized Sylvester-type matrix equations.

  • 42.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Direct Eigenvalue Reordering in a Product of Matrices in Periodic Schur Form2006In: SIAM Journal on Matrix Analysis and Applications, ISSN 0895-4798, E-ISSN 1095-7162, Vol. 28, no 1, p. 285-300Article in journal (Refereed)
    Abstract [en]

    A direct method for eigenvalue reordering in a product of a K-periodic matrix sequence in periodic or extended periodic real Schur form is presented and analyzed. Each reordering of two adjacent sequences of diagonal blocks is performed tentatively to guarantee backward stability and involves solving a K-periodic Sylvester equation (PSE) and constructing a K-periodic sequence of orthogonal transformation matrices. An error analysis of the direct reordering method is presented, and results from computational experiments confirm the stability and accuracy of the method for well-conditioned as well as ill-conditioned problems. These include matrix sequences with fixed and time-varying dimensions, and sequences of small and large periodicity.

  • 43.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Evaluating Parallel Algorithms for Solving Sylvester-Type Matrix Equations: Direct Transformation-Based versus Iterative Matrix-Sign-Function-Based Methods.2006In: Applied Parallel Computing - State of the Art in Scientific Computing: 7th International Workshop, PARA 2004, Springer , 2006, p. 717-729Conference paper (Refereed)
  • 44.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Parallel Algorithms and Condition Estimators for Standard and Generalized Triangular Sylvester-Type Matrix Equations2006In: Applied Parallel Computing - State of the Art in Scientific Computing: 7th International Workshop, PARA 2004, Springer , 2006, p. 127-136Conference paper (Refereed)
    Abstract [en]

    We discuss parallel algorithms for solving eight common standard and generalized triangular Sylvester-type matrix equation. Our parallel algorithms are based on explicit blocking, 2D block-cyclic data distribution of the matrices and wavefront-like traversal of the right hand side matrices while solving small-sized matrix equations at different nodes and updating the rest of the right hand side using level 3 operations. We apply the triangular solvers in condition estimation, developing parallel sep(-1)-estimators. Some experimental results are presented.

  • 45.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parallel Solvers for Sylvester-type Matrix Equations with Applications in Condition Estimation, Part I: Theory and Algorithms2010In: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 37, no 3, p. 32:1-32:32Article in journal (Refereed)
    Abstract [en]

    Parallel ScaLAPACK-style algorithms for solving eight common standard and generalized Sylvester-type matrix equations and various sign and transposed variants are presented. All algorithms are blocked variants based on the Bartels--Stewart method and involve four major steps: reduction to triangular form, updating the right-hand side with respect to the reduction, computing the solution to the reduced triangular problem, and transforming the solution back to the original coordinate system. Novel parallel algorithms for solving reduced triangular matrix equations based on wavefront-like traversal of the right-hand side matrices are presented together with a generic scalability analysis. These algorithms are used in condition estimation and new robust parallel sep − 1-estimators are developed. Experimental results from three parallel platforms, including results from a mixed OpenMP/MPI platform, are presented and analyzed using several performance and accuracy metrics. The analysis includes results regarding general and triangular parallel solvers as well as parallel condition estimators.

  • 46. Granat, Robert
    et al.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    SCASY users' guide: release 1.02010Report (Refereed)
    Abstract [en]

    We present the functionality, the contents, and the proper usage of the latest release of the SCASY software library

  • 47. Granat, Robert
    et al.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kressner, Daniel
    A novel parallel QR algorithm for hybrid distributed memory HPC systems2010In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 32, no 4, p. 2345-2378Article in journal (Refereed)
    Abstract [en]

    A novel variant of the parallel QR algorithm for solving dense nonsymmetric eigenvalue problems on hybrid distributed high performance computing systems is presented. For this purpose, we introduce the concept of multiwindow bulge chain chasing and parallelize aggressive early deflation. The multiwindow approach ensures that most computations when chasing chains of bulges are performed in level 3 BLAS operations, while the aim of aggressive early deflation is to speed up the convergence of the QR algorithm. Mixed MPI-OpenMP coding techniques are utilized for porting the codes to distributed memory platforms with multithreaded nodes, such as multicore processors. Numerous numerical experiments confirm the superior performance of our parallel QR algorithm in comparison with the existing ScaLAPACK code, leading to an implementation that is one to two orders of magnitude faster for sufficiently large problems, including a number of examples from applications.

  • 48.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kressner, Daniel
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    A Parallel Schur Method for Solving Continuous-Time Algebraic Riccati Equations2008In: Conference Name: 2008 IEEE Conference on Computer-Aided Control System Design (CACSD) Conference Location: San Antonio, TX, 2008, p. 51-56Conference paper (Refereed)
    Abstract [en]

    Numerical algorithms for solving the continuous-time algebraic Riccati matrix equation on a distributed memory parallel computer are considered. In particular, it is shown that the Schur method, based on computing the stable invariant subspace of a Hamiltonian matrix, can be parallelized in an efficient and scalable way. Our implementation employs the state-of-the-art library ScaLAPACK as well as recently developed parallel methods for reordering the eigenvalues in a real Schur form. Some experimental results are presented, confirming the scalability of our implementation and comparing it with an existing implementation of the matrix sign iteration from the PLiCOC library.

  • 49.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kressner, Daniel
    ETH Zürich.
    Computing Periodic Deflating Subspaces Associated with a Specified Set of Eigenvalues2007In: BIT Numerical Mathematics, ISSN 0006-3835, E-ISSN 1572-9125, Vol. 47, no 4, p. 763-791Article in journal (Refereed)
    Abstract [en]

    We present a direct method for reordering eigenvalues in the generalized periodic real Schur form of a regular K-cyclic matrix pair sequence (A (k) ,E (k) ). Following and generalizing existing approaches, reordering consists of consecutively computing the solution to an associated Sylvester-like equation and constructing K pairs of orthogonal matrices. These pairs define an orthogonal K-cyclic equivalence transformation that swaps adjacent diagonal blocks in the Schur form. An error analysis of this swapping procedure is presented, which extends existing results for reordering eigenvalues in the generalized real Schur form of a regular pair (A,E). Our direct reordering method is used to compute periodic deflating subspace pairs corresponding to a specified set of eigenvalues. This computational task arises in various applications related to discrete-time periodic descriptor systems. Computational experiments confirm the stability and reliability of the presented eigenvalue reordering method.

  • 50.
    Granat, Robert
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North). Umeå University, Faculty of Science and Technology, HPC2N (High Performance Computing Centre North).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kressner, Daniel
    ETH, Zürich.
    MATLAB Tools for Solving Periodic Eigenvalue Problems2007In: Proceedings of the third IFAC Workshop on Periodic Control Systems (PSYCO’07), International Federation of Automatic Control , 2007, p. 6 pages-Conference paper (Refereed)
12 1 - 50 of 90
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf