umu.sePublikationer
Ändra sökning
Avgränsa sökresultatet
12 51 - 91 av 91
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51.
    Granat, Robert
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr). Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap.
    Kressner, Daniel
    ETH, Zürich.
    MATLAB Tools for Solving Periodic Eigenvalue Problems2007Ingår i: Proceedings of the third IFAC Workshop on Periodic Control Systems (PSYCO’07), International Federation of Automatic Control , 2007, s. 6 pages-Konferensbidrag (Refereegranskat)
  • 52.
    Granat, Robert
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kressner, Daniel
    ETH, Zürich.
    Parallel Eigenvalue Reordering in Real Schur Forms2009Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 21, nr 9, s. 1225-1250Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A parallel algorithm for reordering the eigenvalues in the real Schur form of a matrix is presented and discussed. Our novel approach adopts computational windows and delays multiple outside-window updates until each window has been completely reordered locally. By using multiple concurrent windows the parallel algorithm has a high level of concurrency, and most work is level 3 BLAS operations. The presented algorithm is also extended to the generalized real Schur form. Experimental results for ScaLAPACK-style Fortran 77 implementations on a Linux cluster confirm the efficiency and scalability of our algorithms in terms of more than 16 times of parallel speedup using 64 processors for large-scale problems. Even on a single processor our implementation is demonstrated to perform significantly better compared with the state-of-the-art serial implementation.

  • 53.
    Granat, Robert
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kressner, Daniel
    ETH, Zürich.
    Reordering the Eigenvalues of a Periodic Matrix Pair with Applications in Control2006Ingår i: 2006 IEEE Conferences on Computer Aided Control System Design (CACSD), 2006, s. 25-30Konferensbidrag (Refereegranskat)
  • 54.
    Granat, Robert
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Poromaa, Peter
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Parallel ScaLAPACK-Style Algorithms for Solving Continuous-Time Sylvester Matrix Equations2003Ingår i: Euro-Par 2003 Parallel Processing: Conference Name: 9th International Euro-Par Conference Conference Location: Klagenfurt, Austria, Springer , 2003, s. 800-809Konferensbidrag (Refereegranskat)
    Abstract [en]

    An implementation of a parallel ScaLAPACK-style solver for the general Sylvester equation, op(A)X - Xop(B) = C, where op(A) denotes A or its transpose AT, is presented. The parallel algorithm is based on explicit blocking of the Bartels-Stewart method. An initial transformation of the coefficient matrices A and B to Schur form leads to a reduced triangular matrix equation. We use different matrix traversing strategies to handle the transposes in the problem to solve, leading to different new parallel wave-front algorithms. We also present a strategy to handle the problem when 2 x 2 diagonal blocks of the matrices in Schur form, corresponding to complex conjugate pairs of eigenvalues, are split between several blocks in the block partitioned matrices. Finally, the solution of the reduced matrix equation is transformed back to the originally coordinate system. The implementation acts in a ScaLAPACK environment using 2-dimensional block cyclic mapping of the matrices onto a rectangular grid of processes. Real performance results are presented which verify that our parallel algorithms are reliable and scalable.

  • 55.
    Gusev, Sergei
    et al.
    Department of Mathematics and Mechanics, St. Petersburg State University, St. Petersburg, Russia.
    Johansson, Stefan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Shiriaev, Anton
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Varga, Andras
    Institute of Robotics and Mechatronics, German Aerospace Center, DLR, Germany.
    A numerical evaluation of solvers for the periodic riccati differential equation2010Ingår i: BIT Numerical Mathematics, ISSN 0006-3835, E-ISSN 1572-9125, Vol. 50, nr 2, s. 301-329Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Efficient and accurate structure exploiting numerical methods for solvingthe periodic Riccati differential equation (PRDE) are addressed. Such methods areessential, for example, to design periodic feedback controllers for periodic controlsystems. Three recently proposed methods for solving the PRDE are presented andevaluated on challenging periodic linear artificial systems with known solutions and applied to the stabilization of periodic motions of mechanical systems. The first twomethods are of the type multiple shooting and rely on computing the stable invariantsubspace of an associated Hamiltonian system. The stable subspace is determinedusing either algorithms for computing an ordered periodic real Schur form of a cyclicmatrix sequence, or a recently proposed method which implicitly constructs a stabledeflating subspace from an associated lifted pencil. The third method reformulatesthe PRDE as a convex optimization problem where the stabilizing solution is approximatedby its truncated Fourier series. As known, this reformulation leads to a semidefiniteprogramming problem with linear matrix inequality constraints admitting aneffective numerical realization. The numerical evaluation of the PRDE methods, withfocus on the number of states (n) and the length of the period (T ) of the periodicsystems considered, includes both quantitative and qualitative results.

  • 56.
    Gusev, Sergei
    et al.
    Department of Mathematics and Mechanics, St. Petersburg State University, St. Petersburg, Russia.
    Johansson, Stefan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Shiriaev, Anton
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Varga, Andras
    Institute of Robotics and Mechatronics, German Aerospace Center, DLR, Oberpfaffenhofen, Germany.
    A Numerical Evaluation of Solvers for the Periodic Riccati Differential Equation2009Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Efficient and robust numerical methods for solving the periodic Riccati differential equation (PRDE) are addressed. Such methods are essential, for example, when deriving feedback controllers for orbital stabilization of underactuated mechanical systems. Two recently proposed methods for solving the PRDE are presented and evaluated on artificial systems and on two stabilization problems originating from mechanical systems with unstable dynamics. The first method is of the type multiple shooting and relies on computing the stable invariant subspace of an associated Hamiltonian system. The stable subspace is determined using algorithms for computing a reordered periodic real Schur form of a cyclic matrix sequence, and a recently proposed method which implicitly constructs a stable subspace from an associated lifted pencil. The second method reformulates the PRDE as a maximization problem where the stabilizing solution is approximated with finite dimensional trigonometric base functions. By doing this reformulation the problem turns into a semidefinite programming problem with linear matrix inequality constraints.

  • 57.
    Gustavson, Fred
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Distributed SBP Cholesky factorization algorithms with near-optimal scheduling2009Ingår i: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 36, nr 2, s. 11:1-11:25Artikel i tidskrift (Refereegranskat)
  • 58.
    Gustavson, Fred
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion2012Ingår i: ACM Transactions on Mathematical Software, ISSN 0098-3500, E-ISSN 1557-7295, Vol. 38, nr 3, s. 17:1-17:32Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Techniques and algorithms for efficient in-place conversion to and from standard and blocked matrix storage formats are described. Such functionality is required by numerical libraries that use different data layouts internally. Parallel algorithms and a software package for in-place matrix storage format conversion based on in-place matrix transposition are presented and evaluated. A new algorithm for in-place transposition which efficiently determines the structure of the transposition permutation a priori is one of the key ingredients. It enables effective load balancing in a parallel environment.

  • 59.
    Gustavson, Fred
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage2007Ingår i: Applied Parallel Computing - State of the Art in Scientific Computing: 8th International Workshop, PARA 2006, Springer , 2007, s. 550-559Konferensbidrag (Refereegranskat)
  • 60.
    Johansson, Stefan
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Shiriaev, Anton
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Tillämpad fysik och elektronik.
    Varga, Andras
    Institute of Robotics and Mechatronics, German Aerospace Center, DLR, Oberpfaffenhofen, Germany.
    Comparing One-shot and Multi-shot Methods for Solving Periodic Riccati Differential Equations2007Ingår i: Proceedings of the third IFAC Workshop on Periodic Control Systems (PSYCO’07), International Federation of Automatic Control , 2007, s. 1-6Konferensbidrag (Refereegranskat)
    Abstract [en]

    One-shot methods and recently proposed multi-shot methods for computing stabilizing solutions of continuous-time periodic Riccati differential equations are examined and evaluated on two test problems: (i) a stabilization problem for an artificially constructed time-varying linear system for which the exact solution is known; (ii) a nonlinear stabilization problem for a devil stick juggling model along a periodic trajectory. The numerical comparisons are performed using both general purpose and symplectic integration methods for solving the associated Hamiltonian differential systems. In the multi-shot method a stable subspace is determined using recent algorithms for computing a reordered periodic real Schur form. The results show the increased accuracy achievable by combining multi-shot methods with structure preserving (symplectic) integration techniques.

  • 61.
    Johansson, Stefan
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Van Dooren, Paul
    Department of Mathematical Engineering, Université catholique de Louvain.
    Stratification of full rank polynomial matrices2013Ingår i: Linear Algebra and its Applications, ISSN 0024-3795, E-ISSN 1873-1856, Vol. 439, nr 4, s. 1062-1090Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We show that perturbations of polynomial matrices of full normal-rank can be analyzed viathe study of perturbations of companion form linearizations of such polynomial matrices.It is proved that a full normal-rank polynomial matrix has the same structural elements asits right (or left) linearization. Furthermore, the linearized pencil has a special structurethat can be taken into account when studying its stratification. This yields constraintson the set of achievable eigenstructures. We explicitly show which these constraints are.These results allow us to derive necessary and sufficient conditions for cover relationsbetween two orbits or bundles of the linearization of full normal-rank polynomial matrices.The stratification rules are applied to and illustrated on two artificial polynomial matricesand a half-car passive suspension system with four degrees of freedom.

  • 62.
    Jonsson, Isak
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Recursive Blocked Algorithms for Solving Triangular Systems - Part I: One-Sided and Coupled Sylvester-Type Matrix Equations2002Ingår i: ACM Transactions on Mathematical Software, Vol. 28, nr 4, s. 392-415Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Triangular matrix equations appear naturally in estimating the condition numbers of matrix equations and different eigenspace computations, including block-diagonalization of matrices and matrix pairs and computation of functions of matrices. To solve a triangular matrix equation is also a major step in the classical Bartels-Stewart method for solving the standard continuous-time Sylvester equation (AX-XB=C). We present novel recursive blocked algorithms for solving one-sided triangular matrix equations, including the continuous-time Sylvester and Lyapunov equations, and a generalized coupled Sylvester equation. The main parts of the computations are performed as level-3 general matrix multiply and add (GEMM) operations. In contrast to explicit standard blocking techniques, our recursive approach leads to an automatic variable blocking that has the potential of matching the memory hierarchies of today's HPC systems. Different implementation issues are discussed, including when to terminate the recursion, the design of new optimized superscalar kernels for solving leaf-node triangular matrix equations efficiently, and how parallelism is utilized in our implementations. Uniprocessor and SMP parallel performance results of our recursive blocked algorithms and corresponding routines in the state-of-the-art libraries LAPACK and SLICOT are presented. The performance improvements of our recursive algorithms are remarkable, including 10-fold speedups compared to standard algorithms.

  • 63.
    Jonsson, Isak
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Recursive Blocked Algorithms for Solving Triangular Systems - Part II: Two-Sided and Generalized Sylvester and Lyapunov Matrix Equations2002Ingår i: ACM Transactions on Mathematical Software, Vol. 28, nr 4, s. 416-435Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We continue our study of high-performance algorithms for solving triangular matrix equations. They appear naturally in different condition estimation problems for matrix equations and various eigenspace computations, and as reduced systems in standard algorithms. Building on our successful recursive approach applied to one-sided matrix equations (Part I), we now present novel recursive blocked algorithms for two-sided matrix equations, which include matrix product terms such as AX B-T. Examples are the discrete-time standard and generalized Sylvester and Lyapunov equations. The means for achieving high performance is the recursive variable blocking, which has the potential of matching the memory hierarchies of today's high-performance computing systems, and level-3 computations which mainly are performed as GEMM operations. Different implementation issues are discussed, including the design of efficient new algorithms for two-sided matrix products. We present uniprocessor and SMP parallel performance results of recursive blocked algorithms and routines in the state-of-the-art SLICOT library. Although our recursive algorithms with optimized kernels for the two-sided matrix equations perform more operations, the performance improvements are remarkable, including 10-fold speedups or more, compared to standard algorithms.

  • 64.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kagstrom, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Efficient Reduction from Block Hessenberg Form to Hessenberg Form Using Shared Memory2012Ingår i: Applied parallel and scientific computing: Part II, 2012, s. 258-268Konferensbidrag (Refereegranskat)
    Abstract [en]

    A new cache-efficient algorithm for reduction from block Hessenberg form to Hessenberg form is presented and evaluated. The algorithm targets parallel computers with shared memory. One level of look-ahead in combination with a dynamic load-balancing scheme significantly reduces the idle time and allows the use of coarse-grained tasks. The coarse tasks lead to high-performance computations on each processor/core. Speedups close to 13 over the sequential unblocked algorithm have been observed on a dual quad-core machine using one thread per core.

  • 65.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Improving Perfect Parallelism2014Ingår i: Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski, Springer Berlin/Heidelberg, 2014, Vol. 8384, s. 76-85Konferensbidrag (Refereegranskat)
    Abstract [en]

    We reconsider the familiar problem of executing a perfectly parallel workload consisting of N independent tasks on a parallel computer with P << N processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix-vector multiplications, and sorting integer keys.

  • 66.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    A framework for dynamic node-scheduling of two-sided blocked matrix computations2009Ingår i: Applied Parallel Computing - State of the Art in Parallel and Scientific Computing: 9th International Workshop, PARA 2008, 2009Konferensbidrag (Refereegranskat)
  • 67.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Efficient reduction from block Hessenberg form to Hessenberg form using shared memory2010Ingår i: PARA 2010: State of the Art in Scientific and Parallel Computing, Reykjavik, June 6-9, 2010, 2010Konferensbidrag (Refereegranskat)
  • 68.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures2011Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 37, nr 12, s. 771-782Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2. (C) 2011 Elsevier B.V. All rights reserved.

  • 69.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Wadbro, Eddie
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms2014Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, nr 7, s. 271-288Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The bulge-chasing kernel in the small-bulge multi-shift QR algorithm for the non-symmetric dense eigenvalue problem becomes a sequential bottleneck when the QR algorithm is run in parallel on a multicore platform with shared memory. The duration of each kernel invocation is short, but the critical path of the QR algorithm contains a long sequence of calls to the bulge-chasing kernel. We study the problem of parallelizing the bulge-chasing kernel itself across a handful of processor cores in order to reduce the execution time of the critical path. We propose and evaluate a sequence of four algorithms with varying degrees of complexity and verify that a pipelined algorithm with a slowly shifting block column distribution of the Hessenberg matrix is superior. The load-balancing problem is non-trivial and computational experiments show that the load-balancing scheme has a large impact on the overall performance. We propose two heuristics for the load-balancing problem and also an effective optimization method based on local search. Numerical experiments show that speed-ups are obtained for problems as small as 40-by-40 on two different multicore architectures.

  • 70.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Approximate incomplete cyclic reduction for systems which are tridiagonal and strictly diagonally dominant by rows2013Ingår i: Applied Parallel and Scientific Computing: 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected Papers / [ed] Pekka Manninen and Per Öster, Springer Berlin/Heidelberg, 2013, s. 250-264Konferensbidrag (Refereegranskat)
    Abstract [en]

    Systems which are narrow banded and strictly diagonally dominant by rows can be solved in parallel using a variety of methods including incomplete block cyclic reduction. We show how to accelerate the algorithm by approximating the very first step. We derive tight estimates for the forward error and explain why our procedure is suitable for linear systems obtained by discretizing some common parabolic PDEs. An improved ScaLAPACK style algorithm is presented together with strong scalability results.

  • 71.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems2012Ingår i: PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I / [ed] Wyrzykowski, R., Springer, 2012, s. 80-91Konferensbidrag (Refereegranskat)
    Abstract [en]

    The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller’s analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.

  • 72.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Parallel solution of narrow banded diagonally dominant linear systems2012Ingår i: Applied Parallel and Scientific Computing, Pt II / [ed] Kristján Jónasson, Springer Berlin/Heidelberg, 2012, Vol. 7134, s. 280-290Konferensbidrag (Refereegranskat)
    Abstract [en]

    ScaLAPACK contains a pair of routines for solving systems which are narrow banded and diagonally dominant by rows. Mathematically, the algorithm is block cyclic reduction. The ScaLAPACK implementation can be improved using incomplete, rather than complete block cyclic reduction. If the matrix is strictly dominant by rows, then the truncation error can be bounded directly in terms of the dominance factor and the size of the partitions. Our analysis includes new results applicable in our ongoing work of developing an efficient parallel solver.

  • 73.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    A Perturbation Analysis of the Generalized Sylvester Equation (AR-LB,DR-LE)=(C,F)1994Ingår i: SIAM Journal on Matrix Analysis and Applications, Vol. 15, nr 4, s. 1045-1060Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Perturbation and error bounds for the generalized Sylvester equation (AR-LB, DR-LE) = (C, F) are presented. An explicit expression for the normwise relative backward error associated with an approximate solution of the generalized Sylvester equation is derived and conditions when it can be much greater than the relative residual are given. This analysis is applicable to any method that solves the generalized Sylvester equation. A condition number that reflects the structure of the problem and a normwise forward error bound based on Dif-1[(A, D), (B, E)] and the residual are derived. The structure-preserving condition number can be arbitrarily smaller than a Dif-1-based condition number. The normwise error bound can be evaluated robustly and at moderate cost by using a reliable Dif-1 estimator. A componentwise LAPACK-style forward error bound that can be stronger than the normwise error bound is also presented. A componentwise approximate error bound that can be evaluated to a much lower cost is also proposed. Finally, some computational experiments that validate and evaluate the perturbation and error bounds are presented.

  • 74.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Management of Deep Memory Hierarchies - Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Computations2006Ingår i: Applied Parallel Computing - State of the Art in Scientific Computing: 7th International Workshop, PARA 2004, Springer , 2006, s. 21-32Konferensbidrag (Refereegranskat)
  • 75.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Dongarra, JackUniversity of Tennessee, Knoxville, TN, USA.Elmroth, ErikUmeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).Wasniewsky, JerzyTechnical University of Denmark, Lyngby.
    Applied Parallel Computing: large scale scientific and industrial problems: 4th International Workshop, PARA'98, Umeå, Sweden, June 14-17, 1998 : proceedings1998Proceedings (redaktörskap) (Refereegranskat)
    Abstract [en]

    This book constitutes the carefully refereed proceedings of the 4th International Workshop on Applied Parallel Computing, PARA'98, held in Umea, Sweden, in June 1998.The 75 revised papers presented were carefully reviewed and selected for inclusion in the book. The papers address a variety of topics in large scale scientific and industrial-strength computing, in particular high-performance computing and networking; tools, languages, and environments for parallel processing; scientific visualization and virtual reality; and future directions in high-performance computing and communication.

  • 76.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Elmroth, ErikUmeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).Dongarra, JackUniversity of Tennessee, Knoxville, TN, USA.Wasniewski, JerzyTechnical University of Denmark, Lyngby.
    Applied parallel computing. State of the art in scientific computing: 8th International Workshop, PARA 2006; Umeå, Sweden, June 2006, Revised Selected Papers2007Proceedings (redaktörskap) (Refereegranskat)
    Abstract [en]

    The Eighth International Workshop on Applied Parallel Computing (PARA 2006) was held in Umeå, Sweden, June 18–21, 2006. The workshop was organized by the High Performance Computing Center North (HPC2N) and the Department of Computing Science at Umeå University. The general theme for PARA 2006 was “State of the Art in Scientific and Parallel Computing.” Topics covered at PARA 2006 included basic algorithms and software for scientific, parallel and grid computing, tools and environments for developing high-performance computing applications, as well as a broad spectrum of applications from science and engineering. The workshop included 7 plenary keynote presentations, 15 invited minisymposia organized in 30 sessions, and 16 sessions of contributed talks. The minisymposia and the contributed talks were held in five to six parallel sessions. The main workshop program was preceded by two half-day tutorials. In total, 205 presentations were held at PARA 2006, by speakers representing 28 countries. Extended abstracts for all presentations were made available at the PARA 2006 Web site (www.hpc2n.umu.se/para06). The reviewing process was performed in two stages for evaluation of originality, appropriateness, and significance. In the first stage, extended abstracts were reviewed for selection of contributions to be presented at the workshop. In the second stage the full papers submitted after the workshop were reviewed. In total, 120 papers were selected for publication in this peer-reviewed post-conference proceedings. A number of people contributed in different regards to the organization and the accomplishment of PARA 2006. First of all the Local Organization Committee did a greatly appreciated and enthusiastic job. We also acknowledge the following people for the assistance and support during the workshop days: Yvonne Löwstedt and Anne-Lie Persson; Niklas Edmundsson, Roger Oscarsson, and Mattias Wadenstein. A special thanks goes to the PARA 2006 secretary, Lena Hellman, to Anders Backman and Björn Torkelsson for designing and managing the PARA 2006Web site including the electronic paper submission system, powered by Commence, and to Mats Nylén and Mikael Rännar for their professional assistance in compiling and editing the PARA 2006 program, the booklet of extended abstracts, and the final proceedings. PARA 2006 would not have been possible without the personal involvement of all these fine people. We also greatly acknowledge all minisymposia organizers, the review coordinators and all the referees for their evaluations in the second review stage, which included several rounds and resulted in these professionally peer-reviewed post-workshop proceedings. Finally, we would also like to thank the sponsoring institutions for their generous financial support. VI Preface Since 1996 the international PARA conferences have become biennial and are organized by one of the Nordic countries. The three first workshops including PARA 1996 and the last PARA 2004 were held in Lyngby, Denmark. The other three, besides this one, were held in Umeå, Sweden (PARA 1998), in Bergen, Norway (PARA 2000), and in Espoo, Finland (PARA 2002). The PARA 2008 workshop will take place in Trondheim, Norway, May 13–16, 2008.

  • 77.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Johansson, Stefan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Johansson, Pedher
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    StratiGraph Tool: Matrix Stratifications in Control Applications2012Ingår i: Control and Optimization with Differential-Algebraic Constraints / [ed] Lorenz T. Biegler, Stephen L. Champbell, Volker Mehrmann, Philadelphia: Society for Industrial and Applied Mathematics, 2012, s. 79-103Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    In this contribution, the software tool StratiGraph for computing and visualizing closurehierarchy graphs associated with different orbit and bundle stratifications is presented. Inaddition, we review the underlying theory and illustrate how StratiGraph can be used toanalyze descriptor system models via their associated system pencils. The stratificationtheory provides information for a deeper understanding of how the dynamics of a controlsystem and its system characteristics behave under perturbations.

  • 78.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Johansson, Stefan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Johansson, Pedher
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    StratiGraph Tool: Matrix Stratifications in Control Applications2011Rapport (Övrigt vetenskapligt)
    Abstract [en]

    In this contribution, the software tool StratiGraph for computing and visualizing closure hierarchy graphs associated with different orbit and bundle stratifications is presented. In addition, we review the underlying theory and illustrate how StratiGraph can be used to analyze descriptor system models via their associated system pencils. The stratification theory provides information for a deeper understanding of how the dynamics of a control system and its system characteristics behave under perturbations.

  • 79.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kressner, Daniel
    Computing codimensions and generic canonical forms for generalized matrix products2011Ingår i: The Electronic Journal of Linear Algebra, ISSN 1537-9582, E-ISSN 1081-3810, Vol. 22, s. 277-309Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A generalized matrix product can be formally written as Lambda(sp)(p) Lambda(sp-1)(p-1) ... Lambda(s2)(2) Lambda(s1)(1) where s(i) is an element of {- 1,+ 1} and ( A(1), ..., A(p)) is a tuple of ( possibly rectangular) matrices of suitable dimensions. The periodic eigenvalue problem related to such a product represents a nontrivial extension of generalized eigenvalue and singular value problems. While the classification of generalized matrix products under eigenvalue-preserving similarity transformations and the corresponding canonical forms have been known since the 1970's, finding generic canonical forms has remained an open problem. In this paper, we aim at such generic forms by computing the codimension of the orbit generated by all similarity transformations of a given generalized matrix product. This can be reduced to computing the so called cointeractions between two different blocks in the canonical form. A number of techniques are applied to keep the number of possibilities for different types of cointeractions limited. Nevertheless, the matter remains highly technical; we therefore also provide a computer program for finding the codimension of a canonical form, based on the formulas developed in this paper. A few examples illustrate how our results can be used to determine the generic canonical form of least codimension. Moreover, we describe an algorithm and provide software for extracting the generically regular part of a generalized matrix product.

  • 80.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kressner, Daniel
    ETH, Zürich.
    Multishift Variants of the QZ Algorithm with Aggressive Early Deflation2006Ingår i: SIAM Journal on Matrix Analysis and Applications, Vol. 29, nr 1, s. 199-227Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    New variants of the QZ algorithm for solving the generalized eigenvalue problem are proposed. An extension of the small-bulge multishift QR algorithm is developed, which chases chains of many small bulges instead of only one bulge in each QZ iteration. This allows the effective use of level 3 BLAS operations, which in turn can provide efficient utilization of high performance computing systems with deep memory hierarchies. Moreover, an extension of the aggressive early deflation strategy is proposed, which can identify and de. ate converged eigenvalues long before classic deflation strategies would. Consequently, the number of overall QZ iterations needed until convergence is considerably reduced. As a third ingredient, we reconsider the deflation of infinite eigenvalues and present a new deflation algorithm, which is particularly effective in the presence of a large number of infinite eigenvalues. Combining all these developments, our implementation significantly improves existing implementations of the QZ algorithm. This is demonstrated by numerical experiments with random matrix pairs as well as with matrix pairs arising from various applications.

  • 81.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Kressner, Daniel
    ETH, Zürich.
    Quintana-Orti, E. S.
    Universidad Jaume I, Castellón, Spain.
    Quintana-Orti, G.
    Universidad Jaume I, Castellón, Spain.
    Blocked Algorithms For the Reduction to Hessenberg-Triangular Form Revisited2008Ingår i: BIT Numerical Mathematics, Vol. 48, nr 3, s. 563-584Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present two variants of Moler and Stewart's algorithm for reducing a matrix pair to Hessenberg-triangular (HT) form with increased data locality in the access to the matrices. In one of these variants, a careful reorganization and accumulation of Givens rotations enables the use of efficient level 3 BLAS. Experimental results on four different architectures, representative of current high performance processors, compare the performances of the new variants with those of the implementation of Moler and Stewart's algorithm in subroutine DGGHRD from LAPACK, Dackland and Kagstrom's two-stage algorithm for the HT form, and a modified version of the latter which requires considerably less flops.

  • 82.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kressner, Daniel
    Shao, Meiyue
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On aggressive early deflation in parallel variants of the QR algorithm2012Ingår i: Applied parallel and scientific computing, PT I, Berlin, Heidelberg: Springer, 2012, s. 1-10Konferensbidrag (Refereegranskat)
    Abstract [en]

    The QR algorithm computes the Schur form of a matrix and is by far the most popular approach for solving dense nonsymmetric eigenvalue problems. Multishift and aggressive early deflation (AED) techniques have led to significantly more efficient sequential implementations of the QR algorithm during the last decade. More recently, these techniques have been incorporated in a novel parallel QR algorithm on hybrid distributed memory HPC systems. While leading to significant performance improvements, it has turned out that AED may become a computational bottleneck as the number of processors increases. In this paper, we discuss a two-level approach for performing AED in a parallel environment, where the lower level consists of a novel combination of AED with the pipelined QR algorithm implemented in the ScaLAPACK routine PDLAHQR. Numerical experiments demonstrate that this new implementation further improves the performance of the parallel QR algorithm.

  • 83.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kressner, Daniel
    Shao, Meiyue
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On aggressive early deflation in parallel variants of the QR algorithm2011Konferensbidrag (Refereegranskat)
    Abstract [en]

    The QR algorithm computes the Schur form of a matrix and is by far the most popular approach for solving dense nonsymmetric eigenvalue problems. Multishift and aggressive early deflation (AED) techniques have led to significantly more efficient sequential implementations of the QR algorithm during the last decade. More recently, these techniques have been incorporated in a novel parallel QR algorithm on hybrid distributed memory HPC systems. While leading to significant performance improvements, it has turned out that AED may become a computational bottleneck as the number of processors increases. In this paper, we discuss a two-level approach for performing AED in a parallel environment, where the lower level consists of a novel combination of AED with the pipelined QR algorithm implemented in the ScaLAPACK routine PDLAHQR. Numerical experiments demonstrate that this new iplementation further improves the performance of the parallel QR algorithm.

  • 84.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Ling, Per
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Van Loan, Charles
    Cornell University, NY, USA.
    Algorithm 784: GEMM-based level 3 BLAS: Portability and Optimization Issues1998Ingår i: ACM Transactions on Mathematical Software, Vol. 24, nr 3, s. 303-316Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This companion article discusses portability and optimization issues of the GEMM-based level 3 BLAS model implementations and the performance evaluation benchmark. All software comes in all four data types (single- and double-precision, real and complex) and are designed to be easy to implement and use on different platforms. Each of the GEMM-based routines has a few machine-dependent parameters that specify internal block. sizes, cache characteristics, and branch points for alternative code sections. These parameters provide means for adjustment to the characteristics of a memory hierarchy.

  • 85.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Ling, Per
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Van Loan, Charles
    Cornell University, NY, USA.
    GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark1998Ingår i: ACM Transactions on Mathematical Software, Vol. 24, nr 3, s. 268-302Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMM-based level 3 BLAS are structured to reduce effectively data traffic in a memory hierarchy. Second, the GEMM-based level 3 BLAS performance evaluation benchmark. is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.

  • 86.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Poromaa, Peter
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Computing Eigenspaces with Specified Eigenvalues of a Regular Matrix Pair (A, B) and Condition Estimation: Theory, Algorithms and Software1996Ingår i: Numerical Algorithms, Vol. 12, nr 3-4, s. 369-407Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Theory, algorithms and LAPACK-style software for computing a pair of deflating subspaces with specified eigenvalues of a regular matrix pair (A, B) and error bounds for computed quantities (eigenvalues and eigenspaces) are presented. The reordering of specified eigenvalues is performed with a direct orthogonal transformation method with guaranteed numerical stability. Each swap of two adjacent diagonal blocks in the real generalized Schur form, where at least one of them corresponds to a complex conjugate pair of eigenvalues, involves solving a generalized Sylvester equation and the construction of two orthogonal transformation matrices from certain eigenspaces associated with the diagonal blocks. The swapping of two 1 x 1 blocks is performed using orthogonal (unitary) Givens rotations. The error bound's are based on estimates of condition numbers for eigenvalues and eigenspaces. The software computes reciprocal values of a condition number for an individual eigenvalue (or a cluster of eigenvalues), a condition number for an eigenvector (or eigenspace), and spectral projectors onto a selected cluster. By computing reciprocal values we avoid overflow. Changes in eigenvectors and eigenspaces are measured by their change in angle. The condition numbers yield both asymptotic and global error bounds. The asymptotic bounds are only accurate for small perturbations (E, F) of (A, B), while the global bounds work for all \\(E, F)\\ up to a certain bound, whose size is determined by the conditioning of the problem. It is also shown how these upper bounds can be estimated. Fortran 77 software that implements our algorithms for reordering eigenvalues, computing (left and right) deflating subspaces with specified eigenvalues and condition number estimation are presented. Computational experiments that illustrate the accuracy, efficiency and reliability of our software are also described.

  • 87.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Poromaa, Peter
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    LAPACK-Style Algorithms and Software for Solving the Generalized Sylvester Equation and Estimating the Separation Between Regular Matrix Pairs1996Ingår i: ACM Transactions on Mathematical Software, Vol. 22, nr 1, s. 78-103Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Robust and fast software to solve the generalized Sylvester equation (AR - LB = C, DR - LE = F) for unknowns R and L is presented. This special linear system of equations, and its transpose, arises in computing error bounds for computed eigenvalues and eigenspaces of the generalized eigenvalue problem S - lambda T, in computing deflating subspaces of the same problem, and in computing certain decompositions of transfer matrices arising in control theory. Our contributions are twofold. First, we reorganize the standard algorithm for this problem to use Level 3 BLAS operations, like matrix multiplication, in its inner loop. This speeds up the algorithm by a factor of 9 on an IBM RS6000. Second, we develop and compare several condition estimation algorithms, which inexpensively but accurately estimate the sensitivity of the solution of this linear system.

  • 88.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskaplig fakultet, HPC2N (Högpresterande beräkningscentrum norr).
    Wiberg, Petter
    Umeå universitet, Teknisk-naturvetenskaplig fakultet, Institutionen för datavetenskap.
    Extracting Partial Canonical Structure for Large Scale Eigenvalue Problems2000Ingår i: Numerical Algorithms, Vol. 24, nr 3, s. 195-237Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present methods for computing a nearby partial Jordan-Schur form of a given matrix and a nearby partial Weierstrass-Schur form of a matrix pencil. The focus is on the use and the interplay of the algorithmic building blocks - the implicitly restarted Arnoldi method with prescribed restarts for computing an invariant subspace associated with the dominant eigenvalue, the clustering method for grouping computed eigenvalues into numerically multiple eigenvalues and the staircase algorithm for computing the structure revealing form of the projected problem. For matrix pencils, we present generalizations of these methods. We introduce a new and more accurate clustering heuristic for both matrices and matrix pencils. Particular emphasis is placed on reliability of the partial Jordan-Schur and Weierstrass-Schur methods with respect to the choice of deflation parameters connecting the steps of the algorithm such that the errors are controlled. Finally, successful results from computational experiments conducted on problems with known canonical structure and varying ill-conditioning are presented.

  • 89.
    Tångrot, Jeanette
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå centrum för molekylär patogenes (UCMP).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Sauer, Uwe
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå centrum för molekylär patogenes (UCMP) (Teknisk-naturvetenskaplig fakultet). Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Accurate Domain Identification with Structure-Anchored Hidden Markov Models, saHMMs2009Ingår i: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 76, nr 2, s. 343-352Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The ever increasing speed of DNA sequencing widens the discrepancy between the number of known gene products, and the knowledge of their function and structure. Proper annotation of protein sequences is therefore crucial if the missing information is to be deduced from sequence-based similarity comparisons. These comparisons become exceedingly difficult as the pairwise identities drop to very low values. To improve the accuracy of domain identification, we exploit the fact that the three-dimensional structures of domains are much more conserved than their sequences. Based on structure-anchored multiple sequence alignments of low identity homologues we constructed 850 structure-anchored hidden Markov models (saHMMs), each representing one domain family. Since the saHMMs are highly family specific, they can be used to assign a domain to its correct family and clearly distinguish it from domains belonging to other families, even within the same superfamily. This task is not trivial and becomes particularly difficult if the unknown domain is distantly related to the rest of the domain sequences within the family. In a search with full length protein sequences, harbouring at least one domain as defined by the structural classification of proteins database (SCOP), version 1.71, versus the saHMM database based on SCOP version 1.69, we achieve an accuracy of 99.0%. All of the few hits outside the family fall within the correct superfamily. Compared to Pfam_ls HMMs, the saHMMs obtain about 11% higher coverage. A comparison with BLAST and PSI-BLAST demonstrates that the saHMMs have consistently fewer errors per query at a given coverage. Within our recommended E-value range, the same is true for a comparison with SUPERFAMILY. Furthermore, we are able to annotate 232 proteins with 530 nonoverlapping domains belonging to 102 different domain families among human proteins labelled unknown in the NCBI protein database. Our results demonstrate that the saHMM database represents a versatile and reliable tool for identification of domains in protein sequences. With the aid of saHMMs, homology on the family level can be assigned, even for distantly related sequences. Due to the construction of the saHMMs, the hits they provide are always associated with high quality crystal structures. The saHMM database can be accessed via the FISH server at http://babel.ucmp.umu.se/fish/.

  • 90.
    Tångrot, Jeanette
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå centrum för molekylär patogenes (UCMP) (Teknisk-naturvetenskaplig fakultet). Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Wang, Lixiao
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå centrum för molekylär patogenes (UCMP) (Teknisk-naturvetenskaplig fakultet).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Sauer, Uwe
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå centrum för molekylär patogenes (UCMP) (Teknisk-naturvetenskaplig fakultet).
    FISH-Family identification of sequence homologues using structure anchored hidden Markov models2006Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 34, nr Web Server issue, s. W10-W14Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The FISH server is highly accurate in identifying the family membership of domains in a query protein sequence, even in the case of very low sequence identities to known homologues. A performance test using SCOP sequences and an E-value cut-off of 0.1 showed that 99.3% of the top hits are to the correct family saHMM. Matches to a query sequence provide the user not only with an annotation of the identified domains and hence a hint to their function, but also with probable 2D and 3D structures, as well as with pairwise and multiple sequence alignments to homologues with low sequence identity. In addition, the FISH server allows users to upload and search their own protein sequence collection or to quarry public protein sequence data bases with individual saHMMs. The FISH server can be accessed at http://babel.ucmp.umu.se/fish/.

  • 91.
    Tångrot, Jeanette
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Medicinska fakulteten, Umeå centrum för molekylär patogenes (UCMP) (Medicinska fakulteten).
    Wang, Lixiao
    Umeå universitet, Medicinska fakulteten, Umeå centrum för molekylär patogenes (UCMP) (Medicinska fakulteten).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Sauer, Uwe H.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Umeå centrum för molekylär patogenes (UCMP).
    Design, construction and use of the FISH server2007Ingår i: Applied parallel computing: state of the art in scientific computing, Springer Link , 2007, s. 647-657Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    At the core of the FISH (Family Identification with Structure anchored Hidden Markov models, saHMMs) server lies the midnight ASTRAL set. It is a collection of protein domains with low mutual sequence identity within homologous families, according to the structural classification of proteins, SCOP. Here, we evaluate two algorithms for creating the midnight ASTRAL set. The algorithm that limits the number of structural comparisons is about an order of magnitude faster than the all-against-all algorithm. We therefore choose the faster algorithm, although it produces slightly fewer domains in the set. We use the midnight ASTRAL set to construct the structure-anchored Hidden Markov Model data base, saHMM-db, where each saHMM represents one family. Sequence searches using saHMMs provide information about protein function, domain organization, the probable 2D and 3D structure, and can lead to the discovery of homologous domains in remotely related sequences.

12 51 - 91 av 91
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf