Umeå universitets logga

umu.sePublikationer
Ändra sökning
Avgränsa sökresultatet
1 - 28 av 28
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Adlerborn, Björn
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Towards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form2017Rapport (Övrigt vetenskapligt)
    Abstract [en]

    In this paper we discuss the problem of computing eigenvectors for matrices in Schur form using parallel computing. We develop a new parallel algorithm and report on the performance of our MPI based implementation. We have also implemented a new parallel algorithm for scaling during the backsubstitution phase. We have increased the arithmetic intensity by interleaving the compution of several eigenvectors and by merging the backward substitution and the back-transformation of the eigenvector computation.

    Ladda ner fulltext (pdf)
    fulltext
  • 2.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Negative stride in the column-major format makes sense and has useful applications2017Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Two lower triangular or two upper triangular matrices of the same size can be stored with minimal memory footprint. If both positive and negative strides are used, then both matrices can be accessed as if they were stored in regular column-major format.

    Ladda ner fulltext (pdf)
    fulltext
  • 3.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Improving Perfect Parallelism2014Ingår i: Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski, Springer Berlin/Heidelberg, 2014, Vol. 8384, s. 76-85Konferensbidrag (Refereegranskat)
    Abstract [en]

    We reconsider the familiar problem of executing a perfectly parallel workload consisting of N independent tasks on a parallel computer with P << N processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix-vector multiplications, and sorting integer keys.

  • 4.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Any positive residual history is possible for the Arnoldi method for Lyapunov matrix equations2010Rapport (Övrigt vetenskapligt)
    Abstract [en]

    In this paper we consider the Lyapunov equation AX+XA^T+bb^T = 0, where A is negative definite n by n matrix and b in R^n. The Arnoldi method is an iterative algorithm which can be used to compute an approximate solution. However, the convergence can be very slow and in this paper we show how to explicitly construct a Lyapunov equation with a given residual curve. The matrix A can be chosen as symmetric negative definite and it is possible to arbitrarily specify the elements on the diagonal of the Cholesky factor of -A. If the symmetry is dropped, then it is possible to arbitrarily specify A+A^T, while retaining the residual curve.

    Ladda ner fulltext (pdf)
    fulltext
  • 5.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Any positive residual history is possible for the EKSM for Lyapunov matrix equations2010Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Let A in be an n by n matrix and let B be an n by p matrix and consider the Lyapunov matrix equation AX+XA^T+BB^T=0. If A+A^T < 0, then the extended Krylov subspace method (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show that any positive residual history is possible for the EKSM for Lyapunov matrix equations. In addition, we show how to systematically construct linear time invariant systems for which it is impractical to approximate the action of the product of the system Gramians using the EKSM. This is a property of the underlying Lyapunov matrix equations, rather than a defect of the algorithm.

    Ladda ner fulltext (pdf)
    fulltext
  • 6.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Retracing the residual curve of a Lyapunov equation solver2011Ingår i: BIT Numerical Mathematics, ISSN 0006-3835, E-ISSN 1572-9125, Vol. 51, nr 4, s. 959-975Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Let A ∈ Rn×n and let B ∈ Rn×p and consider the Lyapunov matrix equation AX + XAT + BBT = 0. If A + AT < 0, then the extended Krylov subspacemethod (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show how to construct a symmetric negative definite matrix A and a column vector B, for which the EKSM generates a predetermined residual curve.

    Ladda ner fulltext (pdf)
    fulltext
  • 7.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    The explicit Spike algorithm: Iterative solution of the reduced system2012Ingår i: High-performance scientific computing: algorithms and applications / [ed] Berry, M.W.; Gallivan, K.A.; Gallopoulos, E.; Grama, A.; Philippe, B.; Saad, Y.; Saied, F., London: Springer, 2012, s. 147-156Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    The explicit Spike algorithm applies to narrow banded linear systems which are strictly diagonally dominant by rows. The parallel bottleneck is the solution of the so-called reduced system which is block tridiagonal and strictly diagonally dominant by rows. The reduced system can be solved iteratively using the truncated reduced system matrix as a preconditioner. In this paper we derive a tight estimate for the quality of this preconditioner.

    Ladda ner fulltext (pdf)
    ExplicitSpikeAlg
  • 8.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Well-conditioned eigenvalue problems that overflowManuskript (preprint) (Övrigt vetenskapligt)
  • 9.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N). Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Alastruey-Benede, Jesus
    Ibanez-Marin, Pablo
    Garcia Risueno, Pablo
    Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints2016Ingår i: Parallel Processing and Applied Mathematics, PPAM 2015, Part I / [ed] Wyrzykowski, R Deelman, E Dongarra, J Karczewski, K Kitowski, J Wiatr, K, Cham: Springer International Publishing Switzerland , 2016, s. 160-171Konferensbidrag (Refereegranskat)
    Abstract [en]

    Molecular dynamics is used to study the time evolution of systems of atoms. It is common to constrain bond lengths in order to increase the time step of the simulation. Here we accelerate Newton's method for solving the constraint equations for a system consisting of many identical small molecules. Starting with a modular and generic base code using a sequential data layout, we apply three different optimization techniques. The compiled code approach is used to generate subroutines equivalent to a single step of Newton's method for a user specified molecule. Differing from the generic subroutines, these specific routines contain no loops and no indirect addressing. Interleaving the data describing different molecules generates vectorizable loops. Finally, we apply task fusion. The simultaneous application of all three techniques increases the speed of the base code by a factor of 15 for single precision calculations.

    Ladda ner fulltext (pdf)
    fulltext
  • 10.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Blocked Algorithms for Robust Solution of Triangular Linear Systems2018Ingår i: Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, Vol. 1, s. 68-78Konferensbidrag (Refereegranskat)
    Abstract [en]

    We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx = αb can be computed without exceeding an overflow threshold Ω. Here T is a non-singular upper triangular matrix and b is a single vector, and Ω is less than the largest representable number. This problem is central to the computation of eigenvectors from Schur forms. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK. We explain why it is impractical to parallelize these algorithms. We then derive a robust blocked algorithm which can be executed in parallel using a task-based run-time system such as StarPU. The parallel overhead is increased marginally compared with regular blocked backward substitution.

  • 11.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Robust Solution of Triangular Linear Systems2017Rapport (Övrigt vetenskapligt)
    Abstract [sv]

    We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx=\alpha b can be computed without exceeding the overflow threshold $\Omega$. Here T is a non-singular upper triangular matrix and b is a single vector. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK and our main contribution is to simplify and extend the analysis. We explain why it is impractical to parallelize these algorithms. We then derive a robust block algorithm for solving triangular linear systems. Any run-time system such as StarPU which can run a task based backward substitution algorithm can also execute our robust block algorithm in parallel. It is simply a matter of replacing standard kernels with our robust kernels.

    Ladda ner fulltext (pdf)
    fulltext
  • 12.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Approximate incomplete cyclic reduction for systems which are tridiagonal and strictly diagonally dominant by rows2013Ingår i: Applied Parallel and Scientific Computing: 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected Papers / [ed] Pekka Manninen and Per Öster, Springer Berlin/Heidelberg, 2013, s. 250-264Konferensbidrag (Refereegranskat)
    Abstract [en]

    Systems which are narrow banded and strictly diagonally dominant by rows can be solved in parallel using a variety of methods including incomplete block cyclic reduction. We show how to accelerate the algorithm by approximating the very first step. We derive tight estimates for the forward error and explain why our procedure is suitable for linear systems obtained by discretizing some common parabolic PDEs. An improved ScaLAPACK style algorithm is presented together with strong scalability results.

  • 13.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems2012Ingår i: PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I / [ed] Wyrzykowski, R., Springer, 2012, s. 80-91Konferensbidrag (Refereegranskat)
    Abstract [en]

    The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller’s analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.

    Ladda ner fulltext (pdf)
    fulltext
  • 14.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Parallel solution of narrow banded diagonally dominant linear systems2012Ingår i: Applied Parallel and Scientific Computing, Pt II / [ed] Kristján Jónasson, Springer Berlin/Heidelberg, 2012, Vol. 7134, s. 280-290Konferensbidrag (Refereegranskat)
    Abstract [en]

    ScaLAPACK contains a pair of routines for solving systems which are narrow banded and diagonally dominant by rows. Mathematically, the algorithm is block cyclic reduction. The ScaLAPACK implementation can be improved using incomplete, rather than complete block cyclic reduction. If the matrix is strictly dominant by rows, then the truncation error can be bounded directly in terms of the dominance factor and the size of the partitions. Our analysis includes new results applicable in our ongoing work of developing an efficient parallel solver.

    Ladda ner fulltext (pdf)
    fulltext
  • 15.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    López-Villellas, Lorién
    Barcelona Supercomputing Center, Barcelona, Spain.
    García-Risueño, Pablo
    How accurate does Newton have to be?2023Ingår i: Parallel processing and applied mathematics: 14th International conference, PPAM 2022, Gdansk, Poland, September 11–14, 2022, revised selected papers, part I / [ed] Roman Wyrzykowski; Jack Dongarra; Ewa Deelman; Konrad Karczewski, Switzerland: Springer Nature, 2023, Vol. 1, s. 3-15Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic. In particular, we derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. We briefly discuss implications for parallel solvers.

  • 16.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    López-Villellas, Lorién
    Barcelona Supercomputing Center, Barcelona, Spain.
    García-Risueño, Pablo
    Independent Scholar, Berlin, Germany.
    Newton's method revisited: how accurate do we have to be?2023Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic using three different techniques. We derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. In particular, we apply both a symmetric variant and Forsgren's variant of the simplified Newton method. This work has implications for the implementation of quasi-Newton methods regardless of the scale of the calculation or the machine.

    Ladda ner fulltext (pdf)
    fulltext
  • 17.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Myllykoski, Mirko
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencils2020Ingår i: Parallel Processing and Applied Mathematics: Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Ewa Deelman, Jack Dongarra, Konrad Karczewski, Springer, 2020, s. 58-69Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper we consider the problem of computing generalized eigenvectors of a matrix pencil in real Schur form. In exact arithmetic, this problem can be solved using substitution. In practice, substitution is vulnerable to floating-point overflow. The robust solvers xtgevc in LAPACK prevent overflow by dynamically scaling the eigenvectors.These subroutines are scalar and sequential codes which compute theeigenvectors one by one. In this paper, we discuss how to derive robust algorithms which are blocked and parallel. The new StarNEig librarycontains a robust task-parallel solver Zazamoukh which runs on top of StarPU. Our numerical experiments show that Zazamoukh achieves a super-linear speedup compared with dtgevc for sufficiently large matrices.

    Ladda ner fulltext (pdf)
    fulltext
  • 18.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Myllykoski, Mirko
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Adlerborn, Björn
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    D2.5 Eigenvalue problem solvers2017Rapport (Övrigt vetenskapligt)
    Ladda ner fulltext (pdf)
    fulltext
  • 19.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Schwarz, Angelika Beatrix
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parallel robust solution of triangular linear systems2019Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 31, nr 19, artikel-id e5064Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.

  • 20.
    Kågström, Bo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Myllykoski, Mirko
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Cayrols, Sébastien
    Science and Technology Facilities Council.
    Duff, Iain
    Science and Technology Facilities Council.
    Lopez, Florent
    Science and Technology Facilities Council.
    Nakov, Stojce
    Science and Technology Facilities Council.
    Pranesh, Srikara
    The University of Manchester.
    Stevens, David
    The University of Manchester.
    Dongarra, Jack
    The University of Manchester.
    Donfack, Simplice
    National Institute for Research in Computer Science and Control.
    Grigori, Laura
    National Institute for Research in Computer Science and Control.
    Tissot, Olivier
    National Institute for Research in Computer Science and Control.
    D7.8 Release of the NLAFET library2019Rapport (Övrigt vetenskapligt)
    Ladda ner fulltext (pdf)
    fulltext
  • 21.
    López-Villellas, Lorién
    et al.
    Barcelona Supercomputing Center, Barcelona, Spain.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N). Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Galano-Frutos, Juan José
    Department of Biochemistry, Molecular and Cellular Biology / Biocomputation and Complex Systems Physics Institute (BIFI), Universidad de Zaragoza, Zaragoza, Spain.
    Marco-Sola, Santiago
    Barcelona Supercomputing Center, Barcelona, Spain; Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain.
    Alastruey-Benedé, Jesús
    Departamento de Informática e Ingeniería de Sistemas / Aragón Institute for Engineering Research (I3A), Universidad de Zaragoza, Zaragoza, Spain.
    Ibáñez, Pablo
    Departamento de Informática e Ingeniería de Sistemas / Aragón Institute for Engineering Research (I3A), Universidad de Zaragoza, Zaragoza, Spain.
    Moretó, Miquel
    Barcelona Supercomputing Center, Barcelona, Spain; Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain.
    Sancho, Javier
    Department of Biochemistry, Molecular and Cellular Biology / Biocomputation and Complex Systems Physics Institute (BIFI), Universidad de Zaragoza, Zaragoza, Spain.
    García-Risueño, Pablo
    Accurate and efficient constrained molecular dynamics of polymers using Newton's method and special purpose code2023Ingår i: Computer Physics Communications, ISSN 0010-4655, E-ISSN 1879-2944, Vol. 288, artikel-id 108742Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In molecular dynamics simulations we can often increase the time step by imposing constraints on bond lengths and bond angles. This allows us to extend the length of the time interval and therefore the range of physical phenomena that we can afford to simulate. We examine the existing algorithms and software for solving nonlinear constraint equations in parallel and we explain why it is necessary to advance the state-of-the-art. We present ILVES-PC, a new algorithm for imposing bond constraints on proteins accurately and efficiently. It solves the same system of differential algebraic equations as the celebrated SHAKE algorithm, but ILVES-PC solves the nonlinear constraint equations using Newton’s method rather than the nonlinear Gauss-Seidel method. Moreover, ILVES-PC solves the necessary linear systems using a specialized linear solver that exploits the structure of the protein. ILVES-PC can rapidly solve constraint equations as accurately as the hardware will allow. The run-time of ILVES-PC is proportional to the number of constraints. We have integrated ILVES-PC into GROMACS and simulated proteins of different sizes. Compared with SHAKE, we have achieved speedups of up to 4.9× in single-threaded executions and up to 76× in shared-memory multi-threaded executions. Moreover, ILVES-PC is more accurate than P-LINCS algorithm. Our work is a proof-of-concept of the utility of software designed specifically for the simulation of polymers.

    Ladda ner fulltext (pdf)
    fulltext
  • 22.
    Myllykoski, Mirko
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Introduction to StarNEig: A Task-based Library for Solving Nonsymmetric Eigenvalue Problems2020Ingår i: Parallel Processing and Applied Mathematics: Revised Selected Papers, Part I / [ed] Roman Wyrzykowski and Boleslaw Szymanski, Springer, 2020, s. 70-81Konferensbidrag (Refereegranskat)
    Abstract [en]

    Abstract. In this paper, we present the StarNEig library for solvingdense nonsymmetric (generalized) eigenvalue problems. The library isbuilt on top of the StarPU runtime system and targets both shared anddistributed memory machines. Some components of the library supportGPUs. The library is currently in an early beta state and only real arith-metic is supported. Support for complex data types is planned for afuture release. This paper is aimed at potential users of the library. Wedescribe the design choices and capabilities of the library, and contrastthem to existing software such as ScaLAPACK. StarNEig implements aScaLAPACK compatibility layer that should make it easy for new usersto transition to StarNEig. We demonstrate the performance of the librarywith a small set of computational experiments.

    Ladda ner fulltext (pdf)
    fulltext
  • 23.
    Myllykoski, Mirko
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Task‐based, GPU‐accelerated and robust library for solving dense nonsymmetric eigenvalue problems2021Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 33, nr 11, artikel-id e5915Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we present the StarNEig library for solving dense nonsymmetric standard and generalized eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library have support for GPU acceleration. The library currently applies to real matrices with real and complex eigenvalues and all calculations are done using real arithmetic. Support for complex matrices is planned for a future release. This paper is aimed at potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as LAPACK and ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer which should assist new users in the transition to StarNEig. We demonstrate the performance of the library with a sample of computational experiments.

    Ladda ner fulltext (pdf)
    fulltext
  • 24.
    Myllykoski, Mirko
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Task-Based Parallel Algorithms for Eigenvalue Reordering of Matrices in Real Schur Forms2017Rapport (Övrigt vetenskapligt)
    Abstract [en]

    We develop a task-based parallel algorithm for reordering eigenvalues of matrices in real Schur form. We describe how we implemented the algorithm using StarPU runtime system and report on experiments performed on a shared memory machine. Compared with ScaLAPACK we achieve average speedup of 3. We have strong and weak scaling efficiencies which are well above 50%. We are able to achieve more than 50% of the peak flop rate for all but the smallest matrices. The idle time and the overhead is negligible except for the smallest matrices. The next step is to reconfigure and further develop the code so that it can be applied to matrix pairs in generalized Schur forms and run efficiently on distributed memory machines.

    Ladda ner fulltext (pdf)
    fulltext
  • 25.
    Myllykoski, Mirko
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Schwarz, Angelika Beatrix
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    D2.7 Eigenvalue solvers for nonsymmetric problems2019Rapport (Övrigt vetenskapligt)
    Ladda ner fulltext (pdf)
    fulltext
  • 26.
    Schwarz, Angelika Beatrix
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Robust Task-Parallel Solution of the Triangular Sylvester Equation2020Ingår i: Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Ewa Deelman, Jack Dongarra, Konrad Karczewski, Springer, 2020, s. 82-92Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Bartels-Stewart algorithm is a standard approach to solving the dense Sylvester equation. It reduces the problem to the solution of the triangular Sylvester equation. The triangular Sylvester equation is solved with a variant of backward substitution. Backward substitution is prone to overflow. Overflow can be avoided by dynamic scaling of the solution matrix. An algorithm which prevents overflow is said to be robust. The standard library LAPACK contains the robust scalar sequential solver dtrsyl. This paper derives a robust, level-3 BLAS-based task-parallel solver. By adding overflow protection, our robust solver closes the gap between problems solvable by LAPACK and problems solvable by existing non-robust task-parallel solvers. We demonstrate that our robust solver achieves a performance similar to non-robust solvers.

  • 27.
    Schwarz, Angelika Beatrix
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Robust Parallel Eigenvector Computation for the Non-Symmetric Eigenvalue Problem2020Rapport (Övrigt vetenskapligt)
    Abstract [en]

    A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schurform relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

    Ladda ner fulltext (pdf)
    fulltext
  • 28.
    Schwarz, Angelika Beatrix
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Robust parallel eigenvector computation for the non-symmetric eigenvalue problem2020Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 100, artikel-id 102707Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

1 - 28 av 28
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf