umu.sePublications
Change search
Refine search result
1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Karlsson, Lars
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Negative stride in the column-major format makes sense and has useful applications2017Report (Other academic)
    Abstract [en]

    Two lower triangular or two upper triangular matrices of the same size can be stored with minimal memory footprint. If both positive and negative strides are used, then both matrices can be accessed as if they were stored in regular column-major format.

  • 2.
    Karlsson, Lars
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Improving Perfect Parallelism2014In: Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski, Springer Berlin/Heidelberg, 2014, Vol. 8384, p. 76-85Conference paper (Refereed)
    Abstract [en]

    We reconsider the familiar problem of executing a perfectly parallel workload consisting of N independent tasks on a parallel computer with P << N processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix-vector multiplications, and sorting integer keys.

  • 3.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Retracing the residual curve of a Lyapunov equation solver2011In: BIT Numerical Mathematics, ISSN 0006-3835, E-ISSN 1572-9125, Vol. 51, no 4, p. 959-975Article in journal (Refereed)
    Abstract [en]

    Let A ∈ Rn×n and let B ∈ Rn×p and consider the Lyapunov matrix equation AX + XAT + BBT = 0. If A + AT < 0, then the extended Krylov subspacemethod (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show how to construct a symmetric negative definite matrix A and a column vector B, for which the EKSM generates a predetermined residual curve.

  • 4.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    The explicit Spike algorithm: Iterative solution of the reduced system2012In: High-performance scientific computing: algorithms and applications / [ed] Berry, M.W.; Gallivan, K.A.; Gallopoulos, E.; Grama, A.; Philippe, B.; Saad, Y.; Saied, F., London: Springer, 2012, p. 147-156Chapter in book (Refereed)
    Abstract [en]

    The explicit Spike algorithm applies to narrow banded linear systems which are strictly diagonally dominant by rows. The parallel bottleneck is the solution of the so-called reduced system which is block tridiagonal and strictly diagonally dominant by rows. The reduced system can be solved iteratively using the truncated reduced system matrix as a preconditioner. In this paper we derive a tight estimate for the quality of this preconditioner.

  • 5.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N). Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Alastruey-Benede, Jesus
    Ibanez-Marin, Pablo
    Garcia Risueno, Pablo
    Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints2016In: Parallel Processing and Applied Mathematics, PPAM 2015, Part I / [ed] Wyrzykowski, R Deelman, E Dongarra, J Karczewski, K Kitowski, J Wiatr, K, Cham: Springer International Publishing Switzerland , 2016, p. 160-171Conference paper (Refereed)
    Abstract [en]

    Molecular dynamics is used to study the time evolution of systems of atoms. It is common to constrain bond lengths in order to increase the time step of the simulation. Here we accelerate Newton's method for solving the constraint equations for a system consisting of many identical small molecules. Starting with a modular and generic base code using a sequential data layout, we apply three different optimization techniques. The compiled code approach is used to generate subroutines equivalent to a single step of Newton's method for a user specified molecule. Differing from the generic subroutines, these specific routines contain no loops and no indirect addressing. Interleaving the data describing different molecules generates vectorizable loops. Finally, we apply task fusion. The simultaneous application of all three techniques increases the speed of the base code by a factor of 15 for single precision calculations.

  • 6.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Blocked Algorithms for Robust Solution of Triangular Linear Systems2018In: Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, Vol. 1, p. 68-78Conference paper (Refereed)
    Abstract [en]

    We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx = αb can be computed without exceeding an overflow threshold Ω. Here T is a non-singular upper triangular matrix and b is a single vector, and Ω is less than the largest representable number. This problem is central to the computation of eigenvectors from Schur forms. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK. We explain why it is impractical to parallelize these algorithms. We then derive a robust blocked algorithm which can be executed in parallel using a task-based run-time system such as StarPU. The parallel overhead is increased marginally compared with regular blocked backward substitution.

  • 7.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Approximate incomplete cyclic reduction for systems which are tridiagonal and strictly diagonally dominant by rows2013In: Applied Parallel and Scientific Computing: 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected Papers / [ed] Pekka Manninen and Per Öster, Springer Berlin/Heidelberg, 2013, p. 250-264Conference paper (Refereed)
    Abstract [en]

    Systems which are narrow banded and strictly diagonally dominant by rows can be solved in parallel using a variety of methods including incomplete block cyclic reduction. We show how to accelerate the algorithm by approximating the very first step. We derive tight estimates for the forward error and explain why our procedure is suitable for linear systems obtained by discretizing some common parabolic PDEs. An improved ScaLAPACK style algorithm is presented together with strong scalability results.

  • 8.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems2012In: PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I / [ed] Wyrzykowski, R., Springer, 2012, p. 80-91Conference paper (Refereed)
    Abstract [en]

    The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller’s analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.

  • 9.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Parallel solution of narrow banded diagonally dominant linear systems2012In: Applied Parallel and Scientific Computing, Pt II / [ed] Kristján Jónasson, Springer Berlin/Heidelberg, 2012, Vol. 7134, p. 280-290Conference paper (Refereed)
    Abstract [en]

    ScaLAPACK contains a pair of routines for solving systems which are narrow banded and diagonally dominant by rows. Mathematically, the algorithm is block cyclic reduction. The ScaLAPACK implementation can be improved using incomplete, rather than complete block cyclic reduction. If the matrix is strictly dominant by rows, then the truncation error can be bounded directly in terms of the dominance factor and the size of the partitions. Our analysis includes new results applicable in our ongoing work of developing an efficient parallel solver.

  • 10.
    Kjelgaard Mikkelsen, Carl Christian
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Schwarz, Angelika Beatrix
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parallel robust solution of triangular linear systems2018In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, article id e5064Article in journal (Refereed)
    Abstract [en]

    Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.

  • 11.
    Schwarz, Angelika Beatrix
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Robust Task-Parallel Solution of the Triangular Sylvester EquationManuscript (preprint) (Other academic)
    Abstract [en]

    The Bartels-Stewart algorithm is a standard approach to solving the dense Sylvester equation. It reduces the problem to the solution of the triangular Sylvester equation. The triangular Sylvester equation is solved with a variant of backward substitution. Backward substitution is prone to overflow. Overflow can be avoided by dynamic scaling of the solution matrix. An algorithm which prevents overflow is said to berobust. The standard library LAPACK contains the robust scalar sequential solver dtrsyl. This paper derives a robust, level-3 BLAS-based task-parallel solver. By adding overflow protection, our robust solver closes the gap between problems solvable by LAPACK and problems solvable by existing non-robust task-parallel solvers. We demonstrate that our robust solver achieves a similar performance as non-robust solvers.

1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf