umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Kjelgaard Mikkelsen, Carl Christian
Publications (10 of 11) Show all publications
Kjelgaard Mikkelsen, C. C. & Karlsson, L. (2018). Blocked Algorithms for Robust Solution of Triangular Linear Systems. In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (Ed.), Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I. Paper presented at 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017 (pp. 68-78). Springer, 1
Open this publication in new window or tab >>Blocked Algorithms for Robust Solution of Triangular Linear Systems
2018 (English)In: Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, Vol. 1, p. 68-78Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx = αb can be computed without exceeding an overflow threshold Ω. Here T is a non-singular upper triangular matrix and b is a single vector, and Ω is less than the largest representable number. This problem is central to the computation of eigenvectors from Schur forms. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK. We explain why it is impractical to parallelize these algorithms. We then derive a robust blocked algorithm which can be executed in parallel using a task-based run-time system such as StarPU. The parallel overhead is increased marginally compared with regular blocked backward substitution.

Place, publisher, year, edition, pages
Springer, 2018
Series
Theoretical Computer Science and General Issues, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
Keywords
Triangular linear systems, overflow, blocked algorithms, robust algorithms
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-154296 (URN)10.1007/978-3-319-78024-5_7 (DOI)000458563300007 ()2-s2.0-85044716269 (Scopus ID)978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
Conference
12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017
Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2019-04-16Bibliographically approved
Kjelgaard Mikkelsen, C. C., Schwarz, A. B. & Karlsson, L. (2018). Parallel robust solution of triangular linear systems. Concurrency and Computation, Article ID e5064.
Open this publication in new window or tab >>Parallel robust solution of triangular linear systems
2018 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, article id e5064Article in journal (Refereed) Epub ahead of print
Abstract [en]

Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.

Place, publisher, year, edition, pages
John Wiley & Sons, 2018
Keywords
Overflow protection, parallel algorithms, task-based parallelism, triangular linear systems
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-154297 (URN)10.1002/cpe.5064 (DOI)2-s2.0-85056329436 (Scopus ID)
Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2019-05-28
Karlsson, L. & Kjelgaard Mikkelsen, C. C. (2017). Negative stride in the column-major format makes sense and has useful applications.
Open this publication in new window or tab >>Negative stride in the column-major format makes sense and has useful applications
2017 (English)Report (Other academic)
Abstract [en]

Two lower triangular or two upper triangular matrices of the same size can be stored with minimal memory footprint. If both positive and negative strides are used, then both matrices can be accessed as if they were stored in regular column-major format.

Publisher
p. 4
Series
Report / UMINF, ISSN 0348-0542 ; 17.17
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:umu:diva-154646 (URN)
Available from: 2018-12-21 Created: 2018-12-21 Last updated: 2018-12-21
Kjelgaard Mikkelsen, C. C., Alastruey-Benede, J., Ibanez-Marin, P. & Garcia Risueno, P. (2016). Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints. In: Wyrzykowski, R Deelman, E Dongarra, J Karczewski, K Kitowski, J Wiatr, K (Ed.), Parallel Processing and Applied Mathematics, PPAM 2015, Part I: . Paper presented at 11th International Conference on Parallel Processing and Applied Mathematics (PPAM), SEP 06-09, 2015, Krakow, POLAND (pp. 160-171). Cham: Springer International Publishing Switzerland
Open this publication in new window or tab >>Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints
2016 (English)In: Parallel Processing and Applied Mathematics, PPAM 2015, Part I / [ed] Wyrzykowski, R Deelman, E Dongarra, J Karczewski, K Kitowski, J Wiatr, K, Cham: Springer International Publishing Switzerland , 2016, p. 160-171Conference paper, Published paper (Refereed)
Abstract [en]

Molecular dynamics is used to study the time evolution of systems of atoms. It is common to constrain bond lengths in order to increase the time step of the simulation. Here we accelerate Newton's method for solving the constraint equations for a system consisting of many identical small molecules. Starting with a modular and generic base code using a sequential data layout, we apply three different optimization techniques. The compiled code approach is used to generate subroutines equivalent to a single step of Newton's method for a user specified molecule. Differing from the generic subroutines, these specific routines contain no loops and no indirect addressing. Interleaving the data describing different molecules generates vectorizable loops. Finally, we apply task fusion. The simultaneous application of all three techniques increases the speed of the base code by a factor of 15 for single precision calculations.

Place, publisher, year, edition, pages
Cham: Springer International Publishing Switzerland, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9573
Keywords
Newton's method, Non-linear equations, Molecular dynamics, Constraints, SHAKE, RATTLE, LINCS, mpiled code approach, Vector level parallelism, Vectorizing compiler, SIMD
National Category
Computer Engineering
Identifiers
urn:nbn:se:umu:diva-135304 (URN)10.1007/978-3-319-32149-3_16 (DOI)000400134500016 ()978-3-319-32149-3 (ISBN)978-3-319-32148-6 (ISBN)
Conference
11th International Conference on Parallel Processing and Applied Mathematics (PPAM), SEP 06-09, 2015, Krakow, POLAND
Available from: 2017-05-24 Created: 2017-05-24 Last updated: 2018-06-09Bibliographically approved
Karlsson, L., Kjelgaard Mikkelsen, C. C. & Kågström, B. (2014). Improving Perfect Parallelism. In: Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski (Ed.), Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I. Paper presented at 10th International Conference on Parallel Processing and Applied Mathematics (PPAM), Warsaw, POLAND, SEP 08-11, 2013 (pp. 76-85). Springer Berlin/Heidelberg, 8384
Open this publication in new window or tab >>Improving Perfect Parallelism
2014 (English)In: Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski, Springer Berlin/Heidelberg, 2014, Vol. 8384, p. 76-85Conference paper, Published paper (Refereed)
Abstract [en]

We reconsider the familiar problem of executing a perfectly parallel workload consisting of N independent tasks on a parallel computer with P << N processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix-vector multiplications, and sorting integer keys.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2014
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
Perfectly parallel problem, Resource contention, Forced parallelization
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-100794 (URN)10.1007/978-3-642-55224-3_8 (DOI)000349159200008 ()978-3-642-55224-3 (ISBN)978-3-642-55223-6 (ISBN)
Conference
10th International Conference on Parallel Processing and Applied Mathematics (PPAM), Warsaw, POLAND, SEP 08-11, 2013
Available from: 2015-04-01 Created: 2015-03-09 Last updated: 2019-06-26Bibliographically approved
Kjelgaard Mikkelsen, C. C. & Kågström, B. (2013). Approximate incomplete cyclic reduction for systems which are tridiagonal and strictly diagonally dominant by rows. In: Pekka Manninen and Per Öster (Ed.), Applied Parallel and Scientific Computing: 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected Papers. Paper presented at 11th International Conference on Applied Parallel and Scientific Computing, PARA 2012, Helsinki, Finland, June 10-13, 2012, (pp. 250-264). Springer Berlin/Heidelberg
Open this publication in new window or tab >>Approximate incomplete cyclic reduction for systems which are tridiagonal and strictly diagonally dominant by rows
2013 (English)In: Applied Parallel and Scientific Computing: 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected Papers / [ed] Pekka Manninen and Per Öster, Springer Berlin/Heidelberg, 2013, p. 250-264Conference paper, Published paper (Refereed)
Abstract [en]

Systems which are narrow banded and strictly diagonally dominant by rows can be solved in parallel using a variety of methods including incomplete block cyclic reduction. We show how to accelerate the algorithm by approximating the very first step. We derive tight estimates for the forward error and explain why our procedure is suitable for linear systems obtained by discretizing some common parabolic PDEs. An improved ScaLAPACK style algorithm is presented together with strong scalability results.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 7782
Keywords
approximate incomplete cyclic reduction, Narrow banded, strictly and evenly diagonally dominant linear systems
National Category
Discrete Mathematics Computer and Information Sciences
Identifiers
urn:nbn:se:umu:diva-83245 (URN)10.1007/978-3-642-36803-5_18 (DOI)000343867800018 ()978-3-642-36802-8 (ISBN)
Conference
11th International Conference on Applied Parallel and Scientific Computing, PARA 2012, Helsinki, Finland, June 10-13, 2012,
Available from: 2013-11-21 Created: 2013-11-21 Last updated: 2019-05-09Bibliographically approved
Kjelgaard Mikkelsen, C. C. & Kågström, B. (2012). Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems. In: Wyrzykowski, R. (Ed.), PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I: . Paper presented at 9th International Conference on Parallel Processing and Applied Mathematics, SEP 11-14, 2011, Torun, POLAND (pp. 80-91). Springer
Open this publication in new window or tab >>Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems
2012 (English)In: PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I / [ed] Wyrzykowski, R., Springer, 2012, p. 80-91Conference paper, Published paper (Refereed)
Abstract [en]

The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller’s analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.

Place, publisher, year, edition, pages
Springer, 2012
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7203
Keywords
Banded or block tridiagonal linear systems, strict diagonal dominance, incomplete cyclic reduction, ScaLAPACK
National Category
Computer Sciences
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-50786 (URN)10.1007/978-3-642-31464-3_9 (DOI)000313192800009 ()978-3-642-31463-6 (ISBN)
Conference
9th International Conference on Parallel Processing and Applied Mathematics, SEP 11-14, 2011, Torun, POLAND
Available from: 2012-02-20 Created: 2011-12-21 Last updated: 2018-06-08Bibliographically approved
Kjelgaard Mikkelsen, C. C. & Kågström, B. (2012). Parallel solution of narrow banded diagonally dominant linear systems. In: Kristján Jónasson (Ed.), Applied Parallel and Scientific Computing, Pt II: . Paper presented at 10th Nordic International Conference on Applied Parallel Computing - State of the Art in Scientific and Parallel Computing (PARA), JUN 06-09, 2010, Univ Iceland, Sch Engn & Nat Sci, Reykjavik, ICELAND (pp. 280-290). Springer Berlin/Heidelberg, 7134
Open this publication in new window or tab >>Parallel solution of narrow banded diagonally dominant linear systems
2012 (English)In: Applied Parallel and Scientific Computing, Pt II / [ed] Kristján Jónasson, Springer Berlin/Heidelberg, 2012, Vol. 7134, p. 280-290Conference paper, Published paper (Refereed)
Abstract [en]

ScaLAPACK contains a pair of routines for solving systems which are narrow banded and diagonally dominant by rows. Mathematically, the algorithm is block cyclic reduction. The ScaLAPACK implementation can be improved using incomplete, rather than complete block cyclic reduction. If the matrix is strictly dominant by rows, then the truncation error can be bounded directly in terms of the dominance factor and the size of the partitions. Our analysis includes new results applicable in our ongoing work of developing an efficient parallel solver.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2012
Series
Lecture notes in computer science, ISSN 0302-9743 ; 7134
Keywords
Narrow banded, diagonally dominant linear systems, block cyclic reduction, parallel algorithms, ScaLAPACK
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-51059 (URN)10.1007/978-3-642-28145-7_28 (DOI)000309716000028 ()978-3-642-28144-0 (ISBN)978-3-642-28145-7 (ISBN)
Conference
10th Nordic International Conference on Applied Parallel Computing - State of the Art in Scientific and Parallel Computing (PARA), JUN 06-09, 2010, Univ Iceland, Sch Engn & Nat Sci, Reykjavik, ICELAND
Available from: 2012-02-16 Created: 2012-01-09 Last updated: 2018-06-08Bibliographically approved
Kjelgaard Mikkelsen, C. C. (2012). The explicit Spike algorithm: Iterative solution of the reduced system. In: Berry, M.W.; Gallivan, K.A.; Gallopoulos, E.; Grama, A.; Philippe, B.; Saad, Y.; Saied, F. (Ed.), High-performance scientific computing: algorithms and applications (pp. 147-156). London: Springer
Open this publication in new window or tab >>The explicit Spike algorithm: Iterative solution of the reduced system
2012 (English)In: High-performance scientific computing: algorithms and applications / [ed] Berry, M.W.; Gallivan, K.A.; Gallopoulos, E.; Grama, A.; Philippe, B.; Saad, Y.; Saied, F., London: Springer, 2012, p. 147-156Chapter in book (Refereed)
Abstract [en]

The explicit Spike algorithm applies to narrow banded linear systems which are strictly diagonally dominant by rows. The parallel bottleneck is the solution of the so-called reduced system which is block tridiagonal and strictly diagonally dominant by rows. The reduced system can be solved iteratively using the truncated reduced system matrix as a preconditioner. In this paper we derive a tight estimate for the quality of this preconditioner.

Place, publisher, year, edition, pages
London: Springer, 2012
Keywords
Narrow banded and diagonally dominant linear systems
National Category
Computer Sciences
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-50785 (URN)978-1-4471-2436-8 (ISBN)
Available from: 2012-02-16 Created: 2011-12-21 Last updated: 2018-06-08Bibliographically approved
Kjelgaard Mikkelsen, C. C. (2011). Retracing the residual curve of a Lyapunov equation solver. BIT Numerical Mathematics, 51(4), 959-975
Open this publication in new window or tab >>Retracing the residual curve of a Lyapunov equation solver
2011 (English)In: BIT Numerical Mathematics, ISSN 0006-3835, E-ISSN 1572-9125, Vol. 51, no 4, p. 959-975Article in journal (Refereed) Published
Abstract [en]

Let A ∈ Rn×n and let B ∈ Rn×p and consider the Lyapunov matrix equation AX + XAT + BBT = 0. If A + AT < 0, then the extended Krylov subspacemethod (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show how to construct a symmetric negative definite matrix A and a column vector B, for which the EKSM generates a predetermined residual curve.

Place, publisher, year, edition, pages
Springer, 2011
Keywords
Lyapunov matrix equations, the extended Krylov subspace method
National Category
Computational Mathematics
Research subject
Mathematics
Identifiers
urn:nbn:se:umu:diva-50738 (URN)10.1007/s10543-011-0323-7 (DOI)
Available from: 2012-02-15 Created: 2011-12-20 Last updated: 2018-06-08Bibliographically approved
Organisations

Search in DiVA

Show all publications