umu.sePublications

Please wait ... |

Link to record
http://umu.diva-portal.org/smash/person.jsf?pid=authority-person:65553 $(function(){PrimeFaces.cw("InputTextarea","widget_formSmash_upper_j_idt122_recordDirectLink",{id:"formSmash:upper:j_idt122:recordDirectLink",widgetVar:"widget_formSmash_upper_j_idt122_recordDirectLink",autoResize:true});}); $(function(){PrimeFaces.cw("OverlayPanel","widget_formSmash_upper_j_idt122_j_idt124",{id:"formSmash:upper:j_idt122:j_idt124",widgetVar:"widget_formSmash_upper_j_idt122_j_idt124",target:"formSmash:upper:j_idt122:permLink",showEffect:"blind",hideEffect:"fade",my:"right top",at:"right bottom",showCloseIcon:true});});

Permanent link

Direct link

Kjelgaard Mikkelsen, Carl Christian

Open this publication in new window or tab >>Blocked Algorithms for Robust Solution of Triangular Linear Systems### Kjelgaard Mikkelsen, Carl Christian

### Karlsson, Lars

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_some",{id:"formSmash:j_idt184:0:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_otherAuthors",{id:"formSmash:j_idt184:0:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_otherAuthors",multiple:true}); 2018 (English)In: Parallel Processing and Applied Mathematics: 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski, Springer, 2018, Vol. 1, p. 68-78Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer, 2018
##### Series

Theoretical Computer Science and General Issues, ISSN 0302-9743, E-ISSN 1611-3349 ; 10777
##### Keywords

Triangular linear systems, overflow, blocked algorithms, robust algorithms
##### National Category

Computer Sciences
##### Identifiers

urn:nbn:se:umu:diva-154296 (URN)10.1007/978-3-319-78024-5_7 (DOI)000458563300007 ()2-s2.0-85044716269 (Scopus ID)978-3-319-78023-8 (ISBN)978-3-319-78024-5 (ISBN)
##### Conference

12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt359",{id:"formSmash:j_idt184:0:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt365",{id:"formSmash:j_idt184:0:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_0_j_idt188_j_idt371",{id:"formSmash:j_idt184:0:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_0_j_idt188_j_idt371",multiple:true});
#####

Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2019-04-16Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

We consider the problem of computing a scaling α such that the solution x of the scaled linear system Tx = αb can be computed without exceeding an overflow threshold Ω. Here T is a non-singular upper triangular matrix and b is a single vector, and Ω is less than the largest representable number. This problem is central to the computation of eigenvectors from Schur forms. We show how to protect individual arithmetic operations against overflow and we present a robust scalar algorithm for the complete problem. Our algorithm is very similar to xLATRS in LAPACK. We explain why it is impractical to parallelize these algorithms. We then derive a robust blocked algorithm which can be executed in parallel using a task-based run-time system such as StarPU. The parallel overhead is increased marginally compared with regular blocked backward substitution.

Open this publication in new window or tab >>Parallel robust solution of triangular linear systems### Kjelgaard Mikkelsen, Carl Christian

### Schwarz, Angelika Beatrix

### Karlsson, Lars

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_some",{id:"formSmash:j_idt184:1:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_otherAuthors",{id:"formSmash:j_idt184:1:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_otherAuthors",multiple:true}); 2018 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, article id e5064Article in journal (Refereed) Epub ahead of print
##### Abstract [en]

##### Place, publisher, year, edition, pages

John Wiley & Sons, 2018
##### Keywords

Overflow protection, parallel algorithms, task-based parallelism, triangular linear systems
##### National Category

Computer Sciences
##### Identifiers

urn:nbn:se:umu:diva-154297 (URN)10.1002/cpe.5064 (DOI)2-s2.0-85056329436 (Scopus ID)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt359",{id:"formSmash:j_idt184:1:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt365",{id:"formSmash:j_idt184:1:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_1_j_idt188_j_idt371",{id:"formSmash:j_idt184:1:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_1_j_idt188_j_idt371",multiple:true});
#####

Available from: 2018-12-14 Created: 2018-12-14 Last updated: 2019-05-28

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.

Open this publication in new window or tab >>Negative stride in the column-major format makes sense and has useful applications### Karlsson, Lars

### Kjelgaard Mikkelsen, Carl Christian

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_some",{id:"formSmash:j_idt184:2:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_otherAuthors",{id:"formSmash:j_idt184:2:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_otherAuthors",multiple:true}); 2017 (English)Report (Other academic)
##### Abstract [en]

##### Publisher

p. 4
##### Series

Report / UMINF, ISSN 0348-0542 ; 17.17
##### National Category

Other Computer and Information Science
##### Identifiers

urn:nbn:se:umu:diva-154646 (URN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt359",{id:"formSmash:j_idt184:2:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt365",{id:"formSmash:j_idt184:2:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_2_j_idt188_j_idt371",{id:"formSmash:j_idt184:2:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_2_j_idt188_j_idt371",multiple:true});
#####

Available from: 2018-12-21 Created: 2018-12-21 Last updated: 2018-12-21

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Two lower triangular or two upper triangular matrices of the same size can be stored with minimal memory footprint. If both positive and negative strides are used, then both matrices can be accessed as if they were stored in regular column-major format.

Open this publication in new window or tab >>Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints### Kjelgaard Mikkelsen, Carl Christian

### Alastruey-Benede, Jesus

### Ibanez-Marin, Pablo

### Garcia Risueno, Pablo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_some",{id:"formSmash:j_idt184:3:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_otherAuthors",{id:"formSmash:j_idt184:3:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_otherAuthors",multiple:true}); 2016 (English)In: Parallel Processing and Applied Mathematics, PPAM 2015, Part I / [ed] Wyrzykowski, R Deelman, E Dongarra, J Karczewski, K Kitowski, J Wiatr, K, Cham: Springer International Publishing Switzerland , 2016, p. 160-171Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Cham: Springer International Publishing Switzerland, 2016
##### Series

Lecture Notes in Computer Science, ISSN 0302-9743 ; 9573
##### Keywords

Newton's method, Non-linear equations, Molecular dynamics, Constraints, SHAKE, RATTLE, LINCS, mpiled code approach, Vector level parallelism, Vectorizing compiler, SIMD
##### National Category

Computer Engineering
##### Identifiers

urn:nbn:se:umu:diva-135304 (URN)10.1007/978-3-319-32149-3_16 (DOI)000400134500016 ()978-3-319-32149-3 (ISBN)978-3-319-32148-6 (ISBN)
##### Conference

11th International Conference on Parallel Processing and Applied Mathematics (PPAM), SEP 06-09, 2015, Krakow, POLAND
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt359",{id:"formSmash:j_idt184:3:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt365",{id:"formSmash:j_idt184:3:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_3_j_idt188_j_idt371",{id:"formSmash:j_idt184:3:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_3_j_idt188_j_idt371",multiple:true});
#####

Available from: 2017-05-24 Created: 2017-05-24 Last updated: 2018-06-09Bibliographically approved

Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N). Umeå University, Faculty of Science and Technology, Department of Computing Science.

Molecular dynamics is used to study the time evolution of systems of atoms. It is common to constrain bond lengths in order to increase the time step of the simulation. Here we accelerate Newton's method for solving the constraint equations for a system consisting of many identical small molecules. Starting with a modular and generic base code using a sequential data layout, we apply three different optimization techniques. The compiled code approach is used to generate subroutines equivalent to a single step of Newton's method for a user specified molecule. Differing from the generic subroutines, these specific routines contain no loops and no indirect addressing. Interleaving the data describing different molecules generates vectorizable loops. Finally, we apply task fusion. The simultaneous application of all three techniques increases the speed of the base code by a factor of 15 for single precision calculations.

Open this publication in new window or tab >>Improving Perfect Parallelism### Karlsson, Lars

### Kjelgaard Mikkelsen, Carl Christian

### Kågström, Bo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_some",{id:"formSmash:j_idt184:4:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_otherAuthors",{id:"formSmash:j_idt184:4:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_otherAuthors",multiple:true}); 2014 (English)In: Parallel Processing and Applied Mathematics: 10th International Conference, PPAM 2013, Warsaw, Poland, September 8-11, 2013, Revised Selected Papers, Part I / [ed] Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Waśniewski, Springer Berlin/Heidelberg, 2014, Vol. 8384, p. 76-85Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer Berlin/Heidelberg, 2014
##### Series

Lecture Notes in Computer Science, ISSN 0302-9743
##### Keywords

Perfectly parallel problem, Resource contention, Forced parallelization
##### National Category

Computer Sciences
##### Identifiers

urn:nbn:se:umu:diva-100794 (URN)10.1007/978-3-642-55224-3_8 (DOI)000349159200008 ()978-3-642-55224-3 (ISBN)978-3-642-55223-6 (ISBN)
##### Conference

10th International Conference on Parallel Processing and Applied Mathematics (PPAM), Warsaw, POLAND, SEP 08-11, 2013
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt359",{id:"formSmash:j_idt184:4:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt365",{id:"formSmash:j_idt184:4:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_4_j_idt188_j_idt371",{id:"formSmash:j_idt184:4:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_4_j_idt188_j_idt371",multiple:true});
#####

Available from: 2015-04-01 Created: 2015-03-09 Last updated: 2019-06-26Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

We reconsider the familiar problem of executing a perfectly parallel workload consisting of N independent tasks on a parallel computer with P << N processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix-vector multiplications, and sorting integer keys.

Open this publication in new window or tab >>Approximate incomplete cyclic reduction for systems which are tridiagonal and strictly diagonally dominant by rows### Kjelgaard Mikkelsen, Carl Christian

### Kågström, Bo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_some",{id:"formSmash:j_idt184:5:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_otherAuthors",{id:"formSmash:j_idt184:5:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_otherAuthors",multiple:true}); 2013 (English)In: Applied Parallel and Scientific Computing: 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected Papers / [ed] Pekka Manninen and Per Öster, Springer Berlin/Heidelberg, 2013, p. 250-264Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer Berlin/Heidelberg, 2013
##### Series

Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 7782
##### Keywords

approximate incomplete cyclic reduction, Narrow banded, strictly and evenly diagonally dominant linear systems
##### National Category

Discrete Mathematics Computer and Information Sciences
##### Identifiers

urn:nbn:se:umu:diva-83245 (URN)10.1007/978-3-642-36803-5_18 (DOI)000343867800018 ()978-3-642-36802-8 (ISBN)
##### Conference

11th International Conference on Applied Parallel and Scientific Computing, PARA 2012, Helsinki, Finland, June 10-13, 2012,
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt359",{id:"formSmash:j_idt184:5:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt365",{id:"formSmash:j_idt184:5:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_5_j_idt188_j_idt371",{id:"formSmash:j_idt184:5:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_5_j_idt188_j_idt371",multiple:true});
#####

Available from: 2013-11-21 Created: 2013-11-21 Last updated: 2019-05-09Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).

Systems which are narrow banded and strictly diagonally dominant by rows can be solved in parallel using a variety of methods including incomplete block cyclic reduction. We show how to accelerate the algorithm by approximating the very first step. We derive tight estimates for the forward error and explain why our procedure is suitable for linear systems obtained by discretizing some common parabolic PDEs. An improved ScaLAPACK style algorithm is presented together with strong scalability results.

Open this publication in new window or tab >>Incomplete cyclic reduction of banded and strictly diagonally dominant linear systems### Kjelgaard Mikkelsen, Carl Christian

### Kågström, Bo

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_some",{id:"formSmash:j_idt184:6:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_otherAuthors",{id:"formSmash:j_idt184:6:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_otherAuthors",multiple:true}); 2012 (English)In: PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I / [ed] Wyrzykowski, R., Springer, 2012, p. 80-91Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer, 2012
##### Series

Lecture Notes in Computer Science, ISSN 0302-9743 ; 7203
##### Keywords

Banded or block tridiagonal linear systems, strict diagonal dominance, incomplete cyclic reduction, ScaLAPACK
##### National Category

Computer Sciences
##### Research subject

Computing Science
##### Identifiers

urn:nbn:se:umu:diva-50786 (URN)10.1007/978-3-642-31464-3_9 (DOI)000313192800009 ()978-3-642-31463-6 (ISBN)
##### Conference

9th International Conference on Parallel Processing and Applied Mathematics, SEP 11-14, 2011, Torun, POLAND
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_j_idt359",{id:"formSmash:j_idt184:6:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_j_idt365",{id:"formSmash:j_idt184:6:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_6_j_idt188_j_idt371",{id:"formSmash:j_idt184:6:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_6_j_idt188_j_idt371",multiple:true});
#####

Available from: 2012-02-20 Created: 2011-12-21 Last updated: 2018-06-08Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Umeå University, Faculty of Science and Technology, Department of Computing Science.

The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller’s analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.

Open this publication in new window or tab >>Parallel solution of narrow banded diagonally dominant linear systems### Kjelgaard Mikkelsen, Carl Christian

### Kågström, Bo

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_7_j_idt188_some",{id:"formSmash:j_idt184:7:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_7_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_7_j_idt188_otherAuthors",{id:"formSmash:j_idt184:7:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_7_j_idt188_otherAuthors",multiple:true}); 2012 (English)In: Applied Parallel and Scientific Computing, Pt II / [ed] Kristján Jónasson, Springer Berlin/Heidelberg, 2012, Vol. 7134, p. 280-290Conference paper, Published paper (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer Berlin/Heidelberg, 2012
##### Series

Lecture notes in computer science, ISSN 0302-9743 ; 7134
##### Keywords

Narrow banded, diagonally dominant linear systems, block cyclic reduction, parallel algorithms, ScaLAPACK
##### National Category

Computer Sciences
##### Identifiers

urn:nbn:se:umu:diva-51059 (URN)10.1007/978-3-642-28145-7_28 (DOI)000309716000028 ()978-3-642-28144-0 (ISBN)978-3-642-28145-7 (ISBN)
##### Conference

10th Nordic International Conference on Applied Parallel Computing - State of the Art in Scientific and Parallel Computing (PARA), JUN 06-09, 2010, Univ Iceland, Sch Engn & Nat Sci, Reykjavik, ICELAND
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_7_j_idt188_j_idt359",{id:"formSmash:j_idt184:7:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_7_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_7_j_idt188_j_idt365",{id:"formSmash:j_idt184:7:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_7_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_7_j_idt188_j_idt371",{id:"formSmash:j_idt184:7:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_7_j_idt188_j_idt371",multiple:true});
#####

Available from: 2012-02-16 Created: 2012-01-09 Last updated: 2018-06-08Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).

ScaLAPACK contains a pair of routines for solving systems which are narrow banded and diagonally dominant by rows. Mathematically, the algorithm is block cyclic reduction. The ScaLAPACK implementation can be improved using incomplete, rather than complete block cyclic reduction. If the matrix is strictly dominant by rows, then the truncation error can be bounded directly in terms of the dominance factor and the size of the partitions. Our analysis includes new results applicable in our ongoing work of developing an efficient parallel solver.

Open this publication in new window or tab >>The explicit Spike algorithm: Iterative solution of the reduced system### Kjelgaard Mikkelsen, Carl Christian

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_8_j_idt188_some",{id:"formSmash:j_idt184:8:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_8_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_8_j_idt188_otherAuthors",{id:"formSmash:j_idt184:8:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_8_j_idt188_otherAuthors",multiple:true}); 2012 (English)In: High-performance scientific computing: algorithms and applications / [ed] Berry, M.W.; Gallivan, K.A.; Gallopoulos, E.; Grama, A.; Philippe, B.; Saad, Y.; Saied, F., London: Springer, 2012, p. 147-156Chapter in book (Refereed)
##### Abstract [en]

##### Place, publisher, year, edition, pages

London: Springer, 2012
##### Keywords

Narrow banded and diagonally dominant linear systems
##### National Category

Computer Sciences
##### Research subject

Computing Science
##### Identifiers

urn:nbn:se:umu:diva-50785 (URN)978-1-4471-2436-8 (ISBN)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_8_j_idt188_j_idt359",{id:"formSmash:j_idt184:8:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_8_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_8_j_idt188_j_idt365",{id:"formSmash:j_idt184:8:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_8_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_8_j_idt188_j_idt371",{id:"formSmash:j_idt184:8:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_8_j_idt188_j_idt371",multiple:true});
#####

Available from: 2012-02-16 Created: 2011-12-21 Last updated: 2018-06-08Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science.

The explicit Spike algorithm applies to narrow banded linear systems which are strictly diagonally dominant by rows. The parallel bottleneck is the solution of the so-called reduced system which is block tridiagonal and strictly diagonally dominant by rows. The reduced system can be solved iteratively using the truncated reduced system matrix as a preconditioner. In this paper we derive a tight estimate for the quality of this preconditioner.

Open this publication in new window or tab >>Retracing the residual curve of a Lyapunov equation solver### Kjelgaard Mikkelsen, Carl Christian

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_9_j_idt188_some",{id:"formSmash:j_idt184:9:j_idt188:some",widgetVar:"widget_formSmash_j_idt184_9_j_idt188_some",multiple:true}); PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_9_j_idt188_otherAuthors",{id:"formSmash:j_idt184:9:j_idt188:otherAuthors",widgetVar:"widget_formSmash_j_idt184_9_j_idt188_otherAuthors",multiple:true}); 2011 (English)In: BIT Numerical Mathematics, ISSN 0006-3835, E-ISSN 1572-9125, Vol. 51, no 4, p. 959-975Article in journal (Refereed) Published
##### Abstract [en]

##### Place, publisher, year, edition, pages

Springer, 2011
##### Keywords

Lyapunov matrix equations, the extended Krylov subspace method
##### National Category

Computational Mathematics
##### Research subject

Mathematics
##### Identifiers

urn:nbn:se:umu:diva-50738 (URN)10.1007/s10543-011-0323-7 (DOI)
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_9_j_idt188_j_idt359",{id:"formSmash:j_idt184:9:j_idt188:j_idt359",widgetVar:"widget_formSmash_j_idt184_9_j_idt188_j_idt359",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_9_j_idt188_j_idt365",{id:"formSmash:j_idt184:9:j_idt188:j_idt365",widgetVar:"widget_formSmash_j_idt184_9_j_idt188_j_idt365",multiple:true});
#####

PrimeFaces.cw("AccordionPanel","widget_formSmash_j_idt184_9_j_idt188_j_idt371",{id:"formSmash:j_idt184:9:j_idt188:j_idt371",widgetVar:"widget_formSmash_j_idt184_9_j_idt188_j_idt371",multiple:true});
#####

Available from: 2012-02-15 Created: 2011-12-20 Last updated: 2018-06-08Bibliographically approved

Umeå University, Faculty of Science and Technology, Department of Computing Science.

Let A ∈ Rn×n and let B ∈ Rn×p and consider the Lyapunov matrix equation AX + XAT + BBT = 0. If A + AT < 0, then the extended Krylov subspacemethod (EKSM) can be used to compute a sequence of low rank approximations of X. In this paper we show how to construct a symmetric negative definite matrix A and a column vector B, for which the EKSM generates a predetermined residual curve.