Efficient Reduction from Block Hessenberg Form to Hessenberg Form Using Shared Memory
2012 (English)In: Applied parallel and scientific computing: Part II, 2012, 258-268 p.Conference paper (Refereed)
A new cache-efficient algorithm for reduction from block Hessenberg form to Hessenberg form is presented and evaluated. The algorithm targets parallel computers with shared memory. One level of look-ahead in combination with a dynamic load-balancing scheme significantly reduces the idle time and allows the use of coarse-grained tasks. The coarse tasks lead to high-performance computations on each processor/core. Speedups close to 13 over the sequential unblocked algorithm have been observed on a dual quad-core machine using one thread per core.
Place, publisher, year, edition, pages
2012. 258-268 p.
, Lecture Notes in Computer Science, 7134
Hessenberg reduction, block Hessenberg form, parallel algorithm, dynamic load-balancing, blocked algorithm, high performance
IdentifiersURN: urn:nbn:se:umu:diva-61578ISI: 000309716000026ISBN: 978-3-642-28144-0OAI: oai:DiVA.org:umu-61578DiVA: diva2:572328
10th Nordic International Conference on Applied Parallel Computing - State of the Art in Scientific and Parallel Computing (PARA), JUN 06-09, 2010, Reykjavik, ICELAND