Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms
2014 (Engelska)Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, nr 7, s. 271-288Artikel i tidskrift (Refereegranskat) Published
Abstract [en]
The bulge-chasing kernel in the small-bulge multi-shift QR algorithm for the non-symmetric dense eigenvalue problem becomes a sequential bottleneck when the QR algorithm is run in parallel on a multicore platform with shared memory. The duration of each kernel invocation is short, but the critical path of the QR algorithm contains a long sequence of calls to the bulge-chasing kernel. We study the problem of parallelizing the bulge-chasing kernel itself across a handful of processor cores in order to reduce the execution time of the critical path. We propose and evaluate a sequence of four algorithms with varying degrees of complexity and verify that a pipelined algorithm with a slowly shifting block column distribution of the Hessenberg matrix is superior. The load-balancing problem is non-trivial and computational experiments show that the load-balancing scheme has a large impact on the overall performance. We propose two heuristics for the load-balancing problem and also an effective optimization method based on local search. Numerical experiments show that speed-ups are obtained for problems as small as 40-by-40 on two different multicore architectures.
Ort, förlag, år, upplaga, sidor
Elsevier, 2014. nr 7, s. 271-288
Nyckelord [en]
Fine-grained parallelism, Scalability, Load-balancing, Load-balance optimization, Auto-tuning
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
administrativ databehandling
Identifikatorer
URN: urn:nbn:se:umu:diva-79742DOI: 10.1016/j.parco.2014.04.003ISI: 000339598400010Scopus ID: 2-s2.0-84903385019OAI: oai:DiVA.org:umu-79742DiVA, id: diva2:644471
Konferens
7th International Workshop on Parallel Matrix Algorithms and Applications, London, June 28-30, 2012
Anmärkning
Volume: 40 Issue: 7 Pages: 271-288 Special Issue: SI
2013-08-302013-08-302023-03-24Bibliografiskt granskad