Blocked in-place transposition with application to storage format conversion
2009 (English)Report (Other academic)
We develop a prototype library for in-place (dense) matrix storage format conversion between the canonical row and column-major formats and the four canonical block data layouts. Many of the fastest linear algebra routines operate on matrices in a block data layout. In-place storage format conversion enables support for input/output of large matrices in the canonical row and column-major formats. The library uses algorithms associated with in-place transposition as building blocks. We investigate previous work on the subject of (in-place) transposition and the most promising algorithms are implemented and evaluated. Our results indicate that the Three-Stage Algorithm which only requires a small constant amount of additional memory performs well and is easy to tune. Murray Dow’s V5 algorithm, which is a two-stage semi-in-place algorithm that requires a small amount of additional memory is sometimes a better choice. The write-allocate strategy of most cache-based computer architectures appears to be the cause of an observed performance problem for large matrices.
Place, publisher, year, edition, pages
Umeå: Institutionen för datavetenskap, Umeå universitet , 2009. , 29 p.
Report / UMINF, ISSN 0348-0542 ; 09.01
IdentifiersURN: urn:nbn:se:umu:diva-41217OAI: oai:DiVA.org:umu-41217DiVA: diva2:405035