MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networksShow others and affiliations
2022 (English)In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 133, article id 102778Article in journal (Refereed) Published
Abstract [en]
The compression of deep learning models is of fundamental importance in deploying such models to edge devices. The selection of compression parameters can be automated to meet changes in the hardware platform and application. This article introduces a Multi-Objective Hardware-Aware Quantization (MOHAQ) method, which considers hardware performance and inference error as objectives for mixed-precision quantization. The proposed method feasibly evaluates candidate solutions in a large search space by relying on two steps. First, post-training quantization is applied for fast solution evaluation (inference-only search). Second, we propose the ”beacon-based search” to retrain selected solutions only and use them as beacons to estimate the effect of retraining on other solutions. We use speech recognition models on TIMIT dataset. Experimental evaluations show that Simple Recurrent Unit (SRU)-based models can be compressed up to 8x by post-training quantization without any significant error increase. On SiLago, we found solutions that achieve 97% and 86% of the maximum possible speedup and energy saving, with a minor increase in error on an SRU-based model. On Bitfusion, the beacon-based search reduced the error gain of the inference-only search on SRU-based models and Light Gated Recurrent Unit (LiGRU)-based model by up to 4.9 and 3.9 percentage points, respectively.
Place, publisher, year, edition, pages
Elsevier, 2022. Vol. 133, article id 102778
Keywords [en]
Genetic algorithms, Light gated recurrent unit, Multi-objective optimization, Quantization, Simple recurrent unit
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:umu:diva-201360DOI: 10.1016/j.sysarc.2022.102778ISI: 000892114100006Scopus ID: 2-s2.0-85141919627OAI: oai:DiVA.org:umu-201360DiVA, id: diva2:1716556
Funder
Vinnova, 2018-050012022-12-062022-12-062023-09-05Bibliographically approved