Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models
School of Information Technology, Halmstad Univeristy, Halmstad, Sweden.
Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.ORCID iD: 0000-0002-0562-2082
School of Information Technology, Halmstad Univeristy, Halmstad, Sweden.
2022 (English)In: Information, E-ISSN 2078-2489, Vol. 13, no 4, article id 176Article in journal (Refereed) Published
Abstract [en]

Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses. In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset. We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths. In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied. Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations. The proposed quantization method managed to com-press the four models more than 85%, with an error increase of 0.6, 0, 2.1, and 0.2 percentage points, respectively. By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error. Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints. The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations.

Place, publisher, year, edition, pages
MDPI, 2022. Vol. 13, no 4, article id 176
Keywords [en]
delta networks, edge devices, quantization, recurrent neural network
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:umu:diva-194339DOI: 10.3390/info13040176ISI: 000786262400001Scopus ID: 2-s2.0-85128393517OAI: oai:DiVA.org:umu-194339DiVA, id: diva2:1655922
Available from: 2022-05-04 Created: 2022-05-04 Last updated: 2022-05-04Bibliographically approved

Open Access in DiVA

fulltext(3750 kB)560 downloads
File information
File name FULLTEXT01.pdfFile size 3750 kBChecksum SHA-512
bce9d0975e45533ab159094fc74571c8007c9ca20a337e64be699874c62809111bbdb17883bbd62824a9cc521729b52b0320865c7869c2d9c030a35551e81559
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Nordström, Tomas

Search in DiVA

By author/editor
Nordström, Tomas
By organisation
Department of Applied Physics and Electronics
In the same journal
Information
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 561 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 351 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf