umu.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Effect of Feature Extraction when Classifying Emotions in Speech - An Applied Study
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The demand for machines that can interact with its users through speech is growing. For example, four of the world’s largest IT companies; Amazon, Apple, Google and Microsoft, are developing intelligent personal assistants who are able to communicate through speech. In this thesis, we have investigated the effect of feature extraction when classifying emotions in speech, using a convolutional neural network (CNN). We used the software openSMILE to extract two sets of features, and one set of raw data, from recorded audio, and compared the CNN’s classification accuracy of the sets with eight, five and three classes of emotions. We used one architecture of the CNN, to be fair when comparing each feature set, and implemented it using Keras. The CNN architecture was developed by an experimenta lapproach.The feature set that gave the highest accuracy managed to reach 39 % accuracy when classifying eight emotions (random guessing would yield around 12.5 % accuracy on average), 53 % with five emotions (compared to around 20 % if just guessing), and 69 % with threee motions (compared to around 33 % if just guessing). This set also performed best whendistinguishing emotions from each other.The result shows that using feature extraction improves the accuracy, but more features does not necessarily increase accuracy.While the classification accuracies in this study may seem low, it is important to remember that even for humans, it can be hard to distinguish different feelings based on just the pitch of other people’s voices.

Place, publisher, year, edition, pages
2018. , p. 30
Series
UMNAD ; 1174
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-156139OAI: oai:DiVA.org:umu-156139DiVA, id: diva2:1286114
External cooperation
Acino
Educational program
Bachelor of Science Programme in Computing Science
Supervisors
Examiners
Available from: 2019-02-06 Created: 2019-02-06 Last updated: 2019-02-06Bibliographically approved

Open Access in DiVA

fulltext(596 kB)4 downloads
File information
File name FULLTEXT01.pdfFile size 596 kBChecksum SHA-512
f955ff1974d682263174eaec0ac5a117d14f66923803a23052498a0abec5d3ce39055cff3c1db5e8a9867385e73c0b7ba29655d3bee134ef8087fdedc763d11e
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 4 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 34 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf