umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
That voice sounds familiar: factors in speaker recognition
Umeå University, Faculty of Arts, Philosophy and Linguistics.
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Humans have the ability to recognize other humans by voice alone. This is important both socially and for the robustness of speech perception. This Thesis contains a set of eight studies that investigates how different factors impact on speaker recognition and how these factors can help explain how listeners perceive and evaluate speaker identity. The first study is a review paper overviewing emotion decoding and encoding research. The second study compares the relative importance of the emotional tone in the voice and the emotional content of the message. A mismatch between these was shown to impact upon decoding speed. The third study investigates the factor dialect in speaker recognition and shows, using a bidialectal speaker as the target voice to control all other variables, that the dominance of dialect cannot be overcome. The fourth paper investigates if imitated stage dialects are as perceptually dominant as natural dialects. It was found that a professional actor could disguise his voice successfully by imitating a dialect, yet that a listener's proficiency in a language or accent can reduce susceptibility to a dialect imitation. Papers five to seven focus on automatic techniques for speaker separation. Paper five shows that a method developed for Australian English diphthongs produced comparable results with a Swedish glide + vowel transition. The sixth and seventh papers investigate a speaker separation technique developed for American English. It was found that the technique could be used to separate Swedish speakers and that it is robust against professional imitations. Paper eight investigates how age and hearing impact upon earwitness reliability. This study shows that a senior citizen with corrected hearing can be as reliable an earwitness as a younger adult with no hearing problem, but suggests that a witness' general cognitive skill deterioration needs to be considered when assessing a senior citizen's earwitness evidence. On the basis of the studies a model of speaker recognition is presented, based on the face recognition model by V. Bruce and Young (1986; British Journal of Psychology, 77, pp. 305 - 327) and the voice recognition model by Belin, Fecteau and Bédard (2004; TRENDS in Cognitive Science, 8, pp. 129 - 134). The merged and modified model handles both familiar and unfamiliar voices. The findings presented in this Thesis, in particular the findings of the individual papers in Part II, have implications for criminal cases in which speaker recognition forms a part. The findings feed directly into the growing body of forensic phonetic and forensic linguistic research.

Place, publisher, year, edition, pages
Umeå: Filosofi och lingvistik , 2007. , 160 p.
Keyword [en]
speaker recognition, accent, emotions, hearing, spectral moments, formant transitions, dialect
National Category
Human Computer Interaction
Identifiers
URN: urn:nbn:se:umu:diva-1106ISBN: 978-91-7264-311-6 (print)OAI: oai:DiVA.org:umu-1106DiVA: diva2:140217
Public defence
2007-05-24, Hörsal F, Humanisthuset, Umeå, 10:00
Opponent
Supervisors
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2013-04-09Bibliographically approved
List of papers
1. Emotions in speech: juristic implications
Open this publication in new window or tab >>Emotions in speech: juristic implications
2007 (English)In: Speaker Classification: Volume I, Berlin: Springer Verlag , 2007Chapter in book (Other academic)
Abstract [en]

This chapter focuses on the detection of emotion in speech and the impact that using technology to automate emotion detection would have within the legal system. The current states of the art for studies of perception and acoustics are described, and a number of implications for legal contexts are provided. We discuss, inter alia, assessment of emotion in others, witness credibility, forensic investigation, and training of law enforcement officers.

Place, publisher, year, edition, pages
Berlin: Springer Verlag, 2007
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 4343
Keyword
acoustic parameters, affect, emotion, emotional categories, forensic, juristic, speech
National Category
Specific Languages General Language Studies and Linguistics Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject
Computing Science; Linguistics; datorlingvistik; Psychology
Identifiers
urn:nbn:se:umu:diva-2277 (URN)10.1007/978-3-540-74200-5_8 (DOI)978-3-540-74186-2 (ISBN)
Projects
UDID - Umeå disguise and imitation database
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2011-03-02Bibliographically approved
2. Acoustic impact on decoding of semantic emotions
Open this publication in new window or tab >>Acoustic impact on decoding of semantic emotions
2007 (English)In: Speaker classification II: selected projects / [ed] Christian Müller, Berlin: Springer , 2007, 57-69 p.Chapter in book (Other academic)
Abstract [en]

This paper examines the interaction between the emotion indicated by the content of an utternance and the emotion indicated by the acoustic of an utterance, and considers whether a speaker can hide their emotional state by acting an emotion even though being semantically honest. Three female and two male speakers of Swedish were recorded saying the sentences “Jag har vunnit en miljon pa° lotto” (I have won a million on the lottery), “Det finns böcker i bokhyllan” (There are books on the bookshelf) and “Min mamma har just dött” (my mother just died) as if they were happy, neutral (indifferent), angry or sad. Thirty-nine experimental participants (19 female and 20 male) heard 60 randomly selected stimuli randomly coupled with the question “Do you consider this speaker to be emotionally X?”, where X could be angry, happy, neutral or sad. They were asked to respond yes or no; the listeners’ responses and reaction times were collected. The results show that semantic cues to emotion play little role in the decoding process. Only when there are few specific acoustic cues to an emotion do semantic cues come into play. However, longer reaction times for the stimuli containing mismatched acoustic and semantic cues indicate that the semantic cues to emotion are processed even if they impact little on the perceived emotion.

Place, publisher, year, edition, pages
Berlin: Springer, 2007
Series
Lecture notes in computer science, ISSN 0302-9743 ; 4441
Keyword
Emotion identification, acoustic emotion, semantic emotion, perception, Swedish
Identifiers
urn:nbn:se:umu:diva-2278 (URN)10.1007/978-3-540-74122-0 (DOI)978-3-540-74121-3 (ISBN)
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2013-04-09
3. On the perceptual dominance of dialect
Open this publication in new window or tab >>On the perceptual dominance of dialect
Show others...
Manuscript (Other academic)
Identifiers
urn:nbn:se:umu:diva-2279 (URN)
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2010-01-13Bibliographically approved
4. Dialect imitations in speaker recognition
Open this publication in new window or tab >>Dialect imitations in speaker recognition
2007 (English)In: Proceedings of the 2nd European IAFL conference on Forensic Linguistics / Language and the Law / [ed] Turell, M. Teresa; Spassova, Maria; Cicres, Jordi, Barcelona: Institut Universitari de Lingüística Aplicada. Universitat Pompeu Fabra; Documenta Universitaria , 2007Conference paper, Poster (with or without abstract) (Other academic)
Place, publisher, year, edition, pages
Barcelona: Institut Universitari de Lingüística Aplicada. Universitat Pompeu Fabra; Documenta Universitaria, 2007
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:umu:diva-2280 (URN)978-84-96742-28-4 (ISBN)
Conference
2nd European IAFL conference on Forensic Linguistics / Language and the Law, Barcelona, 14-16 September 2006
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2013-04-09
5. An investigation of the effectiveness of a Swedish glide + vowel segment for speaker discrimination
Open this publication in new window or tab >>An investigation of the effectiveness of a Swedish glide + vowel segment for speaker discrimination
Manuscript (Other academic)
Identifiers
urn:nbn:se:umu:diva-2281 (URN)
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2010-01-13Bibliographically approved
6. Cross-language speaker identification using spectral moments
Open this publication in new window or tab >>Cross-language speaker identification using spectral moments
Show others...
2004 (English)In: Proceedings of the XVIIth Swedish Phonetics Conference FONETIK 2004, Stockholm University, 2004, 76-79 p.Chapter in book (Other academic)
Place, publisher, year, edition, pages
Stockholm University, 2004
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:umu:diva-2282 (URN)
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2013-04-16Bibliographically approved
7. Robustness of spectral moments: a study using voice imitations
Open this publication in new window or tab >>Robustness of spectral moments: a study using voice imitations
Show others...
2004 In: Proceedings of the 10th Australian International Conference on Speech Science and Technology, 2004, 259-264 p.Chapter in book (Other academic) Published
Identifiers
urn:nbn:se:umu:diva-2283 (URN)
Available from: 2007-05-03 Created: 2007-05-03Bibliographically approved
8. Effects of age and age-related hearing loss on speaker recognition, or can senior citizens be reliable earwitnesses
Open this publication in new window or tab >>Effects of age and age-related hearing loss on speaker recognition, or can senior citizens be reliable earwitnesses
Manuscript (Other academic)
Identifiers
urn:nbn:se:umu:diva-2284 (URN)
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2013-04-09
9. Dialect recogntion in a noisy environment
Open this publication in new window or tab >>Dialect recogntion in a noisy environment
Manuscript (Other academic)
Identifiers
urn:nbn:se:umu:diva-2285 (URN)
Available from: 2007-05-03 Created: 2007-05-03 Last updated: 2010-01-13Bibliographically approved

Open Access in DiVA

fulltext(212 kB)1975 downloads
File information
File name FULLTEXT01.pdfFile size 212 kBChecksum SHA-1
9cd567d5413b0e88a45fca2dff7bc5cb963b093a441fe38baa0cbf0f91873f5941da26cd
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Eriksson, Erik J.
By organisation
Philosophy and Linguistics
Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar
Total: 1975 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3012 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf