Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Vocal activity detection and speaker diarization in speech databases: a feasibility study
Umeå University, Faculty of Medicine, Department of Clinical Sciences, Speech and Language Therapy. Umeå University, Faculty of Arts, Humlab.ORCID iD: 0000-0003-3373-0934
2022 (English)Conference paper, Oral presentation only (Other academic)
Abstract [en]

The task of creating speech corpora for phonetic research is time-consuming and could be alleviated by automatic algorithms to provide draft indexing of speech acts. The present investigation assessed the feasibility of applying speech segmentation and speaker diarization models across a collection of recordings to produce a draft indexing that could be utilised by speech management systems to help the researcher to navigate a corpus. The results show that a readily available model for speech segmentation is very likely to contribute to the effectiveness of speech annotation workflows in phonetic research. Speaker diarization models may require specific training to manage consistent speaker separation across a speech corpus, and the evaluated model currently offers no clear advantage to the effectiveness of a speech corpus creation process.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022.
National Category
General Language Studies and Linguistics
Research subject
Linguistics; data science; computational linguistics
Identifiers
URN: urn:nbn:se:umu:diva-198239OAI: oai:DiVA.org:umu-198239DiVA, id: diva2:1684242
Conference
Fonetik 2022, the 33rd Swedish Phonetics Conference, Stockholm, Sweden, June 13-15, 2022
Available from: 2022-07-22 Created: 2022-07-22 Last updated: 2024-04-15Bibliographically approved

Open Access in DiVA

fulltext(369 kB)31 downloads
File information
File name FULLTEXT02.pdfFile size 369 kBChecksum SHA-512
213e14847d04999220323b72b196ed49697730a9332fc5efe4065bbf72e9988c99e417900e6bd7784fe114c9d2314d11b19e5a45ce3ace44f7f69fbeaafb038c
Type fulltextMimetype application/pdf

Authority records

Karlsson, Fredrik

Search in DiVA

By author/editor
Karlsson, Fredrik
By organisation
Speech and Language TherapyHumlab
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 31 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 571 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf