Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Live captioning and translation application for Android
Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
2023 (English)Independent thesis Basic level (professional degree), 10 credits / 15 HE creditsStudent thesisAlternative title
Realtidsundertextning och översättning, en applikation för Android (Swedish)
Abstract [en]

Captioning has long been used in media to help D/deaf and hard-of-hearing persons. Captioning however is difficult and time-consuming manual work. With the rapid evolution of automated speech recognition (ASR) systems, live captioning of everyday speech will soon be a practical reality. A proof of concept Android application for use with a specific headset has been created using the built-in Android SpeechRecognizer, a free open-source API (application programming interface) available for Android phones. This application unlike many existing solutions focuses on two major features, communication with in-situ microphones and hardware via bluetooth and long-duration speech recognition. Long-duration speech recognition was made possible using the segmented session mode of the SpeechRecognizer which was recently added in API version 33 (March2023). The results while not complete show promise for future development. Some initial testing shows a word error rate (WER) of 8% but further testing is required. Tests with noise also show that the system is surprisingly resistant to static noise. The application shows promise and development will continue in the coming weeks. This project was financed by Hörselforskningsfonden in project FA21-0017 and was performed under the supervision of Amin Saremi. 

Abstract [sv]

Undertexter har länge använts i media som hjälp för hörselskadade och döva personer. Skrivandet av dessa undertexter kräver tyvärr mycket svårt och manuellt arbete. Den snabba utvecklingen av automatiska taligenkänningssystem (ASR) pekar dock på en framtid där undertexter kan genereras i realtid för vardagliga situationer. En prototyp av en Androidapplikation har skapats för en särskild hörlur. Applikationen använder Androids inbyggda SpeechRecognizer-system, ett gratis API (applikationsprogrammeringsgränssnitt) med öppen källkod tillgängligt för Android-telefoner. Till skillnad från många existerande lösningar så fokuserar denna applikation på två specifika huvudområden: in-situ mikrofoner och hårdvara via bluetooth samt långvarig röstigenkänning. Långvarig röstigenkänning möjliggjordes segmenterade sessioner is SpeechRecognizer-systemet som lades till i API version 33 (mars 2023). Resultaten är inte kompletta me när lovande för vidare utveckling. Enkel preliminär testning visar på ordfelsfrekvens (WER) på ungefär 8% men vidare testning behövs i framtiden. Ytterligare test med bakgrundsljud visar även att systemet är förvånandsvärt resistant mot statiskt brus. Applikationen är lovande och kommer att fortsätta utvecklas under de kommande veckorna. Detta projekt finansierades av Hörselforskningsfonden i projekt FA21-0017 och utfördes under uppsikt av Amin Saremi.

Place, publisher, year, edition, pages
2023. , p. 24
National Category
Computer Engineering Communication Systems Computer Systems
Identifiers
URN: urn:nbn:se:umu:diva-213997OAI: oai:DiVA.org:umu-213997DiVA, id: diva2:1793675
Subject / course
Elektronik
Educational program
Bachelor of Science Programme in Electronic and Computer Engeneering/ Medical engineering
Presentation
2023-06-01, TA304, Umeå, 14:45 (Swedish)
Supervisors
Examiners
Available from: 2023-09-01 Created: 2023-09-01 Last updated: 2023-09-01Bibliographically approved

Open Access in DiVA

fulltext(41033 kB)650 downloads
File information
File name FULLTEXT01.pdfFile size 41033 kBChecksum SHA-512
0b374234b18f8d5be008c333466b245e20371c50ae789be3a25babc26f9cd30e8e4941b8955f53b4036e7a379b59411c9124a9a57136e22f3e4fe9c024f8a599
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hansson, Joel
By organisation
Department of Applied Physics and Electronics
Computer EngineeringCommunication SystemsComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 651 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 312 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf