umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Designing a Speech Recognition-Named Entity Recognition Pipeline for Amharic within a Cross-Media Analysis Framework
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)
(English)Manuscript (preprint) (Other academic)
Abstract [en]

One of the major challenges that are inherently associated with cross-media analysis frameworks, is effectively addressing multilingual issues. As a result, many languages remain under-resourced and fail to leverage out of available media analysis solutions. Although spoken by over 22 million peoples and there is an ever-increasing amount of Amharic digital contents of various types on the web, querying them, especially audio and video contents, with a simple key-words search, is very hard as they exist in raw format. We introduce a textual and spo- ken content processing workflow into a cross-media analysis framework for Amharic. We design an automatic speech recognition(ASR)-named entity recognition pipeline that includes three main components: ASR, transliterator, and NER. We explored and applied three different modeling techniques used for speech signal analysis, namely Gaussian Mixture Models (GMM), Deep Neural Networks (DNN) and the Subspace Gaussian Mixture Models (SGMM). The models have been evaluated with the same test set with 6203 words using the Word Error Rate (WER) metric and obtained an accuracy of 50.88%, 38.72%, and 46.25% GMM, DNN, SGMM respectively. Also, the OpenNLP-based NER model has been developed, though trained on a very limited data. While the NER model has been trained with the transliterated form of the Amharic text, the ASR is trained with the actual Amharic script. Thus, for interfacing between ASR and NER, we implemented a simple rule-based transliteration program that converts an Amharic script to its corresponding English transliteration form. 

Keywords [en]
Cross-media analysis, named entity recognition, Amharic, speech recognition
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-147639OAI: oai:DiVA.org:umu-147639DiVA, id: diva2:1205118
Available from: 2018-05-10 Created: 2018-05-10 Last updated: 2018-06-09
In thesis
1. Natural language processing in cross-media analysis
Open this publication in new window or tab >>Natural language processing in cross-media analysis
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition.

Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks.

In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages. 

Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå University, 2018. p. 22
Series
Report / UMINF, ISSN 0348-0542 ; 18.06
Keywords
NLP, cross-media analysis, sentiment analysis, competence analysis, speech recognition, Amharic, named entity recognition, machine learning, computational linguistics
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-147640 (URN)978-91-7601-885-9 (ISBN)
Presentation
2018-06-07, N450, Umeå University, Umeå, 15:00 (English)
Opponent
Supervisors
Available from: 2018-05-30 Created: 2018-05-10 Last updated: 2018-06-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records BETA

Woldemariam, YonasDahlgren, Adam

Search in DiVA

By author/editor
Woldemariam, YonasDahlgren, Adam
By organisation
Department of Computing Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 33 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf