umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting User Competence from Text
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Formal and Natural Language)
2017 (English)In: The 21st world multi-conference on systemics, cybernetics and informatics: proceedings : volume 1 / [ed] Nagib Callaos, Belkis Sánches, Michael Savoie, Andrés Tremante, International Institute of Informatics and Systemics, 2017, p. 147-152Conference paper, Published paper (Refereed)
Abstract [en]

We explore the possibility of learning user competence from a text by using natural language processing and machine learning (ML) methods. In our context, competence is defined as the ability to identify the wildlife appearing in images and classifying into species correctly. We evaluate and compare the performance (regarding accuracy and F-measure) of the three ML methods, Naive Bayes (NB), Decision Trees (DT) and K-nearest neighbors (KNN), applied to the text corpus obtained from the Snapshot Senrengeti discussion forum posts. The baseline results show, that regarding accuracy, DT outperforms NB and KNN by 16.00%, and 15.00% respectively. Regarding F-measure, K-NN outperforms NB and DT by 12.08% and 1.17%, respectively. We also propose a hybrid model that combines the three models (DT, NB and KNN). We improve the baseline results with the calibration technique and additional features. Adding a bi-gram feature has shown a dramatic increase (from 48.38% to 64.40%) of accuracy for NB model. We achieved to push the accuracy limit in the baseline models from 93.39% to 94.09%

Place, publisher, year, edition, pages
International Institute of Informatics and Systemics, 2017. p. 147-152
Keywords [en]
text analysis, NLP, machine-learning, naive bayes, decision trees, K-nearest neighbors
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-138291ISBN: 978-1-941763-59-9 (print)OAI: oai:DiVA.org:umu-138291DiVA, id: diva2:1133833
Conference
21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017), Orlando, Florida, USA, July 8-11, 2017
Available from: 2017-08-17 Created: 2017-08-17 Last updated: 2018-06-09Bibliographically approved
In thesis
1. Natural language processing in cross-media analysis
Open this publication in new window or tab >>Natural language processing in cross-media analysis
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition.

Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks.

In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages. 

Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå University, 2018. p. 22
Series
Report / UMINF, ISSN 0348-0542 ; 18.06
Keywords
NLP, cross-media analysis, sentiment analysis, competence analysis, speech recognition, Amharic, named entity recognition, machine learning, computational linguistics
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-147640 (URN)978-91-7601-885-9 (ISBN)
Presentation
2018-06-07, N450, Umeå University, Umeå, 15:00 (English)
Opponent
Supervisors
Available from: 2018-05-30 Created: 2018-05-10 Last updated: 2018-06-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

URL

Authority records BETA

Woldemariam, Yonas

Search in DiVA

By author/editor
Woldemariam, Yonas
By organisation
Department of Computing Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 87 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf