umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Natural language processing in cross-media analysis
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition.

Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks.

In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages. 

Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå University , 2018. , p. 22
Series
Report / UMINF, ISSN 0348-0542 ; 18.06
Keywords [en]
NLP, cross-media analysis, sentiment analysis, competence analysis, speech recognition, Amharic, named entity recognition, machine learning, computational linguistics
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-147640ISBN: 978-91-7601-885-9 (print)OAI: oai:DiVA.org:umu-147640DiVA, id: diva2:1205120
Presentation
2018-06-07, N450, Umeå University, Umeå, 15:00 (English)
Opponent
Supervisors
Available from: 2018-05-30 Created: 2018-05-10 Last updated: 2018-06-09Bibliographically approved
List of papers
1. Sentiment Analysis in A Cross-Media Analysis Framework
Open this publication in new window or tab >>Sentiment Analysis in A Cross-Media Analysis Framework
2016 (English)In: PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, p. 27-31Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces the implementation and integration of a sentiment analysis pipeline into the ongoing open source cross-media analysis framework. The pipeline includes the following components; chat room cleaner, NLP and sentiment analyzer. Before the integration, we also compare two broad categories of sentiment analysis methods, namely lexicon-based and machine learning approaches. We mainly focus on finding out which method is appropriate to detect sentiments from forum discussion posts. In order to conduct our experiments, we use the apache-hadoop framework with its lexicon-based sentiment prediction algorithm and Stanford coreNLP library with the Recursive Neural Tensor Network (RNTN) model. The lexicon-based uses sentiment dictionary containing words annotated with sentiment labels and other basic lexical features, and the later one is trained on Sentiment Treebank with 215,154 phrases, labeled using Amazon Turk. Our overall performance evaluation shows that RNTN outperforms the lexicon-based by 9.88% accuracy on variable length positive, negative, and neutral comments. How-ever, the lexicon-based shows better performance on classifying positive comments. We also found out that the F1-score values of the Lexicon-based is greater by 0.16 from the RNTN.

Keywords
sentiment analysis, cross-media, machine learning algorithm, lexicon-based, neural network
National Category
Information Systems
Identifiers
urn:nbn:se:umu:diva-130264 (URN)10.1109/ICBDA.2016.7509790 (DOI)000390299100006 ()978-1-4673-9591-5 (ISBN)
Conference
IEEE International Conference on Big Data Analysis (ICBDA), MAR 12-14, 2016, Hangzhou, PEOPLES R CHINA
Available from: 2017-01-14 Created: 2017-01-14 Last updated: 2018-06-09Bibliographically approved
2. Predicting User Competence from Text
Open this publication in new window or tab >>Predicting User Competence from Text
2017 (English)In: The 21st world multi-conference on systemics, cybernetics and informatics: proceedings : volume 1 / [ed] Nagib Callaos, Belkis Sánches, Michael Savoie, Andrés Tremante, International Institute of Informatics and Systemics, 2017, p. 147-152Conference paper, Published paper (Refereed)
Abstract [en]

We explore the possibility of learning user competence from a text by using natural language processing and machine learning (ML) methods. In our context, competence is defined as the ability to identify the wildlife appearing in images and classifying into species correctly. We evaluate and compare the performance (regarding accuracy and F-measure) of the three ML methods, Naive Bayes (NB), Decision Trees (DT) and K-nearest neighbors (KNN), applied to the text corpus obtained from the Snapshot Senrengeti discussion forum posts. The baseline results show, that regarding accuracy, DT outperforms NB and KNN by 16.00%, and 15.00% respectively. Regarding F-measure, K-NN outperforms NB and DT by 12.08% and 1.17%, respectively. We also propose a hybrid model that combines the three models (DT, NB and KNN). We improve the baseline results with the calibration technique and additional features. Adding a bi-gram feature has shown a dramatic increase (from 48.38% to 64.40%) of accuracy for NB model. We achieved to push the accuracy limit in the baseline models from 93.39% to 94.09%

Place, publisher, year, edition, pages
International Institute of Informatics and Systemics, 2017
Keywords
text analysis, NLP, machine-learning, naive bayes, decision trees, K-nearest neighbors
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-138291 (URN)978-1-941763-59-9 (ISBN)
Conference
21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017), Orlando, Florida, USA, July 8-11, 2017
Available from: 2017-08-17 Created: 2017-08-17 Last updated: 2018-06-09Bibliographically approved
3. Predicting User Competence from Linguistic Data
Open this publication in new window or tab >>Predicting User Competence from Linguistic Data
2017 (English)In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017) / [ed] Sivaji Bandyopadhyay, Jadavpur University , 2017, p. 476-484Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the problem of predicting the competence of users of the crowd-sourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well. 

Place, publisher, year, edition, pages
Jadavpur University, 2017
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-146185 (URN)
Conference
14th International Conference on Natural Language Processing (ICON-2017)
Available from: 2018-04-03 Created: 2018-04-03 Last updated: 2018-06-09Bibliographically approved
4. Designing a Speech Recognition-Named Entity Recognition Pipeline for Amharic within a Cross-Media Analysis Framework
Open this publication in new window or tab >>Designing a Speech Recognition-Named Entity Recognition Pipeline for Amharic within a Cross-Media Analysis Framework
(English)Manuscript (preprint) (Other academic)
Abstract [en]

One of the major challenges that are inherently associated with cross-media analysis frameworks, is effectively addressing multilingual issues. As a result, many languages remain under-resourced and fail to leverage out of available media analysis solutions. Although spoken by over 22 million peoples and there is an ever-increasing amount of Amharic digital contents of various types on the web, querying them, especially audio and video contents, with a simple key-words search, is very hard as they exist in raw format. We introduce a textual and spo- ken content processing workflow into a cross-media analysis framework for Amharic. We design an automatic speech recognition(ASR)-named entity recognition pipeline that includes three main components: ASR, transliterator, and NER. We explored and applied three different modeling techniques used for speech signal analysis, namely Gaussian Mixture Models (GMM), Deep Neural Networks (DNN) and the Subspace Gaussian Mixture Models (SGMM). The models have been evaluated with the same test set with 6203 words using the Word Error Rate (WER) metric and obtained an accuracy of 50.88%, 38.72%, and 46.25% GMM, DNN, SGMM respectively. Also, the OpenNLP-based NER model has been developed, though trained on a very limited data. While the NER model has been trained with the transliterated form of the Amharic text, the ASR is trained with the actual Amharic script. Thus, for interfacing between ASR and NER, we implemented a simple rule-based transliteration program that converts an Amharic script to its corresponding English transliteration form. 

Keywords
Cross-media analysis, named entity recognition, Amharic, speech recognition
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-147639 (URN)
Available from: 2018-05-10 Created: 2018-05-10 Last updated: 2018-06-09

Open Access in DiVA

fulltext(817 kB)27 downloads
File information
File name ATTACHMENT01.pdfFile size 817 kBChecksum SHA-512
5c10416334ea5d67aaafc6e6db21b174321f01602b0b11df4a447613c23cfb034a09edefed7fa09ca090fff0271767b71ccb65f15abce63c428dbd3b9c91ea7c
Type fulltextMimetype application/pdf

Authority records BETA

Woldemariam, Yonas Demeke

Search in DiVA

By author/editor
Woldemariam, Yonas Demeke
By organisation
Department of Computing Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 0 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 141 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf