umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sentiment Analysis in A Cross-Media Analysis Framework
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2016 (English)In: PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, p. 27-31Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces the implementation and integration of a sentiment analysis pipeline into the ongoing open source cross-media analysis framework. The pipeline includes the following components; chat room cleaner, NLP and sentiment analyzer. Before the integration, we also compare two broad categories of sentiment analysis methods, namely lexicon-based and machine learning approaches. We mainly focus on finding out which method is appropriate to detect sentiments from forum discussion posts. In order to conduct our experiments, we use the apache-hadoop framework with its lexicon-based sentiment prediction algorithm and Stanford coreNLP library with the Recursive Neural Tensor Network (RNTN) model. The lexicon-based uses sentiment dictionary containing words annotated with sentiment labels and other basic lexical features, and the later one is trained on Sentiment Treebank with 215,154 phrases, labeled using Amazon Turk. Our overall performance evaluation shows that RNTN outperforms the lexicon-based by 9.88% accuracy on variable length positive, negative, and neutral comments. How-ever, the lexicon-based shows better performance on classifying positive comments. We also found out that the F1-score values of the Lexicon-based is greater by 0.16 from the RNTN.

Place, publisher, year, edition, pages
2016. p. 27-31
Keywords [en]
sentiment analysis, cross-media, machine learning algorithm, lexicon-based, neural network
National Category
Information Systems
Identifiers
URN: urn:nbn:se:umu:diva-130264DOI: 10.1109/ICBDA.2016.7509790ISI: 000390299100006ISBN: 978-1-4673-9591-5 (print)OAI: oai:DiVA.org:umu-130264DiVA, id: diva2:1065296
Conference
IEEE International Conference on Big Data Analysis (ICBDA), MAR 12-14, 2016, Hangzhou, PEOPLES R CHINA
Available from: 2017-01-14 Created: 2017-01-14 Last updated: 2018-06-09Bibliographically approved
In thesis
1. Natural language processing in cross-media analysis
Open this publication in new window or tab >>Natural language processing in cross-media analysis
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition.

Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks.

In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages. 

Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå University, 2018. p. 22
Series
Report / UMINF, ISSN 0348-0542 ; 18.06
Keywords
NLP, cross-media analysis, sentiment analysis, competence analysis, speech recognition, Amharic, named entity recognition, machine learning, computational linguistics
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-147640 (URN)978-91-7601-885-9 (ISBN)
Presentation
2018-06-07, N450, Umeå University, Umeå, 15:00 (English)
Opponent
Supervisors
Available from: 2018-05-30 Created: 2018-05-10 Last updated: 2018-06-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Woldemariam, Yonas

Search in DiVA

By author/editor
Woldemariam, Yonas
By organisation
Department of Computing Science
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 150 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf