umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting User Competence from Linguistic Data
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-4696-9787
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2017 (English)In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), NLP Association of India , 2017, p. 476-484Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the problem of predicting the competence of users of the crowdsourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well.

Place, publisher, year, edition, pages
NLP Association of India , 2017. p. 476-484
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:umu:diva-144304OAI: oai:DiVA.org:umu-144304DiVA, id: diva2:1178864
Conference
14th International Conference on Natural Language Processing (ICON-2017), Kolkata, India, December 18-21, 2017
Available from: 2018-01-30 Created: 2018-01-30 Last updated: 2019-06-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

URL

Authority records BETA

Woldemariam, Yonas DemekeBjörklund, HenrikBensch, Suna

Search in DiVA

By author/editor
Woldemariam, Yonas DemekeBjörklund, HenrikBensch, Suna
By organisation
Department of Computing Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 241 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf