umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting User Competence from Linguistic Data
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2017 (English)In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), 2017, p. 476-484Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the problem of predicting the competence of users of the crowdsourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well.

Place, publisher, year, edition, pages
2017. p. 476-484
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:umu:diva-144304OAI: oai:DiVA.org:umu-144304DiVA, id: diva2:1178864
Conference
14th International Conference on Natural Language Processing (ICON-2017)
Available from: 2018-01-30 Created: 2018-01-30 Last updated: 2018-06-09

Open Access in DiVA

No full text in DiVA

Authority records BETA

Bensch, Suna

Search in DiVA

By author/editor
Bensch, Suna
By organisation
Department of Computing Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 119 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf