umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Identifying Hateful Text on Social Media with Machine Learning Classifiers and Normalization Methods - Using Support Vector Machines and Naive Bayes Algorithm
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Hateful content on social media is a growing problem. In this thesis, machine learning algorithms and pre-processing methods have been combined in order to train classifiers in identifying hateful text on social media. The combinations have been compared in terms of performance, where the considered performance criteria have been F-score and accuracy in classification. Training are performed using Naive Bayes algorithm(NB) and Support Vector Machines (SVM). The pre-processing techniques that have been used are tokenization and normalization. Fortokenization, an open-source unigram tokenizer have been used while a normalization model that normalizes each tweet pre-classification have been developed in Java. Normalization include basic clean up methods such as removing stop words, URLs, and punctuation, as well as altering methods such as emoticon conversion and spell checking. Both binary and multi-class versions of the classifiers have been used on balanced and unbalanced data.

Both machine learning algorithms perform on a reasonable level with accuracy between 76.70% and 93.55% and an F-score between 0.766 and 0.935. The results point towards the fact that the main purpose of normalization is to reduce noise, balancing data is necessary and that SVM seem to slightly outperform NB.

Place, publisher, year, edition, pages
2018. , p. 39
Series
UMNAD ; 1166
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-155353OAI: oai:DiVA.org:umu-155353DiVA, id: diva2:1278340
Educational program
Bachelor of Science Programme in Computing Science
Supervisors
Examiners
Available from: 2019-01-14 Created: 2019-01-14 Last updated: 2019-01-14Bibliographically approved

Open Access in DiVA

fulltext(222 kB)214 downloads
File information
File name FULLTEXT01.pdfFile size 222 kBChecksum SHA-512
5848ecfa26d985cab04ad712c33d01f1ff3f1c253bd45982a926cd265dda680505832df30144faecabf1a7a09285c7e2d63e1b20c6fd7b363d7936a268ba9f5c
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 214 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 445 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf