umu.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
EVALUATION OF MACHINE LEARNING ALGORITHMS FOR SMS SPAM FILTERING
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The purpose of this thesis is to evaluate different machine learning algorithms and methods for text representation in order to determine what is best suited to use to distinguish between spam SMS and legitimate SMS. A data set that contains 5573 real SMS has been used to train the algorithms K-Nearest Neighbor, Support Vector Machine, Naive Bayes and Logistic Regression. The different methods that have been used to represent text are Bag of Words, Bigram and Word2Vec. In particular, it has been investigated if semantic text representations can improve the performance of classification. A total of 12 combinations have been evaluated with help of the metrics accuracy and F1-score.The results shows that Logistic Regression together with Bag of Words reach the highest accuracy and F1-score. Bigram as text representation seems to work worse then the others methods. Word2Vec can increase the performnce for K-Nearst Neigbor but not for the other algorithms.

Place, publisher, year, edition, pages
2019. , p. 34
Series
UMNAD ; 1184
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-163188OAI: oai:DiVA.org:umu-163188DiVA, id: diva2:1349907
External cooperation
Omegapoint
Educational program
Bachelor of Science Programme in Computing Science
Supervisors
Examiners
Available from: 2019-09-10 Created: 2019-09-10 Last updated: 2019-09-10Bibliographically approved

Open Access in DiVA

fulltext(551 kB)1 downloads
File information
File name FULLTEXT01.pdfFile size 551 kBChecksum SHA-512
90080c1e62ab3c321c7b3fa821ef3f8597a3fbb157c903e20c81b0a623dd673b9b761f6be1b35bac4e6736a4fcf71b6edac85552623bfe0bb2067bfb9e38a4d4
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 32 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf