Umeå University's logo

umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Effectivisation of keywords extraction process: A supervised binary classification approach of scraped words from company websites
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
2023 (engelsk)Independent thesis Advanced level (professional degree), 20 poäng / 30 hpOppgave
Abstract [en]

In today’s digital era, establishing an online presence and maintaining a well-structured website is vitalfor companies to remain competitive in their respective markets. A crucial aspect of online success liesin strategically selecting the right words to optimize customer engagement and search engine visibility.However, this process is often time-consuming, involving extensive analysis of a company’s website aswell as its competitors’. This thesis focuses on developing an efficient binary classification approachto identify key words and phrases extracted from multiple company websites. The data set used forthis solution consists of approximately 92,000 scraped samples, primarily comprising non-key samples.Various features were extracted, and a word embedding model was employed to assess each sample’srelevance to its specific industry and topic. The logistic regression, decision tree and random forestalgorithms were all explored and implemented as different solutions to the classification problem. Theresults indicated that the logistic regression model excelled in retaining keywords but was less effectivein eliminating non-keywords. Conversely, the tree-based methods demonstrated superior classificationof keywords, albeit at the cost of misclassifying a few keywords. Overall, the random forest approachoutperformed the others, achieving a result of 76 percent in recall and 20 percent in precision whenpredicting key samples. In summary, this thesis presents a solution for classifying words and phrasesfrom company websites into key and non-key categories, and the developed methodology could offervaluable insights for companies seeking to enhance their website optimization strategies.

sted, utgiver, år, opplag, sider
2023. , s. 56
Emneord [en]
Machine learning, keyword classification, unbalanced data, word embedding
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-209805OAI: oai:DiVA.org:umu-209805DiVA, id: diva2:1767551
Eksternt samarbeid
Annalect
Utdanningsprogram
Master of Science in Engineering and Management
Veileder
Examiner
Tilgjengelig fra: 2023-06-14 Laget: 2023-06-14 Sist oppdatert: 2023-06-14bibliografisk kontrollert

Open Access i DiVA

fulltext(686 kB)355 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 686 kBChecksum SHA-512
92fd7c1af71ecb4199e7ecb3de0b7b46d30f66ce3c477d1f72c64a9bed9e703ade8aad931a75b7f4de77e3428a76267524f80aea2a2a2cf335a143334bda3acb
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Andersson, JosefFremling, Max
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 355 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 782 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf