Umeå University's logo

umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Concept Drift Detection in Document Classification: An Evaluation of ADWIN, KSWIN, and Page Hinkley Using Different Observation Variables
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
2024 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
Abstract [en]

Machine Learning models can effectively be used in the public sector to classify user-uploaded documents for more efficient administration. However, the nature of user-uploaded data is non-stationary, as the input data stream may be affected by external influences, ranging from redesigns of official documents to geopolitical events that impact the user demographics interacting with the system. When the incoming data deviates from the data used at training, it may introduce concept drift making the models performance degrade over time. Effectively detecting concept drift can be the first step in a model adaptation strategy, as it indicates that a model update is needed.This thesis investigates practical methods for concept drift detection in a document classification domain, focusing on the feasibility to use other observations variables than prediction accuracy which is traditionally used, such as predicted labels and confidence levels. A pilot experiment was conducted using the Fashion MNIST dataset in order to validate the experimental setup and the drift detectors ADWIN, KSWIN and Page Hinkley, before performing analogous experiments using data from a real document classification application.The findings suggest that while confidence levels and predicted labels can be used to detect concept drift, ADWIN may not be the best detec- tor for these observation variables with the parameter values explored. KSWIN and Page Hinkley showed potential but also produced high false positive rates. Differences in the results between the Fashion MNIST experiment and the document classification experiment were observed and underscore the importance of tuning and validating a drift detec- tor for its intended environment. The findings highlight the potential to use confidence levels and predicted labels, emphasizing the need for robust parameter tuning and domain-specific knowledge in developing effective drift detection strategies in real world applications.

sted, utgiver, år, opplag, sider
2024.
Serie
UMNAD ; 1466
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-225881OAI: oai:DiVA.org:umu-225881DiVA, id: diva2:1867280
Eksternt samarbeid
ITS
Utdanningsprogram
Master of Science Programme in Interaction Technology and Design - Engineering
Veileder
Examiner
Tilgjengelig fra: 2024-06-10 Laget: 2024-06-10 Sist oppdatert: 2024-06-10bibliografisk kontrollert

Open Access i DiVA

fulltext(842 kB)333 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 842 kBChecksum SHA-512
113e629cd1b754118207905da07ee86c617cc09f8daf1b1d3c8e0cacdde9b79b83d2c443cebe016d6d5bd05839b46c518453297de6e0ff715ca90a240fd7b5e9
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 333 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 552 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf