umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CONSTRUCTING AND VARYING DATA MODELS FOR UNSUPERVISED ANOMALY DETECTION ON LOG DATAData modelling and domain knowledge’s impact on anomaly detection and explainability
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

As the complexity of today’s systems increases, manual system monitoring and log fi€le analysis are no longer applicable, giving an increasing need for automated anomaly detection systems. However, most current research in the domain, tend to focus only on the technical details of the frameworks and the evaluations of the algorithms, and how this impacts anomaly detection results. In contrast, this study emphasizes the details of how one can approach to understand and model the data, and how this impact anomaly detection performance.Given log data from an education platform application, data is analysed to conform a concept of what is normal, with regards to educational course section behaviour. Data is then modelled to capture the dimensions of a course section, and a detection model created, running a statically tuned K-Nearest neighbours algorithm as classi€er - to emphasize the impact of the modelling, not the algorithm.‘ The results showed that single point anomalies could successfully be detected. However, the results were hard to interpret due to lack of reason and explainability.‘ Thereby, this study presents a method of modifying a multidimensional data model to conform a detection model with increased explainability. ‘The original model is decomposed into smaller modules by utilizing explicit categorical domain knowledge of the available features. Each module will represent a more speci€c aspect of the whole model and results show a more explicit coverage of detected point anomalies and a higher degree of explainability of the detection output, in terms of increased interpretability as well as increased comprehensibility.

Place, publisher, year, edition, pages
2019.
Series
UMNAD ; 1192
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-163544OAI: oai:DiVA.org:umu-163544DiVA, id: diva2:1354368
External cooperation
ITS
Educational program
Bachelor of Science Programme in Computing Science
Supervisors
Examiners
Available from: 2019-09-25 Created: 2019-09-25 Last updated: 2019-09-25Bibliographically approved

Open Access in DiVA

fulltext(1025 kB)10 downloads
File information
File name FULLTEXT01.pdfFile size 1025 kBChecksum SHA-512
4876579d93288a69452757e5650be7c4c060b5702305e4fcacb93068440361039b939d23b8ada4777d3083ba0aae776cb3178bcdf40d2af2c7515e58a5c5bf67
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 10 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf