umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DETECTION of INFRASTRUCTURE ANOMALIES in BUILD LOGS USING MACHINE LEARNINGText classification on Continous Integration log files.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Continuous integration is a practice where software developers integrate their code to a bigger codebase multiple times per day. Before the integration, the code is built and tested by e.g open source build tools such as Jenkins, and the information produced during this process is stored in a log file. Sometimes these builds fail, and the cause can be either user or infrastructure related. A user related error may be that the code cannot compile due to syntax error and an infrastructure error could be a DNS problem. This thesis evaluated how well machine learning can be used to label the cause on failed build logs as either user or infrastructure. This thesis compared the performance of three machine learning algorithms: support-vector machine, random forest, and gradient boosting classifier. Two different datasets are used in this study. A balanced dataset used for training and validation and another dataset used for testing. The preprocessing step, including feature selection, is done using term frequency-inverse document frequency, which converts the text from the build log to a machine learning friendly format. The study also evaluated three different sizes of n-grams for each algorithm and dataset. The performance for the three machine learning algorithms is evaluated by comparing the precision, recall, and F1-score for each model. The three machine learning algorithms and the methodology around preprocessing and evaluation are explained in this study. The results show that machine learning can be used as a tool to help the CI-owners, but may not be used to fully replace the classification done manually today. The machine learning algorithm that performed the best was gradient boosting classifier with an bag of 1 and 2-grams, with a precision, recall and F1-score of 0.87, 0.73 and 0.79.i

Place, publisher, year, edition, pages
2019. , p. 38
Series
UMNAD ; 1205
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-164730OAI: oai:DiVA.org:umu-164730DiVA, id: diva2:1366505
External cooperation
Spotify
Educational program
Master of Science Programme in Computing Science and Engineering
Supervisors
Examiners
Available from: 2019-10-29 Created: 2019-10-29 Last updated: 2019-10-29Bibliographically approved

Open Access in DiVA

fulltext(491 kB)18 downloads
File information
File name FULLTEXT01.pdfFile size 491 kBChecksum SHA-512
9bcf6108b72485ebeb37789c7b43a782bfbd92b9c5b15cb011c236016084e2843f8c4913e3ba329da692df7e212adb336bbfbfb7cf50fdd6c003f79d614151ff
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 18 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 91 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf