Machine Learning Algorithms for Proactive Ransomware Threat Hunting
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
This thesis investigated the performance of various classifier and ensemble models, in the context of ransomware detection. The study was done with the objective of making recommendations regarding future proactive threat hunting on Windows OS using machine learning models and dynamic file analysis. Tests were conducted by employing machine learning models trained on file behaviours given during dynamic analysis using a sandbox. The performance of the models was then evaluated using a statistical analysis pertaining to classification outcomes. Future threat hunting recommendations were made based on the results of the statistical evaluation.
All ensemble models evaluated in this study utilized clustering as a procedure before classification, with the aim of investigating how these ensemble models compared to pure classifiers during evaluation. With regards to existing literature, it was found that previous studies focused either on clustering or classification. As such, investigation into combining clustering and classification was deemed to hold scientific value. These investigations were done through the implementation and evaluation of two pure classification models as well as four ensemble models that combined the same classification algorithms with two clustering algorithms. The classifiers gradient boosting and decision trees were chosen due to high performance in previous research studying the use of machine learning for ransomware detection. Additionally, the clustering algorithms agglomerative clustering and k-means clustering were chosen for the ensemble models.
Out of all models tested, the model that achieved the highest average scores during evaluation was the gradient boosting classifier model, with an average accuracy of 0.932, average recall of 0.913, average precision of 0.926 and average F1-score of 0.918. However, this model achieved the lowest per class recall for ransomware out of all models, where both ensemble models including the gradient boosting as their classifying algorithm showed a slight boost in ransomware classification performance. The model with the highest per class recall for ransomware was the pure decision trees model, which saw a slight decrease in performance with the addition of clustering as an antecedent process to classification. Overall, the results of the statistical evaluation suggest that more research is needed before any of the models evaluated are ready for real life applications. However, a takeaway from this study is that utilizing clustering as an antecedent process to classification shows potential in possibly enhancing classification outcomes for ransomware, meaning that further research into how ensembles are best applied in ransomware detection is warranted.
Place, publisher, year, edition, pages
2024. , p. 40
Keywords [en]
Ransomware, Cyber security, Machine learning, Dynamic analysis, Sandbox
National Category
Other Engineering and Technologies Computer Sciences Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-225794OAI: oai:DiVA.org:umu-225794DiVA, id: diva2:1866811
External cooperation
Omegapoint
Subject / course
Examensarbete i Interaktionsteknik och design
Educational program
Master of Science Programme in Interaction Technology and Design - Engineering
Supervisors
Examiners
2024-09-122024-06-082025-02-18Bibliographically approved