Comparing Transformer-Based and Bag-of-Words Approaches for Phishing Email Detection Under Text Obfuscation
2025 (Engelska)Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hp
Studentuppsats (Examensarbete)
Abstract [en]
Phishing emails are still a problem in cybersecurity, and attackers keep finding new ways to bypass filters. One common trick is text obfuscation, where words are changed using techniques like leetspeak, Unicode characters, or invisible symbols. In this thesis, we compare classical Bag-of-Words models (Naïve Bayes, Logistic Regression, Random Forest), fastText, and transformer-based models (DeBERTa-v3, CANINE-S, ByT5) on clean and obfuscated phishing emails. The results show that transformer models perform best overall and are more stable when the text is manipulated, while traditional models and fastText lose more accuracy under obfuscation. Among transformers, DeBERTa-v3 and ByT5 showed strong robustness, but CANINE-S was more sensitive to obfuscation. These findings suggest that models which rely less on exact tokens and more on context are better suited for detecting phishing emails in real-world conditions.
Ort, förlag, år, upplaga, sidor
2025. , s. 27
Serie
UMNAD ; 1581
Nyckelord [en]
LLM, Artificial Inteligence, Transformers, Phishing Email
Nationell ämneskategori
Artificiell intelligens Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-243824OAI: oai:DiVA.org:umu-243824DiVA, id: diva2:1994486
Utbildningsprogram
Kandidatprogrammet i Datavetenskap
Presentation
, Umeå universitet, Umeå (Engelska)
Handledare
Examinatorer
2025-09-092025-09-022025-09-09Bibliografiskt granskad