Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bottleneck identification and failure prevention with procedural learning in 5G RAN
Umeå University, Faculty of Science and Technology, Department of Computing Science. (ADS LAB)ORCID iD: 0000-0001-9013-6603
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-2633-6798
2023 (English)In: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid) / [ed] Simmhan Y., Altintas I., Varbanescu A.-L., Balaji P., Prasad A.S., Carnevale L., IEEE, 2023, p. 426-436Conference paper, Published paper (Refereed)
Abstract [en]

To meet the low latency requirements of 5G Radio Access Networks (RAN), it is essential to learn where performance bottlenecks occur. As parts are distributed and virtualized, it becomes troublesome to identify where unwanted delays occur. Today, vendors spend huge manual effort analyzing key performance indicators (KPIs) and system logs to detect these bottlenecks. The 5G architecture allows a flexible scaling of microservices to handle the variation in traffic. But knowing how, when, and where to scale is difficult without a detailed latency analysis. In this article, we propose a novel method that combines procedural learning with latency analysis of system log events. The method, which we call LogGenie, learns the latency pattern of the system at different load scenarios and automatically identifies the parts with the most significant increase in latency. Our evaluation in an advanced 5G testbed shows that LogGenie can provide a more detailed analysis than previous research has achieved and help troubleshooters locate bottlenecks faster. Finally, through experiments, we show how a latency prediction model can dynamically fine-tune the behavior where bottlenecks occur. This lowers resource utilization, makes the architecture more flexible, and allows the system to fulfill its latency requirements.

Place, publisher, year, edition, pages
IEEE, 2023. p. 426-436
Keywords [en]
bottleneck detection, latency, RAN, failure prevention, 5G
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-205960DOI: 10.1109/CCGrid57682.2023.00047Scopus ID: 2-s2.0-85166323115ISBN: 979-8-3503-0119-9 (electronic)ISBN: 979-8-3503-0120-5 (print)OAI: oai:DiVA.org:umu-205960DiVA, id: diva2:1745844
Conference
23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Bangalore, India, May 1-4, 2023
Funder
Knut and Alice Wallenberg FoundationAvailable from: 2023-03-24 Created: 2023-03-24 Last updated: 2023-08-15Bibliographically approved
In thesis
1. Machine learning-based diagnostics and observability in mobile networks
Open this publication in new window or tab >>Machine learning-based diagnostics and observability in mobile networks
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Maskininlärningsbaserad diagnostik och observerbarhet i mobila nätverk
Abstract [en]

To meet the high-performance and reliability demands of 5G, the Radio Access Network (RAN) is moving to a cloud-native architecture. The new microservice architecture promises increased operational efficiency and a shorter time-to-market, but it also comes with a price. The new distributed and virtualized architecture is far more complex than ever before, and with the increasing number of features it brings, troubleshooting becomes more difficult. So far, RAN troubleshooters have relied on their expertise to analyze systems manually, but the ever-growing data and increased complexity make it challenging to grasp system behavior.

This thesis contributes threefold, where the proposed machine learning and statistical methods help RAN troubleshooters find deviations in system logs, identify the root cause of these deviations, and improve the system's observability. These methods learn the application's behavior from the system logs events and can identify behavior deviations from many different aspects. The thesis also demonstrates how observability can be improved by using a new software instrumentation guideline. The guideline enables the tracking of systemized procedures and enhances system understanding. The purpose of the guideline is to make RAN developers aware that machine learning can utilize debug information and help their troubleshooting process. To familiarize the reader with the research area, the challenges, and methods that can be used to detect anomalies, perform root cause analysis and observe RAN system behavior. The proposed research methods are integrated and tested in an advanced 5G test bed to evaluate the methods' accuracy, speed, system impact, and implementation cost.

The results demonstrate the advantage of using machine learning and statistical methods when troubleshooting the behavior of RAN. Machine learning methods, similar to those presented in this thesis, may help those who troubleshoot RAN and accelerate the development of 5G. The thesis ends with presenting potential research areas where this research could be further developed and applied, both in RAN and other systems.

Abstract [sv]

För att möta de höga kraven på prestanda och tillförlitlighet i det nya mobila 5G nätet sker nu en övergång till en molnbaserad arkitektur i radioaccessnätverket (RAN). Den nya mikrotjänstarkitekturen är tänkt att öka skalbarheten, prestandan och korta ner ledtiderna för produktleveranserna. Den distribuerade och virtuella arkitekturen är däremot mer komplicerad än tidigare och medför att det blir svårare att felsöka. Hittills har de som felsökt RAN förlitat sig på sin expertis för att manuellt analysera systemet. Men den ständigt växande datamängden och den ökade komplexiteten gör det svårt att förstå systemets beteende.

Denna avhandling bidrar med kunskap inom tre närliggande områden, där de föreslagna maskininlärnings- och statistiska metoderna hjälper de som felsöker RAN att hitta avvikelser i systemloggar, hjälper till att identifiera grundorsaken till dessa avvikelser och förbättrar systemets observerbarhet. Dessa metoder lär sig RANs beteende utifrån händelser i systemloggar och kan identifiera ett antal beteendeavvikelser. Avhandlingen visar också på hur observerbarheten kan förbättras genom att använda en ny riktlinje för mjukvaruinstrumentering. Riktlinjen gör det möjligt att följa hur RANs applikationer påverkar varandra vilket i sin tur förbättrar systemförståelsen. Syftet med riktlinjerna är att göra dem som arbetar med RAN medvetna om hur maskininlärning kan hjälpa till i deras felsökningsprocess. För att bekanta läsaren med forskningsområdet diskuteras först utmaningarna och metoderna som kan användas för att upptäcka avvikelser i RAN data, orsaken till avvikelserna samt hur observerbarheten av systemet kan förbättras. För att utvärdera de föreslagna metodernas noggrannhet, hastighet, systempåverkan och implementeringskostnad, integrerar och testas metoderna i en avancerad 5G-testbädd.

Resultatet visar på de stora fördelarna med att använda maskininlärning och statistiska metoder vid felsökning av beteendet hos RAN. Maskininlärningsmetoder, liknande de som presenteras i denna avhandling, kan komma att hjälpa dem som felsöker RAN och påskynda utvecklingen av 5G. Avhandlingen avslutas med en presentation av potentiella forskningsområden där forskningen i denna avhandling skulle kunna vidareutvecklas och tillämpas, både i RAN men även i andra system.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2023. p. 45
Series
Report / UMINF, ISSN 0348-0542 ; 23.02
Keywords
Anomaly detection, Root cause analysis, Observability, Machine learning, Radio Access Network, 5G
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-206055 (URN)978-91-8070-053-5 (ISBN)978-91-8070-054-2 (ISBN)
Public defence
2023-04-21, Aula Biologica BIO.E.203, Umeå, 09:15 (English)
Opponent
Supervisors
Available from: 2023-03-31 Created: 2023-03-27 Last updated: 2023-03-28Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusConference webbsite

Authority records

Sundqvist, TobiasBhuyan, Monowar H.Elmroth, Erik

Search in DiVA

By author/editor
Sundqvist, TobiasBhuyan, Monowar H.Elmroth, Erik
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 270 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf