A Comparative Study of MATCH_RECOGNIZE and REGEXP-Based SQL Approaches for Process Querying
2025 (Engelska)Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hp
Studentuppsats (Examensarbete)
Abstract [en]
Businesses and organization relies on process querying to analyse data using tools such as process querying languages (PQLs) or SQL. As PQLs introduce certain drawbacks and as SQL remains a standard for handling data, accessing SQLs usefulness for process querying is of practical relevance. The most common ways to express such queries in SQL are with MATCH_RECOGNIZE, and regular expression (REGEXP) based SQL approaches. However, the usefulness of MATCH_RECOGNIZE is still unknown given that SQL have already well-established REGEXP support, and many SQL engines not yet supporting MATCH_RECOGNIZE.
In this thesis we empirically evaluate the performance and scalability off MATCH_RECOGNIZE in comparison to REGEXP-based SQL approaches for process querying by using a simple dataset derived from SIGNAL—a PQL by SAP Signavio—and translating the SIGNAL patterns to both MATCH_RECOGNIZE and REGEXP-based SQL queries. The execution time, CPU usage, and peak-memory usage is measured for each query, and to evaluate the scalability the dataset size is varied using logarithmic scaling e.g., 10%, 25%, 50%,75%, 100% for each query.
The findings of the experiment showed that REGEXP-based SQL approaches outperform MATCH_RECOGNIZE in all metrics, often by a factor of 2. The results did also find that both approaches does scale linearly with increasing data size.
These findings indicate that MATCH_RECOGNIZE might not be the best tool for all process querying task in SQL, especially when using a simple dataset. However, we strongly speculate that MATCH_RECOGNIZE does outperform REGEXP-based SQL approaches as the complexity of the dataset increases.
Ort, förlag, år, upplaga, sidor
2025. , s. 25
Serie
UMNAD ; 1565
Nyckelord [en]
sql, process querying, MATCH_RECOGNIZE, temporal pattern matching, process querying languages, regular expressions
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-240784OAI: oai:DiVA.org:umu-240784DiVA, id: diva2:1973636
Utbildningsprogram
Kandidatprogrammet i Datavetenskap
Handledare
Examinatorer
2025-06-232025-06-192025-06-23Bibliografiskt granskad