Efficient incremental evaluation of succinct regular expressions
2015 (English)In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2015, 1541-1550 p.Conference paper (Refereed)
Regular expressions are omnipresent in database applications. They form the structural core of schema languages for XML, they are a fundamental ingredient for navigational queries in graph databases, and are being considered in languages for upcoming technologies such as schema- and transformation languages for tabular data on the Web. In this paper we study the usage and effectiveness of the counting operator (or: limited repetition) in regular expressions. The counting operator is a popular extension which is part of the POSIX standard and therefore also present in regular expressions in grep, Java, Python, Perl, and Ruby. In a database context, expressions with counting appear in XML Schema and languages for querying graphs such as SPARQL 1.1 and Cypher.
We first present a practical study that suggests that counters are extensively used in practice. We then investigate evaluation methods for such expressions and develop a new algorithm for efficient incremental evaluation. Finally, we conduct an extensive benchmark study that shows that exploiting counting operators can lead to speed-ups of several orders of magnitude in a wide range of settings: normal and incremental evaluation on synthetic and real expressions.
Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2015. 1541-1550 p.
XML, Schema, Regular Expressions
IdentifiersURN: urn:nbn:se:umu:diva-110695DOI: 10.1145/2806416.2806434ISBN: 978-1-4503-3794-6OAI: oai:DiVA.org:umu-110695DiVA: diva2:864248
ACM International on Conference on Information and Knowledge Management (CIKM)2015, Melbourne, VIC, Australia