Umeå University's logo

umu.sePublications
Change search
Refine search result
1 - 29 of 29
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable2011In: Proceedings of the Prague Stringology Conference 2011 / [ed] Jan Holub and Jan Žďárek, Prague: Prague Stringology Club, Czech Technical University , 2011, p. 59-73Conference paper (Refereed)
    Abstract [en]

    The string correction problem looks at minimal ways to modify one stringinto another using fixed operations, such as for example inserting a symbol, deleting asymbol and interchanging the positions of two symbols (a “swap”). This has been generalizedto trees in various ways, but unfortunately having operations to insert/deletenodes in the tree and operations that move subtrees, such as a “swap” of adjacent subtrees,makes the correction problem for trees intractable. In this paper we investigatewhat happens when we have a tree edit distance problem with only swaps. We callthis problem tree swap distance, and go on to prove that this correction problem isNP-complete. This suggests that the swap operation is fundamentally problematic inthe tree case, and other subtree movement models should be studied.

    Download full text (pdf)
    fulltext
  • 2.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank (Contributor)
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Characterizing Non-Regularity2014Report (Other academic)
    Abstract [en]

    This paper considers a characterization of the context-free non-regular languages, conjecturing that there for all such languages exists a fixed string thatcan be pumped to exhibit infinitely many equivalence classes. A proof is given only for a special case, but the general statement is conjectured to hold. The conjecture is then shown to imply that the shuffle of two context-free languagesis not context-free.

  • 3.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Complexities of Order-Related Formal Language Extensions2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work presented in this thesis discusses various formal language formalisms that extend classical formalisms like regular expressions and context-free grammars with additional abilities, most relating to order. This is done while focusing on the impact these extensions have on the efficiency of parsing the languages generated. That is, rather than taking a step up on the Chomsky hierarchy to the context-sensitive languages, which makes parsing very difficult, a smaller step is taken, adding some mechanisms which permit interesting spatial (in)dependencies to be modeled.

    The most immediate example is shuffle formalisms, where existing language formalisms are extended by introducing operators which generate arbitrary interleavings of argument languages. For example, introducing a shuffle operator to the regular expressions does not make it possible to recognize context-free languages like anbn, but it does capture some non-context-free languages like the language of all strings containing the same number of as, bs and cs. The impact these additions have on parsing has many facets. Other than shuffle operators we also consider formalisms enforcing repeating substrings, formalisms moving substrings around, and formalisms that restrict which substrings may be concatenated. The formalisms studied here all have a number of properties in common.

    1. They are closely related to existing regular and context-free formalisms. They operate in a step-wise fashion, deriving strings by sequences of rule applications of individually limited power.
    2. Each step generates a constant number of symbols and does not modify parts that have already been generated. That is, strings are built in an additive fashion that does not explode in size (in contrast to e.g. Lindenmayer systems). All languages here will have a semi-linear Parikh image.
    3. They feature some interesting characteristic involving order or other spatial constraints. In the example of the shuffle multiple derivations are in a sense interspersed in a way that each is unaware of.
    4. All of the formalisms are intended to be limited enough to make an efficient parsing algorithm at least for some cases a reasonable goal.

    This thesis will give intuitive explanations of a number of formalisms fulfilling these requirements, and will sketch some results relating to the parsing problem for them. This should all be viewed as preparation for the more complete results and explanations featured in the papers given in the appendices.

    Download full text (pdf)
    Complexities of Order-Related Formal Language Extensions
  • 4.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Complexities of Parsing in the Presence of Reordering2012Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work presented in this thesis discusses various formalisms for representing the addition of order-controlling and order-relaxing mechanisms to existing formal language models. An immediate example is shuffle expressions, which can represent not only all regular languages (a regular expression is a shuffle expression), but also features additional operations that generate arbitrary interleavings of its argument strings. This defines a language class which, on the one hand, does not contain all context-free languages, but, on the other hand contains an infinite number of languages that are not context-free. Shuffle expressions are, however, not themselves the main interest of this thesis. Instead we consider several formalisms that share many of their properties, where some are direct generalisations of shuffle expressions, while others feature very different methods of controlling order. Notably all formalisms that are studied here

    • have a semi-linear Parikh image,
    • are structured so that each derivation step generates at most a constant number of symbols (as opposed to the parallel derivations in for example Lindenmayer systems),
    • feature interesting ordering characteristics, created either by derivation steps that may generate symbols in multiple places at once, or by multiple generating processes that produce output independently in an interleaved fashion, and
    • are all limited enough to make the question of efficient parsing an interesting and reasonable goal.

    This vague description already hints towards the formalisms considered; the different classes of mildly context-sensitive devices and concurrent finite-state automata.

    This thesis will first explain and discuss these formalisms, and will then primarily focus on the associated membership problem (or parsing problem). Several parsing results are discussed here, and the papers in the appendix give a more complete picture of these problems and some related ones.

    Download full text (pdf)
    fulltext
  • 5.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    The Membership Problem for the Shuffle of Two Deterministic Linear Context-Free Languages is NP-complete2012Report (Other academic)
    Abstract [en]

    Formal language models which employ shuffling, or interleaving, of strings are of interest in many areas of computer science. Notable examples include system verification, plan recognition, and natural language processing. Membership problems for the shuffle of languages are especially interesting. It is known that deciding membership for shuffles of regular languages can be done in polynomial time, and that deciding (non-uniform) membership in the shuffle of two deterministic context-free languages is NP-complete. In this paper we narrow the gap by showing that the non-uniform membership problem for the shuffle of two deterministic *linear* context-free languages is NP-complete.

    Download full text (pdf)
    fulltext
  • 6.
    Berglund, Martin
    et al.
    Department of Information Science and Centre for AI Research, University of Stellenbosch, Matieland, South Africa.
    Bester, Willem
    Division of Computer Science, University of Stellenbosch, Matieland, South Africa.
    van der Merwe, Brink
    Division of Computer Science, University of Stellenbosch, Matieland, South Africa.
    Formalising and implementing Boost POSIX regular expression matching2021In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 857, p. 147-165Article in journal (Refereed)
    Abstract [en]

    Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, analyze the complexity of regular expression matching when using Boost semantics, and provide efficient algorithms for both online and multipass matching.

  • 7.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing unranked tree languages, folded once2023In: Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings / [ed] Henning Fernau; Klaus Jansen, Springer Nature, 2023, p. 60-73Conference paper (Refereed)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges, i.e., folds, selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation is enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper we address the remaining case where only one fold operation is applied, but the order among edges is discarded. We show that under these conditions, the problem is solvable in non-uniform polynomial time.

  • 8.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing unranked tree languages, folded once2024In: Algorithms, E-ISSN 1999-4893, Vol. 17, no 6, article id 268Article in journal (Refereed)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges (i.e., folds) selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation are enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper, we address the remaining case, where only one fold operation is applied, but the order among the edges is discarded. We show that, under these conditions, the problem is solvable in non-uniform polynomial time.

    Download full text (pdf)
    fulltext
  • 9.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Shuffled languages: representation and recognition2013In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 489-490, p. 1-20Article in journal (Refereed)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, in other words, how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 10.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Boiret, Adrien
    Laboratoire d'Informatique d'Orléans & INSA CVL, France.
    Transduction from trees to graphs through folding2023In: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 295, article id 105111Article in journal (Refereed)
    Abstract [en]

    We introduce a fold operation that realises a tree-to-graph transduction by merging selected nodes in the input tree to form a possibly cyclic output graph. The work is motivated by the increasing use of graph-based representations in semantic parsing. We show that a suitable class of graphs languages can be generated by applying the fold operation to regular unranked tree languages. We investigate two versions of the fold operation, one that preserves a depth-first ordering between the edges, and one that does not. Finally, we demonstrate that the time complexity for the associated non-uniform membership problem is solvable in polynomial time for the order-preserving version, and NP-complete for the order-cancelling one.

    Download full text (pdf)
    fulltext
  • 11.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems2013In: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, p. 21-29Conference paper (Other academic)
    Abstract [en]

    We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

    Download full text (pdf)
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems
  • 12.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    Stellenbosch University, South Africa.
    Watson, Bruce
    Stellenbosch University, South Africa.
    Cuts in Regular Expressions2013In: Developments in Language Theory: 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013. Proceedings / [ed] Marie-Pierre Béal, Olivier Carton, Springer Berlin/Heidelberg, 2013, p. 70-81Conference paper (Refereed)
    Abstract [en]

    Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in regular expressions. We formalize this notion in the form of the cut and iterated cut operators. They do not extend the class of languages that can be defined beyond the regular, but they allow for exponentially more succinct representation of some languages. Membership testing remains polynomial, but emptiness testing becomes PSPACE-hard. 

  • 13.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Högberg, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Recognizing Shuffled Languages2011Report (Other academic)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 14.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Högberg, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Recognizing shuffled languages2011In: Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings / [ed] Adrian-Horia Dediu, Shunsuke Inenaga and Carlos Martín-Vide, Springer Berlin/Heidelberg, 2011, p. 142-154Conference paper (Refereed)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 15.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the complexity of variants of the k Best strings problem2010In: Proceedings of the Prague stringology conference 2010, dblp , 2010, p. 76-88Conference paper (Refereed)
    Abstract [en]

    We investigate the problem of extracting the k best strings from a nondeterministic weighted automaton over a semiring S. This problem, which has been considered earlier in the literature, is more difficult than extracting the k best runs, since distinct runs may not correspond to distinct strings. Unsurprisingly, the computational complexity of the problem depends on the semiring S used. We study three different cases, namely the tropical and complex tropical semirings, and the semiring of positive real numbers. For the first case, we establish a polynomial algorithm. For the second and third cases, NP-completeness and undecidability results are shown.

  • 16.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the complexity of variants of the k best strings problem2010In: Proc. Prague Stringology Conference 2010 / [ed] M. Balík, J. Holub, 2010Conference paper (Refereed)
  • 17.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    University of Stellenbosch, South Africa.
    Analyzing catastrophic backtracking behavior in practical regular expression matching2014In: Proceedings 14th international conference on automata and formal languages (AFL 2014) / [ed] Zoltán Ésik; Zoltán Fülöp, Open Publishing Association , 2014, p. 109-123Conference paper (Refereed)
    Abstract [en]

    We consider in some detail how regular expression matching happens in Java, as a popular representative of the category of regex-directed matching engines. We extract a slightly idealized algorithm for this scenario. Next we define an automata model which captures all the aspects needed to perform matching, of the Java style, in a formal way. Finally, two types of static analysis, which take a regular expression and tells whether there exists a family of strings which make Java-style matching run in exponential time, are done.

  • 18.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Willeke
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Constructing a BPE Tokenization DFA2024In: Implementation and Application of Automata: 28th International Conference, CIAA 2024, Akita, Japan, September 3–6, 2024, Proceedings / [ed] Szilárd Zsolt Fazekas, Springer, 2024, Vol. 15015, p. 66-78Conference paper (Refereed)
    Abstract [en]

    Many natural language processing systems operate over tokenizations of text to address the open-vocabulary problem. In this paper, we give and analyze an algorithm for the efficient construction of deterministic finite automata (DFA) designed to operate directly on tokenizations produced by the popular byte pair encoding (BPE) technique. This makes it possible to apply many existing techniques and algorithms to the tokenized case, such as pattern matching, equivalence checking of tokenization dictionaries, and composing tokenized languages in various ways.

  • 19.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Formalizing BPE Tokenization2023In: 13th International Workshop onNon-Classical Models of Automata and Applications (NCMA 2023) / [ed] Nagy B., Freund R., Open Publishing Association , 2023, p. 16-27Conference paper (Refereed)
    Abstract [en]

    In this paper, we formalize practical byte pair encoding tokenization as it is used in large language models and other NLP systems, in particular we formally define and investigate the semantics of the SentencePiece and HuggingFace tokenizers, in particular how they relate to each other, depending on how the tokenization rules are constructed. Beyond this we consider how tokenization can be performed in an incremental fashion, as well as doing it left-to-right using an amount of memory constant in the length of the string, enabling e.g. using a finite state string-to-string transducer.

    Download full text (pdf)
    fulltext
  • 20.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    On the semantics of regular expression parsing in the wild2017In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 679, p. 69-82Article in journal (Refereed)
    Abstract [en]

    We introduce prioritized transducers to formalize capturing groups in regular expression matching in a way that permits straightforward modeling of capturing in Java's 1 regular expression library. The broader questions of parsing semantics and performance are also considered. In addition, the complexity of deciding equivalence of regular expressions with capturing groups is investigated.

  • 21.
    Berglund, Martin
    et al.
    Centre for AI Research, CSIR, Department of Information Science, Stellenbosch University, South Africa.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, South Africa.
    Re-examining regular expressions with backreferences2023In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 940, p. 66-80Article in journal (Refereed)
    Abstract [en]

    Most modern regular expression matching libraries (one of the rare exceptions being Google's RE2) allow backreferences, operations which bind a substring to a variable, allowing it to be matched again verbatim. However, both real-world implementations and definitions in the literature use different syntactic restrictions and have differences in the semantics of the matching of backreferences. Our aim is to compare these various flavors by considering the classes of formal languages that each can describe, establishing, as a result, a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, old classes are extended to new ones, and several incidental results on the nature of these language classes are given.

  • 22.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Department of Information Science, Stellenbosch University, South Africa.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, South Africa.
    Regular Expressions with Backreferences Re-examined2017In: Proceedings of the Prague Stringology Conference 2017 / [ed] Jan Holub; Jan Zdárek, Praque Stringology Club , 2017, p. 30-41Conference paper (Refereed)
    Abstract [en]

    Most modern regular expression matching libraries (one of the rare exceptions being Google’s RE2) allow backreferences, operations which bind a substring to avariable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.

  • 23.
    Berglund, Martin
    et al.
    Department of Information Science, Stellenbosch University, Stellenbosch, South Africa; Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Bruce, Watson
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa; Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    On the Semantics of Atomic Subgroups in Practical Regular Expressions2017In: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol; Cyril Nicaud, Springer, 2017, p. 14-26Conference paper (Refereed)
    Abstract [en]

    Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent needless backtracking, but the operators will often also change the language accepted. As such it is essential to develop a theoretical sound basis for the matching semantics of regular expressions with atomic operators. We here establish that atomic operators preserve regularity, but are exponentially more succinct for some languages. Further we investigate the state complexity of deterministic and non-deterministic finite automata accepting the language corresponding to a regular expression with atomic operators, and show that emptiness testing is PSPACE-complete.

  • 24.
    Berglund, Martin
    et al.
    Department of Information Science, Stellenbosch University, South Africa.
    van der Merwe, Brink
    Computer Science Division, Stellenbosch University, South Africa.
    van Litsenborgh, Steyn
    Computer Science Division, Stellenbosch University, South Africa.
    Regular Expressions with Lookahead2021In: Journal of universal computer science (Online), ISSN 0948-695X, E-ISSN 0948-6968, Vol. 27, no 4, p. 324-340Article in journal (Refereed)
    Abstract [en]

    This paper investigates regular expressions which in addition to the standard operators of union, concatenation, and Kleene star, have lookaheads. We show how to translate regular expressions with lookaheads (REwLA) to equivalent Boolean automata having at most 3 states more than the length of the REwLA. We also investigate the state complexity when translating REwLA to equivalent deterministic finite automata (DFA).

    Download full text (pdf)
    fulltext
  • 25.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Petter, Ericson
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Uniform vs. Nonuniform Membership for Mildly Context-Sensitive Languages: A Brief Survey2016In: Algorithms, E-ISSN 1999-4893, Vol. 9, no 2, article id 32Article in journal (Refereed)
    Abstract [en]

    Parsing for mildly context-sensitive language formalisms is an important area within natural language processing. While the complexity of the parsing problem for some such formalisms is known to be polynomial, this is not the case for all of them. This article presents a series of results regarding the complexity of parsing for linear context-free rewriting systems and deterministic tree-walking transducers. We discuss the difference between uniform and nonuniform complexity measures and how parameterized complexity theory can be used to investigate how different aspects of the formalisms influence how hard the parsing problem is. The main results we survey are all hardness results and indicate that parsing is hard even for relatively small values of parameters such as rank and fan-out in a rewriting system.

    Download full text (pdf)
    fulltext
  • 26.
    van der Merwe, Brink
    et al.
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa; National Institute for Theoretical and Computational Sciences, Stellenbosch, South Africa.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Department of Information Science, Stellenbosch University, Stellenbosch, South Africa.
    Ordered Context-Free Grammars2022In: Implementation and Application of Automata: 26th International Conference, CIAA 2022, Rouen, France, June 28 – July 1, 2022, Proceedings / [ed] Pascal Caron; Ludovic Mignot, Springer Science+Business Media B.V., 2022, p. 53-66Conference paper (Refereed)
    Abstract [en]

    We propose a new unambiguous grammar formalism, referred to as ordered context-free grammars, which is identical to context-free grammars, apart from the property that it also places an order on parse trees. Since only a minor modification to ordered context-free grammars is required to obtain parsing expression grammars, the relationship between context-free grammars and parsing expression grammars becomes more evident. By preserving how ordered context-free grammars support left-recursion, parsing expression grammars is modified to support left recursion in ways much more natural than current approaches.

  • 27.
    van der Merwe, Brink
    et al.
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Mouton, Jacobie
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    van Litsenborgh, Steyn
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Berglund, Martin
    Department of Information Science, Stellenbosch University, Stellenbosch, South Africa.
    Memoized Regular Expressions2021In: Implementation and Application of Automata: 25th International Conference, CIAA 2021, Virtual Event, July 19-22, 2021, Proceedings / [ed] Sebastian Maneth, Springer, 2021, p. 39-52Conference paper (Refereed)
    Abstract [en]

    We extend non-deterministic finite automata (NFAs) and regular expressions (regexes) by adding memoization to these formalisms. These extensions are aimed at improving the matching time of backtracking regex matchers. Additionally, we discuss how to extend the concept of ambiguity in order to be applicable to memoized extensions of regexes and NFAs. These more general notions of ambiguity can be used to analyze the matching time of backtracking regex matchers enhanced with memoization.

  • 28.
    van der Merwe, Brink
    et al.
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Weideman, Nicolaas
    Department of Computer Science, Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    Berglund, Martin
    Department of Information Science, Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    Turning evil regexes harmless2017In: SAICSIT '17: Proceedings of the South African Institute of Computer Scientists and Information Technologists, ACM Digital Library, 2017, article id 38Conference paper (Refereed)
    Abstract [en]

    We explore the relationship between ambiguity in automata and regular expressions on the one hand, and the matching time of backtracking regular expression matchers on the other. We focus in particular on the extreme cases where we have either an exponential amount of ambiguity or no ambiguity at all. We also investigate techniques to reduce or remove ambiguity from regular expressions, which can then be used to transform regular expressions which might be exploited by using algorithmic complexity, into harmless equivalent expressions.

  • 29. Weideman, Nicolaas
    et al.
    van der Merwe, Brink
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Watson, Bruce
    Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA2016In: Implementation and Application of Automata, Springer, 2016, p. 322-334Conference paper (Refereed)
    Abstract [en]

    We apply results from ambiguity of non-deterministic finite automata to the problem of determining the asymptotic worst-case matching time, as a function of the length of the input strings, when attempting to match input strings with a given regular expression, where the matcher being used is a backtracking regular expression matcher.

1 - 29 of 29
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf