Umeå universitets logga

umu.sePublikationer
Ändra sökning
Avgränsa sökresultatet
1 - 29 av 29
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable2011Ingår i: Proceedings of the Prague Stringology Conference 2011 / [ed] Jan Holub and Jan Žďárek, Prague: Prague Stringology Club, Czech Technical University , 2011, s. 59-73Konferensbidrag (Refereegranskat)
    Abstract [en]

    The string correction problem looks at minimal ways to modify one stringinto another using fixed operations, such as for example inserting a symbol, deleting asymbol and interchanging the positions of two symbols (a “swap”). This has been generalizedto trees in various ways, but unfortunately having operations to insert/deletenodes in the tree and operations that move subtrees, such as a “swap” of adjacent subtrees,makes the correction problem for trees intractable. In this paper we investigatewhat happens when we have a tree edit distance problem with only swaps. We callthis problem tree swap distance, and go on to prove that this correction problem isNP-complete. This suggests that the swap operation is fundamentally problematic inthe tree case, and other subtree movement models should be studied.

    Ladda ner fulltext (pdf)
    fulltext
  • 2.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank (Bidragsgivare)
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Characterizing Non-Regularity2014Rapport (Övrigt vetenskapligt)
    Abstract [en]

    This paper considers a characterization of the context-free non-regular languages, conjecturing that there for all such languages exists a fixed string thatcan be pumped to exhibit infinitely many equivalence classes. A proof is given only for a special case, but the general statement is conjectured to hold. The conjecture is then shown to imply that the shuffle of two context-free languagesis not context-free.

  • 3.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Complexities of Order-Related Formal Language Extensions2014Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [sv]

    Denna avhandling diskuterar utökningar av klassiska formalismer inom formella språk, till exempel reguljära uttryck och kontextfria grammatiker. Utökningarna handlar på ett eller annat sätt omordning, och ett särskilt fokus ligger på att göra utökningarna på ett sätt som dels har intressanta spatiala/ordningsrelaterade effekter och som dels bevarar den effektiva parsningen som är möjlig för de ursprungliga klassiska formalismerna. Detta står i kontrast till att ta det större steget upp i Chomsky-hierarkin till de kontextkänsliga språken, vilket medför ett svårt parsningsproblem.

    Ett omedelbart exempel på en sådan utökning är s.k. shuffle-formalismer. Dessa utökar existerande formalismer genom att introducera operatorer som godtyckligt sammanflätar strängar från argumentspråk. Om shuffle-operator introduceras till de reguljära uttrycken ger det inte förmågan att känna igen t.ex. det kontextfria språket anbn, men det fångar istället vissa språk som inte är kontextfria, till exempel språket som består av alla strängar som innehåller lika många a:n, b:n och c:n. Sättet på vilket dessa utökningar påverkar parsningsproblemet är mångfacetterat. Utöver dessa shuffle-operatorer tas också formalismer där delsträngar kan upprepas, formalismer där delsträngar flyttas runt, och formalismer som begränsar hur delsträngar får konkateneras upp. Formalismerna som tas upp här har dock vissa egenskaper gemensamma.

    1. De är nära besläktade med de klassiska reguljära och kontextfria formalismerna. De arbetar stegvis, och konstruerar strängar genom successiva applikationer av individuellt enkla regler.
    2. Varje steg genererar ett konstant antal symboler och modifierar inte det som redan genererats. Det vill säga, strängar byggs additivt och längden på dem kan inte explodera (i kontrast till t.ex. Lindenmayer-system). Alla språk som tas upp kommer att ha en semi-linjär Parikh-avbildning.
    3. De har någon instressant spatial/ordningsrelaterad egenskap. Exempelvis sättet på vilket shuffle-operatorer sammanflätar annars oberoende deriveringar.
    4. Alla formalismerna är tänkta att vara begränsade nog att det är resonabelt att ha effektiv parsning som mål.

    Denna avhandling kommer att ge intuitiva förklaring av ett antal formalismer som uppfyller ovanstående krav, och kommer att skissa en blandning av resultat relaterade till parsningsproblemet för dem. Detta bör ses som förberedande inför läsning av de mer djupgående och komplexa resultaten och förklaringarna i de artiklar som finns inkluderade som appendix.

    Ladda ner fulltext (pdf)
    Complexities of Order-Related Formal Language Extensions
  • 4.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Complexities of Parsing in the Presence of Reordering2012Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    The work presented in this thesis discusses various formalisms for representing the addition of order-controlling and order-relaxing mechanisms to existing formal language models. An immediate example is shuffle expressions, which can represent not only all regular languages (a regular expression is a shuffle expression), but also features additional operations that generate arbitrary interleavings of its argument strings. This defines a language class which, on the one hand, does not contain all context-free languages, but, on the other hand contains an infinite number of languages that are not context-free. Shuffle expressions are, however, not themselves the main interest of this thesis. Instead we consider several formalisms that share many of their properties, where some are direct generalisations of shuffle expressions, while others feature very different methods of controlling order. Notably all formalisms that are studied here

    • have a semi-linear Parikh image,
    • are structured so that each derivation step generates at most a constant number of symbols (as opposed to the parallel derivations in for example Lindenmayer systems),
    • feature interesting ordering characteristics, created either by derivation steps that may generate symbols in multiple places at once, or by multiple generating processes that produce output independently in an interleaved fashion, and
    • are all limited enough to make the question of efficient parsing an interesting and reasonable goal.

    This vague description already hints towards the formalisms considered; the different classes of mildly context-sensitive devices and concurrent finite-state automata.

    This thesis will first explain and discuss these formalisms, and will then primarily focus on the associated membership problem (or parsing problem). Several parsing results are discussed here, and the papers in the appendix give a more complete picture of these problems and some related ones.

    Ladda ner fulltext (pdf)
    fulltext
  • 5.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    The Membership Problem for the Shuffle of Two Deterministic Linear Context-Free Languages is NP-complete2012Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Formal language models which employ shuffling, or interleaving, of strings are of interest in many areas of computer science. Notable examples include system verification, plan recognition, and natural language processing. Membership problems for the shuffle of languages are especially interesting. It is known that deciding membership for shuffles of regular languages can be done in polynomial time, and that deciding (non-uniform) membership in the shuffle of two deterministic context-free languages is NP-complete. In this paper we narrow the gap by showing that the non-uniform membership problem for the shuffle of two deterministic *linear* context-free languages is NP-complete.

    Ladda ner fulltext (pdf)
    fulltext
  • 6.
    Berglund, Martin
    et al.
    Department of Information Science and Centre for AI Research, University of Stellenbosch, Matieland, South Africa.
    Bester, Willem
    Division of Computer Science, University of Stellenbosch, Matieland, South Africa.
    van der Merwe, Brink
    Division of Computer Science, University of Stellenbosch, Matieland, South Africa.
    Formalising and implementing Boost POSIX regular expression matching2021Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 857, s. 147-165Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, analyze the complexity of regular expression matching when using Boost semantics, and provide efficient algorithms for both online and multipass matching.

  • 7.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parsing unranked tree languages, folded once2023Ingår i: Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings / [ed] Henning Fernau; Klaus Jansen, Springer Nature, 2023, s. 60-73Konferensbidrag (Refereegranskat)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges, i.e., folds, selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation is enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper we address the remaining case where only one fold operation is applied, but the order among edges is discarded. We show that under these conditions, the problem is solvable in non-uniform polynomial time.

  • 8.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parsing unranked tree languages, folded once2024Ingår i: Algorithms, E-ISSN 1999-4893, Vol. 17, nr 6, artikel-id 268Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges (i.e., folds) selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation are enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper, we address the remaining case, where only one fold operation is applied, but the order among the edges is discarded. We show that, under these conditions, the problem is solvable in non-uniform polynomial time.

    Ladda ner fulltext (pdf)
    fulltext
  • 9.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Shuffled languages: representation and recognition2013Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 489-490, s. 1-20Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, in other words, how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 10.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Boiret, Adrien
    Laboratoire d'Informatique d'Orléans & INSA CVL, France.
    Transduction from trees to graphs through folding2023Ingår i: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 295, artikel-id 105111Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We introduce a fold operation that realises a tree-to-graph transduction by merging selected nodes in the input tree to form a possibly cyclic output graph. The work is motivated by the increasing use of graph-based representations in semantic parsing. We show that a suitable class of graphs languages can be generated by applying the fold operation to regular unranked tree languages. We investigate two versions of the fold operation, one that preserves a depth-first ordering between the edges, and one that does not. Finally, we demonstrate that the time complexity for the associated non-uniform membership problem is solvable in polynomial time for the order-preserving version, and NP-complete for the order-cancelling one.

    Ladda ner fulltext (pdf)
    fulltext
  • 11.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems2013Ingår i: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, s. 21-29Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

    Ladda ner fulltext (pdf)
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems
  • 12.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    van der Merwe, Brink
    Stellenbosch University, South Africa.
    Watson, Bruce
    Stellenbosch University, South Africa.
    Cuts in Regular Expressions2013Ingår i: Developments in Language Theory: 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013. Proceedings / [ed] Marie-Pierre Béal, Olivier Carton, Springer Berlin/Heidelberg, 2013, s. 70-81Konferensbidrag (Refereegranskat)
    Abstract [en]

    Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in regular expressions. We formalize this notion in the form of the cut and iterated cut operators. They do not extend the class of languages that can be defined beyond the regular, but they allow for exponentially more succinct representation of some languages. Membership testing remains polynomial, but emptiness testing becomes PSPACE-hard. 

  • 13.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Högberg, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Recognizing Shuffled Languages2011Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 14.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Högberg, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Recognizing shuffled languages2011Ingår i: Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings / [ed] Adrian-Horia Dediu, Shunsuke Inenaga and Carlos Martín-Vide, Springer Berlin/Heidelberg, 2011, s. 142-154Konferensbidrag (Refereegranskat)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 15.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the complexity of variants of the k Best strings problem2010Ingår i: Proceedings of the Prague stringology conference 2010, dblp , 2010, s. 76-88Konferensbidrag (Refereegranskat)
    Abstract [en]

    We investigate the problem of extracting the k best strings from a nondeterministic weighted automaton over a semiring S. This problem, which has been considered earlier in the literature, is more difficult than extracting the k best runs, since distinct runs may not correspond to distinct strings. Unsurprisingly, the computational complexity of the problem depends on the semiring S used. We study three different cases, namely the tropical and complex tropical semirings, and the semiring of positive real numbers. For the first case, we establish a polynomial algorithm. For the second and third cases, NP-completeness and undecidability results are shown.

  • 16.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the complexity of variants of the k best strings problem2010Ingår i: Proc. Prague Stringology Conference 2010 / [ed] M. Balík, J. Holub, 2010Konferensbidrag (Refereegranskat)
  • 17.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    van der Merwe, Brink
    University of Stellenbosch, South Africa.
    Analyzing catastrophic backtracking behavior in practical regular expression matching2014Ingår i: Proceedings 14th international conference on automata and formal languages (AFL 2014) / [ed] Zoltán Ésik; Zoltán Fülöp, Open Publishing Association , 2014, s. 109-123Konferensbidrag (Refereegranskat)
    Abstract [en]

    We consider in some detail how regular expression matching happens in Java, as a popular representative of the category of regex-directed matching engines. We extract a slightly idealized algorithm for this scenario. Next we define an automata model which captures all the aspects needed to perform matching, of the Java style, in a formal way. Finally, two types of static analysis, which take a regular expression and tells whether there exists a family of strings which make Java-style matching run in exponential time, are done.

  • 18.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Martens, Willeke
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Constructing a BPE Tokenization DFA2024Ingår i: Implementation and Application of Automata: 28th International Conference, CIAA 2024, Akita, Japan, September 3–6, 2024, Proceedings / [ed] Szilárd Zsolt Fazekas, Springer, 2024, Vol. 15015, s. 66-78Konferensbidrag (Refereegranskat)
    Abstract [en]

    Many natural language processing systems operate over tokenizations of text to address the open-vocabulary problem. In this paper, we give and analyze an algorithm for the efficient construction of deterministic finite automata (DFA) designed to operate directly on tokenizations produced by the popular byte pair encoding (BPE) technique. This makes it possible to apply many existing techniques and algorithms to the tokenized case, such as pattern matching, equivalence checking of tokenization dictionaries, and composing tokenized languages in various ways.

  • 19.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Formalizing BPE Tokenization2023Ingår i: 13th International Workshop onNon-Classical Models of Automata and Applications (NCMA 2023) / [ed] Nagy B., Freund R., Open Publishing Association , 2023, s. 16-27Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we formalize practical byte pair encoding tokenization as it is used in large language models and other NLP systems, in particular we formally define and investigate the semantics of the SentencePiece and HuggingFace tokenizers, in particular how they relate to each other, depending on how the tokenization rules are constructed. Beyond this we consider how tokenization can be performed in an incremental fashion, as well as doing it left-to-right using an amount of memory constant in the length of the string, enabling e.g. using a finite state string-to-string transducer.

    Ladda ner fulltext (pdf)
    fulltext
  • 20.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    van der Merwe, Brink
    On the semantics of regular expression parsing in the wild2017Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 679, s. 69-82Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We introduce prioritized transducers to formalize capturing groups in regular expression matching in a way that permits straightforward modeling of capturing in Java's 1 regular expression library. The broader questions of parsing semantics and performance are also considered. In addition, the complexity of deciding equivalence of regular expressions with capturing groups is investigated.

  • 21.
    Berglund, Martin
    et al.
    Centre for AI Research, CSIR, Department of Information Science, Stellenbosch University, South Africa.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, South Africa.
    Re-examining regular expressions with backreferences2023Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 940, s. 66-80Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Most modern regular expression matching libraries (one of the rare exceptions being Google's RE2) allow backreferences, operations which bind a substring to a variable, allowing it to be matched again verbatim. However, both real-world implementations and definitions in the literature use different syntactic restrictions and have differences in the semantics of the matching of backreferences. Our aim is to compare these various flavors by considering the classes of formal languages that each can describe, establishing, as a result, a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, old classes are extended to new ones, and several incidental results on the nature of these language classes are given.

  • 22.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Department of Information Science, Stellenbosch University, South Africa.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, South Africa.
    Regular Expressions with Backreferences Re-examined2017Ingår i: Proceedings of the Prague Stringology Conference 2017 / [ed] Jan Holub; Jan Zdárek, Praque Stringology Club , 2017, s. 30-41Konferensbidrag (Refereegranskat)
    Abstract [en]

    Most modern regular expression matching libraries (one of the rare exceptions being Google’s RE2) allow backreferences, operations which bind a substring to avariable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.

  • 23.
    Berglund, Martin
    et al.
    Department of Information Science, Stellenbosch University, Stellenbosch, South Africa; Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    van der Merwe, Brink
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Bruce, Watson
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa; Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    On the Semantics of Atomic Subgroups in Practical Regular Expressions2017Ingår i: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol; Cyril Nicaud, Springer, 2017, s. 14-26Konferensbidrag (Refereegranskat)
    Abstract [en]

    Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent needless backtracking, but the operators will often also change the language accepted. As such it is essential to develop a theoretical sound basis for the matching semantics of regular expressions with atomic operators. We here establish that atomic operators preserve regularity, but are exponentially more succinct for some languages. Further we investigate the state complexity of deterministic and non-deterministic finite automata accepting the language corresponding to a regular expression with atomic operators, and show that emptiness testing is PSPACE-complete.

  • 24.
    Berglund, Martin
    et al.
    Department of Information Science, Stellenbosch University, South Africa.
    van der Merwe, Brink
    Computer Science Division, Stellenbosch University, South Africa.
    van Litsenborgh, Steyn
    Computer Science Division, Stellenbosch University, South Africa.
    Regular Expressions with Lookahead2021Ingår i: Journal of universal computer science (Online), ISSN 0948-695X, E-ISSN 0948-6968, Vol. 27, nr 4, s. 324-340Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper investigates regular expressions which in addition to the standard operators of union, concatenation, and Kleene star, have lookaheads. We show how to translate regular expressions with lookaheads (REwLA) to equivalent Boolean automata having at most 3 states more than the length of the REwLA. We also investigate the state complexity when translating REwLA to equivalent deterministic finite automata (DFA).

    Ladda ner fulltext (pdf)
    fulltext
  • 25.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Petter, Ericson
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Uniform vs. Nonuniform Membership for Mildly Context-Sensitive Languages: A Brief Survey2016Ingår i: Algorithms, E-ISSN 1999-4893, Vol. 9, nr 2, artikel-id 32Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Parsing for mildly context-sensitive language formalisms is an important area within natural language processing. While the complexity of the parsing problem for some such formalisms is known to be polynomial, this is not the case for all of them. This article presents a series of results regarding the complexity of parsing for linear context-free rewriting systems and deterministic tree-walking transducers. We discuss the difference between uniform and nonuniform complexity measures and how parameterized complexity theory can be used to investigate how different aspects of the formalisms influence how hard the parsing problem is. The main results we survey are all hardness results and indicate that parsing is hard even for relatively small values of parameters such as rank and fan-out in a rewriting system.

    Ladda ner fulltext (pdf)
    fulltext
  • 26.
    van der Merwe, Brink
    et al.
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa; National Institute for Theoretical and Computational Sciences, Stellenbosch, South Africa.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Department of Information Science, Stellenbosch University, Stellenbosch, South Africa.
    Ordered Context-Free Grammars2022Ingår i: Implementation and Application of Automata: 26th International Conference, CIAA 2022, Rouen, France, June 28 – July 1, 2022, Proceedings / [ed] Pascal Caron; Ludovic Mignot, Springer Science+Business Media B.V., 2022, s. 53-66Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose a new unambiguous grammar formalism, referred to as ordered context-free grammars, which is identical to context-free grammars, apart from the property that it also places an order on parse trees. Since only a minor modification to ordered context-free grammars is required to obtain parsing expression grammars, the relationship between context-free grammars and parsing expression grammars becomes more evident. By preserving how ordered context-free grammars support left-recursion, parsing expression grammars is modified to support left recursion in ways much more natural than current approaches.

  • 27.
    van der Merwe, Brink
    et al.
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Mouton, Jacobie
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    van Litsenborgh, Steyn
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Berglund, Martin
    Department of Information Science, Stellenbosch University, Stellenbosch, South Africa.
    Memoized Regular Expressions2021Ingår i: Implementation and Application of Automata: 25th International Conference, CIAA 2021, Virtual Event, July 19-22, 2021, Proceedings / [ed] Sebastian Maneth, Springer, 2021, s. 39-52Konferensbidrag (Refereegranskat)
    Abstract [en]

    We extend non-deterministic finite automata (NFAs) and regular expressions (regexes) by adding memoization to these formalisms. These extensions are aimed at improving the matching time of backtracking regex matchers. Additionally, we discuss how to extend the concept of ambiguity in order to be applicable to memoized extensions of regexes and NFAs. These more general notions of ambiguity can be used to analyze the matching time of backtracking regex matchers enhanced with memoization.

  • 28.
    van der Merwe, Brink
    et al.
    Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa.
    Weideman, Nicolaas
    Department of Computer Science, Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    Berglund, Martin
    Department of Information Science, Center for AI Research, CSIR, Stellenbosch University, Stellenbosch, South Africa.
    Turning evil regexes harmless2017Ingår i: SAICSIT '17: Proceedings of the South African Institute of Computer Scientists and Information Technologists, ACM Digital Library, 2017, artikel-id 38Konferensbidrag (Refereegranskat)
    Abstract [en]

    We explore the relationship between ambiguity in automata and regular expressions on the one hand, and the matching time of backtracking regular expression matchers on the other. We focus in particular on the extreme cases where we have either an exponential amount of ambiguity or no ambiguity at all. We also investigate techniques to reduce or remove ambiguity from regular expressions, which can then be used to transform regular expressions which might be exploited by using algorithmic complexity, into harmless equivalent expressions.

  • 29. Weideman, Nicolaas
    et al.
    van der Merwe, Brink
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Watson, Bruce
    Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA2016Ingår i: Implementation and Application of Automata, Springer, 2016, s. 322-334Konferensbidrag (Refereegranskat)
    Abstract [en]

    We apply results from ambiguity of non-deterministic finite automata to the problem of determining the asymptotic worst-case matching time, as a function of the length of the input strings, when attempting to match input strings with a given regular expression, where the matcher being used is a backtracking regular expression matcher.

1 - 29 av 29
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf