umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Complexities of Order-Related Formal Language Extensions
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Naturliga och Formella Språk)
2014 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Komplexiteter hos ordnings-relaterade utökningar av formella språk (Swedish)
Abstract [en]

The work presented in this thesis discusses various formal language formalisms that extend classical formalisms like regular expressions and context-free grammars with additional abilities, most relating to order. This is done while focusing on the impact these extensions have on the efficiency of parsing the languages generated. That is, rather than taking a step up on the Chomsky hierarchy to the context-sensitive languages, which makes parsing very difficult, a smaller step is taken, adding some mechanisms which permit interesting spatial (in)dependencies to be modeled.

The most immediate example is shuffle formalisms, where existing language formalisms are extended by introducing operators which generate arbitrary interleavings of argument languages. For example, introducing a shuffle operator to the regular expressions does not make it possible to recognize context-free languages like anbn, but it does capture some non-context-free languages like the language of all strings containing the same number of as, bs and cs. The impact these additions have on parsing has many facets. Other than shuffle operators we also consider formalisms enforcing repeating substrings, formalisms moving substrings around, and formalisms that restrict which substrings may be concatenated. The formalisms studied here all have a number of properties in common.

  1. They are closely related to existing regular and context-free formalisms. They operate in a step-wise fashion, deriving strings by sequences of rule applications of individually limited power.
  2. Each step generates a constant number of symbols and does not modify parts that have already been generated. That is, strings are built in an additive fashion that does not explode in size (in contrast to e.g. Lindenmayer systems). All languages here will have a semi-linear Parikh image.
  3. They feature some interesting characteristic involving order or other spatial constraints. In the example of the shuffle multiple derivations are in a sense interspersed in a way that each is unaware of.
  4. All of the formalisms are intended to be limited enough to make an efficient parsing algorithm at least for some cases a reasonable goal.

This thesis will give intuitive explanations of a number of formalisms fulfilling these requirements, and will sketch some results relating to the parsing problem for them. This should all be viewed as preparation for the more complete results and explanations featured in the papers given in the appendices.

Abstract [sv]

Denna avhandling diskuterar utökningar av klassiska formalismer inom formella språk, till exempel reguljära uttryck och kontextfria grammatiker. Utökningarna handlar på ett eller annat sätt omordning, och ett särskilt fokus ligger på att göra utökningarna på ett sätt som dels har intressanta spatiala/ordningsrelaterade effekter och som dels bevarar den effektiva parsningen som är möjlig för de ursprungliga klassiska formalismerna. Detta står i kontrast till att ta det större steget upp i Chomsky-hierarkin till de kontextkänsliga språken, vilket medför ett svårt parsningsproblem.

Ett omedelbart exempel på en sådan utökning är s.k. shuffle-formalismer. Dessa utökar existerande formalismer genom att introducera operatorer som godtyckligt sammanflätar strängar från argumentspråk. Om shuffle-operator introduceras till de reguljära uttrycken ger det inte förmågan att känna igen t.ex. det kontextfria språket anbn, men det fångar istället vissa språk som inte är kontextfria, till exempel språket som består av alla strängar som innehåller lika många a:n, b:n och c:n. Sättet på vilket dessa utökningar påverkar parsningsproblemet är mångfacetterat. Utöver dessa shuffle-operatorer tas också formalismer där delsträngar kan upprepas, formalismer där delsträngar flyttas runt, och formalismer som begränsar hur delsträngar får konkateneras upp. Formalismerna som tas upp här har dock vissa egenskaper gemensamma.

  1. De är nära besläktade med de klassiska reguljära och kontextfria formalismerna. De arbetar stegvis, och konstruerar strängar genom successiva applikationer av individuellt enkla regler.
  2. Varje steg genererar ett konstant antal symboler och modifierar inte det som redan genererats. Det vill säga, strängar byggs additivt och längden på dem kan inte explodera (i kontrast till t.ex. Lindenmayer-system). Alla språk som tas upp kommer att ha en semi-linjär Parikh-avbildning.
  3. De har någon instressant spatial/ordningsrelaterad egenskap. Exempelvis sättet på vilket shuffle-operatorer sammanflätar annars oberoende deriveringar.
  4. Alla formalismerna är tänkta att vara begränsade nog att det är resonabelt att ha effektiv parsning som mål.

Denna avhandling kommer att ge intuitiva förklaring av ett antal formalismer som uppfyller ovanstående krav, och kommer att skissa en blandning av resultat relaterade till parsningsproblemet för dem. Detta bör ses som förberedande inför läsning av de mer djupgående och komplexa resultaten och förklaringarna i de artiklar som finns inkluderade som appendix.

Place, publisher, year, edition, pages
Umeå: Umeå Universitet , 2014. , 60 p.
Series
Report / UMINF, ISSN 0348-0542 ; 2014:13
Keyword [en]
complexity, automata, languages, order, shuffle, regular expressions, context-free, mildly context-sensitive, parsing, membership, algorithm analysis
National Category
Computer Science
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-88235ISBN: 978-91-7601-047-1 (print)OAI: oai:DiVA.org:umu-88235DiVA: diva2:714563
Public defence
2014-05-21, Naturvetarhuset, N320, Umeå Universitet, Umeå, 13:15 (English)
Opponent
Supervisors
Available from: 2014-04-30 Created: 2014-04-28 Last updated: 2014-04-30Bibliographically approved
List of papers
1. Shuffled languages: representation and recognition
Open this publication in new window or tab >>Shuffled languages: representation and recognition
2013 (English)In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 489-490, 1-20 p.Article in journal (Refereed) Published
Abstract [en]

Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, in other words, how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

Place, publisher, year, edition, pages
Elsevier, 2013
Keyword
Interleaving, Shuffle languages, Membership problems
National Category
Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-79714 (URN)10.1016/j.tcs.2013.04.022 (DOI)000320973900001 ()
Projects
Parameterized Natural Language Parsing
Funder
Swedish Research Council, 621-2011-6080
Available from: 2013-08-29 Created: 2013-08-29 Last updated: 2017-12-06Bibliographically approved
2. On the Parameterized Complexity of Linear Context-Free Rewriting Systems
Open this publication in new window or tab >>On the Parameterized Complexity of Linear Context-Free Rewriting Systems
2013 (English)In: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, 21-29 p.Conference paper, Published paper (Other academic)
Abstract [en]

We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2013
Keyword
parameterized complexity, linear context-free rewriting system
National Category
Language Technology (Computational Linguistics) Computer Science
Research subject
datorlingvistik; Computer Science
Identifiers
urn:nbn:se:umu:diva-71671 (URN)978-1-937284-65-7 (ISBN)
Conference
13th Meeting on the Mathematics of Language (MoL 13), Sofia, Bulgaria, August 9, 2013
Projects
Parameterized Natural Language Parsing
Funder
Swedish Research Council, 2011-6080
Available from: 2013-06-04 Created: 2013-06-04 Last updated: 2014-04-29Bibliographically approved
3. Cuts in Regular Expressions
Open this publication in new window or tab >>Cuts in Regular Expressions
Show others...
2013 (English)In: Developments in Language Theory: 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013. Proceedings / [ed] Marie-Pierre Béal, Olivier Carton, Springer Berlin/Heidelberg, 2013, 70-81 p.Conference paper, Published paper (Refereed)
Abstract [en]

Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in regular expressions. We formalize this notion in the form of the cut and iterated cut operators. They do not extend the class of languages that can be defined beyond the regular, but they allow for exponentially more succinct representation of some languages. Membership testing remains polynomial, but emptiness testing becomes PSPACE-hard. 

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7907
Keyword
regular expression, finite automaton, cut operator
National Category
Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-67738 (URN)10.1007/978-3-642-38771-5_8 (DOI)978-3-642-38770-8 (ISBN)978-3-642-38771-5 (ISBN)
Conference
17th International Conference on Developments in Language Theory, DLT 2013, 18 June 2013 through 21 June 2013, Marne-la-Vallee
Available from: 2013-03-28 Created: 2013-03-28 Last updated: 2014-04-29Bibliographically approved
4. Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching
Open this publication in new window or tab >>Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We consider in some detail how regular expression matching happens in Java, as a popular representative of the category of regex-directed matching engines. We extract a slightly idealized algorithm for this scenario. Next we define an automata model which captures all the aspects needed to perform matching, of the Java style, in a formal way. Finally, two types of static analysis, which take a regular expression and tells whether there exists a family of strings which make Java-style matching run in exponential time, are done.

Keyword
regular expressions, catastrophic backtracking, static analysis
National Category
Computer Science
Identifiers
urn:nbn:se:umu:diva-88232 (URN)
Conference
14th International Conference Automata and Formal Languages
Available from: 2014-04-28 Created: 2014-04-28 Last updated: 2014-04-29Bibliographically approved
5. Characterizing Non-Regularity
Open this publication in new window or tab >>Characterizing Non-Regularity
2014 (English)Report (Other academic)
Abstract [en]

This paper considers a characterization of the context-free non-regular languages, conjecturing that there for all such languages exists a fixed string thatcan be pumped to exhibit infinitely many equivalence classes. A proof is given only for a special case, but the general statement is conjectured to hold. The conjecture is then shown to imply that the shuffle of two context-free languagesis not context-free.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2014. 9 p.
Series
Report / UMINF, ISSN 0348-0542 ; 14.12
Keyword
context-free languages, characterization, regular languages, pumping lemma, shuffle
National Category
Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-88231 (URN)
Available from: 2014-04-28 Created: 2014-04-28 Last updated: 2014-04-29Bibliographically approved
6. Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable
Open this publication in new window or tab >>Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable
2011 (English)In: Proceedings of the Prague Stringology Conference 2011 / [ed] Jan Holub and Jan Žďárek, Prague: Prague Stringology Club, Czech Technical University , 2011, 59-73 p.Conference paper, Published paper (Refereed)
Abstract [en]

The string correction problem looks at minimal ways to modify one stringinto another using fixed operations, such as for example inserting a symbol, deleting asymbol and interchanging the positions of two symbols (a “swap”). This has been generalizedto trees in various ways, but unfortunately having operations to insert/deletenodes in the tree and operations that move subtrees, such as a “swap” of adjacent subtrees,makes the correction problem for trees intractable. In this paper we investigatewhat happens when we have a tree edit distance problem with only swaps. We callthis problem tree swap distance, and go on to prove that this correction problem isNP-complete. This suggests that the swap operation is fundamentally problematic inthe tree case, and other subtree movement models should be studied.

Place, publisher, year, edition, pages
Prague: Prague Stringology Club, Czech Technical University, 2011
Keyword
formella språk, träd, omordning, komplexitet, NP.komplett
National Category
Computer Science
Research subject
Computer and Information Science
Identifiers
urn:nbn:se:umu:diva-54158 (URN)000325068600006 ()978-80-01-04870-2 (ISBN)
Conference
16th Prague Stringology Conference (PSC), August 29-31 2011, Prague
Available from: 2012-04-18 Created: 2012-04-17 Last updated: 2017-01-17Bibliographically approved

Open Access in DiVA

Complexities of Order-Related Formal Language Extensions(1694 kB)296 downloads
File information
File name FULLTEXT02.pdfFile size 1694 kBChecksum SHA-512
235eb8a197fa2e9acfae6b2d44c60b41fe3f1641942190a6aa91ab039c7e0de71c0a34462da4cd43c3d1b193b4f3a4198fb30dac4c0bb09ed3010e3bbefc6108
Type fulltextMimetype application/pdf

Authority records BETA

Berglund, Martin

Search in DiVA

By author/editor
Berglund, Martin
By organisation
Department of Computing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 296 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 244 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf