umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Complexity and expressiveness for formal structures in Natural Language Processing
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Foundations of Language Processing)ORCID-id: 0000-0002-8722-5661
2017 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The formalized and algorithmic study of human language within the field of Natural Language Processing (NLP) has motivated much theoretical work in the related field of formal languages, in particular the subfields of grammar and automata theory. Motivated and informed by NLP, the papers in this thesis explore the connections between expressibility – that is, the ability for a formal system to define complex sets of objects – and algorithmic complexity – that is, the varying amount of effort required to analyse and utilise such systems. Our research studies formal systems working not just on strings, but on more complex structures such as trees and graphs, in particular syntax trees and semantic graphs. The field of mildly context-sensitive languages concerns attempts to find a useful class of formal languages between the context-free and context-sensitive. We study formalisms defining two candidates for this class; tree-adjoining languages and the languages defined by linear context-free rewriting systems. For the former, we specifically investigate the tree languages, and define a subclass and tree automaton with linear parsing complexity. For the latter, we use the framework of parameterized complexity theory to investigate more deeply the related parsing problems, as well as the connections between various formalisms defining the class. The field of semantic modelling aims towards formally and accurately modelling not only the syntax of natural language statements, but also the meaning. In particular, recent work in semantic graphs motivates our study of graph grammars and graph parsing. To the best of our knowledge, the formalism presented in Paper III of this thesis is the first graph grammar where the uniform parsing problem has polynomial parsing complexity, even for input graphs of unbounded node degree.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå Universitet , 2017. , s. 18
Serie
Report / UMINF, ISSN 0348-0542 ; 17.13
Nyckelord [en]
graph grammars, formal languages, natural language processing, parameterized complexity, abstract meaning representation, tree automata, deterministic tree-walking transducers, mildly context-sensitive languages, hyperedge replacement, tree adjoining languages, minimally adequate teacher
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-135014ISBN: 9789176017227 (tryckt)OAI: oai:DiVA.org:umu-135014DiVA, id: diva2:1095810
Presentation
2017-05-19, MA121, MIT-huset 901 87 Umeå, Umeå, 10:15 (Engelska)
Handledare
Tillgänglig från: 2017-05-17 Skapad: 2017-05-16 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Delarbeten
1. A bottom-up automaton for tree adjoining languages
Öppna denna publikation i ny flik eller fönster >>A bottom-up automaton for tree adjoining languages
2015 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

Current tree parsing algorithms for nonregular tree languages all have superlinear running times, possibly limiting their practical applicability. We present a bottom-up tree automaton that captures exactly the tree-adjoining languages in the non-deterministic case. The determinstic case captures a strict superset of the regular tree languages, while preserving running times linear in the size of the tree.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2015. s. 7
Serie
Report / UMINF, ISSN 0348-0542 ; 15.14
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-134999 (URN)
Tillgänglig från: 2017-05-15 Skapad: 2017-05-15 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
2. A note on the complexity of deterministic tree-walking transducers
Öppna denna publikation i ny flik eller fönster >>A note on the complexity of deterministic tree-walking transducers
2013 (Engelska)Ingår i: Non-Classical Models of Automata and Applications (NCMA 2013), Österreichische Computer Gesellschaft , 2013Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Österreichische Computer Gesellschaft, 2013
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-79716 (URN)
Konferens
Non-Classical Models of Automata and Applications (NCMA 2013), August 13 - 14, 2013, Umeå, Sweden
Forskningsfinansiär
Vetenskapsrådet, 621-2011-6080
Tillgänglig från: 2013-08-29 Skapad: 2013-08-29 Senast uppdaterad: 2019-07-05Bibliografiskt granskad
3. Between a Rock and a Hard Place: Parsing for Hyperege Replacement DAG Grammars
Öppna denna publikation i ny flik eller fönster >>Between a Rock and a Hard Place: Parsing for Hyperege Replacement DAG Grammars
2016 (Engelska)Ingår i: Proc. 10th International Conference on Language and Automata Theory and Applications (LATA 2016) / [ed] A.H. Dediu, J. Janoušek, C. Martín-Vide, and B. Truthe, Springer Publishing Company, 2016, Vol. 9618, s. 521-532Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Motivated by applications in natural language processing, we study the uniform membership problem for hyperedge-replacement grammars that generate directed acyclic graphs. Our major result is a low-degree polynomial-time algorithm that solves the uniform membership problem for a restricted type of such grammars. We motivate the necessity of the restrictions by two different NP-completeness results.

Ort, förlag, år, upplaga, sidor
Springer Publishing Company, 2016
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9618
Nyckelord
Graph grammar, Hyperedge replacement, Abstract meaning representation, DAG grammar, Uniform membership problem, Parsing
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-111984 (URN)10.1007/978-3-319-30000-9_40 (DOI)000378745100045 ()978-3-319-30000-9 (ISBN)978-3-319-29999-0 (ISBN)
Konferens
10th International Conference on Language and Automata Theory and Applications (LATA 2016), Prague, Czech Republic, March 14-18, 2016
Tillgänglig från: 2015-11-29 Skapad: 2015-11-29 Senast uppdaterad: 2019-01-07Bibliografiskt granskad
4. On the Regularity and Learnability of Ordered DAG Languages
Öppna denna publikation i ny flik eller fönster >>On the Regularity and Learnability of Ordered DAG Languages
2017 (Engelska)Ingår i: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol and Cyril Nicaud, Cham, 2017, s. 27-39Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Order-Preserving DAG Grammars (OPDGs) is a subclass of Hyper-Edge Replacement Grammars that can be parsed in polynomial time. Their associated class of languages is known as Ordered DAG Lan- guages, and the graphs they generate are characterised by being acyclic, rooted, and having a natural order on their nodes. OPDGs are useful in natural-language processing to model abstract meaning representa- tions. We state and prove a Myhill-Nerode theorem for ordered DAG languages, and translate it into a MAT-learning algorithm for the same class. The algorithm infers a minimal OPDG G for the target language in time polynomial in G and the samples provided by the MAT oracle. 

Ort, förlag, år, upplaga, sidor
Cham: , 2017
Serie
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743, E-ISSN 1611-3349 ; 10329
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-135010 (URN)10.1007/978-3-319-60134-2_3 (DOI)2-s2.0-85022042128 (Scopus ID)978-3-319-60133-5 (ISBN)978-3-319-60134-2 (ISBN)
Konferens
22nd International Conference Implementation and Application of Automata, CIAA 2017, Université Paris-Est Marne-la-Vallée, France, 27-30 June 2017
Tillgänglig från: 2017-05-16 Skapad: 2017-05-16 Senast uppdaterad: 2019-07-05Bibliografiskt granskad
5. Investigating different graph representations of semantics
Öppna denna publikation i ny flik eller fönster >>Investigating different graph representations of semantics
2016 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Combinatory Categorial Grammar is a generic approach to the mechanical understanding of language, where movement is minimised in favour of using combinators such as B (composition) and T (type lifting) to clearly define in which ways various constituents can refer to each other. Taking the tree languages induced by the syntactic derivations and connecting the various leaves linked through the semantics, one ends up with a class of graph languages. The present work aims to point out promising avenues of research in order to investigate this class, specifically in terms of similarities with other graph-based semantic representations, such as Abstract Meaning Representations (AMR), and furthermore what graph generating or recognising formalism would be most suitable to define the class characteristics.

Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-134998 (URN)
Konferens
Sixth Swedish Language Technology Conference (SLTC), Umeå University, 17-18 November, 2016
Tillgänglig från: 2017-05-15 Skapad: 2017-05-15 Senast uppdaterad: 2018-06-09Bibliografiskt granskad

Open Access i DiVA

fulltext(332 kB)166 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 332 kBChecksumma SHA-512
9668d54b1b7d91cb29b8b97739cd1c1c0ade7d61b32d4944684fa94a44571f9ec8242f95697c2e17adf45bdf8e77b08a8b9e5437905e789172913bd534c3c3e7
Typ fulltextMimetyp application/pdf

Övriga länkar

URL

Personposter BETA

Ericson, Petter

Sök vidare i DiVA

Av författaren/redaktören
Ericson, Petter
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 166 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 602 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf