Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Polynomial Graph Parsing with Non-Structural Reentrancies
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0001-8503-0118
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0001-7349-7693
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-9873-4170
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Graph-based semantic representations are valuable in natural language processing, where it is often simple and effective to represent linguistic concepts as nodes, and relations as edges between them. Several attempts has been made to find a generative device that is sufficiently powerful to represent languages of semantic graphs, while at the same allowing efficient parsing. We add to this line of work by introducing graph extension grammar, which consists of an algebra over graphs together with a regular tree grammar that generates expressions over the operations of the algebra. Due to the design of the operations, these grammars can generate graphs with non-structural reentrancies, a type of node-sharing that is excessively common in formalisms such as abstract meaning representation, but for which existing devices offer little support. We provide a parsing algorithm for graph extension grammars, which is proved to be correct and run in polynomial time. 

National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-182986OAI: oai:DiVA.org:umu-182986DiVA, id: diva2:1553941
Available from: 2021-05-11 Created: 2021-05-11 Last updated: 2021-05-11
In thesis
1. Best Trees Extraction and Contextual Grammars for Language Processing
Open this publication in new window or tab >>Best Trees Extraction and Contextual Grammars for Language Processing
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Extrahering av optimala träd samt kontextuella grafgrammatiker för språkbearbetning
Abstract [en]

In natural language processing, the syntax of a sentence refers to the words used in the sentence, their grammatical role, and their order. Semantics concerns the concepts represented by the words in the sentence and their relations, i.e., the meaning of the sentence. While a human can easily analyse a sentence in a language they understand to figure out its grammatical construction and meaning, this is a difficult task for a computer. To analyse natural language, the computer needs a language model. First and foremost, the computer must have data structures that can represent syntax and semantics. Then, the computer requires some information about what is considered correct syntax and semantics – this can be provided in the form of human-annotated corpora of natural language. Computers use formal languages such as programming languages, and our goal is thus to model natural languages using formal languages. There are several ways to capture the correctness aspect of a natural language corpus in a formal language model. One strategy is to specify a formal language using a set of rules that are, in a sense, very similar to the grammatical rules of natural language. In this thesis, we only consider such rule-based formalisms.

Trees are commonly used to represent syntactic analyses of sentences, and graphs can represent the semantics of sentences. Examples of rule-based formalisms that define languages of trees and graphs are tree automata and graph grammars, respectively. When used in language processing, the rules of a formalism are normally given weights, which are then combined as specified by the formalism to assign weights to the trees or graphs in its language. The weights enable us to rank the trees and graphs by their similarity to the linguistic data in the human-annotated corpora. 

Since natural language is very complicated to model, there are many small gaps in the research of natural language processing to address. The research of this thesis considers two separate but related problems: First, we have the N-best problem, which is about finding a number N of top-ranked hypotheses given a ranked hypothesis space. In our case, the hypothesis space is represented by a weighted rule-based formalism, making the hypothesis space a weighted formal language. The hypotheses themselves can for example have the form of weighted syntax trees. The second problem is that of semantic modelling, whose aim is to find a formalism complex enough to define languages of semantic representations. This model can however not be too complex since we still want to be able to efficiently compute solutions to language processing tasks.

This thesis is divided into two parts according to the two problems introduced above. The first part covers the N-best problem for weighted tree automata. In this line of research, we develop and evaluate multiple versions of an efficient algorithm that solves the problem in question. Since our algorithm is the first to do so, we theoretically and experimentally evaluate it in comparison to the state-of-the-art algorithm for solving an easier version of the problem. In the second part, we study how rule-based formalisms can be used to model graphs that represent meaning, i.e., semantic graphs. We investigate an existing formalism and through this work learn what properties of that formalism are necessary for semantic modelling. Finally, we use our new-found knowledge to develop a more specialised formalism, and argue that it is better suited for the task of semantic modelling than existing formalisms.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2021. p. 60
Series
Report / UMINF, ISSN 0348-0542 ; 21.04
Keywords
Weighted tree automata, the N-best problem, efficient algorithms, semantic graph, abstract meaning representation, contextual graph grammars, hyperedge replacement, graph extensions
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-182989 (URN)978-91-7855-521-5 (ISBN)978-91-7855-522-2 (ISBN)
Public defence
2021-06-11, MA316, MIT-huset, plan 3, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2021-05-21 Created: 2021-05-11 Last updated: 2021-05-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Björklund, JohannaDrewes, FrankJonsson, Anna

Search in DiVA

By author/editor
Björklund, JohannaDrewes, FrankJonsson, Anna
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 192 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf