umu.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards semantic language processing
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)
2018 (English)Licentiate thesis, comprehensive summary (Other academic)Alternative title
Mot semantisk språkbearbetning (Swedish)
Abstract [en]

The overall goal of the field of natural language processing is to facilitate the communication between humans and computers, and to help humans with natural language problems such as translation. In this thesis, we focus on semantic language processing. Modelling semantics – the meaning of natural language – requires both a structure to hold the semantic information and a device that can enforce rules on the structure to ensure well-formed semantics while not being too computationally heavy. The devices used in natural language processing are preferably weighted to allow for comparison of the alternative semantic interpretations outputted by a device.

The structure employed here is the abstract meaning representation (AMR). We show that AMRs representing well-formed semantics can be generated while leaving out AMRs that are not semantically well-formed. For this purpose, we use a type of graph grammar called contextual hyperedge replacement grammar (CHRG). Moreover, we argue that a more well-known subclass of CHRG – the hyperedge replacement grammar (HRG) – is not powerful enough for AMR generation. This is due to the limitation of HRG when it comes to handling co-references, which in its turn depends on the fact that HRGs only generate graphs of bounded treewidth.

Furthermore, we also address the N best problem, which is as follows: Given a weighted device, return the N best (here: smallest-weighted, or more intuitively, smallest-errored) structures. Our goal is to solve the N best problem for devices capable of expressing sophisticated forms of semantic representations such as CHRGs. Here, however, we merely take a first step consisting in developing methods for solving the N best problem for weighted tree automata and some types of weighted acyclic hypergraphs.

Place, publisher, year, edition, pages
Umeå: Department of Computing Science, Umeå University , 2018. , p. 16
Series
Report / UMINF, ISSN 0348-0542 ; 18.12
Keywords [en]
Weighted tree automata, abstract meaning representation, contextual hyperedge replacement grammar, hyperedge replacement grammar, semantic modelling, the N best problem
National Category
Computer Sciences
Research subject
Computer Science; datorlingvistik
Identifiers
URN: urn:nbn:se:umu:diva-153738ISBN: 978-91-7601-964-1 (print)OAI: oai:DiVA.org:umu-153738DiVA, id: diva2:1266417
Presentation
2018-12-07, MC413, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2018-11-29 Created: 2018-11-28 Last updated: 2018-11-29Bibliographically approved
List of papers
1. Contextual Hyperedge Replacement Grammars for Abstract Meaning Representations
Open this publication in new window or tab >>Contextual Hyperedge Replacement Grammars for Abstract Meaning Representations
2017 (English)In: Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms / [ed] M. Kuhlmann, T. Scheffler, Association for Computational Linguistics, 2017, p. 102-111Conference paper, Published paper (Refereed)
Abstract [en]

We show how contextual hyperedge replacement grammars can be used to generate abstract meaning representations (AMRs), and argue that they are more suitable for this purpose than hyperedge replacement grammars. Contextual hyperedge replacement turns out to have two advantages over plain hyperedge replacement: it can completely cover the language of all AMRs over a given domain of concepts, and at the same time its grammars become both smaller and simpler.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2017
Keywords
Abstract Meaning Representation, DAG Language, Contextual Hyperedge-Replacement
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-137921 (URN)
Conference
13th International Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+13), Umeå, Sweden, September 4-6, 2017
Available from: 2017-07-31 Created: 2017-07-31 Last updated: 2018-11-29Bibliographically approved
2. Finding the N Best Vertices in an Infinite Weighted Hypergraph
Open this publication in new window or tab >>Finding the N Best Vertices in an Infinite Weighted Hypergraph
2017 (English)In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 682, p. 78p. 30-41Article in journal (Refereed) Published
Abstract [en]

We propose an algorithm for computing the N best vertices in a weighted acyclic hypergraph over a nice semiring. A semiring is nice if it is finitely-generated, idempotent, and has 1 as its minimal element. We then apply the algorithm to the problem of computing the N best trees with respect to a weighted tree automaton, and complement theoretical correctness and complexity arguments with experimental data. The algorithm has several practical applications in natural language processing, for example, to derive the N most likely parse trees with respect to a probabilistic context-free grammar. 

Place, publisher, year, edition, pages
Elsevier, 2017. p. 78
Keywords
Hypergraph, N-best problem, Idempotent semiring
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-132501 (URN)10.1016/j.tcs.2017.03.010 (DOI)000405062100005 ()
Note

Special Issue: SI

Available from: 2017-03-15 Created: 2017-03-15 Last updated: 2018-11-29Bibliographically approved
3. A Comparison of Two N-Best Extraction Methods for Weighted Tree Automata
Open this publication in new window or tab >>A Comparison of Two N-Best Extraction Methods for Weighted Tree Automata
2018 (English)In: Implementation and Application of Automata: 23rd International Conference, CIAA 2018, Charlottetown, PE, Canada, July 30 – August 2, 2018, Proceedings, Springer, 2018, p. 197-108Conference paper, Published paper (Refereed)
Abstract [en]

We conduct a comparative study of two state-of-the-art al- gorithms for extracting the N best trees from a weighted tree automaton (wta). The algorithms are Best Trees, which uses a priority queue to structure the search space, and Filtered Runs, which is based on an algorithm by Huang and Chiang that extracts N best runs, implemented as part of the Tiburon wta toolkit. The experiments are run on four data sets, each consisting of a sequence of wtas of increasing sizes. Our conclusion is that Best Trees can be recommended when the input wtas exhibit a high or unpredictable degree of nondeterminism, whereas Filtered Runs is the better option when the input wtas are large but essentially deterministic.

Place, publisher, year, edition, pages
Springer, 2018
Series
Lecture Notes in Computer Science
Keywords
N-best list, tree automaton
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-149994 (URN)978-3-319-94812-6 (ISBN)978-3-319-94811-9 (ISBN)
Conference
23rd International Conference on Implementation and Applications of Automata (CIAA 2018), Charlottetown, Canada, July 30-August 2, 2018
Available from: 2018-06-30 Created: 2018-06-30 Last updated: 2018-11-29Bibliographically approved

Open Access in DiVA

fulltext(651 kB)15 downloads
File information
File name FULLTEXT02.pdfFile size 651 kBChecksum SHA-512
839c2cf51b4b452f836f25453c887f73700e1e0fb5d4f04cfcf2e77035458c084360fda773953c2873efd41fdc2cbed057e7a822166b57160eb9277516eda37b
Type fulltextMimetype application/pdf

Authority records BETA

Jonsson, Anna

Search in DiVA

By author/editor
Jonsson, Anna
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 15 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 86 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf