Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Finding the N Best Vertices in an Infinite Weighted Hypergraph
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)ORCID iD: 0000-0001-7349-7693
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)
2017 (English)In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 682, p. 78p. 30-41Article in journal (Refereed) Published
Abstract [en]

We propose an algorithm for computing the N best vertices in a weighted acyclic hypergraph over a nice semiring. A semiring is nice if it is finitely-generated, idempotent, and has 1 as its minimal element. We then apply the algorithm to the problem of computing the N best trees with respect to a weighted tree automaton, and complement theoretical correctness and complexity arguments with experimental data. The algorithm has several practical applications in natural language processing, for example, to derive the N most likely parse trees with respect to a probabilistic context-free grammar. 

Place, publisher, year, edition, pages
Elsevier, 2017. Vol. 682, p. 78p. 30-41
Keywords [en]
Hypergraph, N-best problem, Idempotent semiring
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-132501DOI: 10.1016/j.tcs.2017.03.010ISI: 000405062100005Scopus ID: 2-s2.0-85016174936OAI: oai:DiVA.org:umu-132501DiVA, id: diva2:1081961
Note

Special Issue: SI

Available from: 2017-03-15 Created: 2017-03-15 Last updated: 2023-03-24Bibliographically approved
In thesis
1. Towards semantic language processing
Open this publication in new window or tab >>Towards semantic language processing
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Alternative title[sv]
Mot semantisk språkbearbetning
Abstract [en]

The overall goal of the field of natural language processing is to facilitate the communication between humans and computers, and to help humans with natural language problems such as translation. In this thesis, we focus on semantic language processing. Modelling semantics – the meaning of natural language – requires both a structure to hold the semantic information and a device that can enforce rules on the structure to ensure well-formed semantics while not being too computationally heavy. The devices used in natural language processing are preferably weighted to allow for comparison of the alternative semantic interpretations outputted by a device.

The structure employed here is the abstract meaning representation (AMR). We show that AMRs representing well-formed semantics can be generated while leaving out AMRs that are not semantically well-formed. For this purpose, we use a type of graph grammar called contextual hyperedge replacement grammar (CHRG). Moreover, we argue that a more well-known subclass of CHRG – the hyperedge replacement grammar (HRG) – is not powerful enough for AMR generation. This is due to the limitation of HRG when it comes to handling co-references, which in its turn depends on the fact that HRGs only generate graphs of bounded treewidth.

Furthermore, we also address the N best problem, which is as follows: Given a weighted device, return the N best (here: smallest-weighted, or more intuitively, smallest-errored) structures. Our goal is to solve the N best problem for devices capable of expressing sophisticated forms of semantic representations such as CHRGs. Here, however, we merely take a first step consisting in developing methods for solving the N best problem for weighted tree automata and some types of weighted acyclic hypergraphs.

Place, publisher, year, edition, pages
Umeå: Department of Computing Science, Umeå University, 2018. p. 16
Series
Report / UMINF, ISSN 0348-0542 ; 18.12
Keywords
Weighted tree automata, abstract meaning representation, contextual hyperedge replacement grammar, hyperedge replacement grammar, semantic modelling, the N best problem
National Category
Computer Sciences
Research subject
Computer Science; computational linguistics
Identifiers
urn:nbn:se:umu:diva-153738 (URN)978-91-7601-964-1 (ISBN)
Presentation
2018-12-07, MC413, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2018-11-29 Created: 2018-11-28 Last updated: 2018-11-29Bibliographically approved
2. Best Trees Extraction and Contextual Grammars for Language Processing
Open this publication in new window or tab >>Best Trees Extraction and Contextual Grammars for Language Processing
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Extrahering av optimala träd samt kontextuella grafgrammatiker för språkbearbetning
Abstract [en]

In natural language processing, the syntax of a sentence refers to the words used in the sentence, their grammatical role, and their order. Semantics concerns the concepts represented by the words in the sentence and their relations, i.e., the meaning of the sentence. While a human can easily analyse a sentence in a language they understand to figure out its grammatical construction and meaning, this is a difficult task for a computer. To analyse natural language, the computer needs a language model. First and foremost, the computer must have data structures that can represent syntax and semantics. Then, the computer requires some information about what is considered correct syntax and semantics – this can be provided in the form of human-annotated corpora of natural language. Computers use formal languages such as programming languages, and our goal is thus to model natural languages using formal languages. There are several ways to capture the correctness aspect of a natural language corpus in a formal language model. One strategy is to specify a formal language using a set of rules that are, in a sense, very similar to the grammatical rules of natural language. In this thesis, we only consider such rule-based formalisms.

Trees are commonly used to represent syntactic analyses of sentences, and graphs can represent the semantics of sentences. Examples of rule-based formalisms that define languages of trees and graphs are tree automata and graph grammars, respectively. When used in language processing, the rules of a formalism are normally given weights, which are then combined as specified by the formalism to assign weights to the trees or graphs in its language. The weights enable us to rank the trees and graphs by their similarity to the linguistic data in the human-annotated corpora. 

Since natural language is very complicated to model, there are many small gaps in the research of natural language processing to address. The research of this thesis considers two separate but related problems: First, we have the N-best problem, which is about finding a number N of top-ranked hypotheses given a ranked hypothesis space. In our case, the hypothesis space is represented by a weighted rule-based formalism, making the hypothesis space a weighted formal language. The hypotheses themselves can for example have the form of weighted syntax trees. The second problem is that of semantic modelling, whose aim is to find a formalism complex enough to define languages of semantic representations. This model can however not be too complex since we still want to be able to efficiently compute solutions to language processing tasks.

This thesis is divided into two parts according to the two problems introduced above. The first part covers the N-best problem for weighted tree automata. In this line of research, we develop and evaluate multiple versions of an efficient algorithm that solves the problem in question. Since our algorithm is the first to do so, we theoretically and experimentally evaluate it in comparison to the state-of-the-art algorithm for solving an easier version of the problem. In the second part, we study how rule-based formalisms can be used to model graphs that represent meaning, i.e., semantic graphs. We investigate an existing formalism and through this work learn what properties of that formalism are necessary for semantic modelling. Finally, we use our new-found knowledge to develop a more specialised formalism, and argue that it is better suited for the task of semantic modelling than existing formalisms.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2021. p. 60
Series
Report / UMINF, ISSN 0348-0542 ; 21.04
Keywords
Weighted tree automata, the N-best problem, efficient algorithms, semantic graph, abstract meaning representation, contextual graph grammars, hyperedge replacement, graph extensions
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-182989 (URN)978-91-7855-521-5 (ISBN)978-91-7855-522-2 (ISBN)
Public defence
2021-06-11, MA316, MIT-huset, plan 3, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2021-05-21 Created: 2021-05-11 Last updated: 2021-05-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Björklund, JohannaDrewes, FrankJonsson, Anna

Search in DiVA

By author/editor
Björklund, JohannaDrewes, FrankJonsson, Anna
By organisation
Department of Computing Science
In the same journal
Theoretical Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 1448 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf