umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Björklund, Johanna
Alternative names
Publications (10 of 40) Show all publications
Björklund, J., Drewes, F. & Zechner, N. (2019). Efficient enumeration of weighted tree languages over the tropical semiring. Journal of computer and system sciences (Print), 104, 119-130
Open this publication in new window or tab >>Efficient enumeration of weighted tree languages over the tropical semiring
2019 (English)In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 104, p. 78p. 119-130Article in journal (Refereed) Published
Abstract [en]

We generalise a search algorithm by Mohri and Riley from strings to trees. The original algorithm takes as input a nondeterministic weighted automaton M over the tropical semiring and an integer N, and outputs N strings of minimal weight with respect to M. In our setting, M is a weighted tree automaton, again over the tropical semiring, and the output is a set of N trees with minimal weight in this language. We prove that the algorithm is correct, and that its time complexity is a low polynomial in N and the relevant size parameters of M. 

Publisher
p. 78
Keywords
weighted tree automaton, N-best analysis, tropical semiring
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-132963 (URN)10.1016/j.jcss.2017.03.006 (DOI)000472246300008 ()
Available from: 2017-03-27 Created: 2017-03-27 Last updated: 2019-07-12Bibliographically approved
Björklund, J., Drewes, F. & Satta, G. (2019). Z-Automata for Compact and Direct Representation of Unranked Tree Languages. In: Michal Hospodár, Galina Jirásková (Ed.), Implementation and Application of Automata: 24th International Conference, CIAA 2019 Košice, Slovakia, July 22–25, 2019 Proceedings. Paper presented at Implementation and Application of Automata - 24th International Conference (CIAA 2019), Slovakia, July 22-25, 2019 (pp. 83-94). Springer
Open this publication in new window or tab >>Z-Automata for Compact and Direct Representation of Unranked Tree Languages
2019 (English)In: Implementation and Application of Automata: 24th International Conference, CIAA 2019 Košice, Slovakia, July 22–25, 2019 Proceedings / [ed] Michal Hospodár, Galina Jirásková, Springer, 2019, p. 83-94Conference paper, Published paper (Refereed)
Abstract [en]

Unranked tree languages are valuable in natural language processing for modelling dependency trees. We introduce a new type of automaton for unranked tree languages, called Z-automaton, that is tailored for this particular application. The Z-automaton offers a compact form of representation, and unlike the closely related notion of stepwise automata, does not require a binary encoding of its input. We establish an arc-factored normal form, and prove the membership problem of Z-automata in normal form to be in O(mn), where m is the size of the transition table of the Z-automaton and n is the size of the input tree.

Place, publisher, year, edition, pages
Springer, 2019
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 11601
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-158624 (URN)10.1007/978-3-030-23679-3_7 (DOI)978-3-030-23678-6 (ISBN)978-3-030-23679-3 (ISBN)
Conference
Implementation and Application of Automata - 24th International Conference (CIAA 2019), Slovakia, July 22-25, 2019
Available from: 2019-05-03 Created: 2019-05-03 Last updated: 2019-07-16
Björklund, J., Drewes, F. & Jonsson, A. (2018). A Comparison of Two N-Best Extraction Methods for Weighted Tree Automata. In: Implementation and Application of Automata: 23rd International Conference, CIAA 2018, Charlottetown, PE, Canada, July 30 – August 2, 2018, Proceedings. Paper presented at 23rd International Conference on Implementation and Applications of Automata (CIAA 2018), Charlottetown, Canada, July 30-August 2, 2018 (pp. 197-108). Springer
Open this publication in new window or tab >>A Comparison of Two N-Best Extraction Methods for Weighted Tree Automata
2018 (English)In: Implementation and Application of Automata: 23rd International Conference, CIAA 2018, Charlottetown, PE, Canada, July 30 – August 2, 2018, Proceedings, Springer, 2018, p. 197-108Conference paper, Published paper (Refereed)
Abstract [en]

We conduct a comparative study of two state-of-the-art al- gorithms for extracting the N best trees from a weighted tree automaton (wta). The algorithms are Best Trees, which uses a priority queue to structure the search space, and Filtered Runs, which is based on an algorithm by Huang and Chiang that extracts N best runs, implemented as part of the Tiburon wta toolkit. The experiments are run on four data sets, each consisting of a sequence of wtas of increasing sizes. Our conclusion is that Best Trees can be recommended when the input wtas exhibit a high or unpredictable degree of nondeterminism, whereas Filtered Runs is the better option when the input wtas are large but essentially deterministic.

Place, publisher, year, edition, pages
Springer, 2018
Series
Lecture Notes in Computer Science
Keywords
N-best list, tree automaton
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-149994 (URN)10.1007/978-3-319-94812-6_9 (DOI)000469285600009 ()978-3-319-94812-6 (ISBN)978-3-319-94811-9 (ISBN)
Conference
23rd International Conference on Implementation and Applications of Automata (CIAA 2018), Charlottetown, Canada, July 30-August 2, 2018
Available from: 2018-06-30 Created: 2018-06-30 Last updated: 2019-06-20Bibliographically approved
Björklund, H., Björklund, J. & Ericson, P. (2018). Minimisation and Characterisation of Order-Preserving DAG Grammars. Umeå: Umeå University
Open this publication in new window or tab >>Minimisation and Characterisation of Order-Preserving DAG Grammars
2018 (English)Report (Other academic)
Abstract [en]

Order-preserving DAG grammars (OPDGs) is a formalism for processing semantic infor- mation in natural languages [5, 4]. OPDGs are sufficiently expressive to model abstract meaning representations, a graph-based form of semantic representation in which nodes en- code objects and edges relations. At the same time, they allow for efficient parsing in the uniform setting, where both the grammar and subject graph are taken as part of the input.

In this article, we introduce an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars. This makes it possible to transfer a number of results from that domain to OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable in the so-called MAT setting. To conclude, we show that the languages generated by OPDGs are MSO-definable.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2018. p. 19
Series
Report / UMINF, ISSN 0348-0542 ; 18.15
Keywords
graph grammars, minimally adequate teacher, order preservation, weighted graph grammars, weighted learning
National Category
Computer Sciences
Research subject
business data processing
Identifiers
urn:nbn:se:umu:diva-153394 (URN)
Available from: 2018-11-19 Created: 2018-11-19 Last updated: 2019-07-05Bibliographically approved
Björklund, J., Cleophas, L. & Karlsson, M. (2017). An evaluation of structured language modeling for automatic speech recognition. Journal of universal computer science (Online), 23(11), 1019-1034
Open this publication in new window or tab >>An evaluation of structured language modeling for automatic speech recognition
2017 (English)In: Journal of universal computer science (Online), ISSN 0948-695X, E-ISSN 0948-6968, Vol. 23, no 11, p. 1019-1034Article in journal (Refereed) Published
Abstract [en]

We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.

Place, publisher, year, edition, pages
Graz: Graz university of technology, Institute for information systems computer media IICM, 2017
Keywords
language modeling, automatic speech recognition, probabilistic lexicalized tree-insertion grammars
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-147230 (URN)10.3217/jucs-023-11-1019 (DOI)000429070900002 ()
Available from: 2018-05-02 Created: 2018-05-02 Last updated: 2019-04-29Bibliographically approved
Bensch, S., Björklund, J. & Kutrib, M. (2017). Deterministic Stack Transducers. Paper presented at 21st International Conference on Implementation and Application of Automata (CIAA), JUL 19-22, 2016, Yonsei Univ, Seoul, SOUTH KOREA. International Journal of Foundations of Computer Science, 28(5), 583-601
Open this publication in new window or tab >>Deterministic Stack Transducers
2017 (English)In: International Journal of Foundations of Computer Science, ISSN 0129-0541, Vol. 28, no 5, p. 583-601Article in journal (Refereed) Published
Abstract [en]

We introduce and investigate stack transducers, which are one-way stack automata with an output tape. A one-way stack automaton is a classical pushdown automaton with the additional ability to move the stack head inside the stack without altering the contents. For stack transducers, we distinguish between a digging and a non-digging mode. In digging mode, the stack transducer can write on the output tape when its stack head is inside the stack, whereas in non-digging mode, the stack transducer is only allowed to emit symbols when its stack head is at the top of the stack. These stack transducers have a motivation from natural-language interface applications, as they capture long-distance dependencies in syntactic, semantic, and discourse structures. We study the computational capacity for deterministic digging and non-digging stack transducers, as well as for their non-erasing and checking versions. We finally show that even for the strongest variant of stack transducers the stack languages are regular.

Place, publisher, year, edition, pages
World Scientific Publishing Co. Pte. Ltd., 2017
Keywords
Stack transducers, automata theory, computational capacity
National Category
Computer Systems
Identifiers
urn:nbn:se:umu:diva-143669 (URN)10.1142/S0129054117400081 (DOI)000418091500010 ()
Conference
21st International Conference on Implementation and Application of Automata (CIAA), JUL 19-22, 2016, Yonsei Univ, Seoul, SOUTH KOREA
Available from: 2018-01-05 Created: 2018-01-05 Last updated: 2018-06-09Bibliographically approved
Björklund, J., Drewes, F. & Jonsson, A. (2017). Finding the N Best Vertices in an Infinite Weighted Hypergraph. Theoretical Computer Science, 682, 30-41
Open this publication in new window or tab >>Finding the N Best Vertices in an Infinite Weighted Hypergraph
2017 (English)In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 682, p. 78p. 30-41Article in journal (Refereed) Published
Abstract [en]

We propose an algorithm for computing the N best vertices in a weighted acyclic hypergraph over a nice semiring. A semiring is nice if it is finitely-generated, idempotent, and has 1 as its minimal element. We then apply the algorithm to the problem of computing the N best trees with respect to a weighted tree automaton, and complement theoretical correctness and complexity arguments with experimental data. The algorithm has several practical applications in natural language processing, for example, to derive the N most likely parse trees with respect to a probabilistic context-free grammar. 

Place, publisher, year, edition, pages
Elsevier, 2017. p. 78
Keywords
Hypergraph, N-best problem, Idempotent semiring
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-132501 (URN)10.1016/j.tcs.2017.03.010 (DOI)000405062100005 ()
Note

Special Issue: SI

Available from: 2017-03-15 Created: 2017-03-15 Last updated: 2018-11-29Bibliographically approved
Björklund, J. & Cleophas, L. (2017). Minimization of Finite State Automata Through Partition Aggregation. In: Drewes, F Martin-Vide, C Truthe, B (Ed.), Language and Automata Theory and Applications: 11th International Conference, LATA 2017, Umeå, Sweden, March 6-9, 2017, Proceedings. Paper presented at 11th International Conference on Language and Automata Theory and Applications (LATA), MAR 06-09, 2017, Umeå, SWEDEN (pp. 223-235). Springer International Publishing AG
Open this publication in new window or tab >>Minimization of Finite State Automata Through Partition Aggregation
2017 (English)In: Language and Automata Theory and Applications: 11th International Conference, LATA 2017, Umeå, Sweden, March 6-9, 2017, Proceedings / [ed] Drewes, F Martin-Vide, C Truthe, B, Springer International Publishing AG , 2017, p. 223-235Conference paper, Published paper (Refereed)
Abstract [en]

We present a minimization algorithm for finite state automata that finds and merges bisimulation-equivalent states, identified through partition aggregation. We show the algorithm to be correct and run in time O(n(2)d(2) vertical bar Sigma vertical bar), where n is the number of states of the input automaton M, d is the maximal outdegree in the transition graph for any combination of state and input symbol, and vertical bar Sigma vertical bar is the size of the input alphabet. The algorithm is slower than those based on partition refinement, but has the advantage that intermediate solutions are also language equivalent to M. As a result, the algorithm can be interrupted or put on hold as needed, and the derived automaton is still useful. Furthermore, the algorithm essentially searches for the maximal model of a characteristic formula for M, so many of the optimisation techniques used to gain efficiency in SAT solvers are likely to apply.

Place, publisher, year, edition, pages
Springer International Publishing AG, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10168
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-144128 (URN)10.1007/978-3-319-53733-7_16 (DOI)000418579700016 ()978-3-319-53732-0 (ISBN)978-3-319-53733-7 (ISBN)
Conference
11th International Conference on Language and Automata Theory and Applications (LATA), MAR 06-09, 2017, Umeå, SWEDEN
Available from: 2018-01-22 Created: 2018-01-22 Last updated: 2018-06-09Bibliographically approved
Björklund, H., Björklund, J. & Ericson, P. (2017). On the Regularity and Learnability of Ordered DAG Languages. In: Arnaud Carayol and Cyril Nicaud (Ed.), Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings. Paper presented at 22nd International Conference Implementation and Application of Automata, CIAA 2017, Université Paris-Est Marne-la-Vallée, France, 27-30 June 2017 (pp. 27-39). Cham
Open this publication in new window or tab >>On the Regularity and Learnability of Ordered DAG Languages
2017 (English)In: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol and Cyril Nicaud, Cham, 2017, p. 27-39Conference paper, Published paper (Refereed)
Abstract [en]

Order-Preserving DAG Grammars (OPDGs) is a subclass of Hyper-Edge Replacement Grammars that can be parsed in polynomial time. Their associated class of languages is known as Ordered DAG Lan- guages, and the graphs they generate are characterised by being acyclic, rooted, and having a natural order on their nodes. OPDGs are useful in natural-language processing to model abstract meaning representa- tions. We state and prove a Myhill-Nerode theorem for ordered DAG languages, and translate it into a MAT-learning algorithm for the same class. The algorithm infers a minimal OPDG G for the target language in time polynomial in G and the samples provided by the MAT oracle. 

Place, publisher, year, edition, pages
Cham: , 2017
Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743, E-ISSN 1611-3349 ; 10329
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-135010 (URN)10.1007/978-3-319-60134-2_3 (DOI)2-s2.0-85022042128 (Scopus ID)978-3-319-60133-5 (ISBN)978-3-319-60134-2 (ISBN)
Conference
22nd International Conference Implementation and Application of Automata, CIAA 2017, Université Paris-Est Marne-la-Vallée, France, 27-30 June 2017
Available from: 2017-05-16 Created: 2017-05-16 Last updated: 2019-07-05Bibliographically approved
Björklund, J. & Zechner, N. (2017). Syntactic methods for topic-independent authorship attribution. Natural Language Engineering, 23(5), 789-806
Open this publication in new window or tab >>Syntactic methods for topic-independent authorship attribution
2017 (English)In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 23, no 5, p. 789-806Article in journal (Refereed) Published
Abstract [en]

The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are 'deep' in the sense that they are derived by parsing the subject texts, in contrast to 'shallow' syntactic features for which a part-of-speech analysis is enough. The experiments are made on two corpora of online texts and one corpus of novels written around the year 1900. The classification tasks include classical closed-world authorship attribution, identification of separate texts among the works of one author, and cross-topic authorship attribution. In the first tasks, the feature sets were fairly evenly matched, but for the last task, the syntax-based feature set outperformed the baseline feature set. These results suggest that, compared to lexical features, syntactic features are more robust to changes in topic.

Place, publisher, year, edition, pages
CAMBRIDGE UNIV PRESS, 2017
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-139621 (URN)10.1017/S1351324917000249 (DOI)000407573100006 ()
Available from: 2017-10-04 Created: 2017-10-04 Last updated: 2018-06-09Bibliographically approved
Organisations

Search in DiVA

Show all publications