Umeå University's logo

umu.sePublications
Change search
Refine search result
1 - 41 of 41
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Bensch, Suna
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Algorithmic properties of Millstream systems2010In: Developments in Language Theory: 14th International Conference, DLT 2010 / [ed] Sheng Yu, Springer Berlin/Heidelberg, 2010, p. 54-65Conference paper (Refereed)
    Abstract [en]

    Millstream systems have recently been proposed as a formalization of the linguistic idea that natural language should be described as a combination of different modules related by interfaces. In this paper we investigate algorithmic properties of Millstream systems having regular tree grammars as modules and MSO logic as interface logic. We focus on the so-called completion problem: Given trees generated by a subset of the modules, can they be completed into a valid configuration of the Millstream system?

  • 2.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank (Contributor)
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Characterizing Non-Regularity2014Report (Other academic)
    Abstract [en]

    This paper considers a characterization of the context-free non-regular languages, conjecturing that there for all such languages exists a fixed string thatcan be pumped to exhibit infinitely many equivalence classes. A proof is given only for a special case, but the general statement is conjectured to hold. The conjecture is then shown to imply that the shuffle of two context-free languagesis not context-free.

  • 3.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing unranked tree languages, folded once2023In: Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings / [ed] Henning Fernau; Klaus Jansen, Springer Nature, 2023, p. 60-73Conference paper (Refereed)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges, i.e., folds, selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation is enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper we address the remaining case where only one fold operation is applied, but the order among edges is discarded. We show that under these conditions, the problem is solvable in non-uniform polynomial time.

  • 4.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing unranked tree languages, folded once2024In: Algorithms, E-ISSN 1999-4893, Vol. 17, no 6, article id 268Article in journal (Refereed)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges (i.e., folds) selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation are enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper, we address the remaining case, where only one fold operation is applied, but the order among the edges is discarded. We show that, under these conditions, the problem is solvable in non-uniform polynomial time.

    Download full text (pdf)
    fulltext
  • 5.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Shuffled languages: representation and recognition2013In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 489-490, p. 1-20Article in journal (Refereed)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, in other words, how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 6.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Boiret, Adrien
    Laboratoire d'Informatique d'Orléans & INSA CVL, France.
    Transduction from trees to graphs through folding2023In: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 295, article id 105111Article in journal (Refereed)
    Abstract [en]

    We introduce a fold operation that realises a tree-to-graph transduction by merging selected nodes in the input tree to form a possibly cyclic output graph. The work is motivated by the increasing use of graph-based representations in semantic parsing. We show that a suitable class of graphs languages can be generated by applying the fold operation to regular unranked tree languages. We investigate two versions of the fold operation, one that preserves a depth-first ordering between the edges, and one that does not. Finally, we demonstrate that the time complexity for the associated non-uniform membership problem is solvable in polynomial time for the order-preserving version, and NP-complete for the order-cancelling one.

    Download full text (pdf)
    fulltext
  • 7.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems2013In: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, p. 21-29Conference paper (Other academic)
    Abstract [en]

    We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

    Download full text (pdf)
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems
  • 8.
    Berglund, Martin
    et al.
    Stellenbosch University.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Single-Rooted DAGs in Regular DAG Languages: Parikh Image and Path Languages2017In: Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+13) / [ed] M. Kuhlmann, T. Scheffler, Association for Computational Linguistics , 2017, p. 94-101Conference paper (Refereed)
    Abstract [en]

    In a recent survey (Drewes, 2017) of results on DAG automata  some open problems are formulated for the case where the  DAG language accepted by a DAG automaton A is restricted to DAGs with a single root, denoted by L(A)u. Here we consider each of  those problems, demonstrating that: (i) the finiteness  of L(A)u is decidable, (ii) the path languages of L(A)u can be characterized in  terms of the string languages accepted by partially blind  multicounter automata, and  (iii) the Parikh image of L(A)u is semilinear.

  • 9.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    Stellenbosch University, South Africa.
    Watson, Bruce
    Stellenbosch University, South Africa.
    Cuts in Regular Expressions2013In: Developments in Language Theory: 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013. Proceedings / [ed] Marie-Pierre Béal, Olivier Carton, Springer Berlin/Heidelberg, 2013, p. 70-81Conference paper (Refereed)
    Abstract [en]

    Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in regular expressions. We formalize this notion in the form of the cut and iterated cut operators. They do not extend the class of languages that can be defined beyond the regular, but they allow for exponentially more succinct representation of some languages. Membership testing remains polynomial, but emptiness testing becomes PSPACE-hard. 

  • 10.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Högberg, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Recognizing Shuffled Languages2011Report (Other academic)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 11.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Högberg, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Recognizing shuffled languages2011In: Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings / [ed] Adrian-Horia Dediu, Shunsuke Inenaga and Carlos Martín-Vide, Springer Berlin/Heidelberg, 2011, p. 142-154Conference paper (Refereed)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 12.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Petter, Ericson
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Uniform vs. Nonuniform Membership for Mildly Context-Sensitive Languages: A Brief Survey2016In: Algorithms, E-ISSN 1999-4893, Vol. 9, no 2, article id 32Article in journal (Refereed)
    Abstract [en]

    Parsing for mildly context-sensitive language formalisms is an important area within natural language processing. While the complexity of the parsing problem for some such formalisms is known to be polynomial, this is not the case for all of them. This article presents a series of results regarding the complexity of parsing for linear context-free rewriting systems and deterministic tree-walking transducers. We discuss the difference between uniform and nonuniform complexity measures and how parameterized complexity theory can be used to investigate how different aspects of the formalisms influence how hard the parsing problem is. The main results we survey are all hardness results and indicate that parsing is hard even for relatively small values of parameters such as rank and fan-out in a rewriting system.

    Download full text (pdf)
    fulltext
  • 13.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Dahlgren, Adam
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Demeke, Yonas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Implementing a speech-to-text pipeline on the MICO platform2016Report (Other academic)
    Abstract [en]

    MICO is an open-source platform for cross-media analysis, querying, and recommendation. It is the major outcome of the European research project Media in Context, and has been contributed to by academic and industrial partners from Germany, Austria, Sweden, Italy, and the UK. A central idea is to group sets of related media objects into multimodal content items, and to process and store these as logical units. The platform is designed to be easy to extend and adapt, and this makes it a useful building block for a diverse set of multimedia applications. To promote the platform and demonstrate its potential, we describe our work on a Kaldi-based speech-recognition pipeline.

    Download full text (pdf)
    fulltext
  • 14.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Minimisation and Characterisation of Order-Preserving DAG Grammars2018Report (Other academic)
    Abstract [en]

    Order-preserving DAG grammars (OPDGs) is a formalism for processing semantic infor- mation in natural languages [5, 4]. OPDGs are sufficiently expressive to model abstract meaning representations, a graph-based form of semantic representation in which nodes en- code objects and edges relations. At the same time, they allow for efficient parsing in the uniform setting, where both the grammar and subject graph are taken as part of the input.

    In this article, we introduce an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars. This makes it possible to transfer a number of results from that domain to OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable in the so-called MAT setting. To conclude, we show that the languages generated by OPDGs are MSO-definable.

    Download full text (pdf)
    fulltext
  • 15.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the Regularity and Learnability of Ordered DAG Languages2017In: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol and Cyril Nicaud, Cham, 2017, p. 27-39Conference paper (Refereed)
    Abstract [en]

    Order-Preserving DAG Grammars (OPDGs) is a subclass of Hyper-Edge Replacement Grammars that can be parsed in polynomial time. Their associated class of languages is known as Ordered DAG Lan- guages, and the graphs they generate are characterised by being acyclic, rooted, and having a natural order on their nodes. OPDGs are useful in natural-language processing to model abstract meaning representa- tions. We state and prove a Myhill-Nerode theorem for ordered DAG languages, and translate it into a MAT-learning algorithm for the same class. The algorithm infers a minimal OPDG G for the target language in time polynomial in G and the samples provided by the MAT oracle. 

  • 16.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Tree-based generation of restricted graph languages2024In: International Journal of Foundations of Computer Science, ISSN 0129-0541, Vol. 35, no 1 & 2, p. 215-243Article in journal (Refereed)
    Abstract [en]

    Order-preserving DAG grammars (OPDGs) is a formalism for representing languages of structurally restricted graphs. As demonstrated in [17], they are sufficiently expressive to model abstract meaning representations in natural language processing, a graph-based form of semantic representation in which nodes encode objects and edges relations. At the same time, they can be parsed in O (n2 + nm) , where m and n are the sizes of the grammar and the input graph, respectively. In this work, we provide an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars under an equivalence theory. This makes it possible to transfer results from the field of formal tree languages to the domain of OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable under the \minimal adequeate teacher" paradigm, that is, by querying an oracle for equivalence between languages, and membership of individual graphs. To conclude, we demonstrate that the languages generated by OPDGs are definable in monadic second-order logic.

    Download full text (pdf)
    fulltext
  • 17.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Zechner, Niklas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Compression of finite-state automata through failure transitions2014In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 557, p. 87-100Article in journal (Refereed)
    Abstract [en]

    Several linear-time algorithms for automata-based pattern matching rely on failure transitions for efficient back-tracking. Like epsilon transitions, failure transition do not consume input symbols, but unlike them, they may only be taken when no other transition is applicable. At a semantic level, this conveniently models catch-all clauses and allows for compact language representation.

    This work investigates the transition-reduction problem for deterministic finite-state automata (DFA). The input is a DFA A and an integer k. The question is whether k or more transitions can be saved by replacing regular transitions with failure transitions. We show that while the problem is NP-complete, there are approximation techniques and heuristics that mitigate the computational complexity. We conclude by demonstrating the computational difficulty of two related minimisation problems, thereby cancelling the ongoing search for efficient algorithms.

    Download full text (pdf)
    fulltext
  • 18.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Devinney, Hannah
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS).
    Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish2023In: Proceedings of the third workshop on language technology for equality, diversity, inclusion, The Association for Computational Linguistics , 2023, p. 54-61Conference paper (Refereed)
    Abstract [en]

    Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.

    Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

  • 19.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Devinney, Hannah
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS).
    Improving Swedish part-of-speech tagging for hen2022Conference paper (Refereed)
    Abstract [en]

    Despite the fact that the gender-neutral pro-noun hen was officially added to the Swedish language in 2014, state of the art part of speech taggers still routinely fail to identify it as a pronoun. We retrain both efselab and spaCy models with augmented (semi-synthetic) data, where instances of gendered pronouns are replaced by hen to correct for the lack of representation in the original training data. Our results show that adding such data works to correct for the disparity in performance

    Download full text (pdf)
    fulltext
  • 20.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Between a Rock and a Hard Place: Parsing for Hyperege Replacement DAG Grammars2016In: Proc. 10th International Conference on Language and Automata Theory and Applications (LATA 2016) / [ed] A.H. Dediu, J. Janoušek, C. Martín-Vide, and B. Truthe, Springer Publishing Company, 2016, Vol. 9618, p. 521-532Conference paper (Refereed)
    Abstract [en]

    Motivated by applications in natural language processing, we study the uniform membership problem for hyperedge-replacement grammars that generate directed acyclic graphs. Our major result is a low-degree polynomial-time algorithm that solves the uniform membership problem for a restricted type of such grammars. We motivate the necessity of the restrictions by two different NP-completeness results.

  • 21.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing Weighted Order-Preserving Hyperedge Replacement Grammars2019In: Proceedings of the 16th Meeting on the Mathematics of Language / [ed] F. Drewes, P. de Groote, G. Penn, Association for Computational Linguistics, 2019, p. 1-11Conference paper (Refereed)
    Abstract [en]

    We introduce a weighted extension of the recently proposed notion oforder-preserving hyperedge-replacement grammars and prove that the weightof a graph according to such a weighted graph grammar can be computeduniformly in quadratic time (under assumptions made precise in the paper).

    Download full text (pdf)
    fulltext
  • 22.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing Weighted Order-Preserving Hyperedge Replacement Grammars2018Report (Other academic)
    Abstract [en]

    We introduce a weighted extension of the recently proposed notion of order-preserving hyperedge-replacement grammars and prove that the weight of a graph according to such a weighted graph grammar can be computed uniformly in quadratic time (under assumptions made precise in the paper).

    Download full text (pdf)
    fulltext
  • 23.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Starke, Florian
    Faculty of Computer Science, TU Dresden.
    Uniform Parsing for Hyperedge Replacement Grammars2018Report (Other academic)
    Abstract [en]

    It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

    Download full text (pdf)
    fulltext
  • 24.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Digital and Cognitive Musicology Lab, Ecole Polytechnique F´ed´erale de Lausanne, Switzerland.
    Starke, Florian
    Faculty of Computer Science, TU Dresden, Germany.
    Uniform Parsing for Hyperedge Replacement Grammars2021In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 118, p. 1-27Article in journal (Refereed)
    Abstract [en]

    It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

    Download full text (pdf)
    fulltext
  • 25.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A note on the complexity of deterministic tree-walking transducers2013In: Non-Classical Models of Automata and Applications (NCMA 2013), Österreichische Computer Gesellschaft , 2013Conference paper (Refereed)
  • 26.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Gelade, Wouter
    Marquardt, Marcel
    Martens, Wim
    Incremental XPath Evaluation2009In: Proceedings of the 12th international conference on database theory: March 23-25, 2009, Saint Petersbourgh, Russia / [ed] Ronald Fagin, 2009, p. 162-173Conference paper (Refereed)
    Abstract [en]

    We study the problem of incrementally maintaining an XPath query on an XML database under updates. The updates we consider are node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D) · poly(Q)) and conjunctive forward XPath queries in time O(depth(D)· log(width(D))·poly(Q)), where D is the size of the database, Q the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in the database, respectively. The auxiliary data structures for maintenance are linear in D and polynomial in Q in all these cases.

  • 27.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Gelade, Wouter
    Martens, Wim
    Incremental XPath evaluation2010In: ACM Transactions on Database Systems, ISSN 0362-5915, E-ISSN 1557-4644, Vol. 35, no 4:29, p. 42-Article in journal (Refereed)
    Abstract [en]

    Incremental view maintenance for XPath queries asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of updates U to D, and the problem is to compute the result of Q(U(D)): the result of evaluating query Q on database D after having applied updates U. This article initiates a systematic study of the Boolean version of this problem. In the Boolean version, one only wants to know whether Q(U(D)) is empty or not.

    In order to quickly answer this question, we are allowed to maintain an auxiliary data structure. The complexity of the maintenance algorithms is measured in, (1) the size of the auxiliary data structure, (2) the worst-case time per update needed to compute Q(U(D)), and (3) the worst-case time per update needed to bring the auxiliary data structure up to date. We allow three kinds of updates: node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D)·poly(|Q|)) per update and conjunctive forward XPath queries in time O(depth(D) · log(width(D))·poly(|Q|)) per update, where |Q| is the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in database D, respectively. The auxiliary data structures for maintenance are linear in |D| and polynomial in |Q| in all these cases.

  • 28.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    The tractability frontier for NFA minimization2012In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 78, no 1, p. 198-210Article in journal (Refereed)
    Abstract [en]

    We prove that minimizing finite automata is NP-hard for almost all classes of automata that extend the class of deterministic finite automata. More specifically, we show that minimization is NP-hard for all finite automata classes that subsume the class of δNFAs which accept strings of length at most three. Here, δNFAs are the finite automata that are unambiguous, allow at most one state q with a non-deterministic transition for at most one alphabet symbol a, and are allowed to visit state q at most once in a run. As a corollary, we also obtain that the same result holds for all finite automata classes that subsume that class of finite automata that are unambiguous, have at most two initial states, and accept strings of length at most two.

  • 29.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    Schweikardt, Nicole
    Schwentick, Thomas
    Logik und Automaten : Ein echtes Dreamteam: [Logic and automatons : A real dream team]2010In: Informatik-Spektrum, ISSN 0170-6012, E-ISSN 1432-122X, Vol. 33, no 5, p. 452-461Article, review/survey (Other academic)
  • 30.
    Björklund, Henrik
    et al.
    Technical University of Dortmund, Germany.
    Martens, Wim
    Technical University of Dortmund, Germany.
    Schwentick, Thomas
    Technical University of Dortmund, Germany.
    Conjunctive query containment over trees2011In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 77, no 3, p. 450-472Article in journal (Refereed)
    Abstract [en]

    The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes ChildNextSibling, their transitive and reflexive closures, andFollowing. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNP-complete, or -complete. For the satisfiability problem most problems are classified as either in PTIME or NP-complete.

  • 31.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    Schwentick, Thomas
    Conjunctive query containment over trees using schema information2018In: Acta Informatica, ISSN 0001-5903, E-ISSN 1432-0525, Vol. 55, no 1, p. 17-56Article in journal (Refereed)
    Abstract [en]

    We study the containment, satisfiability, and validity problems for conjunctive queries over trees with respect to a schema. We show that conjunctive query containment and validity are 2EXPTIME -complete with respect to a schema, in both cases where the schema is given as a DTD or as a tree automaton. Furthermore, we show that satisfiability for conjunctive queries with respect to a schema can be decided in NP . The problem is NP -hard already for queries using only one kind of axis. Finally, we consider conjunctive queries that can test for equalities and inequalities of data values. Here, satisfiability and validity are decidable, but containment is undecidable, even without schema information. On the other hand, containment with respect to a schema becomes decidable again if the "larger" query is not allowed to use both equalities and inequalities.

  • 32.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    University of Bayreuth.
    Schwentick, Thomas
    Technical University of Dortmund.
    Validity of Tree Pattern Queries with Respect to Schema Information2013In: Mathematical Foundations of Computer Science 2013: 38th International Symposium, MFCS 2013, Klosterneuburg, Austria, August 26-30, 2013. Proceedings / [ed] Krishnendu Chatterjee, Jiri Sgall, Springer: Springer Berlin/Heidelberg, 2013, p. 171-182Conference paper (Refereed)
    Abstract [en]

    We prove that various containment and validity problems for tree pattern queries with respect to a schema are EXPTIME-complete. When one does not require the root of a tree pattern query to match the root of atree, validity of a non-branching tree pattern query with respect to a Relax NG schema or W3C XML Schema is already EXPTIME-hard when the query does not branch and uses only child axes. These hardness results already hold when the alphabet size is fixed. Validity with respect to a DTD is proved to be EXPTIME-hard already when the query only uses child axes and is allowed to branch only once.

  • 33.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    Universität Bayreuth.
    Timm, Thomas
    Universität Bayreuth.
    Efficient incremental evaluation of succinct regular expressions2015In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2015, p. 1541-1550Conference paper (Refereed)
    Abstract [en]

    Regular expressions are omnipresent in database applications. They form the structural core of schema languages for XML, they are a fundamental ingredient for navigational queries in graph databases, and are being considered in languages for upcoming technologies such as schema- and transformation languages for tabular data on the Web. In this paper we study the usage and effectiveness of the counting operator (or: limited repetition) in regular expressions. The counting operator is a popular extension which is part of the POSIX standard and therefore also present in regular expressions in grep, Java, Python, Perl, and Ruby. In a database context, expressions with counting appear in XML Schema and languages for querying graphs such as SPARQL 1.1 and Cypher.

    We first present a practical study that suggests that counters are extensively used in practice. We then investigate evaluation methods for such expressions and develop a new algorithm for efficient incremental evaluation. Finally, we conduct an extensive benchmark study that shows that exploiting counting operators can lead to speed-ups of several orders of magnitude in a wide range of settings: normal and incremental evaluation on synthetic and real expressions.

  • 34.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Schuster, Martin
    Technische Universität Dortmund.
    Schwentick, Thomas
    Technische Universität Dortmund.
    Kulbatzki, Joscha
    SCISYS Deutschland GmbH.
    On optimum left-to-right strategies for active context-free games2013In: ICDT '13 Proceedings of the 16th International Conference on Database Theory, ACM Press, 2013, p. 105-116Conference paper (Refereed)
    Abstract [en]

    Active context-free games are two-player games on strings over finite alphabets with one player trying to rewrite the input string to match a target specification. These games have been investigated in the context of exchanging Active XML (AXML) data. While it was known that the rewriting problem is undecidable in general, it is shown here that it is EXPSPACE-complete to decide for a given contextfree game, whether all safely rewritable strings can be safely rewritten in a left-to-right manner, a problem that was previously considered by Abiteboul et al. Furthermore, it is shown that the corresponding problem for games with finite replacement languages is EXPTIME-complete.

  • 35.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Schwentick, Thomas
    Theoretical Computer Science, Technical University of Dortmund, Dortmund, Germany.
    Class-memory automata revisited2016In: Claudio Moraga: a passion for multi-valued logic and soft computing / [ed] Rudolf Seising; Héctor Allende-Cid, Springer, 2016, , p. 15p. 201-215Chapter in book (Refereed)
    Abstract [en]

    Data words are an extension of traditional strings that have at each position, besides a symbol from some finite alphabet, a data value from some infinite domain. Class-memory automata constitute an automata model for data words with a decidable emptiness problem. The paper improves the previous result that class-memory automata are strictly more expressive than register automata, another automata model for data words. More specifically, it shows that weak class-memory automata, a restriction of class-memory automata introduced by Cotton-Barratt, Murawski, and Ong are strictly more powerful than the extension of register automata by a data-guessing facility. While weak deterministic class-memory automata yield a restriction of class-memory automata which is closed under Boolean operations, the paper also proposes an extension of deterministic class-memory automata with this property.

  • 36.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Schwentick, Thomas
    On notions of regularity for data languages2010In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 411, no 4-5, p. 702-715Article in journal (Refereed)
    Abstract [en]

    With motivation from considerations in XML database theory and model checking, data strings have been introduced as an extension of finite alphabet strings which carry, at each position, a symbol and a data value from an infinite domain. Previous work has shown that it is difficult to come up with an expressive yet decidable automaton model for data languages. Recently, such a model, data automata, was introduced. This paper introduces a simpler but equivalent model and investigates its expressive power, algorithmic and closure properties, and some extensions.

  • 37.
    Devinney, Hannah
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS).
    Björklund, Jenny
    Uppsala University.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Crime and Relationship: Exploring Gender Bias in NLP Corpora2020Conference paper (Refereed)
    Abstract [en]

    Gender bias in natural language processing (NLP) tools, deriving from implicit human bias embedded in language data, is an important and complicated problem on the road to fair algorithms. We leverage topic modeling to retrieve documents associated with particular gendered categories, and discuss how exploring these documents can inform our understanding of the corpora we may use to train NLP tools. This is a starting point for challenging the systemic power structures and producing a justice-focused approach to NLP.

    Download full text (pdf)
    fulltext
  • 38.
    Devinney, Hannah
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS).
    Björklund, Jenny
    Centre for Gender Research, Uppsala University.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish2020In: Proceedings of the Second Workshop on Gender Bias in Natural Language Processing / [ed] Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster, Association for Computational Linguistics, 2020, p. 79-92Conference paper (Refereed)
    Abstract [en]

    Gender bias has been identified in many models for Natural Language Processing, stemming from implicit biases in the text corpora used to train the models. Such corpora are too large to closely analyze for biased or stereotypical content. Thus, we argue for a combination of quantitative and qualitative methods, where the quantitative part produces a view of the data of a size suitable for qualitative analysis. We investigate the usefulness of semi-supervised topic modeling for the detection and analysis of gender bias in three corpora (mainstream news articles in English and Swedish, and LGBTQ+ web content in English). We compare differences in topic models for three gender categories (masculine, feminine, and nonbinary or neutral) in each corpus. We find that in all corpora, genders are treated differently and that these differences tend to correspond to hegemonic ideas of gender.

    Download full text (pdf)
    fulltext
  • 39.
    Devinney, Hannah
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS).
    Björklund, Jenny
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Uppsala University, Sweden.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Theories of gender in natural language processing2022In: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22), Association for Computing Machinery (ACM), 2022, p. 2083-2102Conference paper (Refereed)
    Abstract [en]

    The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no comprehensive analysis of how “gender” is theorized in the field. We survey nearly 200 articles concerning gender bias in NLP to discover how the field conceptualizes gender both explicitly (e.g. through definitions of terms) and implicitly (e.g. through how gender is operationalized in practice). In order to get a better idea of emerging trajectories of thought, we split these articles into two sections by time.

    We find that the majority of the articles do not make their theo- rization of gender explicit, even if they clearly define “bias.” Almost none use a model of gender that is intersectional or inclusive of non- binary genders; and many conflate sex characteristics, social gender, and linguistic gender in ways that disregard the existence and expe- rience of trans, nonbinary, and intersex people. There is an increase between the two time-sections in statements acknowledging that gender is a complicated reality, however, very few articles manage to put this acknowledgment into practice. In addition to analyzing these findings, we provide specific recommendations to facilitate interdisciplinary work, and to incorporate theory and methodol- ogy from Gender Studies. Our hope is that this will produce more inclusive gender bias research in NLP.

    Download full text (pdf)
    fulltext
  • 40.
    Devinney, Hannah
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS). Linköping University.
    Björklund, Jenny
    Centre for Gender Research, Uppsala University.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    We don’t talk about that: case studies on intersectional analysis of social bias in large language models2024In: Proceedings of the 5th workshop on gender bias in natural language processing (GeBNLP) / [ed] Agnieszka Faleńska; Christine Basta; Marta Costa-jussà; Seraphina Goldfarb-Tarrant; Debora Nozza, Association for Computational Linguistics, 2024, p. 33-44Conference paper (Refereed)
    Abstract [en]

    Despite concerns that Large Language Models (LLMs) are vectors for reproducing and ampli- fying social biases such as sexism, transpho- bia, islamophobia, and racism, there is a lack of work qualitatively analyzing how such pat- terns of bias are generated by LLMs. We use mixed-methods approaches and apply a femi- nist, intersectional lens to the problem across two language domains, Swedish and English, by generating narrative texts using LLMs. We find that hegemonic norms are consistently re- produced; dominant identities are often treated as ‘default’; and discussion of identity itself may be considered ‘inappropriate’ by the safety features applied to some LLMs. Due to the dif- fering behaviors of models, depending both on their design and the language they are trained on, we observe that strategies of identifying “bias” must be adapted to individual models and their socio-cultural contexts.

    Download full text (pdf)
    fulltext
  • 41.
    Woldemariam, Yonas Demeke
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Bensch, Suna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Predicting User Competence from Linguistic Data2017In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017) / [ed] Sivaji Bandyopadhyay, Jadavpur University , 2017, p. 476-484Conference paper (Refereed)
    Abstract [en]

    We investigate the problem of predicting the competence of users of the crowd-sourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well. 

    Download full text (pdf)
    fulltext
1 - 41 of 41