Umeå universitets logga

umu.sePublikationer
Ändra sökning
Avgränsa sökresultatet
1 - 41 av 41
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Bensch, Suna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Algorithmic properties of Millstream systems2010Ingår i: Developments in Language Theory: 14th International Conference, DLT 2010 / [ed] Sheng Yu, Springer Berlin/Heidelberg, 2010, s. 54-65Konferensbidrag (Refereegranskat)
    Abstract [en]

    Millstream systems have recently been proposed as a formalization of the linguistic idea that natural language should be described as a combination of different modules related by interfaces. In this paper we investigate algorithmic properties of Millstream systems having regular tree grammars as modules and MSO logic as interface logic. We focus on the so-called completion problem: Given trees generated by a subset of the modules, can they be completed into a valid configuration of the Millstream system?

  • 2.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank (Bidragsgivare)
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Characterizing Non-Regularity2014Rapport (Övrigt vetenskapligt)
    Abstract [en]

    This paper considers a characterization of the context-free non-regular languages, conjecturing that there for all such languages exists a fixed string thatcan be pumped to exhibit infinitely many equivalence classes. A proof is given only for a special case, but the general statement is conjectured to hold. The conjecture is then shown to imply that the shuffle of two context-free languagesis not context-free.

  • 3.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parsing unranked tree languages, folded once2023Ingår i: Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings / [ed] Henning Fernau; Klaus Jansen, Springer Nature, 2023, s. 60-73Konferensbidrag (Refereegranskat)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges, i.e., folds, selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation is enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper we address the remaining case where only one fold operation is applied, but the order among edges is discarded. We show that under these conditions, the problem is solvable in non-uniform polynomial time.

  • 4.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parsing unranked tree languages, folded once2024Ingår i: Algorithms, E-ISSN 1999-4893, Vol. 17, nr 6, artikel-id 268Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges (i.e., folds) selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation are enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper, we address the remaining case, where only one fold operation is applied, but the order among the edges is discarded. We show that, under these conditions, the problem is solvable in non-uniform polynomial time.

    Ladda ner fulltext (pdf)
    fulltext
  • 5.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Shuffled languages: representation and recognition2013Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 489-490, s. 1-20Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, in other words, how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 6.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Boiret, Adrien
    Laboratoire d'Informatique d'Orléans & INSA CVL, France.
    Transduction from trees to graphs through folding2023Ingår i: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 295, artikel-id 105111Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We introduce a fold operation that realises a tree-to-graph transduction by merging selected nodes in the input tree to form a possibly cyclic output graph. The work is motivated by the increasing use of graph-based representations in semantic parsing. We show that a suitable class of graphs languages can be generated by applying the fold operation to regular unranked tree languages. We investigate two versions of the fold operation, one that preserves a depth-first ordering between the edges, and one that does not. Finally, we demonstrate that the time complexity for the associated non-uniform membership problem is solvable in polynomial time for the order-preserving version, and NP-complete for the order-cancelling one.

    Ladda ner fulltext (pdf)
    fulltext
  • 7.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems2013Ingår i: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, s. 21-29Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

    Ladda ner fulltext (pdf)
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems
  • 8.
    Berglund, Martin
    et al.
    Stellenbosch University.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Single-Rooted DAGs in Regular DAG Languages: Parikh Image and Path Languages2017Ingår i: Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+13) / [ed] M. Kuhlmann, T. Scheffler, Association for Computational Linguistics , 2017, s. 94-101Konferensbidrag (Refereegranskat)
    Abstract [en]

    In a recent survey (Drewes, 2017) of results on DAG automata  some open problems are formulated for the case where the  DAG language accepted by a DAG automaton A is restricted to DAGs with a single root, denoted by L(A)u. Here we consider each of  those problems, demonstrating that: (i) the finiteness  of L(A)u is decidable, (ii) the path languages of L(A)u can be characterized in  terms of the string languages accepted by partially blind  multicounter automata, and  (iii) the Parikh image of L(A)u is semilinear.

  • 9.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    van der Merwe, Brink
    Stellenbosch University, South Africa.
    Watson, Bruce
    Stellenbosch University, South Africa.
    Cuts in Regular Expressions2013Ingår i: Developments in Language Theory: 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013. Proceedings / [ed] Marie-Pierre Béal, Olivier Carton, Springer Berlin/Heidelberg, 2013, s. 70-81Konferensbidrag (Refereegranskat)
    Abstract [en]

    Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in regular expressions. We formalize this notion in the form of the cut and iterated cut operators. They do not extend the class of languages that can be defined beyond the regular, but they allow for exponentially more succinct representation of some languages. Membership testing remains polynomial, but emptiness testing becomes PSPACE-hard. 

  • 10.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Högberg, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Recognizing Shuffled Languages2011Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 11.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Högberg, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Recognizing shuffled languages2011Ingår i: Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings / [ed] Adrian-Horia Dediu, Shunsuke Inenaga and Carlos Martín-Vide, Springer Berlin/Heidelberg, 2011, s. 142-154Konferensbidrag (Refereegranskat)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 12.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Berglund, Martin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Petter, Ericson
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Uniform vs. Nonuniform Membership for Mildly Context-Sensitive Languages: A Brief Survey2016Ingår i: Algorithms, E-ISSN 1999-4893, Vol. 9, nr 2, artikel-id 32Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Parsing for mildly context-sensitive language formalisms is an important area within natural language processing. While the complexity of the parsing problem for some such formalisms is known to be polynomial, this is not the case for all of them. This article presents a series of results regarding the complexity of parsing for linear context-free rewriting systems and deterministic tree-walking transducers. We discuss the difference between uniform and nonuniform complexity measures and how parameterized complexity theory can be used to investigate how different aspects of the formalisms influence how hard the parsing problem is. The main results we survey are all hardness results and indicate that parsing is hard even for relatively small values of parameters such as rank and fan-out in a rewriting system.

    Ladda ner fulltext (pdf)
    fulltext
  • 13.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Dahlgren, Adam
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Demeke, Yonas
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Implementing a speech-to-text pipeline on the MICO platform2016Rapport (Övrigt vetenskapligt)
    Abstract [en]

    MICO is an open-source platform for cross-media analysis, querying, and recommendation. It is the major outcome of the European research project Media in Context, and has been contributed to by academic and industrial partners from Germany, Austria, Sweden, Italy, and the UK. A central idea is to group sets of related media objects into multimodal content items, and to process and store these as logical units. The platform is designed to be easy to extend and adapt, and this makes it a useful building block for a diverse set of multimedia applications. To promote the platform and demonstrate its potential, we describe our work on a Kaldi-based speech-recognition pipeline.

    Ladda ner fulltext (pdf)
    fulltext
  • 14.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Minimisation and Characterisation of Order-Preserving DAG Grammars2018Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Order-preserving DAG grammars (OPDGs) is a formalism for processing semantic infor- mation in natural languages [5, 4]. OPDGs are sufficiently expressive to model abstract meaning representations, a graph-based form of semantic representation in which nodes en- code objects and edges relations. At the same time, they allow for efficient parsing in the uniform setting, where both the grammar and subject graph are taken as part of the input.

    In this article, we introduce an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars. This makes it possible to transfer a number of results from that domain to OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable in the so-called MAT setting. To conclude, we show that the languages generated by OPDGs are MSO-definable.

    Ladda ner fulltext (pdf)
    fulltext
  • 15.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the Regularity and Learnability of Ordered DAG Languages2017Ingår i: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol and Cyril Nicaud, Cham, 2017, s. 27-39Konferensbidrag (Refereegranskat)
    Abstract [en]

    Order-Preserving DAG Grammars (OPDGs) is a subclass of Hyper-Edge Replacement Grammars that can be parsed in polynomial time. Their associated class of languages is known as Ordered DAG Lan- guages, and the graphs they generate are characterised by being acyclic, rooted, and having a natural order on their nodes. OPDGs are useful in natural-language processing to model abstract meaning representa- tions. We state and prove a Myhill-Nerode theorem for ordered DAG languages, and translate it into a MAT-learning algorithm for the same class. The algorithm infers a minimal OPDG G for the target language in time polynomial in G and the samples provided by the MAT oracle. 

  • 16.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Tree-based generation of restricted graph languages2024Ingår i: International Journal of Foundations of Computer Science, ISSN 0129-0541, Vol. 35, nr 1 & 2, s. 215-243Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Order-preserving DAG grammars (OPDGs) is a formalism for representing languages of structurally restricted graphs. As demonstrated in [17], they are sufficiently expressive to model abstract meaning representations in natural language processing, a graph-based form of semantic representation in which nodes encode objects and edges relations. At the same time, they can be parsed in O (n2 + nm) , where m and n are the sizes of the grammar and the input graph, respectively. In this work, we provide an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars under an equivalence theory. This makes it possible to transfer results from the field of formal tree languages to the domain of OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable under the \minimal adequeate teacher" paradigm, that is, by querying an oracle for equivalence between languages, and membership of individual graphs. To conclude, we demonstrate that the languages generated by OPDGs are definable in monadic second-order logic.

    Ladda ner fulltext (pdf)
    fulltext
  • 17.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Zechner, Niklas
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Compression of finite-state automata through failure transitions2014Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 557, s. 87-100Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Several linear-time algorithms for automata-based pattern matching rely on failure transitions for efficient back-tracking. Like epsilon transitions, failure transition do not consume input symbols, but unlike them, they may only be taken when no other transition is applicable. At a semantic level, this conveniently models catch-all clauses and allows for compact language representation.

    This work investigates the transition-reduction problem for deterministic finite-state automata (DFA). The input is a DFA A and an integer k. The question is whether k or more transitions can be saved by replacing regular transitions with failure transitions. We show that while the problem is NP-complete, there are approximation techniques and heuristics that mitigate the computational complexity. We conclude by demonstrating the computational difficulty of two related minimisation problems, thereby cancelling the ongoing search for efficient algorithms.

    Ladda ner fulltext (pdf)
    fulltext
  • 18.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Devinney, Hannah
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish2023Ingår i: Proceedings of the third workshop on language technology for equality, diversity, inclusion, The Association for Computational Linguistics , 2023, s. 54-61Konferensbidrag (Refereegranskat)
    Abstract [en]

    Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.

    Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

  • 19.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Devinney, Hannah
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Improving Swedish part-of-speech tagging for hen2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Despite the fact that the gender-neutral pro-noun hen was officially added to the Swedish language in 2014, state of the art part of speech taggers still routinely fail to identify it as a pronoun. We retrain both efselab and spaCy models with augmented (semi-synthetic) data, where instances of gendered pronouns are replaced by hen to correct for the lack of representation in the original training data. Our results show that adding such data works to correct for the disparity in performance

    Ladda ner fulltext (pdf)
    fulltext
  • 20.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Between a Rock and a Hard Place: Parsing for Hyperege Replacement DAG Grammars2016Ingår i: Proc. 10th International Conference on Language and Automata Theory and Applications (LATA 2016) / [ed] A.H. Dediu, J. Janoušek, C. Martín-Vide, and B. Truthe, Springer Publishing Company, 2016, Vol. 9618, s. 521-532Konferensbidrag (Refereegranskat)
    Abstract [en]

    Motivated by applications in natural language processing, we study the uniform membership problem for hyperedge-replacement grammars that generate directed acyclic graphs. Our major result is a low-degree polynomial-time algorithm that solves the uniform membership problem for a restricted type of such grammars. We motivate the necessity of the restrictions by two different NP-completeness results.

  • 21.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parsing Weighted Order-Preserving Hyperedge Replacement Grammars2019Ingår i: Proceedings of the 16th Meeting on the Mathematics of Language / [ed] F. Drewes, P. de Groote, G. Penn, Association for Computational Linguistics, 2019, s. 1-11Konferensbidrag (Refereegranskat)
    Abstract [en]

    We introduce a weighted extension of the recently proposed notion oforder-preserving hyperedge-replacement grammars and prove that the weightof a graph according to such a weighted graph grammar can be computeduniformly in quadratic time (under assumptions made precise in the paper).

    Ladda ner fulltext (pdf)
    fulltext
  • 22.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Parsing Weighted Order-Preserving Hyperedge Replacement Grammars2018Rapport (Övrigt vetenskapligt)
    Abstract [en]

    We introduce a weighted extension of the recently proposed notion of order-preserving hyperedge-replacement grammars and prove that the weight of a graph according to such a weighted graph grammar can be computed uniformly in quadratic time (under assumptions made precise in the paper).

    Ladda ner fulltext (pdf)
    fulltext
  • 23.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Starke, Florian
    Faculty of Computer Science, TU Dresden.
    Uniform Parsing for Hyperedge Replacement Grammars2018Rapport (Övrigt vetenskapligt)
    Abstract [en]

    It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

    Ladda ner fulltext (pdf)
    fulltext
  • 24.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Digital and Cognitive Musicology Lab, Ecole Polytechnique F´ed´erale de Lausanne, Switzerland.
    Starke, Florian
    Faculty of Computer Science, TU Dresden, Germany.
    Uniform Parsing for Hyperedge Replacement Grammars2021Ingår i: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 118, s. 1-27Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

    Ladda ner fulltext (pdf)
    fulltext
  • 25.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ericson, Petter
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    A note on the complexity of deterministic tree-walking transducers2013Ingår i: Non-Classical Models of Automata and Applications (NCMA 2013), Österreichische Computer Gesellschaft , 2013Konferensbidrag (Refereegranskat)
  • 26.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Gelade, Wouter
    Marquardt, Marcel
    Martens, Wim
    Incremental XPath Evaluation2009Ingår i: Proceedings of the 12th international conference on database theory: March 23-25, 2009, Saint Petersbourgh, Russia / [ed] Ronald Fagin, 2009, s. 162-173Konferensbidrag (Refereegranskat)
    Abstract [en]

    We study the problem of incrementally maintaining an XPath query on an XML database under updates. The updates we consider are node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D) · poly(Q)) and conjunctive forward XPath queries in time O(depth(D)· log(width(D))·poly(Q)), where D is the size of the database, Q the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in the database, respectively. The auxiliary data structures for maintenance are linear in D and polynomial in Q in all these cases.

  • 27.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Gelade, Wouter
    Martens, Wim
    Incremental XPath evaluation2010Ingår i: ACM Transactions on Database Systems, ISSN 0362-5915, E-ISSN 1557-4644, Vol. 35, nr 4:29, s. 42-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Incremental view maintenance for XPath queries asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of updates U to D, and the problem is to compute the result of Q(U(D)): the result of evaluating query Q on database D after having applied updates U. This article initiates a systematic study of the Boolean version of this problem. In the Boolean version, one only wants to know whether Q(U(D)) is empty or not.

    In order to quickly answer this question, we are allowed to maintain an auxiliary data structure. The complexity of the maintenance algorithms is measured in, (1) the size of the auxiliary data structure, (2) the worst-case time per update needed to compute Q(U(D)), and (3) the worst-case time per update needed to bring the auxiliary data structure up to date. We allow three kinds of updates: node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D)·poly(|Q|)) per update and conjunctive forward XPath queries in time O(depth(D) · log(width(D))·poly(|Q|)) per update, where |Q| is the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in database D, respectively. The auxiliary data structures for maintenance are linear in |D| and polynomial in |Q| in all these cases.

  • 28.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Martens, Wim
    The tractability frontier for NFA minimization2012Ingår i: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 78, nr 1, s. 198-210Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We prove that minimizing finite automata is NP-hard for almost all classes of automata that extend the class of deterministic finite automata. More specifically, we show that minimization is NP-hard for all finite automata classes that subsume the class of δNFAs which accept strings of length at most three. Here, δNFAs are the finite automata that are unambiguous, allow at most one state q with a non-deterministic transition for at most one alphabet symbol a, and are allowed to visit state q at most once in a run. As a corollary, we also obtain that the same result holds for all finite automata classes that subsume that class of finite automata that are unambiguous, have at most two initial states, and accept strings of length at most two.

  • 29.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Martens, Wim
    Schweikardt, Nicole
    Schwentick, Thomas
    Logik und Automaten : Ein echtes Dreamteam: [Logic and automatons : A real dream team]2010Ingår i: Informatik-Spektrum, ISSN 0170-6012, E-ISSN 1432-122X, Vol. 33, nr 5, s. 452-461Artikel, forskningsöversikt (Övrigt vetenskapligt)
  • 30.
    Björklund, Henrik
    et al.
    Technical University of Dortmund, Germany.
    Martens, Wim
    Technical University of Dortmund, Germany.
    Schwentick, Thomas
    Technical University of Dortmund, Germany.
    Conjunctive query containment over trees2011Ingår i: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 77, nr 3, s. 450-472Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes ChildNextSibling, their transitive and reflexive closures, andFollowing. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNP-complete, or -complete. For the satisfiability problem most problems are classified as either in PTIME or NP-complete.

  • 31.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Martens, Wim
    Schwentick, Thomas
    Conjunctive query containment over trees using schema information2018Ingår i: Acta Informatica, ISSN 0001-5903, E-ISSN 1432-0525, Vol. 55, nr 1, s. 17-56Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We study the containment, satisfiability, and validity problems for conjunctive queries over trees with respect to a schema. We show that conjunctive query containment and validity are 2EXPTIME -complete with respect to a schema, in both cases where the schema is given as a DTD or as a tree automaton. Furthermore, we show that satisfiability for conjunctive queries with respect to a schema can be decided in NP . The problem is NP -hard already for queries using only one kind of axis. Finally, we consider conjunctive queries that can test for equalities and inequalities of data values. Here, satisfiability and validity are decidable, but containment is undecidable, even without schema information. On the other hand, containment with respect to a schema becomes decidable again if the "larger" query is not allowed to use both equalities and inequalities.

  • 32.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Martens, Wim
    University of Bayreuth.
    Schwentick, Thomas
    Technical University of Dortmund.
    Validity of Tree Pattern Queries with Respect to Schema Information2013Ingår i: Mathematical Foundations of Computer Science 2013: 38th International Symposium, MFCS 2013, Klosterneuburg, Austria, August 26-30, 2013. Proceedings / [ed] Krishnendu Chatterjee, Jiri Sgall, Springer: Springer Berlin/Heidelberg, 2013, s. 171-182Konferensbidrag (Refereegranskat)
    Abstract [en]

    We prove that various containment and validity problems for tree pattern queries with respect to a schema are EXPTIME-complete. When one does not require the root of a tree pattern query to match the root of atree, validity of a non-branching tree pattern query with respect to a Relax NG schema or W3C XML Schema is already EXPTIME-hard when the query does not branch and uses only child axes. These hardness results already hold when the alphabet size is fixed. Validity with respect to a DTD is proved to be EXPTIME-hard already when the query only uses child axes and is allowed to branch only once.

  • 33.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Martens, Wim
    Universität Bayreuth.
    Timm, Thomas
    Universität Bayreuth.
    Efficient incremental evaluation of succinct regular expressions2015Ingår i: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2015, s. 1541-1550Konferensbidrag (Refereegranskat)
    Abstract [en]

    Regular expressions are omnipresent in database applications. They form the structural core of schema languages for XML, they are a fundamental ingredient for navigational queries in graph databases, and are being considered in languages for upcoming technologies such as schema- and transformation languages for tabular data on the Web. In this paper we study the usage and effectiveness of the counting operator (or: limited repetition) in regular expressions. The counting operator is a popular extension which is part of the POSIX standard and therefore also present in regular expressions in grep, Java, Python, Perl, and Ruby. In a database context, expressions with counting appear in XML Schema and languages for querying graphs such as SPARQL 1.1 and Cypher.

    We first present a practical study that suggests that counters are extensively used in practice. We then investigate evaluation methods for such expressions and develop a new algorithm for efficient incremental evaluation. Finally, we conduct an extensive benchmark study that shows that exploiting counting operators can lead to speed-ups of several orders of magnitude in a wide range of settings: normal and incremental evaluation on synthetic and real expressions.

  • 34.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Schuster, Martin
    Technische Universität Dortmund.
    Schwentick, Thomas
    Technische Universität Dortmund.
    Kulbatzki, Joscha
    SCISYS Deutschland GmbH.
    On optimum left-to-right strategies for active context-free games2013Ingår i: ICDT '13 Proceedings of the 16th International Conference on Database Theory, ACM Press, 2013, s. 105-116Konferensbidrag (Refereegranskat)
    Abstract [en]

    Active context-free games are two-player games on strings over finite alphabets with one player trying to rewrite the input string to match a target specification. These games have been investigated in the context of exchanging Active XML (AXML) data. While it was known that the rewriting problem is undecidable in general, it is shown here that it is EXPSPACE-complete to decide for a given contextfree game, whether all safely rewritable strings can be safely rewritten in a left-to-right manner, a problem that was previously considered by Abiteboul et al. Furthermore, it is shown that the corresponding problem for games with finite replacement languages is EXPTIME-complete.

  • 35.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Schwentick, Thomas
    Theoretical Computer Science, Technical University of Dortmund, Dortmund, Germany.
    Class-memory automata revisited2016Ingår i: Claudio Moraga: a passion for multi-valued logic and soft computing / [ed] Rudolf Seising; Héctor Allende-Cid, Springer, 2016, , s. 15s. 201-215Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Data words are an extension of traditional strings that have at each position, besides a symbol from some finite alphabet, a data value from some infinite domain. Class-memory automata constitute an automata model for data words with a decidable emptiness problem. The paper improves the previous result that class-memory automata are strictly more expressive than register automata, another automata model for data words. More specifically, it shows that weak class-memory automata, a restriction of class-memory automata introduced by Cotton-Barratt, Murawski, and Ong are strictly more powerful than the extension of register automata by a data-guessing facility. While weak deterministic class-memory automata yield a restriction of class-memory automata which is closed under Boolean operations, the paper also proposes an extension of deterministic class-memory automata with this property.

  • 36.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Schwentick, Thomas
    On notions of regularity for data languages2010Ingår i: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 411, nr 4-5, s. 702-715Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    With motivation from considerations in XML database theory and model checking, data strings have been introduced as an extension of finite alphabet strings which carry, at each position, a symbol and a data value from an infinite domain. Previous work has shown that it is difficult to come up with an expressive yet decidable automaton model for data languages. Recently, such a model, data automata, was introduced. This paper introduces a simpler but equivalent model and investigates its expressive power, algorithmic and closure properties, and some extensions.

  • 37.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Björklund, Jenny
    Uppsala University.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Crime and Relationship: Exploring Gender Bias in NLP Corpora2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    Gender bias in natural language processing (NLP) tools, deriving from implicit human bias embedded in language data, is an important and complicated problem on the road to fair algorithms. We leverage topic modeling to retrieve documents associated with particular gendered categories, and discuss how exploring these documents can inform our understanding of the corpora we may use to train NLP tools. This is a starting point for challenging the systemic power structures and producing a justice-focused approach to NLP.

    Ladda ner fulltext (pdf)
    fulltext
  • 38.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Björklund, Jenny
    Centre for Gender Research, Uppsala University.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish2020Ingår i: Proceedings of the Second Workshop on Gender Bias in Natural Language Processing / [ed] Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster, Association for Computational Linguistics, 2020, s. 79-92Konferensbidrag (Refereegranskat)
    Abstract [en]

    Gender bias has been identified in many models for Natural Language Processing, stemming from implicit biases in the text corpora used to train the models. Such corpora are too large to closely analyze for biased or stereotypical content. Thus, we argue for a combination of quantitative and qualitative methods, where the quantitative part produces a view of the data of a size suitable for qualitative analysis. We investigate the usefulness of semi-supervised topic modeling for the detection and analysis of gender bias in three corpora (mainstream news articles in English and Swedish, and LGBTQ+ web content in English). We compare differences in topic models for three gender categories (masculine, feminine, and nonbinary or neutral) in each corpus. We find that in all corpora, genders are treated differently and that these differences tend to correspond to hegemonic ideas of gender.

    Ladda ner fulltext (pdf)
    fulltext
  • 39.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Björklund, Jenny
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Uppsala University, Sweden.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Theories of gender in natural language processing2022Ingår i: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22), Association for Computing Machinery (ACM), 2022, s. 2083-2102Konferensbidrag (Refereegranskat)
    Abstract [en]

    The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no comprehensive analysis of how “gender” is theorized in the field. We survey nearly 200 articles concerning gender bias in NLP to discover how the field conceptualizes gender both explicitly (e.g. through definitions of terms) and implicitly (e.g. through how gender is operationalized in practice). In order to get a better idea of emerging trajectories of thought, we split these articles into two sections by time.

    We find that the majority of the articles do not make their theo- rization of gender explicit, even if they clearly define “bias.” Almost none use a model of gender that is intersectional or inclusive of non- binary genders; and many conflate sex characteristics, social gender, and linguistic gender in ways that disregard the existence and expe- rience of trans, nonbinary, and intersex people. There is an increase between the two time-sections in statements acknowledging that gender is a complicated reality, however, very few articles manage to put this acknowledgment into practice. In addition to analyzing these findings, we provide specific recommendations to facilitate interdisciplinary work, and to incorporate theory and methodol- ogy from Gender Studies. Our hope is that this will produce more inclusive gender bias research in NLP.

    Ladda ner fulltext (pdf)
    fulltext
  • 40.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS). Linköping University.
    Björklund, Jenny
    Centre for Gender Research, Uppsala University.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    We don’t talk about that: case studies on intersectional analysis of social bias in large language models2024Ingår i: Proceedings of the 5th workshop on gender bias in natural language processing (GeBNLP) / [ed] Agnieszka Faleńska; Christine Basta; Marta Costa-jussà; Seraphina Goldfarb-Tarrant; Debora Nozza, Association for Computational Linguistics, 2024, s. 33-44Konferensbidrag (Refereegranskat)
    Abstract [en]

    Despite concerns that Large Language Models (LLMs) are vectors for reproducing and ampli- fying social biases such as sexism, transpho- bia, islamophobia, and racism, there is a lack of work qualitatively analyzing how such pat- terns of bias are generated by LLMs. We use mixed-methods approaches and apply a femi- nist, intersectional lens to the problem across two language domains, Swedish and English, by generating narrative texts using LLMs. We find that hegemonic norms are consistently re- produced; dominant identities are often treated as ‘default’; and discussion of identity itself may be considered ‘inappropriate’ by the safety features applied to some LLMs. Due to the dif- fering behaviors of models, depending both on their design and the language they are trained on, we observe that strategies of identifying “bias” must be adapted to individual models and their socio-cultural contexts.

    Ladda ner fulltext (pdf)
    fulltext
  • 41.
    Woldemariam, Yonas Demeke
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Bensch, Suna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Predicting User Competence from Linguistic Data2017Ingår i: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017) / [ed] Sivaji Bandyopadhyay, Jadavpur University , 2017, s. 476-484Konferensbidrag (Refereegranskat)
    Abstract [en]

    We investigate the problem of predicting the competence of users of the crowd-sourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well. 

    Ladda ner fulltext (pdf)
    fulltext
1 - 41 av 41
RefereraExporteraLänk till träfflistan
Permanent länk