umu.sePublications
Change search
Refine search result
1 - 29 of 29
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Bensch, Suna
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Algorithmic properties of Millstream systems2010In: Developments in Language Theory: 14th International Conference, DLT 2010 / [ed] Sheng Yu, Springer Berlin/Heidelberg, 2010, p. 54-65Conference paper (Refereed)
    Abstract [en]

    Millstream systems have recently been proposed as a formalization of the linguistic idea that natural language should be described as a combination of different modules related by interfaces. In this paper we investigate algorithmic properties of Millstream systems having regular tree grammars as modules and MSO logic as interface logic. We focus on the so-called completion problem: Given trees generated by a subset of the modules, can they be completed into a valid configuration of the Millstream system?

  • 2.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank (Contributor)
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Characterizing Non-Regularity2014Report (Other academic)
    Abstract [en]

    This paper considers a characterization of the context-free non-regular languages, conjecturing that there for all such languages exists a fixed string thatcan be pumped to exhibit infinitely many equivalence classes. A proof is given only for a special case, but the general statement is conjectured to hold. The conjecture is then shown to imply that the shuffle of two context-free languagesis not context-free.

  • 3.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Shuffled languages: representation and recognition2013In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 489-490, p. 1-20Article in journal (Refereed)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, in other words, how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 4.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems2013In: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, p. 21-29Conference paper (Other academic)
    Abstract [en]

    We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

  • 5.
    Berglund, Martin
    et al.
    Stellenbosch University.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Single-Rooted DAGs in Regular DAG Languages: Parikh Image and Path Languages2017In: Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+13) / [ed] M. Kuhlmann, T. Scheffler, Association for Computational Linguistics , 2017, p. 94-101Conference paper (Refereed)
    Abstract [en]

    In a recent survey (Drewes, 2017) of results on DAG automata  some open problems are formulated for the case where the  DAG language accepted by a DAG automaton A is restricted to DAGs with a single root, denoted by L(A)u. Here we consider each of  those problems, demonstrating that: (i) the finiteness  of L(A)u is decidable, (ii) the path languages of L(A)u can be characterized in  terms of the string languages accepted by partially blind  multicounter automata, and  (iii) the Parikh image of L(A)u is semilinear.

  • 6.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    van der Merwe, Brink
    Stellenbosch University, South Africa.
    Watson, Bruce
    Stellenbosch University, South Africa.
    Cuts in Regular Expressions2013In: Developments in Language Theory: 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013. Proceedings / [ed] Marie-Pierre Béal, Olivier Carton, Springer Berlin/Heidelberg, 2013, p. 70-81Conference paper (Refereed)
    Abstract [en]

    Most software packages with regular expression matching engines offer operators that extend the classical regular expressions, such as counting, intersection, complementation, and interleaving. Some of the most popular engines, for example those of Java and Perl, also provide operators that are intended to control the nondeterminism inherent in regular expressions. We formalize this notion in the form of the cut and iterated cut operators. They do not extend the class of languages that can be defined beyond the regular, but they allow for exponentially more succinct representation of some languages. Membership testing remains polynomial, but emptiness testing becomes PSPACE-hard. 

  • 7.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Högberg, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Recognizing shuffled languages2011In: Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings / [ed] Adrian-Horia Dediu, Shunsuke Inenaga and Carlos Martín-Vide, Springer Berlin/Heidelberg, 2011, p. 142-154Conference paper (Refereed)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 8.
    Berglund, Martin
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Högberg, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Recognizing Shuffled Languages2011Report (Other academic)
    Abstract [en]

    Language models that use interleaving, or shuffle, operators have applications in various areas of computer science, including system verification, plan recognition, and natural language processing. We study the complexity of the membership problem for such models, i.e., how difficult it is to determine if a string belongs to a language or not. In particular, we investigate how interleaving can be introduced into models that capture the context-free languages.

  • 9.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Berglund, Martin
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Petter, Ericson
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Uniform vs. Nonuniform Membership for Mildly Context-Sensitive Languages: A Brief Survey2016In: Algorithms, ISSN 1999-4893, E-ISSN 1999-4893, Vol. 9, no 2, article id 32Article in journal (Refereed)
    Abstract [en]

    Parsing for mildly context-sensitive language formalisms is an important area within natural language processing. While the complexity of the parsing problem for some such formalisms is known to be polynomial, this is not the case for all of them. This article presents a series of results regarding the complexity of parsing for linear context-free rewriting systems and deterministic tree-walking transducers. We discuss the difference between uniform and nonuniform complexity measures and how parameterized complexity theory can be used to investigate how different aspects of the formalisms influence how hard the parsing problem is. The main results we survey are all hardness results and indicate that parsing is hard even for relatively small values of parameters such as rank and fan-out in a rewriting system.

  • 10.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Minimisation and Characterisation of Order-Preserving DAG Grammars2018Report (Other academic)
    Abstract [en]

    Order-preserving DAG grammars (OPDGs) is a formalism for processing semantic infor- mation in natural languages [5, 4]. OPDGs are sufficiently expressive to model abstract meaning representations, a graph-based form of semantic representation in which nodes en- code objects and edges relations. At the same time, they allow for efficient parsing in the uniform setting, where both the grammar and subject graph are taken as part of the input.

    In this article, we introduce an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars. This makes it possible to transfer a number of results from that domain to OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable in the so-called MAT setting. To conclude, we show that the languages generated by OPDGs are MSO-definable.

  • 11.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    On the Regularity and Learnability of Ordered DAG Languages2017In: Implementation and Application of Automata: 22nd International Conference, CIAA 2017, Marne-la-Vallée, France, June 27-30, 2017, Proceedings / [ed] Arnaud Carayol and Cyril Nicaud, Cham, 2017, p. 27-39Conference paper (Refereed)
    Abstract [en]

    Order-Preserving DAG Grammars (OPDGs) is a subclass of Hyper-Edge Replacement Grammars that can be parsed in polynomial time. Their associated class of languages is known as Ordered DAG Lan- guages, and the graphs they generate are characterised by being acyclic, rooted, and having a natural order on their nodes. OPDGs are useful in natural-language processing to model abstract meaning representa- tions. We state and prove a Myhill-Nerode theorem for ordered DAG languages, and translate it into a MAT-learning algorithm for the same class. The algorithm infers a minimal OPDG G for the target language in time polynomial in G and the samples provided by the MAT oracle. 

  • 12.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Johanna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Zechner, Niklas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Compression of finite-state automata through failure transitions2014In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 557, p. 87-100Article in journal (Refereed)
    Abstract [en]

    Several linear-time algorithms for automata-based pattern matching rely on failure transitions for efficient back-tracking. Like epsilon transitions, failure transition do not consume input symbols, but unlike them, they may only be taken when no other transition is applicable. At a semantic level, this conveniently models catch-all clauses and allows for compact language representation.

    This work investigates the transition-reduction problem for deterministic finite-state automata (DFA). The input is a DFA A and an integer k. The question is whether k or more transitions can be saved by replacing regular transitions with failure transitions. We show that while the problem is NP-complete, there are approximation techniques and heuristics that mitigate the computational complexity. We conclude by demonstrating the computational difficulty of two related minimisation problems, thereby cancelling the ongoing search for efficient algorithms.

  • 13.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Between a Rock and a Hard Place: Parsing for Hyperege Replacement DAG Grammars2016In: Proc. 10th International Conference on Language and Automata Theory and Applications (LATA 2016) / [ed] A.H. Dediu, J. Janoušek, C. Martín-Vide, and B. Truthe, Springer Publishing Company, 2016, Vol. 9618, p. 521-532Conference paper (Refereed)
    Abstract [en]

    Motivated by applications in natural language processing, we study the uniform membership problem for hyperedge-replacement grammars that generate directed acyclic graphs. Our major result is a low-degree polynomial-time algorithm that solves the uniform membership problem for a restricted type of such grammars. We motivate the necessity of the restrictions by two different NP-completeness results.

  • 14.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing Weighted Order-Preserving Hyperedge Replacement Grammars2019In: Proc. 16th Meeting on the Mathematics of Language (MOL 2019) / [ed] F. Drewes, P. de Groote, G. Penn, Association for Computational Linguistics, 2019Conference paper (Refereed)
    Abstract [en]

    We introduce a weighted extension of the recently proposed notion oforder-preserving hyperedge-replacement grammars and prove that the weightof a graph according to such a weighted graph grammar can be computeduniformly in quadratic time (under assumptions made precise in the paper).

  • 15.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Parsing Weighted Order-Preserving Hyperedge Replacement Grammars2018Report (Other academic)
    Abstract [en]

    We introduce a weighted extension of the recently proposed notion of order-preserving hyperedge-replacement grammars and prove that the weight of a graph according to such a weighted graph grammar can be computed uniformly in quadratic time (under assumptions made precise in the paper).

  • 16.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Drewes, Frank
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Starke, Florian
    Faculty of Computer Science, TU Dresden.
    Uniform Parsing for Hyperedge Replacement Grammars2018Report (Other academic)
    Abstract [en]

    It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

  • 17.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ericson, Petter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A note on the complexity of deterministic tree-walking transducers2013In: Non-Classical Models of Automata and Applications (NCMA 2013), Österreichische Computer Gesellschaft , 2013Conference paper (Refereed)
  • 18.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Gelade, Wouter
    Marquardt, Marcel
    Martens, Wim
    Incremental XPath Evaluation2009In: Proceedings of the 12th international conference on database theory: March 23-25, 2009, Saint Petersbourgh, Russia / [ed] Ronald Fagin, 2009, p. 162-173Conference paper (Refereed)
    Abstract [en]

    We study the problem of incrementally maintaining an XPath query on an XML database under updates. The updates we consider are node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D) · poly(Q)) and conjunctive forward XPath queries in time O(depth(D)· log(width(D))·poly(Q)), where D is the size of the database, Q the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in the database, respectively. The auxiliary data structures for maintenance are linear in D and polynomial in Q in all these cases.

  • 19.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Gelade, Wouter
    Martens, Wim
    Incremental XPath evaluation2010In: ACM Transactions on Database Systems, ISSN 0362-5915, E-ISSN 1557-4644, Vol. 35, no 4:29, p. 42-Article in journal (Refereed)
    Abstract [en]

    Incremental view maintenance for XPath queries asks to maintain a materialized XPath view over an XML database. It assumes an underlying XML database D and a query Q. One is given a sequence of updates U to D, and the problem is to compute the result of Q(U(D)): the result of evaluating query Q on database D after having applied updates U. This article initiates a systematic study of the Boolean version of this problem. In the Boolean version, one only wants to know whether Q(U(D)) is empty or not.

    In order to quickly answer this question, we are allowed to maintain an auxiliary data structure. The complexity of the maintenance algorithms is measured in, (1) the size of the auxiliary data structure, (2) the worst-case time per update needed to compute Q(U(D)), and (3) the worst-case time per update needed to bring the auxiliary data structure up to date. We allow three kinds of updates: node insertion, node deletion, and node relabeling. Our main results are that downward XPath queries can be incrementally maintained in time O(depth(D)·poly(|Q|)) per update and conjunctive forward XPath queries in time O(depth(D) · log(width(D))·poly(|Q|)) per update, where |Q| is the size of the query, and depth(D) and width(D) are the nesting depth and maximum number of siblings in database D, respectively. The auxiliary data structures for maintenance are linear in |D| and polynomial in |Q| in all these cases.

  • 20.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    The tractability frontier for NFA minimization2012In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 78, no 1, p. 198-210Article in journal (Refereed)
    Abstract [en]

    We prove that minimizing finite automata is NP-hard for almost all classes of automata that extend the class of deterministic finite automata. More specifically, we show that minimization is NP-hard for all finite automata classes that subsume the class of δNFAs which accept strings of length at most three. Here, δNFAs are the finite automata that are unambiguous, allow at most one state q with a non-deterministic transition for at most one alphabet symbol a, and are allowed to visit state q at most once in a run. As a corollary, we also obtain that the same result holds for all finite automata classes that subsume that class of finite automata that are unambiguous, have at most two initial states, and accept strings of length at most two.

  • 21.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    Schweikardt, Nicole
    Schwentick, Thomas
    Logik und Automaten: Ein echtes Dreamteam2010In: Informatik-Spektrum, ISSN 0170-6012, E-ISSN 1432-122X, Vol. 33, no 5, p. 452-461Article, review/survey (Other (popular science, discussion, etc.))
  • 22.
    Björklund, Henrik
    et al.
    Technical University of Dortmund, Germany.
    Martens, Wim
    Technical University of Dortmund, Germany.
    Schwentick, Thomas
    Technical University of Dortmund, Germany.
    Conjunctive query containment over trees2011In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 77, no 3, p. 450-472Article in journal (Refereed)
    Abstract [en]

    The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes ChildNextSibling, their transitive and reflexive closures, andFollowing. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNP-complete, or -complete. For the satisfiability problem most problems are classified as either in PTIME or NP-complete.

  • 23.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    Schwentick, Thomas
    Conjunctive query containment over trees using schema information2018In: Acta Informatica, ISSN 0001-5903, E-ISSN 1432-0525, Vol. 55, no 1, p. 17-56Article in journal (Refereed)
    Abstract [en]

    We study the containment, satisfiability, and validity problems for conjunctive queries over trees with respect to a schema. We show that conjunctive query containment and validity are 2EXPTIME -complete with respect to a schema, in both cases where the schema is given as a DTD or as a tree automaton. Furthermore, we show that satisfiability for conjunctive queries with respect to a schema can be decided in NP . The problem is NP -hard already for queries using only one kind of axis. Finally, we consider conjunctive queries that can test for equalities and inequalities of data values. Here, satisfiability and validity are decidable, but containment is undecidable, even without schema information. On the other hand, containment with respect to a schema becomes decidable again if the "larger" query is not allowed to use both equalities and inequalities.

  • 24.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    University of Bayreuth.
    Schwentick, Thomas
    Technical University of Dortmund.
    Validity of Tree Pattern Queries with Respect to Schema Information2013In: Mathematical Foundations of Computer Science 2013: 38th International Symposium, MFCS 2013, Klosterneuburg, Austria, August 26-30, 2013. Proceedings / [ed] Krishnendu Chatterjee, Jiri Sgall, Springer: Springer Berlin/Heidelberg, 2013, p. 171-182Conference paper (Refereed)
    Abstract [en]

    We prove that various containment and validity problems for tree pattern queries with respect to a schema are EXPTIME-complete. When one does not require the root of a tree pattern query to match the root of atree, validity of a non-branching tree pattern query with respect to a Relax NG schema or W3C XML Schema is already EXPTIME-hard when the query does not branch and uses only child axes. These hardness results already hold when the alphabet size is fixed. Validity with respect to a DTD is proved to be EXPTIME-hard already when the query only uses child axes and is allowed to branch only once.

  • 25.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Martens, Wim
    Universität Bayreuth.
    Timm, Thomas
    Universität Bayreuth.
    Efficient incremental evaluation of succinct regular expressions2015In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2015, p. 1541-1550Conference paper (Refereed)
    Abstract [en]

    Regular expressions are omnipresent in database applications. They form the structural core of schema languages for XML, they are a fundamental ingredient for navigational queries in graph databases, and are being considered in languages for upcoming technologies such as schema- and transformation languages for tabular data on the Web. In this paper we study the usage and effectiveness of the counting operator (or: limited repetition) in regular expressions. The counting operator is a popular extension which is part of the POSIX standard and therefore also present in regular expressions in grep, Java, Python, Perl, and Ruby. In a database context, expressions with counting appear in XML Schema and languages for querying graphs such as SPARQL 1.1 and Cypher.

    We first present a practical study that suggests that counters are extensively used in practice. We then investigate evaluation methods for such expressions and develop a new algorithm for efficient incremental evaluation. Finally, we conduct an extensive benchmark study that shows that exploiting counting operators can lead to speed-ups of several orders of magnitude in a wide range of settings: normal and incremental evaluation on synthetic and real expressions.

  • 26.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Schuster, Martin
    Technische Universität Dortmund.
    Schwentick, Thomas
    Technische Universität Dortmund.
    Kulbatzki, Joscha
    SCISYS Deutschland GmbH.
    On optimum left-to-right strategies for active context-free games2013In: ICDT '13 Proceedings of the 16th International Conference on Database Theory, ACM Press, 2013, p. 105-116Conference paper (Refereed)
    Abstract [en]

    Active context-free games are two-player games on strings over finite alphabets with one player trying to rewrite the input string to match a target specification. These games have been investigated in the context of exchanging Active XML (AXML) data. While it was known that the rewriting problem is undecidable in general, it is shown here that it is EXPSPACE-complete to decide for a given contextfree game, whether all safely rewritable strings can be safely rewritten in a left-to-right manner, a problem that was previously considered by Abiteboul et al. Furthermore, it is shown that the corresponding problem for games with finite replacement languages is EXPTIME-complete.

  • 27.
    Björklund, Henrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Schwentick, Thomas
    On notions of regularity for data languages2010In: Theoretical Computer Science, ISSN 0304-3975, E-ISSN 1879-2294, Vol. 411, no 4-5, p. 702-715Article in journal (Refereed)
    Abstract [en]

    With motivation from considerations in XML database theory and model checking, data strings have been introduced as an extension of finite alphabet strings which carry, at each position, a symbol and a data value from an infinite domain. Previous work has shown that it is difficult to come up with an expressive yet decidable automaton model for data languages. Recently, such a model, data automata, was introduced. This paper introduces a simpler but equivalent model and investigates its expressive power, algorithmic and closure properties, and some extensions.

  • 28.
    Woldemariam, Yonas Demeke
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Bensch, Suna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Predicting User Competence from Linguistic Data2017In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017) / [ed] Sivaji Bandyopadhyay, Jadavpur University , 2017, p. 476-484Conference paper (Refereed)
    Abstract [en]

    We investigate the problem of predicting the competence of users of the crowd-sourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well. 

  • 29.
    Woldemariam, Yonas Demeke
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Björklund, Henrik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Bensch, Suna
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Predicting User Competence from Linguistic Data2017In: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), NLP Association of India , 2017, p. 476-484Conference paper (Refereed)
    Abstract [en]

    We investigate the problem of predicting the competence of users of the crowdsourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well.

1 - 29 of 29
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf