Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 40) Show all publications
Berglund, M., Björklund, H. & Björklund, J. (2024). Parsing unranked tree languages, folded once. Algorithms, 17(6), Article ID 268.
Open this publication in new window or tab >>Parsing unranked tree languages, folded once
2024 (English)In: Algorithms, E-ISSN 1999-4893, Vol. 17, no 6, article id 268Article in journal (Refereed) Published
Abstract [en]

A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges (i.e., folds) selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation are enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper, we address the remaining case, where only one fold operation is applied, but the order among the edges is discarded. We show that, under these conditions, the problem is solvable in non-uniform polynomial time.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
graphs, transducers, trees, vector addition systems
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-227569 (URN)10.3390/a17060268 (DOI)2-s2.0-85196886791 (Scopus ID)
Funder
Swedish Research Council, 2020-03852Wallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg Foundation
Note

This paper is an extended version of a paper published in International Symposium on Fundamentals of Computation Theory, Trier, Germany, 18–21 September.

Available from: 2024-07-02 Created: 2024-07-02 Last updated: 2024-07-02Bibliographically approved
Björklund, H., Björklund, J. & Ericson, P. (2024). Tree-based generation of restricted graph languages. International Journal of Foundations of Computer Science, 35(1 & 2), 215-243
Open this publication in new window or tab >>Tree-based generation of restricted graph languages
2024 (English)In: International Journal of Foundations of Computer Science, ISSN 0129-0541, Vol. 35, no 1 & 2, p. 215-243Article in journal (Refereed) Published
Abstract [en]

Order-preserving DAG grammars (OPDGs) is a formalism for representing languages of structurally restricted graphs. As demonstrated in [17], they are sufficiently expressive to model abstract meaning representations in natural language processing, a graph-based form of semantic representation in which nodes encode objects and edges relations. At the same time, they can be parsed in O (n2 + nm) , where m and n are the sizes of the grammar and the input graph, respectively. In this work, we provide an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars under an equivalence theory. This makes it possible to transfer results from the field of formal tree languages to the domain of OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable under the \minimal adequeate teacher" paradigm, that is, by querying an oracle for equivalence between languages, and membership of individual graphs. To conclude, we demonstrate that the languages generated by OPDGs are definable in monadic second-order logic.

Place, publisher, year, edition, pages
World Scientific, 2024
Keywords
Graph languages, logic characterisation, MAT learning, minimization
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-217981 (URN)10.1142/S0129054123480106 (DOI)001109806500001 ()2-s2.0-85178101785 (Scopus ID)
Funder
Swedish Research Council, 2020-03852Wallenberg AI, Autonomous Systems and Software Program (WASP), Nest project Sting
Available from: 2023-12-15 Created: 2023-12-15 Last updated: 2024-05-14Bibliographically approved
Björklund, H. & Devinney, H. (2023). Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish. In: Proceedings of the third workshop on language technology for equality, diversity, inclusion: . Paper presented at Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2023) at RANLP 2023, Varna, Bulgaria, September 7, 2023 (pp. 54-61). The Association for Computational Linguistics
Open this publication in new window or tab >>Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish
2023 (English)In: Proceedings of the third workshop on language technology for equality, diversity, inclusion, The Association for Computational Linguistics , 2023, p. 54-61Conference paper, Published paper (Refereed)
Abstract [en]

Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.

Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

Place, publisher, year, edition, pages
The Association for Computational Linguistics, 2023
Keywords
Part-of-Speech, gendered pronouns, neopronouns
National Category
Language Technology (Computational Linguistics)
Research subject
computational linguistics
Identifiers
urn:nbn:se:umu:diva-213782 (URN)10.26615/978-954-452-084-7_008 (DOI)2-s2.0-85184990283 (Scopus ID)978-954-452-084-7 (ISBN)
Conference
Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2023) at RANLP 2023, Varna, Bulgaria, September 7, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-09-26 Created: 2023-09-26 Last updated: 2024-02-27Bibliographically approved
Berglund, M., Björklund, H. & Björklund, J. (2023). Parsing unranked tree languages, folded once. In: Henning Fernau; Klaus Jansen (Ed.), Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings. Paper presented at 24th International Symposium on Fundamentals of Computation Theory, FCT 2023, Trier, Germany, September 18–21, 2023 (pp. 60-73). Springer Nature
Open this publication in new window or tab >>Parsing unranked tree languages, folded once
2023 (English)In: Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings / [ed] Henning Fernau; Klaus Jansen, Springer Nature, 2023, p. 60-73Conference paper, Published paper (Refereed)
Abstract [en]

A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges, i.e., folds, selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation is enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper we address the remaining case where only one fold operation is applied, but the order among edges is discarded. We show that under these conditions, the problem is solvable in non-uniform polynomial time.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14292
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-215936 (URN)10.1007/978-3-031-43587-4_5 (DOI)2-s2.0-85174590997 (Scopus ID)9783031435867 (ISBN)
Conference
24th International Symposium on Fundamentals of Computation Theory, FCT 2023, Trier, Germany, September 18–21, 2023
Available from: 2023-11-02 Created: 2023-11-02 Last updated: 2023-11-02Bibliographically approved
Berglund, M., Björklund, H., Björklund, J. & Boiret, A. (2023). Transduction from trees to graphs through folding. Information and Computation, 295, Article ID 105111.
Open this publication in new window or tab >>Transduction from trees to graphs through folding
2023 (English)In: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 295, article id 105111Article in journal (Refereed) Published
Abstract [en]

We introduce a fold operation that realises a tree-to-graph transduction by merging selected nodes in the input tree to form a possibly cyclic output graph. The work is motivated by the increasing use of graph-based representations in semantic parsing. We show that a suitable class of graphs languages can be generated by applying the fold operation to regular unranked tree languages. We investigate two versions of the fold operation, one that preserves a depth-first ordering between the edges, and one that does not. Finally, we demonstrate that the time complexity for the associated non-uniform membership problem is solvable in polynomial time for the order-preserving version, and NP-complete for the order-cancelling one.

Place, publisher, year, edition, pages
Elsevier, 2023
Keywords
Graphs, Semantic representations, Tranducers, Trees
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-216195 (URN)10.1016/j.ic.2023.105111 (DOI)2-s2.0-85175145940 (Scopus ID)
Funder
Swedish Research Council, 2020-03852
Available from: 2023-11-08 Created: 2023-11-08 Last updated: 2023-11-08Bibliographically approved
Björklund, H. & Devinney, H. (2022). Improving Swedish part-of-speech tagging for hen. In: : . Paper presented at Swedish Language Technology Conference 2022, Stockholm, Sweden, November 23-25, 2022.
Open this publication in new window or tab >>Improving Swedish part-of-speech tagging for hen
2022 (English)Conference paper, Oral presentation only (Refereed)
Abstract [en]

Despite the fact that the gender-neutral pro-noun hen was officially added to the Swedish language in 2014, state of the art part of speech taggers still routinely fail to identify it as a pronoun. We retrain both efselab and spaCy models with augmented (semi-synthetic) data, where instances of gendered pronouns are replaced by hen to correct for the lack of representation in the original training data. Our results show that adding such data works to correct for the disparity in performance

Keywords
Part-of-Speech, gendered pronouns, neopronouns
National Category
Language Technology (Computational Linguistics)
Research subject
computational linguistics
Identifiers
urn:nbn:se:umu:diva-201268 (URN)
Conference
Swedish Language Technology Conference 2022, Stockholm, Sweden, November 23-25, 2022
Available from: 2022-11-24 Created: 2022-11-24 Last updated: 2022-11-28Bibliographically approved
Devinney, H., Björklund, J. & Björklund, H. (2022). Theories of Gender in Natural Language Processing. In: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22): . Paper presented at ACM FAccT Conference 2022, Conference on Fairness, Accountability, and Transparency, Hybrid via Seoul, Soth Korea, June 21-14, 2022.
Open this publication in new window or tab >>Theories of Gender in Natural Language Processing
2022 (English)In: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22), 2022Conference paper, Published paper (Refereed)
Abstract [en]

The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no comprehensive analysis of how “gender” is theorized in the field. We survey nearly 200 articles concerning gender bias in NLP to discover how the field conceptualizes gender both explicitly (e.g. through definitions of terms) and implicitly (e.g. through how gender is operationalized in practice). In order to get a better idea of emerging trajectories of thought, we split these articles into two sections by time.

We find that the majority of the articles do not make their theo- rization of gender explicit, even if they clearly define “bias.” Almost none use a model of gender that is intersectional or inclusive of non- binary genders; and many conflate sex characteristics, social gender, and linguistic gender in ways that disregard the existence and expe- rience of trans, nonbinary, and intersex people. There is an increase between the two time-sections in statements acknowledging that gender is a complicated reality, however, very few articles manage to put this acknowledgment into practice. In addition to analyzing these findings, we provide specific recommendations to facilitate interdisciplinary work, and to incorporate theory and methodol- ogy from Gender Studies. Our hope is that this will produce more inclusive gender bias research in NLP.

Keywords
natural language processing, gender bias, gender studies
National Category
Language Technology (Computational Linguistics) Gender Studies
Research subject
Computer Science; gender studies
Identifiers
urn:nbn:se:umu:diva-194742 (URN)10.1145/3531146.3534627 (DOI)2-s2.0-85133018925 (Scopus ID)
Conference
ACM FAccT Conference 2022, Conference on Fairness, Accountability, and Transparency, Hybrid via Seoul, Soth Korea, June 21-14, 2022
Note

Alternative title: "Theories of 'Gender' in NLP Bias Research"

Available from: 2022-05-16 Created: 2022-05-16 Last updated: 2023-03-24
Björklund, H., Drewes, F., Ericson, P. & Starke, F. (2021). Uniform Parsing for Hyperedge Replacement Grammars. Journal of computer and system sciences (Print), 118, 1-27
Open this publication in new window or tab >>Uniform Parsing for Hyperedge Replacement Grammars
2021 (English)In: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 118, p. 1-27Article in journal (Refereed) Published
Abstract [en]

It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

Place, publisher, year, edition, pages
Elsevier, 2021
Keywords
parsing, graph language, graph grammar, abstract meaning representation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-177125 (URN)10.1016/j.jcss.2020.10.002 (DOI)000615930900001 ()2-s2.0-85097717738 (Scopus ID)
Available from: 2020-11-29 Created: 2020-11-29 Last updated: 2023-09-05Bibliographically approved
Devinney, H., Björklund, J. & Björklund, H. (2020). Crime and Relationship: Exploring Gender Bias in NLP Corpora. In: : . Paper presented at SLTC 2020 – The Eighth Swedish Language Technology Conference, 25–27 November 2020, Online.
Open this publication in new window or tab >>Crime and Relationship: Exploring Gender Bias in NLP Corpora
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Gender bias in natural language processing (NLP) tools, deriving from implicit human bias embedded in language data, is an important and complicated problem on the road to fair algorithms. We leverage topic modeling to retrieve documents associated with particular gendered categories, and discuss how exploring these documents can inform our understanding of the corpora we may use to train NLP tools. This is a starting point for challenging the systemic power structures and producing a justice-focused approach to NLP.

Keywords
gender bias, topic modeling
National Category
Language Technology (Computational Linguistics) Gender Studies
Research subject
Computer Science; gender studies
Identifiers
urn:nbn:se:umu:diva-177583 (URN)
Conference
SLTC 2020 – The Eighth Swedish Language Technology Conference, 25–27 November 2020, Online
Projects
EQUITBL
Available from: 2020-12-14 Created: 2020-12-14 Last updated: 2021-01-14Bibliographically approved
Devinney, H., Björklund, J. & Björklund, H. (2020). Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish. In: Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster (Ed.), Proceedings of the Second Workshop on Gender Bias in Natural Language Processing: . Paper presented at GeBNLP2020, COLING'2020 – The 28th International Conference on Computational Linguistics, December 8-13, 2020, Online (pp. 79-92). Association for Computational Linguistics
Open this publication in new window or tab >>Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish
2020 (English)In: Proceedings of the Second Workshop on Gender Bias in Natural Language Processing / [ed] Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster, Association for Computational Linguistics, 2020, p. 79-92Conference paper, Published paper (Refereed)
Abstract [en]

Gender bias has been identified in many models for Natural Language Processing, stemming from implicit biases in the text corpora used to train the models. Such corpora are too large to closely analyze for biased or stereotypical content. Thus, we argue for a combination of quantitative and qualitative methods, where the quantitative part produces a view of the data of a size suitable for qualitative analysis. We investigate the usefulness of semi-supervised topic modeling for the detection and analysis of gender bias in three corpora (mainstream news articles in English and Swedish, and LGBTQ+ web content in English). We compare differences in topic models for three gender categories (masculine, feminine, and nonbinary or neutral) in each corpus. We find that in all corpora, genders are treated differently and that these differences tend to correspond to hegemonic ideas of gender.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2020
Keywords
gender bias, topic modelling
National Category
Language Technology (Computational Linguistics) Gender Studies
Research subject
Computer Science; gender studies
Identifiers
urn:nbn:se:umu:diva-177576 (URN)
Conference
GeBNLP2020, COLING'2020 – The 28th International Conference on Computational Linguistics, December 8-13, 2020, Online
Projects
EQUITBL
Available from: 2020-12-14 Created: 2020-12-14 Last updated: 2021-01-14Bibliographically approved
Projects
Parameterized Natural Language Parsing [2011-06080_VR]; Umeå University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4696-9787

Search in DiVA

Show all publications