Umeå universitets logga

umu.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 41) Visa alla publikationer
Berglund, M., Björklund, H. & Björklund, J. (2024). Parsing unranked tree languages, folded once. Algorithms, 17(6), Article ID 268.
Öppna denna publikation i ny flik eller fönster >>Parsing unranked tree languages, folded once
2024 (Engelska)Ingår i: Algorithms, E-ISSN 1999-4893, Vol. 17, nr 6, artikel-id 268Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges (i.e., folds) selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation are enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper, we address the remaining case, where only one fold operation is applied, but the order among the edges is discarded. We show that, under these conditions, the problem is solvable in non-uniform polynomial time.

Ort, förlag, år, upplaga, sidor
MDPI, 2024
Nyckelord
graphs, transducers, trees, vector addition systems
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-227569 (URN)10.3390/a17060268 (DOI)2-s2.0-85196886791 (Scopus ID)
Forskningsfinansiär
Vetenskapsrådet, 2020-03852Wallenberg AI, Autonomous Systems and Software Program (WASP)Knut och Alice Wallenbergs Stiftelse
Anmärkning

This paper is an extended version of a paper published in International Symposium on Fundamentals of Computation Theory, Trier, Germany, 18–21 September.

Tillgänglig från: 2024-07-02 Skapad: 2024-07-02 Senast uppdaterad: 2024-07-02Bibliografiskt granskad
Björklund, H., Björklund, J. & Ericson, P. (2024). Tree-based generation of restricted graph languages. International Journal of Foundations of Computer Science, 35(1 & 2), 215-243
Öppna denna publikation i ny flik eller fönster >>Tree-based generation of restricted graph languages
2024 (Engelska)Ingår i: International Journal of Foundations of Computer Science, ISSN 0129-0541, Vol. 35, nr 1 & 2, s. 215-243Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Order-preserving DAG grammars (OPDGs) is a formalism for representing languages of structurally restricted graphs. As demonstrated in [17], they are sufficiently expressive to model abstract meaning representations in natural language processing, a graph-based form of semantic representation in which nodes encode objects and edges relations. At the same time, they can be parsed in O (n2 + nm) , where m and n are the sizes of the grammar and the input graph, respectively. In this work, we provide an initial algebra semantic for OPDGs, which allows us to view them as regular tree grammars under an equivalence theory. This makes it possible to transfer results from the field of formal tree languages to the domain of OPDGs, both in the unweighted and the weighted case. In particular, we show that deterministic OPDGs can be minimised efficiently, and that they are learnable under the \minimal adequeate teacher" paradigm, that is, by querying an oracle for equivalence between languages, and membership of individual graphs. To conclude, we demonstrate that the languages generated by OPDGs are definable in monadic second-order logic.

Ort, förlag, år, upplaga, sidor
World Scientific, 2024
Nyckelord
Graph languages, logic characterisation, MAT learning, minimization
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-217981 (URN)10.1142/S0129054123480106 (DOI)001109806500001 ()2-s2.0-85178101785 (Scopus ID)
Forskningsfinansiär
Vetenskapsrådet, 2020-03852Wallenberg AI, Autonomous Systems and Software Program (WASP), Nest project Sting
Tillgänglig från: 2023-12-15 Skapad: 2023-12-15 Senast uppdaterad: 2024-05-14Bibliografiskt granskad
Devinney, H., Björklund, J. & Björklund, H. (2024). We don’t talk about that: case studies on intersectional analysis of social bias in large language models. In: Agnieszka Faleńska; Christine Basta; Marta Costa-jussà; Seraphina Goldfarb-Tarrant; Debora Nozza (Ed.), Proceedings of the 5th workshop on gender bias in natural language processing (GeBNLP): . Paper presented at Workshop on Gender Bias in Natural Language Processing (GeBNLP), Bangkok, Thailand, 16th August, 2024. (pp. 33-44). Association for Computational Linguistics
Öppna denna publikation i ny flik eller fönster >>We don’t talk about that: case studies on intersectional analysis of social bias in large language models
2024 (Engelska)Ingår i: Proceedings of the 5th workshop on gender bias in natural language processing (GeBNLP) / [ed] Agnieszka Faleńska; Christine Basta; Marta Costa-jussà; Seraphina Goldfarb-Tarrant; Debora Nozza, Association for Computational Linguistics, 2024, s. 33-44Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Despite concerns that Large Language Models (LLMs) are vectors for reproducing and ampli- fying social biases such as sexism, transpho- bia, islamophobia, and racism, there is a lack of work qualitatively analyzing how such pat- terns of bias are generated by LLMs. We use mixed-methods approaches and apply a femi- nist, intersectional lens to the problem across two language domains, Swedish and English, by generating narrative texts using LLMs. We find that hegemonic norms are consistently re- produced; dominant identities are often treated as ‘default’; and discussion of identity itself may be considered ‘inappropriate’ by the safety features applied to some LLMs. Due to the dif- fering behaviors of models, depending both on their design and the language they are trained on, we observe that strategies of identifying “bias” must be adapted to individual models and their socio-cultural contexts.

Ort, förlag, år, upplaga, sidor
Association for Computational Linguistics, 2024
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Forskningsämne
datorlingvistik
Identifikatorer
urn:nbn:se:umu:diva-228891 (URN)979-8-89176-137-7 (ISBN)
Konferens
Workshop on Gender Bias in Natural Language Processing (GeBNLP), Bangkok, Thailand, 16th August, 2024.
Tillgänglig från: 2024-08-29 Skapad: 2024-08-29 Senast uppdaterad: 2024-08-29Bibliografiskt granskad
Björklund, H. & Devinney, H. (2023). Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish. In: Proceedings of the third workshop on language technology for equality, diversity, inclusion: . Paper presented at Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2023) at RANLP 2023, Varna, Bulgaria, September 7, 2023 (pp. 54-61). The Association for Computational Linguistics
Öppna denna publikation i ny flik eller fönster >>Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish
2023 (Engelska)Ingår i: Proceedings of the third workshop on language technology for equality, diversity, inclusion, The Association for Computational Linguistics , 2023, s. 54-61Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.

Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

Ort, förlag, år, upplaga, sidor
The Association for Computational Linguistics, 2023
Nyckelord
Part-of-Speech, gendered pronouns, neopronouns
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Forskningsämne
datorlingvistik
Identifikatorer
urn:nbn:se:umu:diva-213782 (URN)10.26615/978-954-452-084-7_008 (DOI)2-s2.0-85184990283 (Scopus ID)978-954-452-084-7 (ISBN)
Konferens
Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2023) at RANLP 2023, Varna, Bulgaria, September 7, 2023
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Tillgänglig från: 2023-09-26 Skapad: 2023-09-26 Senast uppdaterad: 2024-02-27Bibliografiskt granskad
Berglund, M., Björklund, H. & Björklund, J. (2023). Parsing unranked tree languages, folded once. In: Henning Fernau; Klaus Jansen (Ed.), Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings. Paper presented at 24th International Symposium on Fundamentals of Computation Theory, FCT 2023, Trier, Germany, September 18–21, 2023 (pp. 60-73). Springer Nature
Öppna denna publikation i ny flik eller fönster >>Parsing unranked tree languages, folded once
2023 (Engelska)Ingår i: Fundamentals of computation theory: 24th International Symposium, FCT 2023, Trier, Germany, September 18–21, 2023, Proceedings / [ed] Henning Fernau; Klaus Jansen, Springer Nature, 2023, s. 60-73Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

A regular unranked tree folding consists of a regular unranked tree language and a folding operation that merges, i.e., folds, selected nodes of a tree to form a graph; the combination is a formal device for representing graph languages. If, in the process of folding, the order among edges is discarded so that the result is an unordered graph, then two applications of a fold operation is enough to make the associated parsing problem NP-complete. However, if the order is kept, then the problem is solvable in non-uniform polynomial time. In this paper we address the remaining case where only one fold operation is applied, but the order among edges is discarded. We show that under these conditions, the problem is solvable in non-uniform polynomial time.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2023
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14292
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-215936 (URN)10.1007/978-3-031-43587-4_5 (DOI)2-s2.0-85174590997 (Scopus ID)9783031435867 (ISBN)
Konferens
24th International Symposium on Fundamentals of Computation Theory, FCT 2023, Trier, Germany, September 18–21, 2023
Tillgänglig från: 2023-11-02 Skapad: 2023-11-02 Senast uppdaterad: 2023-11-02Bibliografiskt granskad
Berglund, M., Björklund, H., Björklund, J. & Boiret, A. (2023). Transduction from trees to graphs through folding. Information and Computation, 295, Article ID 105111.
Öppna denna publikation i ny flik eller fönster >>Transduction from trees to graphs through folding
2023 (Engelska)Ingår i: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 295, artikel-id 105111Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

We introduce a fold operation that realises a tree-to-graph transduction by merging selected nodes in the input tree to form a possibly cyclic output graph. The work is motivated by the increasing use of graph-based representations in semantic parsing. We show that a suitable class of graphs languages can be generated by applying the fold operation to regular unranked tree languages. We investigate two versions of the fold operation, one that preserves a depth-first ordering between the edges, and one that does not. Finally, we demonstrate that the time complexity for the associated non-uniform membership problem is solvable in polynomial time for the order-preserving version, and NP-complete for the order-cancelling one.

Ort, förlag, år, upplaga, sidor
Elsevier, 2023
Nyckelord
Graphs, Semantic representations, Tranducers, Trees
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-216195 (URN)10.1016/j.ic.2023.105111 (DOI)2-s2.0-85175145940 (Scopus ID)
Forskningsfinansiär
Vetenskapsrådet, 2020-03852
Tillgänglig från: 2023-11-08 Skapad: 2023-11-08 Senast uppdaterad: 2023-11-08Bibliografiskt granskad
Björklund, H. & Devinney, H. (2022). Improving Swedish part-of-speech tagging for hen. In: : . Paper presented at Swedish Language Technology Conference 2022, Stockholm, Sweden, November 23-25, 2022.
Öppna denna publikation i ny flik eller fönster >>Improving Swedish part-of-speech tagging for hen
2022 (Engelska)Konferensbidrag, Enbart muntlig presentation (Refereegranskat)
Abstract [en]

Despite the fact that the gender-neutral pro-noun hen was officially added to the Swedish language in 2014, state of the art part of speech taggers still routinely fail to identify it as a pronoun. We retrain both efselab and spaCy models with augmented (semi-synthetic) data, where instances of gendered pronouns are replaced by hen to correct for the lack of representation in the original training data. Our results show that adding such data works to correct for the disparity in performance

Nyckelord
Part-of-Speech, gendered pronouns, neopronouns
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Forskningsämne
datorlingvistik
Identifikatorer
urn:nbn:se:umu:diva-201268 (URN)
Konferens
Swedish Language Technology Conference 2022, Stockholm, Sweden, November 23-25, 2022
Tillgänglig från: 2022-11-24 Skapad: 2022-11-24 Senast uppdaterad: 2022-11-28Bibliografiskt granskad
Devinney, H., Björklund, J. & Björklund, H. (2022). Theories of gender in natural language processing. In: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22): . Paper presented at ACM FAccT Conference 2022, Conference on Fairness, Accountability, and Transparency, Hybrid via Seoul, Soth Korea, June 21-14, 2022 (pp. 2083-2102). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Theories of gender in natural language processing
2022 (Engelska)Ingår i: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22), Association for Computing Machinery (ACM), 2022, s. 2083-2102Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no comprehensive analysis of how “gender” is theorized in the field. We survey nearly 200 articles concerning gender bias in NLP to discover how the field conceptualizes gender both explicitly (e.g. through definitions of terms) and implicitly (e.g. through how gender is operationalized in practice). In order to get a better idea of emerging trajectories of thought, we split these articles into two sections by time.

We find that the majority of the articles do not make their theo- rization of gender explicit, even if they clearly define “bias.” Almost none use a model of gender that is intersectional or inclusive of non- binary genders; and many conflate sex characteristics, social gender, and linguistic gender in ways that disregard the existence and expe- rience of trans, nonbinary, and intersex people. There is an increase between the two time-sections in statements acknowledging that gender is a complicated reality, however, very few articles manage to put this acknowledgment into practice. In addition to analyzing these findings, we provide specific recommendations to facilitate interdisciplinary work, and to incorporate theory and methodol- ogy from Gender Studies. Our hope is that this will produce more inclusive gender bias research in NLP.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2022
Nyckelord
natural language processing, gender bias, gender studies
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling) Genusstudier
Forskningsämne
datalogi; genusvetenskap
Identifikatorer
urn:nbn:se:umu:diva-194742 (URN)10.1145/3531146.3534627 (DOI)2-s2.0-85133018925 (Scopus ID)9781450393522 (ISBN)
Konferens
ACM FAccT Conference 2022, Conference on Fairness, Accountability, and Transparency, Hybrid via Seoul, Soth Korea, June 21-14, 2022
Anmärkning

Alternative title: "Theories of 'Gender' in NLP Bias Research"

Tillgänglig från: 2022-05-16 Skapad: 2022-05-16 Senast uppdaterad: 2024-08-27Bibliografiskt granskad
Björklund, H., Drewes, F., Ericson, P. & Starke, F. (2021). Uniform Parsing for Hyperedge Replacement Grammars. Journal of computer and system sciences (Print), 118, 1-27
Öppna denna publikation i ny flik eller fönster >>Uniform Parsing for Hyperedge Replacement Grammars
2021 (Engelska)Ingår i: Journal of computer and system sciences (Print), ISSN 0022-0000, E-ISSN 1090-2724, Vol. 118, s. 1-27Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

It is well known that hyperedge-replacement grammars can generate NP-complete graph languages even under seemingly harsh restrictions. This means that the parsing problem is difficult even in the non-uniform setting, in which the grammar is considered to be fixed rather than being part of the input. Little is known about restrictions under which truly uniform polynomial parsing is possible. In this paper we propose a low-degree polynomial-time algorithm that solves the uniform parsing problem for a restricted type of hyperedge-replacement grammars which we expect to be of interest for practical applications.

Ort, förlag, år, upplaga, sidor
Elsevier, 2021
Nyckelord
parsing, graph language, graph grammar, abstract meaning representation
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-177125 (URN)10.1016/j.jcss.2020.10.002 (DOI)000615930900001 ()2-s2.0-85097717738 (Scopus ID)
Tillgänglig från: 2020-11-29 Skapad: 2020-11-29 Senast uppdaterad: 2023-09-05Bibliografiskt granskad
Devinney, H., Björklund, J. & Björklund, H. (2020). Crime and Relationship: Exploring Gender Bias in NLP Corpora. In: : . Paper presented at SLTC 2020 – The Eighth Swedish Language Technology Conference, 25–27 November 2020, Online.
Öppna denna publikation i ny flik eller fönster >>Crime and Relationship: Exploring Gender Bias in NLP Corpora
2020 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Gender bias in natural language processing (NLP) tools, deriving from implicit human bias embedded in language data, is an important and complicated problem on the road to fair algorithms. We leverage topic modeling to retrieve documents associated with particular gendered categories, and discuss how exploring these documents can inform our understanding of the corpora we may use to train NLP tools. This is a starting point for challenging the systemic power structures and producing a justice-focused approach to NLP.

Nyckelord
gender bias, topic modeling
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling) Genusstudier
Forskningsämne
datalogi; genusvetenskap
Identifikatorer
urn:nbn:se:umu:diva-177583 (URN)
Konferens
SLTC 2020 – The Eighth Swedish Language Technology Conference, 25–27 November 2020, Online
Projekt
EQUITBL
Tillgänglig från: 2020-12-14 Skapad: 2020-12-14 Senast uppdaterad: 2021-01-14Bibliografiskt granskad
Projekt
Parametriserad syntaktisk analys för naturliga språk [2011-06080_VR]; Umeå universitet
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-4696-9787

Sök vidare i DiVA

Visa alla publikationer