Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-4696-9787
Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS).
2023 (English)In: Proceedings of the third workshop on language technology for equality, diversity, inclusion, The Association for Computational Linguistics , 2023, p. 54-61Conference paper, Published paper (Refereed)
Abstract [en]

Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.

Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

Place, publisher, year, edition, pages
The Association for Computational Linguistics , 2023. p. 54-61
Keywords [en]
Part-of-Speech, gendered pronouns, neopronouns
National Category
Language Technology (Computational Linguistics)
Research subject
computational linguistics
Identifiers
URN: urn:nbn:se:umu:diva-213782DOI: 10.26615/978-954-452-084-7_008Scopus ID: 2-s2.0-85184990283ISBN: 978-954-452-084-7 (print)OAI: oai:DiVA.org:umu-213782DiVA, id: diva2:1800286
Conference
Third Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2023) at RANLP 2023, Varna, Bulgaria, September 7, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2023-09-26 Created: 2023-09-26 Last updated: 2024-02-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusProceedings

Authority records

Björklund, HenrikDevinney, Hannah

Search in DiVA

By author/editor
Björklund, HenrikDevinney, Hannah
By organisation
Department of Computing ScienceUmeå Centre for Gender Studies (UCGS)
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 530 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf