Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish
Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Social Sciences, Umeå Centre for Gender Studies (UCGS). (Foundations of Language Processing)
Centre for Gender Research, Uppsala University.ORCID iD: 0000-0002-4954-4397
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)ORCID iD: 0000-0002-4696-9787
2020 (English)In: Proceedings of the Second Workshop on Gender Bias in Natural Language Processing / [ed] Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster, Association for Computational Linguistics, 2020, p. 79-92Conference paper, Published paper (Refereed)
Abstract [en]

Gender bias has been identified in many models for Natural Language Processing, stemming from implicit biases in the text corpora used to train the models. Such corpora are too large to closely analyze for biased or stereotypical content. Thus, we argue for a combination of quantitative and qualitative methods, where the quantitative part produces a view of the data of a size suitable for qualitative analysis. We investigate the usefulness of semi-supervised topic modeling for the detection and analysis of gender bias in three corpora (mainstream news articles in English and Swedish, and LGBTQ+ web content in English). We compare differences in topic models for three gender categories (masculine, feminine, and nonbinary or neutral) in each corpus. We find that in all corpora, genders are treated differently and that these differences tend to correspond to hegemonic ideas of gender.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2020. p. 79-92
Keywords [en]
gender bias, topic modelling
National Category
Language Technology (Computational Linguistics) Gender Studies
Research subject
Computer Science; gender studies
Identifiers
URN: urn:nbn:se:umu:diva-177576OAI: oai:DiVA.org:umu-177576DiVA, id: diva2:1509697
Conference
GeBNLP2020, COLING'2020 – The 28th International Conference on Computational Linguistics, December 8-13, 2020, Online
Projects
EQUITBLAvailable from: 2020-12-14 Created: 2020-12-14 Last updated: 2021-01-14Bibliographically approved

Open Access in DiVA

fulltext(435 kB)319 downloads
File information
File name FULLTEXT01.pdfFile size 435 kBChecksum SHA-512
75aa228350746b0e86210363c57b798ece033c2575b4d4847f67f75765a324b4e103ac97e9cae0ed67a4c4d527f81a31fcce08ad047894a644720db21a1da45c
Type fulltextMimetype application/pdf

Other links

URL

Authority records

Devinney, HannahBjörklund, JennyBjörklund, Henrik

Search in DiVA

By author/editor
Devinney, HannahBjörklund, JennyBjörklund, Henrik
By organisation
Department of Computing ScienceUmeå Centre for Gender Studies (UCGS)
Language Technology (Computational Linguistics)Gender Studies

Search outside of DiVA

GoogleGoogle Scholar
Total: 319 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1212 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf