Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-6th-edition.csl
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Dynamic topic modeling by clustering embeddings from pretrained language models: a research proposal
Umeå University, Faculty of Science and Technology, Department of Computing Science. Adlede AB, Umeå, Sweden. (Foundations of Language Processing)ORCID iD: 0000-0002-4366-7863
Adlede AB, Umeå, Sweden.ORCID iD: 0000-0001-6601-5190
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Foundations of Language Processing)ORCID iD: 0000-0001-7349-7693
2022 (English)In: Proceedings of the 2nd conference of the Asia-Pacific chapter of the Association for computational linguistics and the 12th international joint conference on natural language processing: student research workshop / [ed] Yan Hanqi; Yang Zonghan; Sebastian Ruder; Wan Xiaojun, Association for Computational Linguistics (ACL) , 2022, p. 84-91Conference paper, Published paper (Refereed)
Abstract [en]

A new trend in topic modeling research is to do Neural Topic Modeling by Clustering document Embeddings (NTM-CE) created with a pretrained language model. Studies have evaluated static NTM-CE models and found them performing comparably to, or even better than other topic models. An important extension of static topic modeling is making the models dynamic, allowing the study of topic evolution over time, as well as detecting emerging and disappearing topics. In this research proposal, we present two research questions to understand dynamic topic modeling with NTM-CE theoretically and practically. To answer these, we propose four phases with the aim of establishing evaluation methods for dynamic topic modeling, finding NTM-CE-specific properties, and creating a framework for dynamic NTM-CE. For evaluation, we propose to use both quantitative measurements of coherence and human evaluation supported by our recently developed tool.

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL) , 2022. p. 84-91
Keywords [en]
topic modeling, dynamic topic modeling, topic modeling evaluation, research proposal, pretrained language model
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:umu:diva-202486DOI: 10.18653/v1/2022.aacl-srw.12ISBN: 9781955917568 (electronic)OAI: oai:DiVA.org:umu-202486DiVA, id: diva2:1725510
Conference
The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Online, November 21-24, 2022
Funder
Swedish Foundation for Strategic Research, ID19-0055Available from: 2023-01-11 Created: 2023-01-11 Last updated: 2025-03-11Bibliographically approved

Open Access in DiVA

fulltext(776 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 776 kBChecksum SHA-512
48be2991ace331e4ff1679920b00181a74c1a67d84802eb57fd974f28ecfbb3141ca0a648559cc68b08c44cedc15d9290b71099c8df0409f91f9f01d536cf07b
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Eklund, AntonDrewes, Frank

Search in DiVA

By author/editor
Eklund, AntonForsman, MonaDrewes, Frank
By organisation
Department of Computing Science
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 295 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-6th-edition.csl
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf