Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Topic modelling of ukraine war-related news using latent dirichlet allocation with collapsed Gibbs sampling
Umeå University, Faculty of Science and Technology, Department of Computing Science. National Technical University “Kharkiv Polytechnic Institute”, Kyrpychova str. 2, Kharkiv, Ukraine.ORCID iD: 0000-0002-9826-0286
Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, Ukraine.
Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, Ukraine.
Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, Ukraine.
Show others and affiliations
2024 (English)In: ISW-CoLInS 2024. Intelligent Systems Workshop at CoLInS 2024: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Systems. Volume III: Intelligent Systems Workshop, CEUR-WS , 2024, p. 1-15Conference paper, Published paper (Refereed)
Abstract [en]

The context of this research is the application of topic modeling to war-related news in the context of the Ukraine war. The objective of the research is to use Latent Dirichlet Allocation (LDA) with Collapsed Gibbs sampling to identify distinct content groups in war-related news. The method used in the research involves data scraping from a Ukrainian news website, data preprocessing, and applying the LDA with Collapsed Gibbs algorithm to infer the latent topics within the corpus. The results of the research include the identification of twelve distinct topics and the corresponding keywords that characterize each topic. The analysis of the results provides insights into the context of each topic, such as discussions on safety measures during wartime, consequences of military actions, and reports on military casualties. The research concludes that the application of LDA with Collapsed Gibbs is a valuable tool for identifying and understanding the context of war-related news. However, there may be discrepancies between the results of the model and human interpretation, which may be due to limitations in the results, model parameters, and the presence of noise data. Future research should focus on optimizing model parameters, filtering noise data, and improving the analysis of topic context to enhance the reliability and interpretability of the results.

Place, publisher, year, edition, pages
CEUR-WS , 2024. p. 1-15
Series
CEUR workshop proceedings, ISSN 1613-0073 ; 3688
Keywords [en]
Latent Dirichlet Allocation, Topic modeling, Ukraine war
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-225937Scopus ID: 2-s2.0-85195141693OAI: oai:DiVA.org:umu-225937DiVA, id: diva2:1868976
Conference
8th International Conference on Computational Linguistics and Intelligent Systems, Lviv, Ukraine, April 12-13, 2024
Available from: 2024-06-12 Created: 2024-06-12 Last updated: 2024-07-22Bibliographically approved

Open Access in DiVA

fulltext(1321 kB)133 downloads
File information
File name FULLTEXT01.pdfFile size 1321 kBChecksum SHA-512
fccaedf833892a057a75d942b83196b0086766f661f2a82c0517385cf9fabd272b70af90ad646f1ddeaa1fa3bba0cb3c2fc8c0777f5d2966b00aa4cae4c2eb99
Type fulltextMimetype application/pdf

Other links

ScopusCEUR workshop proceedings 3688

Authority records

Khairova, Nina

Search in DiVA

By author/editor
Khairova, Nina
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 134 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 553 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf