Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Topic Modeling for Customer Insights: A Comparative Analysis of LDA and BERTopic in Categorizing Customer Calls
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.
2023 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Customer calls serve as a valuable source of feedback for financial service providers, potentially containing a wealth of unexplored insights into customer questions and concerns. However, these call data are typically unstructured and challenging to analyze effectively. This thesis project focuses on leveraging Topic Modeling techniques, a sub-field of Natural Language Processing, to extract meaningful customer insights from recorded customer calls to a European financial service provider. The objective of the study is to compare two widely used Topic Modeling algorithms, Latent Dirichlet Allocation (LDA) and BERTopic, in order to categorize and analyze the content of the calls. By leveraging the power of these algorithms, the thesis aims to provide the company with a comprehensive understanding of customer needs, preferences, and concerns, ultimately facilitating more effective decision-making processes. 

Through a literature review and dataset analysis, i.e., pre-processing to ensure data quality and consistency, the two algorithms, LDA and BERTopic, are applied to extract latent topics. The performance is then evaluated using quantitative and qualitative measures, i.e., perplexity and coherence scores as well as in- terpretability and usefulness of topic quality. The findings contribute to knowledge on Topic Modeling for customer insights and enable the company to improve customer engagement, satisfaction and tailor their customer strategies. 

The results show that LDA outperforms BERTopic in terms of topic quality and business value. Although BERTopic demonstrates a slightly better quantitative performance, LDA aligns much better with human interpretation, indicating a stronger ability to capture meaningful and coherent topics within company’s customer call data.

Place, publisher, year, edition, pages
2023. , p. s. 47
Keywords [en]
Customer Insights, Natural Language Processing, Topic Modeling, Latent Dirichlet Allocation, BERTopic
National Category
Mathematics
Identifiers
URN: urn:nbn:se:umu:diva-209225OAI: oai:DiVA.org:umu-209225DiVA, id: diva2:1763637
External cooperation
European Financial Service Provider
Educational program
Master of Science in Engineering and Management
Supervisors
Examiners
Available from: 2023-06-14 Created: 2023-06-07 Last updated: 2023-06-14Bibliographically approved

Open Access in DiVA

Axelborn_Berggren(6247 kB)6444 downloads
File information
File name FULLTEXT01.pdfFile size 6247 kBChecksum SHA-512
902faa4da981477a8704687762040f00f0927c569ca2acb52c0baa8957f5073ddaf104f5de876b34dfd458ea7ef2309972f26ce80ba1f9df164ff3be79a8f482
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Axelborn, HenrikBerggren, John
By organisation
Department of Mathematics and Mathematical Statistics
Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 6449 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 5310 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf