Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical data
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.ORCID iD: 0000-0002-9040-6674
Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics.ORCID iD: 0000-0003-1591-5716
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.ORCID iD: 0000-0003-1098-0076
2023 (English)In: Computational Statistics & Data Analysis, ISSN 0167-9473, E-ISSN 1872-7352, Vol. 177, article id 107583Article in journal (Refereed) Published
Abstract [en]

Nonparametric bagging clustering methods are studied and compared to identify latent structures from a sequence of dependent categorical data observed along a one-dimensional (discrete) time domain. The frequency of the observed categories is assumed to be generated by a (slowly varying) latent signal, according to latent state-specific probability distributions. The bagging clustering methods use random tessellations (partitions) of the time domain and clustering of the category frequencies of the observed data in the tessellation cells to recover the latent signal, within a bagging framework. New and existing ways of generating the tessellations and clustering are discussed and combined into different bagging clustering methods. Edge tessellations and adaptive tessellations are the new proposed ways of forming partitions. Composite methods are also introduced, that are using (automated) decision rules based on entropy measures to choose among the proposed bagging clustering methods. The performance of all the methods is compared in a simulation study. From the simulation study it can be concluded that local and global entropy measures are powerful tools in improving the recovery of the latent signal, both via the adaptive tessellation strategies (local entropy) and in designing composite methods (global entropy). The composite methods are robust and overall improve performance, in particular the composite method using adaptive (edge) tessellations.

Place, publisher, year, edition, pages
Elsevier, 2023. Vol. 177, article id 107583
Keywords [en]
Bagging methods, Categorical dependent data, Clustering, Entropy
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
URN: urn:nbn:se:umu:diva-198931DOI: 10.1016/j.csda.2022.107583ISI: 000930488900007Scopus ID: 2-s2.0-85135796679OAI: oai:DiVA.org:umu-198931DiVA, id: diva2:1696677
Funder
Swedish Research Council, 340-2013-5203Available from: 2022-09-19 Created: 2022-09-19 Last updated: 2024-08-15Bibliographically approved

Open Access in DiVA

fulltext(2302 kB)210 downloads
File information
File name FULLTEXT01.pdfFile size 2302 kBChecksum SHA-512
09350dcc74db5a48b5ec6c184e4e9681efa069fb2a58d4ab5e16989c9814d5c1dd2ab23246c75f1be3bb083aa542c7444f6d4048fbd11da8bb6e9095fe2c0883
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Abramowicz, KonradSjöstedt de Luna, SaraStrandberg, Johan

Search in DiVA

By author/editor
Abramowicz, KonradSjöstedt de Luna, SaraStrandberg, Johan
By organisation
Department of Mathematics and Mathematical StatisticsStatistics
In the same journal
Computational Statistics & Data Analysis
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 210 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 677 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf