Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Comparison of Methods for Feature Selection in Clustering of High-Dimensional RNA-Sequencing Data to Identify Cancer Subtypes
Umeå universitet, Samhällsvetenskapliga fakulteten, Handelshögskolan vid Umeå universitet. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.ORCID-id: 0000-0003-2386-930x
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik. Umeå universitet, Medicinska fakulteten, Institutionen för strålningsvetenskaper, Onkologi.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
2021 (Engelska)Ingår i: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 12, artikel-id 632620Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Cancer subtype identification is important to facilitate cancer diagnosis and select effective treatments. Clustering of cancer patients based on high-dimensional RNA-sequencing data can be used to detect novel subtypes, but only a subset of the features (e.g., genes) contains information related to the cancer subtype. Therefore, it is reasonable to assume that the clustering should be based on a set of carefully selected features rather than all features. Several feature selection methods have been proposed, but how and when to use these methods are still poorly understood. Thirteen feature selection methods were evaluated on four human cancer data sets, all with known subtypes (gold standards), which were only used for evaluation. The methods were characterized by considering mean expression and standard deviation (SD) of the selected genes, the overlap with other methods and their clustering performance, obtained comparing the clustering result with the gold standard using the adjusted Rand index (ARI). The results were compared to a supervised approach as a positive control and two negative controls in which either a random selection of genes or all genes were included. For all data sets, the best feature selection approach outperformed the negative control and for two data sets the gain was substantial with ARI increasing from (−0.01, 0.39) to (0.66, 0.72), respectively. No feature selection method completely outperformed the others but using the dip-rest statistic to select 1000 genes was overall a good choice. The commonly used approach, where genes with the highest SDs are selected, did not perform well in our study.

Ort, förlag, år, upplaga, sidor
Frontiers Media S.A. , 2021. Vol. 12, artikel-id 632620
Nyckelord [en]
cancer subtypes, feature selection, gene selection, high-dimensional, RNA-seq
Nationell ämneskategori
Sannolikhetsteori och statistik Bioinformatik och systembiologi
Identifikatorer
URN: urn:nbn:se:umu:diva-181727DOI: 10.3389/fgene.2021.632620ISI: 000626903100001Scopus ID: 2-s2.0-85102373666OAI: oai:DiVA.org:umu-181727DiVA, id: diva2:1539593
Tillgänglig från: 2021-03-24 Skapad: 2021-03-24 Senast uppdaterad: 2023-09-05Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Källberg, DavidVidman, LindaRydén, Patrik

Sök vidare i DiVA

Av författaren/redaktören
Källberg, DavidVidman, LindaRydén, Patrik
Av organisationen
Handelshögskolan vid Umeå universitetInstitutionen för matematik och matematisk statistikOnkologi
I samma tidskrift
Frontiers in Genetics
Sannolikhetsteori och statistikBioinformatik och systembiologi

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 353 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf