Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Comparison of methods for variable selection in clustering of high-dimensional RNA-sequencing data to identify cancer subtypes
Umeå universitet, Samhällsvetenskapliga fakulteten, Handelshögskolan vid Umeå universitet, Statistik.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nyckelord [en]
feature selection, clustering, RNA-seq, cancer
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
URN: urn:nbn:se:umu:diva-167264OAI: oai:DiVA.org:umu-167264DiVA, id: diva2:1385477
Tillgänglig från: 2020-01-14 Skapad: 2020-01-14 Senast uppdaterad: 2020-01-16Bibliografiskt granskad
Ingår i avhandling
1. cancer subtype identification using cluster analysis on high-dimensional omics data
Öppna denna publikation i ny flik eller fönster >>cancer subtype identification using cluster analysis on high-dimensional omics data
2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Identification and prediction of cancer subtypes are important parts in the development towards personalized medicine. By tailoring treatments, it is possible to decrease unnecessary suffering and reduce costs. Since the introduction of next generation sequencing techniques, the amount of data available for medical research has increased rapidly. The high dimensional omics data produced by various techniques requires statistical methods to transform data into information and knowledge.

All papers in this thesis are related to distinguishing of disease subtypes in patients with cancer using omics data. The high dimension and the complexity of sequencing data from tumor samples makes it necessary to pre—process the data.  We carry out comparisons of feature selection methods and clustering methods used for identification of cancer subtypes. In addition, we evaluate the effect that certain characteristics of the data have on the ability to identify cancer subtypes. The results show that no method outperforms the others in all cases and the relative ranking of methods is very dependent on the data. We also show that the benefit of receiving a more homogeneous data by analyzing genders separately can outweigh the possible drawbacks caused by smaller sample sizes. One of the major challenges when dealing with omics data from tumor samples is that the patients are generally a very heterogeneous group. Factors that lead to heterogeneity include age, gender, ethnicity and stage of disease. How big the effect size is for each of these factors might affect the ability to identify the subgroups of interest.

In omics data, the feature space is often large and how many of the features that are informative for the factors of interest will also affect the complexity of the problem. We present a novel clustering approach that can identify different clusters in different subsets of the feature space, which is applied on methylation data to create new potential biomarkers. It is shown that by combining clinical data with methylation data for patients with clear cell renal carcinoma, it is possible to improve the currently used prediction model for disease progression.  

Using unsupervised clustering techniques, we identify three molecular subtypes of prostate cancer bone metastases based on gene expression profiles. The robustness of the identified subtypes is confirmed by applying several clustering algorithms with very similar results.

 

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2020. s. 22
Serie
Research report in mathematical statistics, ISSN 1653-0829 ; 70/20
Nyckelord
cluster analysis, cancer, classification
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
urn:nbn:se:umu:diva-167275 (URN)978-91-7855-172-9 (ISBN)978-91-7855-173-6 (ISBN)
Disputation
2020-02-07, N460, Naturvetarhuset, Umeå, 09:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2020-01-17 Skapad: 2020-01-14 Senast uppdaterad: 2021-10-19Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Person

Källberg, DavidVidman, LindaRydén, Patrik

Sök vidare i DiVA

Av författaren/redaktören
Källberg, DavidVidman, LindaRydén, Patrik
Av organisationen
StatistikInstitutionen för matematik och matematisk statistik
Sannolikhetsteori och statistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 363 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf