umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
cancer subtype identification using cluster analysis on high-dimensional omics data
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Identification and prediction of cancer subtypes are important parts in the development towards personalized medicine. By tailoring treatments, it is possible to decrease unnecessary suffering and reduce costs. Since the introduction of next generation sequencing techniques, the amount of data available for medical research has increased rapidly. The high dimensional omics data produced by various techniques requires statistical methods to transform data into information and knowledge.

All papers in this thesis are related to distinguishing of disease subtypes in patients with cancer using omics data. The high dimension and the complexity of sequencing data from tumor samples makes it necessary to pre—process the data.  We carry out comparisons of feature selection methods and clustering methods used for identification of cancer subtypes. In addition, we evaluate the effect that certain characteristics of the data have on the ability to identify cancer subtypes. The results show that no method outperforms the others in all cases and the relative ranking of methods is very dependent on the data. We also show that the benefit of receiving a more homogeneous data by analyzing genders separately can outweigh the possible drawbacks caused by smaller sample sizes. One of the major challenges when dealing with omics data from tumor samples is that the patients are generally a very heterogeneous group. Factors that lead to heterogeneity include age, gender, ethnicity and stage of disease. How big the effect size is for each of these factors might affect the ability to identify the subgroups of interest.

In omics data, the feature space is often large and how many of the features that are informative for the factors of interest will also affect the complexity of the problem. We present a novel clustering approach that can identify different clusters in different subsets of the feature space, which is applied on methylation data to create new potential biomarkers. It is shown that by combining clinical data with methylation data for patients with clear cell renal carcinoma, it is possible to improve the currently used prediction model for disease progression.  

Using unsupervised clustering techniques, we identify three molecular subtypes of prostate cancer bone metastases based on gene expression profiles. The robustness of the identified subtypes is confirmed by applying several clustering algorithms with very similar results.

 

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet , 2020. , s. 22
Serie
Research report in mathematical statistics, ISSN 1653-0829 ; 70/20
Nyckelord [en]
cluster analysis, cancer, classification
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
URN: urn:nbn:se:umu:diva-167275ISBN: 978-91-7855-172-9 (tryckt)ISBN: 978-91-7855-173-6 (digital)OAI: oai:DiVA.org:umu-167275DiVA, id: diva2:1385586
Disputation
2020-02-07, N460, Naturvetarhuset, Umeå, 09:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2020-01-17 Skapad: 2020-01-14 Senast uppdaterad: 2020-01-15Bibliografiskt granskad
Delarbeten
1. Cluster analysis on high dimensional RNA-seq data with applications to cancer research: An evaluation study
Öppna denna publikation i ny flik eller fönster >>Cluster analysis on high dimensional RNA-seq data with applications to cancer research: An evaluation study
2019 (Engelska)Ingår i: PLoS ONE, E-ISSN 1932-6203, Vol. 14, nr 12, artikel-id e0219102Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Background: Clustering of gene expression data is widely used to identify novel subtypes of cancer. Plenty of clustering approaches have been proposed, but there is a lack of knowledge regarding their relative merits and how data characteristics influence the performance. We evaluate how cluster analysis choices affect the performance by studying four publicly available human cancer data sets: breast, brain, kidney and stomach cancer. In particular, we focus on how the sample size, distribution of subtypes and sample heterogeneity affect the performance.

Results: In general, increasing the sample size had limited effect on the clustering performance, e.g. for the breast cancer data similar performance was obtained for n = 40 as for n = 330. The relative distribution of the subtypes had a noticeable effect on the ability to identify the disease subtypes and data with disproportionate cluster sizes turned out to be difficult to cluster. Both the choice of clustering method and selection method affected the ability to identify the subtypes, but the relative performance varied between data sets, making it difficult to rank the approaches. For some data sets, the performance was substantially higher when the clustering was based on data from only one sex compared to data from a mixed population. This suggests that homogeneous data are easier to cluster than heterogeneous data and that clustering males and females individually may be beneficial and increase the chance to detect novel subtypes. It was also observed that the performance often differed substantially between females and males.

Conclusions: The number of samples seems to have a limited effect on the performance while the heterogeneity, at least with respect to sex, is important for the performance. Hence, by analyzing the genders separately, the possible loss caused by having fewer samples could be outweighed by the benefit of a more homogeneous data.

Ort, förlag, år, upplaga, sidor
San Francisco: Public Library of Science, 2019
Nyckelord
Cancer, cluster analysis
Nationell ämneskategori
Sannolikhetsteori och statistik Bioinformatik och systembiologi
Identifikatorer
urn:nbn:se:umu:diva-167274 (URN)10.1371/journal.pone.0219102 (DOI)31805048 (PubMedID)
Tillgänglig från: 2020-01-14 Skapad: 2020-01-14 Senast uppdaterad: 2020-01-15Bibliografiskt granskad
2. Comparison of methods for variable selection in clustering of high-dimensional RNA-sequencing data to identify cancer subtypes
Öppna denna publikation i ny flik eller fönster >>Comparison of methods for variable selection in clustering of high-dimensional RNA-sequencing data to identify cancer subtypes
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nyckelord
feature selection, clustering, RNA-seq, cancer
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
urn:nbn:se:umu:diva-167264 (URN)
Tillgänglig från: 2020-01-14 Skapad: 2020-01-14 Senast uppdaterad: 2020-01-16Bibliografiskt granskad
3. Gene expression profiles define molecular subtypes of prostate cancer bone metastases with different outcomes and morphology traceable back to the primary tumor
Öppna denna publikation i ny flik eller fönster >>Gene expression profiles define molecular subtypes of prostate cancer bone metastases with different outcomes and morphology traceable back to the primary tumor
Visa övriga...
2019 (Engelska)Ingår i: Molecular Oncology, ISSN 1574-7891, E-ISSN 1878-0261, Vol. 13, nr 8, s. 1763-1777Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Bone metastasis is the lethal end-stage of prostate cancer (PC), but the biology of bone metastases is poorly understood. The overall aim of this study was therefore to explore molecular variability in PC bone metastases of potential importance for therapy. Specifically, genome-wide expression profiles of bone metastases from untreated patients (n = 12) and patients treated with androgen-deprivation therapy (ADT, n = 60) were analyzed in relation to patient outcome and to morphological characteristics in metastases and paired primary tumors. Principal component analysis and unsupervised classification were used to identify sample clusters based on mRNA profiles. Clusters were characterized by gene set enrichment analysis and related to histological and clinical parameters using univariate and multivariate statistics. Selected proteins were analyzed by immunohistochemistry in metastases and matched primary tumors (n = 52) and in transurethral resected prostate (TUR-P) tissue of a separate cohort (n = 59). Three molecular subtypes of bone metastases (MetA-C) characterized by differences in gene expression pattern, morphology, and clinical behavior were identified. MetA (71% of the cases) showed increased expression of androgen receptor-regulated genes, including prostate-specific antigen (PSA), and glandular structures indicating a luminal cell phenotype. MetB (17%) showed expression profiles related to cell cycle activity and DNA damage, and a pronounced cellular atypia. MetC (12%) exhibited enriched stroma-epithelial cell interactions. MetB patients had the lowest serum PSA levels and the poorest prognosis after ADT. Combined analysis of PSA and Ki67 immunoreactivity (proliferation) in bone metastases, paired primary tumors, and TUR-P samples was able to differentiate MetA-like (high PSA, low Ki67) from MetB-like (low PSA, high Ki67) tumors and demonstrate their different prognosis. In conclusion, bone metastases from PC patients are separated based on gene expression profiles into molecular subtypes with different morphology, biology, and clinical outcome. These findings deserve further exploration with the purpose of improving treatment of metastatic PC.

Ort, förlag, år, upplaga, sidor
John Wiley & Sons, 2019
Nyckelord
bone metastasis, gene expression, gene set enrichment analysis, morphology, survival, unsupervised cluster analysis
Nationell ämneskategori
Cancer och onkologi
Identifikatorer
urn:nbn:se:umu:diva-162668 (URN)10.1002/1878-0261.12526 (DOI)000478600200009 ()31162796 (PubMedID)
Tillgänglig från: 2019-09-05 Skapad: 2019-09-05 Senast uppdaterad: 2020-01-14Bibliografiskt granskad
4. Combining epigenetic and clinicopathological variables improves prognostic prediction in clear cell Renal Cell Carcinoma
Öppna denna publikation i ny flik eller fönster >>Combining epigenetic and clinicopathological variables improves prognostic prediction in clear cell Renal Cell Carcinoma
Visa övriga...
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nyckelord
DNA methylation, cancer, cluster analysis, classification, clear cell renal cell carcinoma
Nationell ämneskategori
Cancer och onkologi Sannolikhetsteori och statistik
Identifikatorer
urn:nbn:se:umu:diva-167269 (URN)
Tillgänglig från: 2020-01-14 Skapad: 2020-01-14 Senast uppdaterad: 2020-01-31Bibliografiskt granskad

Open Access i DiVA

fulltext(724 kB)25 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 724 kBChecksumma SHA-512
4ea940d91cd259f4cb3b4bb06d1c71b4bc9a62f24ed2ad66d384efcb4d0a98fa1c88150f6d4d9b28a924a5818a6a38cdaf38a3bdd6c0fbc8a1749de53b5e90d2
Typ fulltextMimetyp application/pdf
spikblad(310 kB)7 nedladdningar
Filinformation
Filnamn SPIKBLAD01.pdfFilstorlek 310 kBChecksumma SHA-512
c519416c9c00b81af9d66dd3e84ba4b932148bb3d467e43e30c6e912eb663aa9dca94b55d25e19269ae7d8b4341f0506d3a22dd8b852468e2d0e85878eb6119f
Typ spikbladMimetyp application/pdf

Personposter BETA

Vidman, Linda

Sök vidare i DiVA

Av författaren/redaktören
Vidman, Linda
Av organisationen
Institutionen för matematik och matematisk statistik
Sannolikhetsteori och statistik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 25 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 168 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf