Umeå University's logo

umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
cancer subtype identification using cluster analysis on high-dimensional omics data
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för matematik och matematisk statistik.
2020 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Identification and prediction of cancer subtypes are important parts in the development towards personalized medicine. By tailoring treatments, it is possible to decrease unnecessary suffering and reduce costs. Since the introduction of next generation sequencing techniques, the amount of data available for medical research has increased rapidly. The high dimensional omics data produced by various techniques requires statistical methods to transform data into information and knowledge.

All papers in this thesis are related to distinguishing of disease subtypes in patients with cancer using omics data. The high dimension and the complexity of sequencing data from tumor samples makes it necessary to pre—process the data.  We carry out comparisons of feature selection methods and clustering methods used for identification of cancer subtypes. In addition, we evaluate the effect that certain characteristics of the data have on the ability to identify cancer subtypes. The results show that no method outperforms the others in all cases and the relative ranking of methods is very dependent on the data. We also show that the benefit of receiving a more homogeneous data by analyzing genders separately can outweigh the possible drawbacks caused by smaller sample sizes. One of the major challenges when dealing with omics data from tumor samples is that the patients are generally a very heterogeneous group. Factors that lead to heterogeneity include age, gender, ethnicity and stage of disease. How big the effect size is for each of these factors might affect the ability to identify the subgroups of interest.

In omics data, the feature space is often large and how many of the features that are informative for the factors of interest will also affect the complexity of the problem. We present a novel clustering approach that can identify different clusters in different subsets of the feature space, which is applied on methylation data to create new potential biomarkers. It is shown that by combining clinical data with methylation data for patients with clear cell renal carcinoma, it is possible to improve the currently used prediction model for disease progression.  

Using unsupervised clustering techniques, we identify three molecular subtypes of prostate cancer bone metastases based on gene expression profiles. The robustness of the identified subtypes is confirmed by applying several clustering algorithms with very similar results.

 

sted, utgiver, år, opplag, sider
Umeå: Umeå universitet , 2020. , s. 22
Serie
Research report in mathematical statistics, ISSN 1653-0829 ; 70/20
Emneord [en]
cluster analysis, cancer, classification
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-167275ISBN: 978-91-7855-172-9 (tryckt)ISBN: 978-91-7855-173-6 (digital)OAI: oai:DiVA.org:umu-167275DiVA, id: diva2:1385586
Disputas
2020-02-07, N460, Naturvetarhuset, Umeå, 09:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2020-01-17 Laget: 2020-01-14 Sist oppdatert: 2021-10-19bibliografisk kontrollert
Delarbeid
1. Cluster analysis on high dimensional RNA-seq data with applications to cancer research: An evaluation study
Åpne denne publikasjonen i ny fane eller vindu >>Cluster analysis on high dimensional RNA-seq data with applications to cancer research: An evaluation study
2019 (engelsk)Inngår i: PLOS ONE, E-ISSN 1932-6203, Vol. 14, nr 12, artikkel-id e0219102Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background: Clustering of gene expression data is widely used to identify novel subtypes of cancer. Plenty of clustering approaches have been proposed, but there is a lack of knowledge regarding their relative merits and how data characteristics influence the performance. We evaluate how cluster analysis choices affect the performance by studying four publicly available human cancer data sets: breast, brain, kidney and stomach cancer. In particular, we focus on how the sample size, distribution of subtypes and sample heterogeneity affect the performance.

Results: In general, increasing the sample size had limited effect on the clustering performance, e.g. for the breast cancer data similar performance was obtained for n = 40 as for n = 330. The relative distribution of the subtypes had a noticeable effect on the ability to identify the disease subtypes and data with disproportionate cluster sizes turned out to be difficult to cluster. Both the choice of clustering method and selection method affected the ability to identify the subtypes, but the relative performance varied between data sets, making it difficult to rank the approaches. For some data sets, the performance was substantially higher when the clustering was based on data from only one sex compared to data from a mixed population. This suggests that homogeneous data are easier to cluster than heterogeneous data and that clustering males and females individually may be beneficial and increase the chance to detect novel subtypes. It was also observed that the performance often differed substantially between females and males.

Conclusions: The number of samples seems to have a limited effect on the performance while the heterogeneity, at least with respect to sex, is important for the performance. Hence, by analyzing the genders separately, the possible loss caused by having fewer samples could be outweighed by the benefit of a more homogeneous data.

sted, utgiver, år, opplag, sider
San Francisco: Public Library of Science, 2019
Emneord
Cancer, cluster analysis
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-167274 (URN)10.1371/journal.pone.0219102 (DOI)000534009700002 ()31805048 (PubMedID)2-s2.0-85076157692 (Scopus ID)
Tilgjengelig fra: 2020-01-14 Laget: 2020-01-14 Sist oppdatert: 2025-02-05bibliografisk kontrollert
2. Comparison of methods for variable selection in clustering of high-dimensional RNA-sequencing data to identify cancer subtypes
Åpne denne publikasjonen i ny fane eller vindu >>Comparison of methods for variable selection in clustering of high-dimensional RNA-sequencing data to identify cancer subtypes
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Emneord
feature selection, clustering, RNA-seq, cancer
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-167264 (URN)
Tilgjengelig fra: 2020-01-14 Laget: 2020-01-14 Sist oppdatert: 2020-01-16bibliografisk kontrollert
3. Gene expression profiles define molecular subtypes of prostate cancer bone metastases with different outcomes and morphology traceable back to the primary tumor
Åpne denne publikasjonen i ny fane eller vindu >>Gene expression profiles define molecular subtypes of prostate cancer bone metastases with different outcomes and morphology traceable back to the primary tumor
Vise andre…
2019 (engelsk)Inngår i: Molecular Oncology, ISSN 1574-7891, E-ISSN 1878-0261, Vol. 13, nr 8, s. 1763-1777Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Bone metastasis is the lethal end-stage of prostate cancer (PC), but the biology of bone metastases is poorly understood. The overall aim of this study was therefore to explore molecular variability in PC bone metastases of potential importance for therapy. Specifically, genome-wide expression profiles of bone metastases from untreated patients (n = 12) and patients treated with androgen-deprivation therapy (ADT, n = 60) were analyzed in relation to patient outcome and to morphological characteristics in metastases and paired primary tumors. Principal component analysis and unsupervised classification were used to identify sample clusters based on mRNA profiles. Clusters were characterized by gene set enrichment analysis and related to histological and clinical parameters using univariate and multivariate statistics. Selected proteins were analyzed by immunohistochemistry in metastases and matched primary tumors (n = 52) and in transurethral resected prostate (TUR-P) tissue of a separate cohort (n = 59). Three molecular subtypes of bone metastases (MetA-C) characterized by differences in gene expression pattern, morphology, and clinical behavior were identified. MetA (71% of the cases) showed increased expression of androgen receptor-regulated genes, including prostate-specific antigen (PSA), and glandular structures indicating a luminal cell phenotype. MetB (17%) showed expression profiles related to cell cycle activity and DNA damage, and a pronounced cellular atypia. MetC (12%) exhibited enriched stroma-epithelial cell interactions. MetB patients had the lowest serum PSA levels and the poorest prognosis after ADT. Combined analysis of PSA and Ki67 immunoreactivity (proliferation) in bone metastases, paired primary tumors, and TUR-P samples was able to differentiate MetA-like (high PSA, low Ki67) from MetB-like (low PSA, high Ki67) tumors and demonstrate their different prognosis. In conclusion, bone metastases from PC patients are separated based on gene expression profiles into molecular subtypes with different morphology, biology, and clinical outcome. These findings deserve further exploration with the purpose of improving treatment of metastatic PC.

sted, utgiver, år, opplag, sider
John Wiley & Sons, 2019
Emneord
bone metastasis, gene expression, gene set enrichment analysis, morphology, survival, unsupervised cluster analysis
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-162668 (URN)10.1002/1878-0261.12526 (DOI)000478600200009 ()31162796 (PubMedID)2-s2.0-85068158741 (Scopus ID)
Tilgjengelig fra: 2019-09-05 Laget: 2019-09-05 Sist oppdatert: 2023-03-24bibliografisk kontrollert
4. Combining epigenetic and clinicopathological variables improves prognostic prediction in clear cell Renal Cell Carcinoma
Åpne denne publikasjonen i ny fane eller vindu >>Combining epigenetic and clinicopathological variables improves prognostic prediction in clear cell Renal Cell Carcinoma
Vise andre…
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Emneord
DNA methylation, cancer, cluster analysis, classification, clear cell renal cell carcinoma
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-167269 (URN)
Tilgjengelig fra: 2020-01-14 Laget: 2020-01-14 Sist oppdatert: 2020-01-31bibliografisk kontrollert

Open Access i DiVA

fulltext(724 kB)690 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 724 kBChecksum SHA-512
4ea940d91cd259f4cb3b4bb06d1c71b4bc9a62f24ed2ad66d384efcb4d0a98fa1c88150f6d4d9b28a924a5818a6a38cdaf38a3bdd6c0fbc8a1749de53b5e90d2
Type fulltextMimetype application/pdf
spikblad(310 kB)70 nedlastinger
Filinformasjon
Fil SPIKBLAD01.pdfFilstørrelse 310 kBChecksum SHA-512
c519416c9c00b81af9d66dd3e84ba4b932148bb3d467e43e30c6e912eb663aa9dca94b55d25e19269ae7d8b4341f0506d3a22dd8b852468e2d0e85878eb6119f
Type spikbladMimetype application/pdf

Person

Vidman, Linda

Søk i DiVA

Av forfatter/redaktør
Vidman, Linda
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 690 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 896 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf