Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
OPLS-based multiclass classification and data-driven interclass relationship discovery
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.ORCID-id: 0000-0002-1898-4453
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.ORCID-id: 0000-0001-9347-5790
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.ORCID-id: 0000-0003-3799-6094
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.ORCID-id: 0000-0001-8357-5018
2025 (Engelska)Ingår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 65, nr 4, s. 1762-1770Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Multiclass data sets and large-scale studies are increasingly common in omics sciences, drug discovery, and clinical research due to advancements in analytical platforms. Efficiently handling these data sets and discerning subtle differences across multiple classes remains a significant challenge. In metabolomics, two-class orthogonal projection to latent structures discriminant analysis (OPLS-DA) models are widely used due to their strong discrimination capabilities and ability to provide interpretable information on class differences. However, these models face challenges in multiclass settings. A common solution is to transform the multiclass comparison into multiple two-class comparisons, which, while more effective than a global multiclass OPLS-DA model, unfortunately results in a manual, time-consuming model-building process with complicated interpretation. Here, we introduce an extension of OPLS-DA for data-driven multiclass classification: orthogonal partial least squares-hierarchical discriminant analysis (OPLS-HDA). OPLS-HDA integrates hierarchical cluster analysis (HCA) with the OPLS-DA framework to create a decision tree, addressing multiclass classification challenges and providing intuitive visualization of interclass relationships. To avoid overfitting and ensure reliable predictions, we use cross-validation during model building. Benchmark results show that OPLS-HDA performs competitively across diverse data sets compared to eight established methods. This method represents a significant advancement, offering a powerful tool to dissect complex multiclass data sets. With its versatility, interpretability, and ease of use, OPLS-HDA is an efficient approach to multiclass data analysis applicable across various fields.

Ort, förlag, år, upplaga, sidor
American Chemical Society (ACS), 2025. Vol. 65, nr 4, s. 1762-1770
Nyckelord [en]
Cluster Analysis, Discriminant Analysis, Humans, Least-Squares Analysis, Metabolomics
Nationell ämneskategori
Bioinformatik och beräkningsbiologi
Identifikatorer
URN: urn:nbn:se:umu:diva-236203DOI: 10.1021/acs.jcim.4c01799ISI: 001412188800001PubMedID: 39899705Scopus ID: 2-s2.0-85216849215OAI: oai:DiVA.org:umu-236203DiVA, id: diva2:1944242
Tillgänglig från: 2025-03-13 Skapad: 2025-03-13 Senast uppdaterad: 2025-03-19Bibliografiskt granskad
Ingår i avhandling
1. Chemometric strategies for supervised multi-model analysis
Öppna denna publikation i ny flik eller fönster >>Chemometric strategies for supervised multi-model analysis
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Kemometriska strategier för guidad multi-modellanalys
Abstract [en]

Understanding biological processes is inherently complex. The cellular machinery andbiochemical pathways present significant challenges in scientific research. Advances indata collection, such as high-content imaging and omics technologies, have enableddeeper insights, but extracting meaningful conclusions from these complicateddatasets remains a challenge. In this thesis, the focus has been on developingchemometric strategies and supervised modelling approaches to improve datainterpretation, aiming to aid scientists in drawing conclusions from their data.In Paper I, we show that cell imaging data, combined with chemometric tools, caneffectively characterize treatment effects, leading to the development of a metric calledEquivalence (Eq.) scores. This work raised two main questions: Are fluorescent labelsnecessary for meaningful characterization? Can living cells, imaged over time, providedeeper insights? In Paper III, we address these questions by investigating anapproach based on label-free live-cell imaging data where we extended the Eq. scoresto time series data. We demonstrate that time-dependent analysis reveals both earlyand late cellular responses and improves the prediction of drug mechanisms.In Paper II, we address challenges arising when Orthogonal Projections to LatentStructures-Discriminant Analysis (OPLS-DA) models are used to analyse severalclasses, such as subtypes of diseases or different treatments. We introduce OPLSHierarchicalDiscriminant Analysis (OPLS-HDA), a method that integrateshierarchical clustering analysis (HCA) with two-class OPLS-DA models to create anOPLS-based decision tree. We demonstrated that OPLS-HDA is a strong classifiercompared to eight other established methods while maintaining interpretability.Additionally, we provide Python scripts that are integrated with SIMCA®, offering auser-friendly interface for broader accessibility.Extracting reliable insights from complex data requires intentional and structuredapproaches. This work highlights the benefits of modular and interpretable modellingsolutions, ensuring that results are both understandable and trustworthy. By breakingdown complex analytical challenges and building tools that enhance interpretability,this work contributes to the broader goal of accelerating data-driven discoveries in lifesciences.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2025. s. 58
Nyckelord
Label-free live-cell imaging, Morphological profiling, Multi-class classification
Nationell ämneskategori
Farmakologi och toxikologi Bioinformatik och beräkningsbiologi
Identifikatorer
urn:nbn:se:umu:diva-236640 (URN)978-91-8070-642-1 (ISBN)978-91-8070-643-8 (ISBN)
Disputation
2025-04-16, Stora Hörsalen (KBE303), KBC-huset, Linnaeus väg 6, Umeå, 09:00 (Engelska)
Opponent
Handledare
Forskningsfinansiär
eSSENCE - An eScience Collaboration
Tillgänglig från: 2025-03-26 Skapad: 2025-03-19 Senast uppdaterad: 2025-03-21Bibliografiskt granskad

Open Access i DiVA

fulltext(3563 kB)194 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 3563 kBChecksumma SHA-512
3b3d98625a1866c0c970870009b7d6490782210bcc8c3e6bd7bbda0e8761b9bc54f527b326dd32530c844ec59c99e80eaca5baa79c4c5a7d0c6c638340c46bbb
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Forsgren, EdvinBjörkblom, BennyTrygg, JohanJonsson, Pär

Sök vidare i DiVA

Av författaren/redaktören
Forsgren, EdvinBjörkblom, BennyTrygg, JohanJonsson, Pär
Av organisationen
Kemiska institutionen
I samma tidskrift
Journal of Chemical Information and Modeling
Bioinformatik och beräkningsbiologi

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 194 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 768 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf