Umeå University's logo

umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Multivariate integration and visualization of multiblock data in chemical and biological applications
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.ORCID-id: 0000-0001-8445-0559
2019 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)Alternativ tittel
Multivariat integration och visualisering av multiblockdata i kemiska och biologiska applikationer (svensk)
Abstract [en]

Thanks to improvements in technology more data than ever before is generated in almost all fields of science and industry.

The data is analyzed to hopefully provide valuable information and knowledge about a product or process, such as how to improve the quality of a manufactured product.

Analysis of collected data is often performed on a single dataset or data source at a time. In this thesis, I have focused on multiblock analysis, a concept that includes multiple sources or data blocks.  Analogous to how the human senses combine to let us experience the world around us, multiblock analysis integrates multiple data sources, providing a fuller examination of the product or process under study.

My thesis introduces Joint and Unique Multiblock Analysis, JUMBA, a complete analysis workflow for data integration. I describe each step of JUMBA, including data pre-treatment, model building and validation as well as model interpretation. Special focus is put on several newly developed visualizations for model validation and interpretation to make it as easy as possible to draw conclusions from the analysis.

 

By reading my thesis, the reader will gain a working understanding of the process of performing multiblock analysis, including solutions to common problems that are often encountered.

Abstract [sv]

Tack vare tekniska framsprång genereras det idag stora mängder data inom forskning och industri. Genom att analysera sådan data kan det i slutändan leda till att värdefull kunskap om en produkt eller process erhålls och kvaliteten på de studerade produkterna därmed kan ökas.

Analysen av data sker ofta på en enda datakälla, som då representeras av en matris, även kallat ett datablock. I denna avhandling har jag istället fokuserat på koncept som kan analysera flera datakällor samtidigt och integrera dessa. I likhet med hur människans sinnen låter oss uppleva världen runt omkring medför integrerandet av flera datakällor att undersökningen av en produkt eller process blir mer omfattande.

I min avhandling introduceras arbetsflödet JUMBA (Joint and Unique Multiblock Analysis, eng), som är ämnat för att utföra en fullständig integration av data. Jag beskriver varje enskilt steg av JUMBA, allt från förbehandling av data till byggande och validering av modeller samt deras tolkning. Jag har lagt särskild vikt vid att beskriva flera nyskapade typer av visualiseringar som underlättar att korrekta slutsatser kan dras från analysen.

Jag hoppas att läsaren av min avhandling kommer få förståelse för hur man utför analys av flera datablock och denne hittar även lösningar på problem man normalt sett kan ställas inför vid genomförandet.

sted, utgiver, år, opplag, sider
Umeå: Umeå universitet , 2019. , s. 62
Emneord [en]
Multivariate analysis, PCA, PLS, OnPLS, JUMBA, Multiblock, calibration transfer
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-158330ISBN: 978-91-7855-069-2 (tryckt)OAI: oai:DiVA.org:umu-158330DiVA, id: diva2:1306825
Disputas
2019-05-17, KB.E3.03, KBC - building, Linnaeus väg 6, 90736 Umeå, Umeå, 10:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
eSSENCE - An eScience CollaborationTilgjengelig fra: 2019-04-26 Laget: 2019-04-25 Sist oppdatert: 2019-04-30bibliografisk kontrollert
Delarbeid
1. Visualization of descriptive multiblock analysis
Åpne denne publikasjonen i ny fane eller vindu >>Visualization of descriptive multiblock analysis
Vise andre…
2020 (engelsk)Inngår i: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 34, nr 1, artikkel-id e3071Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Understanding and making the most of complex data collected from multiple sources is a challenging task. Data integration is the procedure of describing the main features in multiple data blocks, and several methods for multiblock analysis have been previously developed, including OnPLS and JIVE. One of the main challenges is how to visualize and interpret the results of multiblock analyses because of the increased model complexity and sheer size of data. In this paper, we present novel visualization tools that simplify interpretation and overview of multiblock analysis. We introduce a correlation matrix plot that provides an overview of the relationships between blocks found by multiblock models. We also present a multiblock scatter plot, a metadata correlation plot, and a variation distribution plot, that simplify the interpretation of multiblock models. We demonstrate our visualizations on an industrial case study in vibration spectroscopy (NIR, UV, and Raman datasets) as well as a multiomics integration study (transcript, metabolite, and protein datasets). We conclude that our visualizations provide useful tools to harness the complexity of multiblock analysis and enable better understanding of the investigated system.

sted, utgiver, år, opplag, sider
John Wiley & Sons, 2020
Emneord
data fusion, descriptive analytics, multiblock analysis, OnPLS, visualization
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-152512 (URN)10.1002/cem.3071 (DOI)000509318600006 ()2-s2.0-85051048496 (Scopus ID)
Forskningsfinansiär
eSSENCE - An eScience CollaborationSwedish Research Council, 2016‐04376
Tilgjengelig fra: 2018-10-09 Laget: 2018-10-09 Sist oppdatert: 2020-03-12bibliografisk kontrollert
2. Joint and unique multiblock analysis for integration and calibration transfer of NIR instruments
Åpne denne publikasjonen i ny fane eller vindu >>Joint and unique multiblock analysis for integration and calibration transfer of NIR instruments
Vise andre…
2019 (engelsk)Inngår i: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 91, nr 5, s. 3516-3524Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

In the present paper, we introduce an end-to-end workflow called joint and unique multiblock analysis (JUMBA), which allows multiple sources of data to be analyzed simultaneously to better understand how they complement each other. In near-infrared (NIR) spectroscopy, calibration models between NIR spectra and responses are used to replace wet-chemistry methods, and the models tend to be instrument-specific. Calibration-transfer techniques are used for standardization of NIR-instrumentation, enabling the use of one model on several instruments. The current paper investigates both the similarities and differences among a variety of NIR instruments using JUMBA. We demonstrate JUMBA on both a previously unpublished data set in which five NIR instruments measured mushroom substrate and a publicly available data set measured on corn samples. We found that NIR spectra from different instrumentation largely shared the same underlying structures, an insight we took advantage of to perform calibration transfer. The proposed JUMBA transfer displayed excellent calibration-transfer performance across the two analyzed data sets and outperformed existing methods in terms of both prediction accuracy and stability. When applied to a multi-instrument environment, JUMBA transfer can integrate all instruments in the same model and will ensure higher consistency among them compared with existing calibration-transfer methods.

sted, utgiver, år, opplag, sider
Washington: American Chemical Society (ACS), 2019
Emneord
near-infrared spectroscopy, spent mushroom compost, multivariate calibration, water-content, standardization, regression, vegetation, models, ONPLS
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-156707 (URN)10.1021/acs.analchem.8b05188 (DOI)000460709200047 ()30758178 (PubMedID)2-s2.0-85062418105 (Scopus ID)
Prosjekter
Bio4Energy
Forskningsfinansiär
Bio4Energy
Tilgjengelig fra: 2019-02-25 Laget: 2019-02-25 Sist oppdatert: 2020-07-01bibliografisk kontrollert
3. Multi-Tissue Metabolomics Integration Utilising Hierarchical Modelling and Data Integration Methods
Åpne denne publikasjonen i ny fane eller vindu >>Multi-Tissue Metabolomics Integration Utilising Hierarchical Modelling and Data Integration Methods
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-158329 (URN)
Tilgjengelig fra: 2019-04-24 Laget: 2019-04-24 Sist oppdatert: 2019-04-25
4. Joint and unique multiblock analysis of biological data: multiomics malaria study
Åpne denne publikasjonen i ny fane eller vindu >>Joint and unique multiblock analysis of biological data: multiomics malaria study
Vise andre…
2019 (engelsk)Inngår i: Faraday discussions, ISSN 1359-6640, E-ISSN 1364-5498, Vol. 218, s. 268-283Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Modern profiling technologies enable obtaining large amounts of data which can be later used for comprehensive understanding of the studied system. Proper evaluation of such data is challenging, and cannot be faced by bare analysis of separate datasets. Integrated approaches are necessary, because only data integration allows finding correlation trends common for all studied data sets and revealing hidden structures not known a priori. This improves understanding and interpretation of the complex systems. Joint and Unique MultiBlock Analysis (JUMBA) is an analysis method based on the OnPLS-algorithm that decomposes a set of matrices into joint parts containing variation shared with other connected matrices and variation that is unique for each single matrix. Mapping unique variation is important from a data integration perspective, since it certainly cannot be expected that all variation co-varies. In this work we used JUMBA for integrated analysis of lipidomic, metabolomic and oxylipin datasets obtained from profiling of plasma samples from children infected with P. falciparum malaria. P. falciparum is one of the primary contributors to childhood mortality and obstetric complications in the developing world, what makes development of the new diagnostic and prognostic tools, as well as better understanding of the disease, of utmost importance. In presented work JUMBA made it possible to detect already known trends related to disease progression, but also to discover new structures in the data connected to food intake and personal differences in metabolism. By separating the variation in each data set into joint and unique, JUMBA reduced complexity of the analysis, facilitated detection of samples and variables corresponding to specific structures across multiple datasets and by doing this enabled fast interpretation of the studied system. All this makes JUMBA a perfect choice for multiblock analysis of systems biology data.

sted, utgiver, år, opplag, sider
Cambridge: Royal Society of Chemistry, 2019
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-156705 (URN)10.1039/C8FD00243F (DOI)000481497900014 ()2-s2.0-85071086614 (Scopus ID)
Konferanse
Conference on Challenges in Analysis of Complex Natural Mixtures, Univ Edinburgh, Edinburgh, MAY 13-15, 2019
Tilgjengelig fra: 2019-02-25 Laget: 2019-02-25 Sist oppdatert: 2023-03-24bibliografisk kontrollert

Open Access i DiVA

spikblad(124 kB)121 nedlastinger
Filinformasjon
Fil SPIKBLAD01.pdfFilstørrelse 124 kBChecksum SHA-512
496f30d0b7fb3d542de4093ff0ee8b08638302ce0a5a3bd212de5e3467d2498fd47a62f2c2fd1cc5ce8e3a653ed880ea807ffb40cc13194d4836c250ee239e9d
Type spikbladMimetype application/pdf
fulltext(1228 kB)857 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 1228 kBChecksum SHA-512
ec77056e1aa15eca314f08837df4f1b63db3b89db375ef97d571e093aee17dbf9174acebec11db1dc7a54da2740936e2c8ef20a33c497ea368554c225ad91bee
Type fulltextMimetype application/pdf

Person

Skotare, Tomas

Søk i DiVA

Av forfatter/redaktør
Skotare, Tomas
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 870 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 5351 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf