umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multivariate processing and modelling of hyphenated metabolite data
Umeå University, Faculty of Science and Technology, Chemistry.
2005 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

One trend in the ‘omics’ sciences is the generation of increasing amounts of data, describing complex biological samples. To cope with this and facilitate progress towards reliable diagnostic tools, it is crucial to develop methods for extracting representative and predictive information. In global metabolite analysis (metabolomics and metabonomics) NMR, GC/MS and LC/MS are the main platforms for data generation. Multivariate projection methods (e.g. PCA, PLS and O-PLS) have been recognized as efficient tools for data analysis within subjects such as biology and chemistry due to their ability to provide interpretable models based on many, correlated variables. In global metabolite analysis, these methods have been successfully applied in areas such as toxicology, disease diagnosis and plant functional genomics.

This thesis describes the development of processing methods for the unbiased extraction of representative and predictive information from metabolic GC/MS and LC/MS data characterizing biofluids, e.g. plant extracts, urine and blood plasma. In order to allow the multivariate projections to detect and highlight differences between samples, one requirement of the processing methods is that they must extract a common set of descriptors from all samples and still retain the metabolically relevant information in the data. In Papers I and II this was done by applying a hierarchical multivariate compression approach to both GC/MS and LC/MS data. In the study described in Paper III a hierarchical multivariate curve resolution strategy (H-MCR) was developed for simultaneously resolving multiple GC/MS samples into pure profiles. In Paper IV the H-MCR method was applied to a drug toxicity study in rats, where the method’s potential for biomarker detection and identification was exemplified. Finally, the H-MCR method was extended, as described in Paper V, allowing independent samples to be processed and predicted using a model based on an existing set of representative samples. The fact that these processing methods proved to be valid for predicting the properties of new independent samples indicates that it is now possible for global metabolite analysis to be extended beyond isolated studies. In addition, the results facilitate high through-put analysis, because predicting the nature of samples is rapid compared to the actual processing. In summary this research highlights the possibilities for using global metabolite analysis in diagnosis.

Place, publisher, year, edition, pages
Umeå: Kemi , 2005. , 66 p.
Keyword [en]
Chemometrics, Curve Resolution, GC/MS, LC/MS, Metabolomics, Metabonomics, Multivariate Analysis and Multivariate Curve Resolution.
National Category
Organic Chemistry
Identifiers
URN: urn:nbn:se:umu:diva-663ISBN: 91-7305-922-7 OAI: oai:DiVA.org:umu-663DiVA: diva2:144165
Public defence
2006-01-27, KB3B1, KBC, Umeå Univeristet, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2005-12-22 Created: 2005-12-22 Last updated: 2009-12-03Bibliographically approved
List of papers
1. A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS
Open this publication in new window or tab >>A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS
Show others...
2004 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 76, no 6, 1738-1745 p.Article in journal (Refereed) Published
Abstract [en]

In metabolomics, the purpose is to identify and quantify all the metabolites in a biological system. Combined gas chromatography and mass spectrometry (GC/MS) is one of the most commonly used techniques in metabolomics together with 1H NMR, and it has been shown that more than 300 compounds can be distinguished with GC/MS after deconvolution of overlapping peaks. To avoid having to deconvolute all analyzed samples prior to multivariate analysis of the data, we have developed a strategy for rapid comparison of nonprocessed MS data files. The method includes baseline correction, alignment, time window determinations, alternating regression, PLS-DA, and identification of retention time windows in the chromatograms that explain the differences between the samples. Use of alternating regression also gives interpretable loadings, which retain the information provided by m/z values that vary between the samples in each retention time window. The method has been applied to plant extracts derived from leaves of different developmental stages and plants subjected to small changes in day length. The data show that the new method can detect differences between the samples and that it gives results comparable to those obtained when deconvolution is applied prior to the multivariate analysis. We suggest that this method can be used for rapid comparison of large sets of GC/MS data, thereby applying time-consuming deconvolution only to parts of the chromatograms that contribute to explain the differences between the samples.

Place, publisher, year, edition, pages
Columbus, OH: American Chemical Society, 2004
National Category
Chemical Sciences
Identifiers
urn:nbn:se:umu:diva-4888 (URN)10.1021/ac0352427 (DOI)
Available from: 2005-12-22 Created: 2005-12-22 Last updated: 2017-12-14
2. Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets
Open this publication in new window or tab >>Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets
Show others...
2005 (English)In: The Analyst, ISSN 0003-2654, E-ISSN 1364-5528, Vol. 130, no 5, 701-707 p.Article in journal (Other (popular science, discussion, etc.)) Published
Abstract [en]

LC/MS is an analytical technique that, due to its high sensitivity, has become increasingly popular for the generation of metabolic signatures in biological samples and for the building of metabolic data bases. However, to be able to create robust and interpretable ( transparent) multivariate models for the comparison of many samples, the data must fulfil certain specific criteria: (i) that each sample is characterized by the same number of variables, (ii) that each of these variables is represented across all observations, and (iii) that a variable in one sample has the same biological meaning or represents the same metabolite in all other samples. In addition, the obtained models must have the ability to make predictions of, e. g. related and independent samples characterized accordingly to the model samples. This method involves the construction of a representative data set, including automatic peak detection, alignment, setting of retention time windows, summing in the chromatographic dimension and data compression by means of alternating regression, where the relevant metabolic variation is retained for further modelling using multivariate analysis. This approach has the advantage of allowing the comparison of large numbers of samples based on their LC/MS metabolic profiles, but also of creating a means for the interpretation of the investigated biological system. This includes finding relevant systematic patterns among samples, identifying influential variables, verifying the findings in the raw data, and finally using the models for predictions. The presented strategy was here applied to a population study using urine samples from two cohorts, Shanxi (People's Republic of China) and Honolulu ( USA). The results showed that the evaluation of the extracted information data using partial least square discriminant analysis (PLS-DA) provided a robust, predictive and transparent model for the metabolic differences between the two populations. The presented findings suggest that this is a general approach for data handling, analysis, and evaluation of large metabolic LC/MS data sets.

Keyword
MULTIVARIATE STATISTICAL-ANALYSIS, MASS-SPECTROMETRY, NMR-SPECTROSCOPY, SYSTEMS BIOLOGY, RAT URINE, METABONOMICS, PLS, COMPONENTS, REGRESSION, DIAGNOSIS
National Category
Biological Sciences
Identifiers
urn:nbn:se:umu:diva-13599 (URN)10.1039/b501890k (DOI)15852140 (PubMedID)
Available from: 2007-05-11 Created: 2007-05-11 Last updated: 2017-12-14
3. High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses
Open this publication in new window or tab >>High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses
Show others...
2005 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 77, no 17, 5635-5642 p.Article in journal (Refereed) Published
Abstract [en]

In metabolomics, the objective is to identify differences in metabolite profiles between samples. A widely used tool in metabolomics investigations is gas chromatography-mass spectrometry (GC/MS). More than 400 compounds can be detected in a single analysis, if overlapping GC/ MS peaks are deconvoluted. However, the deconvolution process is time-consuming and difficult to automate, and additional processing is needed in order to compare samples. Therefore, there is a need to improve and automate the data processing strategy for data generated in GC/MS-based metabolomics; if not, the processing step will be a major bottleneck for high-throughput analyses. Here we describe a new semiautomated strategy using a hierarchical multivariate curve resolution approach that processes all samples simultaneously. The presented strategy generates (after appropriate treatment, e.g., multivariate analysis) tables of all the detected metabolites that differ in relative concentrations between samples. The processing of 70 samples took similar time to that of the GC/TOFMS analyses of the samples. The strategy has been validated using two different sets of samples: a complex mixture of standard compounds and Arabidopsis samples.

KeyWords Plus: CHROMATOGRAPHY MASS-SPECTROMETRY; PRINCIPAL COMPONENT ANALYSIS; SYSTEMS BIOLOGY; ARABIDOPSIS-THALIANA; CHEMOMETRIC ANALYSIS; 2-WAY DATA; MS; REGRESSION; RESOLUTION; ALIGNMENT

National Category
Biological Sciences
Identifiers
urn:nbn:se:umu:diva-13582 (URN)10.1021/ac050601e (DOI)16131076 (PubMedID)
Available from: 2007-09-14 Created: 2007-09-14 Last updated: 2017-12-14
4. Modeling of time dependent toxicological responses in urinary GC/MS data
Open this publication in new window or tab >>Modeling of time dependent toxicological responses in urinary GC/MS data
Show others...
(English)Article in journal (Refereed) Submitted
National Category
Biological Sciences
Identifiers
urn:nbn:se:umu:diva-4891 (URN)
Available from: 2005-12-22 Created: 2005-12-22 Last updated: 2013-03-19
5. Predictive metabolite profiling applying hierarchical multivariate curve resolution to GC-MS data: a potential tool for multi-parametric diagnosis
Open this publication in new window or tab >>Predictive metabolite profiling applying hierarchical multivariate curve resolution to GC-MS data: a potential tool for multi-parametric diagnosis
Show others...
2006 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 5, no 6, 1407-1414 p.Article in journal (Refereed) Published
Abstract [en]

A method for predictive metabolite profiling based on resolution of GC-MS data followed by multivariate data analysis is presented and applied to three different biofluid data sets (rat urine, aspen leaf extracts, and human blood plasma). Hierarchical multivariate curve resolution (H-MCR) was used to simultaneously resolve the GC-MS data into pure profiles, describing the relative metabolite concentrations between samples, for multivariate analysis. Here, we present an extension of the H-MCR method allowing treatment of independent samples according to processing parameters estimated from a set of training samples. Predictions or inclusion of the new samples, based on their metabolite profiles, into an existing model could then be carried out, which is a requirement for a working application within, e.g., clinical diagnosis. Apart from allowing treatment and prediction of independent samples the proposed method also reduces the time for the curve resolution process since only a subset of representative samples have to be processed while the remaining samples can be treated according to the obtained processing parameters. The time required for resolving the 30 training samples in the rat urine example was approximately 13 h, while the treatment of the 30 test samples according to the training parameters required only approximately 30 s per sample (approximately 15 min in total). In addition, the presented results show that the suggested approach works for describing metabolic changes in different biofluids, indicating that this is a general approach for high-throughput predictive metabolite profiling, which could have important applications in areas such as plant functional genomics, drug toxicity, treatment efficacy and early disease diagnosis.

Place, publisher, year, edition, pages
American Chemical Society, 2006
Keyword
Animals, Blood Proteins/*analysis, Data Interpretation; Statistical, Gas Chromatography-Mass Spectrometry, Humans, Laboratory Techniques and Procedures, Male, Multivariate Analysis, Plant Leaves/*chemistry, Proteome/*analysis, Rats, Urine/chemistry
National Category
Chemical Sciences
Identifiers
urn:nbn:se:umu:diva-11772 (URN)10.1021/pr0600071 (DOI)16739992 (PubMedID)
Available from: 2007-12-06 Created: 2007-12-06 Last updated: 2017-12-14

Open Access in DiVA

fulltext(2393 kB)1864 downloads
File information
File name FULLTEXT01.pdfFile size 2393 kBChecksum MD5
fef3cd0160a1f1f4ddd02a114bec43243999e9beaceb922321e5c8c3ed4c00b59f7fdb32
Type fulltextMimetype application/pdf

By organisation
Chemistry
Organic Chemistry

Search outside of DiVA

GoogleGoogle Scholar
Total: 1864 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1954 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf