umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Wold, Svante
Publications (10 of 48) Show all publications
Wold, S. (2015). Chemometrics and Bruce: Some Fond Memories. In: 40 Years of Chemometrics – From Bruce Kowalski to the Future: . Paper presented at Symposium on the Birth of Chemometrics - In Honor and Memory of Bruce Kowalski, OCT 01, 2013, Milwaukee, WI (pp. 1-13). , 1199.
Open this publication in new window or tab >>Chemometrics and Bruce: Some Fond Memories
2015 (English)In: 40 Years of Chemometrics – From Bruce Kowalski to the Future, 2015, Vol. 1199, 1-13 p.Conference paper, Published paper (Refereed)
Abstract [en]

This chapter describes the transformation of a young physical organic chemist (SW, 1964), from a believer in first principles models to a middle-aged chemometrician (SW, 1974) promoting empirical and semiempirical "data driven, soft, analogy" models for the design of experiments and the analysis of the resulting data. This transformation was marked by a number of influential events, each tipping the balance towards the data driven, soft, analogy models until the point of no return in 1974. On June 10, 1974, Bruce and I together with our research groups joined forces formed the Chemometrics Society (later renamed to the International Chemometrics Society), and we took off into multidimensional space. This review of my personal scientific history, inspired and encouraged by Bruce, is illustrated by examples of method development driven by necessity to solve specific problems and leading to data driven soft models, which, at least in my own eyes, were superior to the classical first principles approaches to the same problems. Bruce and I met at numerous conferences between 1975 and 1990, but after that, Bruce and I gradually slid out of the academic world, and now Bruce has taken his final step.

Series
ACS Symposium Series, ISSN 0097-6156 ; 1199
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:umu:diva-114927 (URN)10.1021/bk-2015-1199 (DOI)000367760300001 ()978-0-8412-3097-2; 978-0-8412-3098-9 (ISBN)
Conference
Symposium on the Birth of Chemometrics - In Honor and Memory of Bruce Kowalski, OCT 01, 2013, Milwaukee, WI
Available from: 2016-05-03 Created: 2016-01-29 Last updated: 2016-05-03Bibliographically approved
Eriksson, L., Trygg, J. & Wold, S. (2014). A chemometrics toolbox based on projections and latent variables. Journal of Chemometrics, 28(5), 332-346.
Open this publication in new window or tab >>A chemometrics toolbox based on projections and latent variables
2014 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 28, no 5, 332-346 p.Article in journal (Refereed) Published
Abstract [en]

A personal view is given about the gradual development of projection methods-also called bilinear, latent variable, and more-and their use in chemometrics. We start with the principal components analysis (PCA) being the basis for more elaborate methods for more complex problems such as soft independent modeling of class analogy, partial least squares (PLS), hierarchical PCA and PLS, PLS-discriminant analysis, Orthogonal projection to latent structures (OPLS), OPLS-discriminant analysis and more. From its start around 1970, this development was strongly influenced by Bruce Kowalski and his group in Seattle, and his realization that the multidimensional data profiles emerging from spectrometers, chromatographs, and other electronic instruments, contained interesting information that was not recognized by the current one variable at a time approaches to chemical data analysis. This led to the adoption of what in statistics is called the data analytical approach, often called also the data driven approach, soft modeling, and more. This approach combined with PCA and later PLS, turned out to work very well in the analysis of chemical data. This because of the close correspondence between, on the one hand, the matrix decomposition at the heart of PCA and PLS and, on the other hand, the analogy concept on which so much of chemical theory and experimentation are based. This extends to numerical and conceptual stability and good approximation properties of these models. The development is informally summarized and described and illustrated by a few examples and anecdotes.

Place, publisher, year, edition, pages
John Wiley & Sons, 2014
Keyword
Chemometrics, Latent variables, OPLS, PLS, Projection methods
National Category
Chemical Sciences
Identifiers
urn:nbn:se:umu:diva-86943 (URN)10.1002/cem.2581 (DOI)000335520900003 ()
Available from: 2014-03-13 Created: 2014-03-13 Last updated: 2017-12-05Bibliographically approved
Eriksson, L. & Wold, S. (2010). A graphical index of separation (GIOS) in multivariate modeling. Journal of Chemometrics, 24(11-12), 779-789.
Open this publication in new window or tab >>A graphical index of separation (GIOS) in multivariate modeling
2010 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 24, no 11-12, 779-789 p.Article in journal (Refereed) Published
Abstract [en]

We introduce a new measure for the importance of predictor variables, X, for the separation of two groups (classes) of observations. The measure is a Graphical Index of Separation (GIOS), and is, for each predictor, determined from the distribution of all possible pairs of observations with one from each group. GIOS is quantitative, intuitively simple and easy to interpret. The GIOS is straightforward to visualize in bivariate plots, and line or bar plots for larger number of variables. The approach applies both to discriminant analyses such as LDA, SIMCA, PLS-DA, OPLS-DA and to quantitative modeling such as MLR, PLS and OPLS. In the latter case, the observations are first divided into two groups based on their response values, Y. The GIOS approach is illustrated by PLS-DA/OPLS-DA and SIMCA-classification of a number of multivariate data sets with few and many variables relative to the number of observations.

Place, publisher, year, edition, pages
John Wiley & Sons, 2010
Keyword
variable ranking, PLS-DA, LDA, PLS, visualization
National Category
Environmental Sciences Ecology
Identifiers
urn:nbn:se:umu:diva-109571 (URN)10.1002/cem.1372 (DOI)000286291500017 ()
Available from: 2015-09-30 Created: 2015-09-30 Last updated: 2017-12-01Bibliographically approved
Wold, S., Kettaneh-Wold, N., MacGregor, J. F. & Dunn, K. G. (2009). Batch Process Modeling and MSPC. In: Steven Brown, Romà Tauler, Beata Walczak (Ed.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, VOLS 1-4 (pp. A163-A197). AMSTERDAM: Elsevier.
Open this publication in new window or tab >>Batch Process Modeling and MSPC
2009 (English)In: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, VOLS 1-4 / [ed] Steven Brown, Romà Tauler, Beata Walczak, AMSTERDAM: Elsevier, 2009, A163-A197 p.Chapter in book (Other academic)
Place, publisher, year, edition, pages
AMSTERDAM: Elsevier, 2009
National Category
Analytical Chemistry
Identifiers
urn:nbn:se:umu:diva-76141 (URN)000311292900032 ()978-0444-52702-8 (ISBN)
Available from: 2013-07-05 Created: 2013-07-04 Last updated: 2013-07-05Bibliographically approved
Trygg, J., Wold, S. & Eriksson, L. (2009). Hierarchically Organizing Data Using a Partial Least Squares Analysis (PLS-Trees). 20090164171.
Open this publication in new window or tab >>Hierarchically Organizing Data Using a Partial Least Squares Analysis (PLS-Trees)
2009 (English)Patent (Other (popular science, discussion, etc.))
Abstract [en]

Abstract: A method and system for partitioning (clustering) large amounts of data in a relatively short processing time. The method involves providing a first data matrix and a second data matrix where each of the first and second data matrices includes one or more variables, and a plurality of data points. The method also involves determining a first score from the first data matrix using a partial least squares (PLS) analysis or orthogonal PLS (OPLS) analysis and partitioning the first and second data matrices (e.g., row-wise) into a first group and a second group based on the sorted first score, the variance of the first data matrix, and a variance of the first and second groups relative to the variances of the first and second data matrices.

National Category
Algebra and Logic Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-88204 (URN)
Patent
20090164171
Available from: 2014-04-25 Created: 2014-04-25 Last updated: 2016-02-29
Eriksson, L., Trygg, J. & Wold, S. (2009). PLS-trees (R), a top-down clustering approach. Journal of Chemometrics, 23(11), 569-580.
Open this publication in new window or tab >>PLS-trees (R), a top-down clustering approach
2009 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 23, no 11, 569-580 p.Article in journal (Refereed) Published
Abstract [en]

A hierarchical clustering approach based on a set of PLS models is presented. Called PLS-Trees (R), this approach is analogous to classification and regression trees (CART), but uses the scores of PLS regression models as the basis for splitting the clusters, instead of the individual X-variables. The split of one cluster into two is made along the sorted first X-score (t(1)) of a PLS model of the cluster, but may potentially be made along a direction corresponding to a combination of scores. The position of the split is selected according to the improvement of a weighted combination of (a) the variance of the X-score, (b) the variance of Y and (c) a penalty function discouraging an unbalanced split with very different numbers of observations. Cross-validation is used to terminate the branches of the tree, and to determine the number of components of each cluster PLS model. Some obvious extensions of the approach to OPLS-Trees and trees based on hierarchical PLS or OPLS models with the variables divided in blocks depending on their type, are also mentioned. The possibility to greatly reduce the number of variables in each PLS model on the basis of their PLS w-coefficients is also pointed out. The approach is illustrated by means of three examples. The first two examples are quantitative structure-activity relationship (QSAR) data sets, while the third is based on hyperspectral images of liver tissue for identifying different sources of variability in the liver samples.

Place, publisher, year, edition, pages
Chichester: John Wiley & Sons, 2009
Keyword
PLS-Trees, PLS, dendrogram, data mining, clustering, variable selection, outlier detection
National Category
Analytical Chemistry Computer Engineering Mathematics Robotics
Identifiers
urn:nbn:se:umu:diva-115961 (URN)10.1002/cem.1254 (DOI)000273586400003 ()
Available from: 2016-02-29 Created: 2016-02-08 Last updated: 2018-01-10Bibliographically approved
Eriksson, L., Trygg, J. & Wold, S. (2008). CV-ANOVA for significance testing of PLS and OPLS® models. Journal of Chemometrics, 22(11-12), 594-600.
Open this publication in new window or tab >>CV-ANOVA for significance testing of PLS and OPLS® models
2008 (English)In: Journal of Chemometrics, Vol. 22, no 11-12, 594-600 p.Article in journal (Refereed) Published
Abstract [en]

This report describes significance testing for PLS and OPLS® (orthogonal PLS) models. The testing is applicable to single-Y cases and is based on ANOVA of the cross-validated residuals (CV-ANOVA). Two variants of the CV-ANOVA are introduced. The first is based on the cross-validated predictive residuals of the PLS or OPLS model while the second works with the cross-validated predictive score values of the OPLS model. The two CV-ANOVA diagnostics are shown to work well in those cases where PLS and OPLS work well, that is, for data with many and correlated variables, missing data, etc. The utility of the CV-ANOVA diagnostic is demonstrated using three datasets related to (i) the monitoring of an industrial de-inking process; (ii) a pharmaceutical QSAR problem and (iii) a multivariate calibration application from a sugar refinery. Copyright © 2008 John Wiley & Sons, Ltd.

Keyword
significance testing, OPLS, cross-validation, ANOVA, predictive score
National Category
Chemical Sciences
Identifiers
urn:nbn:se:umu:diva-11492 (URN)10.1002/cem.1187 (DOI)
Note

Special Issue: Proceedings of the 10th Scandinavian Symposium on Chemometrics, SSC10

Available from: 2009-01-12 Created: 2009-01-12 Last updated: 2013-02-28
Wold, S., Høy, M., Martens, H., Trygg, J., Westad, F., MacGregor, J. & Wise, B. M. (2008). The PLS model space revisited. Journal of Chemometrics, 23(2), 67-68.
Open this publication in new window or tab >>The PLS model space revisited
Show others...
2008 (English)In: Journal of Chemometrics, Vol. 23, no 2, 67-68 p.Article in journal (Other (popular science, discussion, etc.)) Published
Abstract [en]

Pell, Ramos and Manne (PRM) in a recent article in this journal claim that the conventional PLS algorithm with orthogonal scores has an inherent inconsistency in that it uses different model spaces for calculating the prediction model coefficients and for calculating the X-space model and it's residuals [1]. We disagree with PRM. All PLS model scores, residuals, coefficients, etc., obtained by the conventional PLS algorithm do come from the same underlying latent variable (LV) model, and not from different models or model spaces as PRM suggest. PRM have simply posed a different model with different assumptions and obtained slightly different results, as should have been expected.

Place, publisher, year, edition, pages
John Wiley & Sons, Ltd, 2008
Keyword
partial least squares, latent variable regression
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:umu:diva-11493 (URN)10.1002/cem.1171 (DOI)
Available from: 2009-01-13 Created: 2009-01-13 Last updated: 2018-01-13
Wiklund, S., Nilsson, D., Eriksson, L., Sjöström, M., Wold, S. & Faber, K. (2007). A randomization test for PLS component selection. Journal of Chemometrics, 21(10-11), 427-439.
Open this publication in new window or tab >>A randomization test for PLS component selection
Show others...
2007 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 21, no 10-11, 427-439 p.Article in journal (Refereed) Published
Abstract [en]

During the last two decades, a number of methods have been developed and evaluated for selecting the optimal number of components in a PLS model. In this paper, a new method is introduced that is based on a randomization test. The advantage of using a randomization test is that in contrast to cross validation (CV), it requires no exclusion of data, thus avoiding problems related to data exclusion, for example in designed experiments. The method is tested using simulated data sets for which the true dimensionality is clearly defined and also compared to regularly used methods for 10 real data sets. The randomization test works as a good statistical selection tool in combination with other selection rules. It also works as an indicator when the data require a pre-treatment.

Place, publisher, year, edition, pages
Chichester: Wiley & Sons, 2007
Keyword
randomization test, permutation test, component selection, factor selection, latent variable selection
National Category
Chemical Sciences
Identifiers
urn:nbn:se:umu:diva-2653 (URN)10.1002/cem.1086 (DOI)
Available from: 2007-10-19 Created: 2007-10-19 Last updated: 2017-12-14Bibliographically approved
Eriksson, L., Kettaneh-Wold, N., Trygg, J., Wikström, C. & Wold, S. (2006). Multi- and Megavariate Data Analysis: Part I: Basic Principles and Applications. Umetrics Inc.
Open this publication in new window or tab >>Multi- and Megavariate Data Analysis: Part I: Basic Principles and Applications
Show others...
2006 (English)Book (Refereed)
Place, publisher, year, edition, pages
Umetrics Inc, 2006. 425 p.
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:umu:diva-12895 (URN)9197373028 (ISBN)
Note

Excerpt from the book: “The flexibility of projection methods have made them useful also for the analysis and modelling of messy and complicated data; these methods are increasingly used in a wide range of industrial applications in PARD (processes, administration, research, and development). These include process monitoring and early fault detection and classification, process analytical technology (PAT), data mining and integration, quality control, and relationships between chemical composition/structure and chemical/biological properties. Multivariate calibration, which models the relationship between a multitude of signals measured on various samples and the concentrations of constituents of these samples, is another show-case of MVDA and projections.”

Available from: 2008-04-01 Created: 2008-04-01 Last updated: 2018-01-13
Organisations

Search in DiVA

Show all publications