umu.sePublications
Change search
Refine search result
1 - 48 of 48
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Andersson, Per M
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Lundstedt, Torbjörn
    Comparison between physicochemical and calculated molecular descriptors2000In: Journal of Chemometrics: Special Issue: Proceedings of the SSC6, August 1999, HiT/TF, Norway . Issue Edited by Kim Esbensen, Vol. 14, no 5-6, 629-42 p.Article in journal (Refereed)
    Abstract [en]

    It has earlier been proven that measured physicochemical properties are useful in the selection of building blocks for combinatorial chemistry as well as for investigation of the scope and limitations of organic reactions. However, measured physicochemical properties are only available for small subsets of reagents, starting materials or building blocks; therefore it is necessary to use calculated descriptors and it is essential that the descriptors are relevant. The objective was to investigate whether three different descriptor data sets contained similar information about the chemical structure, with the major aim to investigate whether calculated descriptors contain similar information as experimental data. A total of 205 heterogeneous primary amines were characterized using three different data sets of molecular descriptor variables. The first set consisted of four physicochemical variables compiled from the literature and commercially available chemicals in chemical catalogues. From these four descriptors together with molecular weight, three additional descriptors could be calculated, resulting in a total of eight descriptor variables in the first data set. The second data set consisted of 81 calculated molecular descriptor variables relating to size, connectivity, atom count, topology and electrotopology indices. The third data set consisted of 10 semi-empirical variables (AM1). All the calculated variables were generated using the software Tsar 3.11. The descriptor variable sets were compared using principal component analysis (PCA) and partial least squares projections to latent structures (PLS). The following result shows that the different descriptor sets do contain similar latent information and that the different types of calculated variables do correlate well with the experimental data, making them suitable to use for e.g. combinatorial library design.

  • 2.
    Andersson, Per M
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Lundstedt, Torbjörn
    Strategies for subset selection of parts of an in-house chemical library2001In: Journal of Chemometrics, Vol. 15, no 4, 353-69 p.Article in journal (Refereed)
    Abstract [en]

    When a company decides to perform biological testing of their in-house library, i.e. compounds which have been synthesized or purchased over the years, it is usually not feasible or desirable to test all of them using e.g. high-throughput screening (HTS). The limitation is the usually high number of compounds to test (104-106) leading to practical limitations and high costs in terms of both material costs and disposal considerations. Therefore it is often desirable to make a selection of which compounds to include in the biological testing. A challenge is how to make this selection in order to cover the structural space of the in-house library as well as possible. Here we present and discuss different selection strategies based mainly on statistical molecular design (SMD). These methods require different prior information about the compounds under investigation, e.g. characterization of the chemical structure, affinity/biological activity data or neither of these. Which method to be used is largely problem-dependent, i.e. the composition and origin of the library, and hence the structural space, are of great importance. Chemical and biological knowledge about the system under investigation should as far as possible be considered when making the final decision on which method to apply.

  • 3. Artursson, Tom
    et al.
    Hagman, Anders
    Björk, Seth
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Jacobsson, Sven P
    Study of Preprocessing Methods for the Determination of Crystalline Phases in Binary Mixtures of Drug Substances by X-ray Powder Diffraction and Multivariate Calibration2000In: Applied Spectroscopy, ISSN 0003-7028, E-ISSN 1943-3530, Vol. 54, no 8, 272A-301A p.Article in journal (Refereed)
    Abstract [en]

    In this paper, various preprocessing methods were tested on data generated by X-ray powder diffraction (XRPD) in order to enhance the partial least-squares (PLS) regression modeling performance. The preprocessing methods examined were 22 different discrete wavelet transforms, Fourier transform, Savitzky-Golay, orthogonal signal correction (OSC), and combinations of wavelet transform and OSC, and Fourier transform and OSC. Root mean square error of prediction (RMSEP) of an independent test set was used to measure the performance of the various preprocessing methods. The best PLS model was obtained with a wavelet transform (Symmlet 8), which at the same time compressed the data set by a factor of 9.5. With the use of wavelet and X-ray powder diffraction, concentrations of less than 10% of one crystal from could be detected in a binary mixture. The linear range was found to be in the range 10-70% of the crystalline form of phenacetin, although semiquantitative work could be carried out down to a level of approximately 2%. Furthermore, the wavelet-pretreated models were able to handle admixtures and deliberately added noise.

  • 4.
    Berglund, Anders
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kettaneh, Nouna
    Umetrics Inc., Kinnelon, NJ, USA.
    Uppgård, Lise-Lott
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Bendwell, Nancy
    Tembec Inc., Temiscaming, Quebec, Canada.
    Cameron, Dave R
    Tembec Inc., Temiscaming, Quebec, Canada.
    The GIFI approach to non-linear PLS modeling2001In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 15, no 4, 321-36 p.Article in journal (Refereed)
    Abstract [en]

    The GIFI approach to non-linear modeling involves the transformation of quantitative variables to a set of 1/0 dummies in a similar manner to the way qualitative variables are coded. This is followed by analyzing the sets of 1/0 dummies by principal component analysis, multiple regression or, as discussed here, PLS. The patterns of the resulting coefficients indicate the nature of the non-linearities in the data. Here the potential uses and limitations of PLS regression, in combination with four variants of GIFI coding, are investigated using both simulated and empirical data sets.

  • 5. Champagne, M
    et al.
    Meglen, B
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kettaneh-Wold, Nouna
    The use of orthogonal signal correction to improve NIR readings of pulp fibre properties2001In: Pulp & Paper-Canada, ISSN 0316-4004, Vol. 102, no 4, 41-3 p.Article in journal (Refereed)
    Abstract [en]

    In 1999 Tembec Industries and the National Renewal Energy Laboratories worked together in developing a methodology to use Near-infrared (NIR). Technology of in-house pulp fibre quality properties Q99 and Q97. The initial results with dry samples of pulp were encouraging. the wet samples results were initially disappointing using the standard chemometric techniques. Svante Wold developed a new chemometric method called Orthogonal Signal correction (OSC), which was used to obtain a good correction of Q99 in the wet pulp samples.

  • 6.
    Dåbakk, Eigil
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Nilsson, Mats
    Geladi, Paul
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Renberg, Ingemar
    Umeå University, Faculty of Science and Technology, Department of Ecology and Environmental Sciences.
    Inferring lake water chemistry from filtered seston using NIR spectrometry2000In: Water Research, Vol. 34, no 5, 1666-72 p.Article in journal (Refereed)
    Abstract [en]

    Near-infrared spectrometry (NIR) is a rapid, inexpensive and reagent-free technique, widely used in industry in areas such as quality control and process management. The technique has great potential for environmental monitoring of aqueous systems. This study assesses relationships, using PLS regression, between NIR spectra of seston collected on glass fibre filters and the following measured lake water parameters: total organic carbon (TOC), total phosphorus (TP), Abs420 and pH. Water samples were collected from 271 oligotrophic lakes during autumn 1995. The predictive model for TOC explained 68% of the variance (SEP=2.1 mg L-1, range 14.9 mg L-1), and that for colour 71% (SEP=0.04 A, range 0.36 A), while the explained variances for pH and TP were 72% (SEP=0.36 μg L-1, range 3.13 μg L-1) and 45% (SEP=4 μg L-1, range 41 μg L-1), respectively. A model correlating NIR spectra and the actual amount of phosphorus in the seston captured on filters explained 86% of the variance (SEP=0.044 μg/filter, range 0.47). Several pretreatments and regression techniques were used in an attempt to enhance modeling performance. However, straightforward PLS on raw data performed best in all cases.

  • 7. Eriksson, L.
    et al.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    A chemometrics toolbox based on projections and latent variables2014In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 28, no 5, 332-346 p.Article in journal (Refereed)
    Abstract [en]

    A personal view is given about the gradual development of projection methods-also called bilinear, latent variable, and more-and their use in chemometrics. We start with the principal components analysis (PCA) being the basis for more elaborate methods for more complex problems such as soft independent modeling of class analogy, partial least squares (PLS), hierarchical PCA and PLS, PLS-discriminant analysis, Orthogonal projection to latent structures (OPLS), OPLS-discriminant analysis and more. From its start around 1970, this development was strongly influenced by Bruce Kowalski and his group in Seattle, and his realization that the multidimensional data profiles emerging from spectrometers, chromatographs, and other electronic instruments, contained interesting information that was not recognized by the current one variable at a time approaches to chemical data analysis. This led to the adoption of what in statistics is called the data analytical approach, often called also the data driven approach, soft modeling, and more. This approach combined with PCA and later PLS, turned out to work very well in the analysis of chemical data. This because of the close correspondence between, on the one hand, the matrix decomposition at the heart of PCA and PLS and, on the other hand, the analogy concept on which so much of chemical theory and experimentation are based. This extends to numerical and conceptual stability and good approximation properties of these models. The development is informally summarized and described and illustrated by a few examples and anecdotes.

  • 8. Eriksson, Lennart
    et al.
    Andersson, Patrik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Johansson, Erik
    Tysklind, Mats
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sandberg, Maria
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    The Constrained Principal Property (CPP) Space in QSAR---Directional and Non-Directional Modelling Approaches2000In: Molecular Modeling and Prediction of Bioactivity, Kluwer Academic/Plenum Publishers , 2000, 65-70 p.Chapter in book (Other academic)
  • 9. Eriksson, Lennart
    et al.
    Antti, Henrik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Gottfries, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Holmes, Elaine
    Johansson, Erik
    Lindgren, Fredrik
    Long, Ingrid
    Lundstedt, Torbjörn
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm)2004In: Analytical and Bioanalytical Chemistry, ISSN 1618-2642 (Print) 1618-2650 (Online), Vol. 380, no 3, 419-29 p.Article in journal (Refereed)
    Abstract [en]

    This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.

  • 10. Eriksson, Lennart
    et al.
    Damborsky, J
    Earll, M
    Johansson, Erik
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Three-block bi-focal PLS (3BIF-PLS) and its application in QSAR2004In: SAR and QSAR in Environmental Research, ISSN 1062-936X, Vol. 15, no 5 & 6, 481-99 p.Article in journal (Refereed)
    Abstract [en]

    When X and Y are multivariate, the two-block partial least squares (PLS) method is often used. In this paper, we outline an extension addressing a special case of the three-block (X/Y/Z) problem, where Z sits "under" Y. We have called this approach three-block bi-focal PLS (3BIF-PLS). It views the X/Y relationship as the dominant problem, and seeks to use the additional information in Z in order to improve the interpretation of the Y-part of the X/Y association. Two data sets are used to illustrate 3BIF-PLS. Example I relates to single point mutants of haloalkane dehalogenase from Sphingomonas paucimobilis UT26 and their ability to transform halogenated hydrocarbons, some of which are found as organic pollutants in soil. Example II deals with soil remediation capability of bacteria. Whole bacterial communities are monitored over time using "DNA-fingerprinting" technology to see how pollution affects population composition. Since the data sets are large, hierarchical multivariate modelling is invoked to compress data prior to 3BIF-PLS analysis. It is concluded that the 3BIF-PLS approach works well. The paper contains a discussion of pros and cons of the method, and hints at further developmental opportunities.

  • 11. Eriksson, Lennart
    et al.
    Gottfries, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Johansson, Erik
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Time-resolved QSAR: an approach to PLS modelling of three-way biological data2004In: Chemometrics and Intelligent Laboratory Systems, Vol. 73, no 1, 73-84 p.Article in journal (Refereed)
    Abstract [en]

    This paper outlines a novel approach to the analysis of three-way Y-data in quantitative structure–activity relationship (QSAR) modelling. The new method represents a modification of an existing approach for multivariate modelling of batch process data. It is based on unfolding the three-way Y-matrix into a two-way matrix according to a sequential order of an external variable. In QSAR, time, pH, or temperature at which the biological data were gathered, are conceivably such external variables. Thus, unfolding can be done differently depending on the objective of the investigation, thereby shifting the focus of the QSAR analysis. The ensuing multivariate data analysis uses two levels of modelling. (1) On the lower (observation) level a projections to latent structures (PLS) model is developed between the unfolded biological data and the external variable. This model will identify compounds with biological data being sensitive to changes in the external variable (like time, pH, or temperature). (2) The scores of the lower level model are then re-arranged to enable the upper (QSAR) level model. In this model, a battery of structure descriptors (X) is related to the Y-matrix of scores of the lower level model. As an example, a series of 35 compounds and their anti-microbial activity towards the bacterial strain Escherichia coli CCM2260 is used. This biological activity has been determined at different times (2 to 10 h) and pH-values (pH 5.6 to 8.0).

  • 12. Eriksson, Lennart
    et al.
    Johansson, Erik
    Kettaneh-Wold, Nouna
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wikström, Conny
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multi- and Megavariate Data Analysis: Part II: Advanced Applications and Method Extensions2006Book (Refereed)
    Abstract [en]

    This second volume has two parts, the first with specialized applications of multi- and mega-variate analysis, namely:

    QSAR (quantitative structure-activity relationships) describes how series of molecular structures can be translated to quantitative data and how these data then are used to model and predict biological activity measurements made on the corresponding molecules. Chapters on how the QSAR concept applies in peptide QSAR, lead finding and optimization, combinatorial chemistry, and chem-and bio-informatics, are included.

    The multi- and megavariate analysis of “omics” data, has a special chapter, i.e., data from metabonomics, proteomics, genomics and other areas.

    Then follow six chapters on extensions of the basic projection methods (PCA and PLS):

    Orthogonal PLS (OPLS) showing how a PLS model can be “rotated” so that all y-related information appears in the first component, which facilitates the model interpretation.

    Hierarchical modeling, both PC and PLS, allowing variables of different types to be handled in separate blocks, which greatly simplifies the handling of datasets with very many variables.

    Non-linear PLS describes various approaches to the modeling of non-linear relationships between predictors X and responses Y.

    The Image Analysis chapter shows how multivariate analysis applies to the analysis of series of digital images.

    Data Mining and Integration has a discussion of how to get useful information out of large and complicated data sets, and how to manage and organize data in complex investigations.

    The second volume ends with a chapter on preference and sensory data, followed by an appendix summarizing the multivariate approach, statistical notes, and references.

  • 13. Eriksson, Lennart
    et al.
    Johansson, Erik
    Lindgren, Fredrik
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Megavariate analysis of hierarchical QSAR data2002In: Journal of Computer-Aided Molecular Design, Vol. 16, no 10, 711-26 p.Article in journal (Refereed)
    Abstract [en]

    Multivariate PCA- and PLS-models involving many variables are often difficult to interpret, because plots and lists of loadings, coefficients, VIPs, etc, rapidly become messy and hard to overview. There may then be a strong temptation to eliminate variables to obtain a smaller data set. Such a reduction of variables, however, often removes information and makes the modelling efforts less reliable. Model interpretation may be misleading and predictive power may deteriorate.

    A better alternative is usually to partition the variables into blocks of logically related variables and apply hierarchical data analysis. Such blocked data may be analyzed by PCA and PLS. This modelling forms the base-level of the hierarchical modelling set-up. On the base-level in-depth information is extracted for the different blocks. The score vectors formed on the base-level, here called `super variables', may be linked together in new matrices on the top-level. On the top-level superficial relationships between the X- and the Y-data are investigated.

    In this paper the basic principles of hierarchical modelling by means of PCA and PLS are reviewed. One objective of the paper is to disseminate this concept to a broader QSAR audience. The hierarchical methods are used to analyze a set of 10 haloalkanes for which K = 30 chemical descriptors and M = 255 biological responses have been gathered. Due to the complexity of the biological data, they are sub-divided in four blocks. All the modelling steps on the base-level and the top-level are reported and the final QSAR model is interpreted thoroughly.

  • 14. Eriksson, Lennart
    et al.
    Johansson, Erik
    Lindgren, Fredrik
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    GIFI-PLS: Modeling of Non-Linearities and Discontinuities in QSAR2000In: QSAR, Vol. 19, no 4, 345-55 p.Article in journal (Refereed)
    Abstract [en]

    This paper introduces to the QSAR community a novel method for modeling and understanding non-linear relationships between biological potency and chemical structure properties of molecules. The approach, GIFI-PLS, is based on ``binning'' of quantitative X-variables into categorical variables. Each categorical variable is then expanded into a set of linked 1/0 dummy variables, which enable modeling of non-linearity. By way of four QSAR data sets, it is demonstrated that GIFI-PLS is useful for modeling of non-linearity and discontinuity in QSAR, and that the predictive power of a QSAR model may improve.

  • 15. Eriksson, Lennart
    et al.
    Johansson, Erik
    Müller, Martin
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    On the selection of the training set in environmental QSAR analysis when compounds are clustered2000In: Journal of Chemometrics, Vol. 14, no 5-6, 599-616 p.Article in journal (Refereed)
    Abstract [en]

    In QSAR analysis in environmental sciences, adverse effects of chemicals released to the environment are modelled and predicted as a function of the chemical properties of the pollutants. Usually the set of compounds under study contains several classes of substances, i.e. a more or less strongly clustered set. It is then needed to ensure that the selected training set comprises compounds representing all those chemical classes. Multivariate design in the principal properties of the compound classes is usually appropriate for selecting a meaningful training set. However, with clustered data, often seen in environmental chemistry and toxicology, a single multivariate design may be suboptimal because of the risk of ignoring small classes with few members and only selecting training set compounds from the largest classes. Recently a procedure for training set selection recognizing clustering was proposed by us. In this approach, when non-selective biological or environmental responses are modelled, local multivariate designs are constructed within each cluster (class). The chosen compounds arising from the local designs are finally united in the overall training set, which thus will contain members from all clusters. The proposed strategy is here further tested and elaborated by applying it to a series of 351 chemical substances for which the soil sorption coefficient is available. These compounds are divided into 14 classes containing between 10 and 52 members. The training set selection is discussed, followed by multivariate QSAR modelling, model interpretation and predictions for the test set. Various types of statistical experimental designs are tested during the training set selection phase.

  • 16. Eriksson, Lennart
    et al.
    Kettaneh-Wold, Nouna
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wikström, Conny
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multi- and Megavariate Data Analysis: Part I: Basic Principles and Applications2006Book (Refereed)
  • 17. Eriksson, Lennart
    et al.
    Toft, Marianne
    Johansson, Erik
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Separating Y-predictive and Y-orthogonal variation in multi-block spectral data2006In: Journal of Chemometrics, Vol. 20, 352-61 p.Article in journal (Refereed)
    Abstract [en]

    Spectral data (X) may contain (a) variation that is correlated to concentrations or properties (Y) of samples and (b) variation that is unrelated to the same Y. This paper outlines an approach by which both such sources of variation may be resolved. The approach is based on a combination of hierarchical modelling and orthogonal partial least squares (OPLS). OPLS is first used at the base hierarchical level. The output is a labelling of the resulting score vectors as representing Y-predictive or Y-orthogonal variation. OPLS is then also used at the top hierarchical level together with principal components analysis (PCA). With PCA the Y-orthogonal X-variation is analysed and interpreted. With OPLS the Y-predictive X-variation is examined. The applicability of the proposed strategy is illustrated using one multi-block spectral data set.

  • 18. Eriksson, Lennart
    et al.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Johansson, Erik
    Bro, Rasmus
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Orthogonal signal correction, wavelet analysis, and multivariate calibration of complicated process fluorescence data2000In: Analytica Chimica Acta, Vol. 420, no 2, 181-95 p.Article in journal (Refereed)
    Abstract [en]

    In this paper, multivariate calibration of complicated process fluorescence data is presented. Two data sets related to the production of white sugar are investigated. The first data set comprises 106 observations and 571 spectral variables, and the second data set 268 observations and 3997 spectral variables. In both applications, a single response, ash content, is modelled and predicted as a function of the spectral variables. Both data sets contain certain features making multivariate calibration efforts non-trivial. The objective is to show how principal component analysis (PCA) and partial least squares (PLS) regression can be used to overview the data sets and to establish predictively sound regression models. It is shown how a recently developed technique for signal filtering, orthogonal signal correction (OSC), can be applied in multivariate calibration to enhance predictive power. In addition, signal compression is tested on the larger data set using wavelet analysis. It is demonstrated that a compression down to 4% of the original matrix size - in the variable direction - is possible without loss of predictive power. It is concluded that the combination of OSC for pre-processing and wavelet analysis for compression of spectral data is promising for future use.

  • 19. Eriksson, Lennart
    et al.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    CV-ANOVA for significance testing of PLS and OPLS® models2008In: Journal of Chemometrics, Vol. 22, no 11-12, 594-600 p.Article in journal (Refereed)
    Abstract [en]

    This report describes significance testing for PLS and OPLS® (orthogonal PLS) models. The testing is applicable to single-Y cases and is based on ANOVA of the cross-validated residuals (CV-ANOVA). Two variants of the CV-ANOVA are introduced. The first is based on the cross-validated predictive residuals of the PLS or OPLS model while the second works with the cross-validated predictive score values of the OPLS model. The two CV-ANOVA diagnostics are shown to work well in those cases where PLS and OPLS work well, that is, for data with many and correlated variables, missing data, etc. The utility of the CV-ANOVA diagnostic is demonstrated using three datasets related to (i) the monitoring of an industrial de-inking process; (ii) a pharmaceutical QSAR problem and (iii) a multivariate calibration application from a sugar refinery. Copyright © 2008 John Wiley & Sons, Ltd.

  • 20. Eriksson, Lennart
    et al.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry. Umetrics Inc., 42 Pine Hill Rd, Hollis, NH 03049, USA.
    PLS-trees (R), a top-down clustering approach2009In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 23, no 11, 569-580 p.Article in journal (Refereed)
    Abstract [en]

    A hierarchical clustering approach based on a set of PLS models is presented. Called PLS-Trees (R), this approach is analogous to classification and regression trees (CART), but uses the scores of PLS regression models as the basis for splitting the clusters, instead of the individual X-variables. The split of one cluster into two is made along the sorted first X-score (t(1)) of a PLS model of the cluster, but may potentially be made along a direction corresponding to a combination of scores. The position of the split is selected according to the improvement of a weighted combination of (a) the variance of the X-score, (b) the variance of Y and (c) a penalty function discouraging an unbalanced split with very different numbers of observations. Cross-validation is used to terminate the branches of the tree, and to determine the number of components of each cluster PLS model. Some obvious extensions of the approach to OPLS-Trees and trees based on hierarchical PLS or OPLS models with the variables divided in blocks depending on their type, are also mentioned. The possibility to greatly reduce the number of variables in each PLS model on the basis of their PLS w-coefficients is also pointed out. The approach is illustrated by means of three examples. The first two examples are quantitative structure-activity relationship (QSAR) data sets, while the third is based on hyperspectral images of liver tissue for identifying different sources of variability in the liver samples.

  • 21. Eriksson, Lennart
    et al.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    A graphical index of separation (GIOS) in multivariate modeling2010In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 24, no 11-12, 779-789 p.Article in journal (Refereed)
    Abstract [en]

    We introduce a new measure for the importance of predictor variables, X, for the separation of two groups (classes) of observations. The measure is a Graphical Index of Separation (GIOS), and is, for each predictor, determined from the distribution of all possible pairs of observations with one from each group. GIOS is quantitative, intuitively simple and easy to interpret. The GIOS is straightforward to visualize in bivariate plots, and line or bar plots for larger number of variables. The approach applies both to discriminant analyses such as LDA, SIMCA, PLS-DA, OPLS-DA and to quantitative modeling such as MLR, PLS and OPLS. In the latter case, the observations are first divided into two groups based on their response values, Y. The GIOS approach is illustrated by PLS-DA/OPLS-DA and SIMCA-classification of a number of multivariate data sets with few and many variables relative to the number of observations.

  • 22. Eriksson, Lennart
    et al.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multivariate analysis of congruent images (MACI)2005In: Journal of Chemometrics, Vol. 19, no 5-7, 393-403 p.Article in journal (Refereed)
    Abstract [en]

    The multivariate analysis of congruent images (MACI) is discussed. Here, each image represents one observation and the data set contains a set of congruent images. With congruent images we mean a set of images, properly pre-processed, oriented and aligned, so that each data element (feature, pixel) corresponds to the same element across all images. An example may be a set of frames from a fixed video camera looking at a stable process. The purpose of a MACI is to find and express patterns over a set of images for the purpose of classification or quantitative regression-like relationships. This is in contrast to standard image analysis, which is usually concerned with a single image and the identification of parts of the image, for example tumour tissue versus normal. We also extend MACI to the case with a set of images that initially are not fully congruent, but are made so by the use of wavelet analysis and the distributions of the wavelet coefficients. Thus, the resulting description forms a set of congruent vectors amenable to multivariate data analysis. The MACI approach will be illustrated by four data sets, three easy-to-understand tutorial image data sets and one industrial image data set relating to quality control of steel rolls.

  • 23. Johansson, Erik
    et al.
    Eriksson, Lennart
    Sandberg, Maria
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    QSAR Model Validation2000In: Molecular Modeling and Prediction of Bioactivity, Kluwer Academic/Plenum Publishers, New York , 2000, 271-272 p.Chapter in book (Other (popular science, discussion, etc.))
    Abstract [en]

    The book covers the challenging process from lead finding to drug candidates. Topics include new developments in chemometrics and rational molecular design as well as different aspects of structure representation, knowledge-based approaches to structure identification, and information handling.

  • 24. Kettaneh, Nouna
    et al.
    Berglund, Anders
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    PCA and PLS with very large data sets2005In: Computational Statistics & Data Analysis, Vol. 48, no 1, 69-85 p.Article in journal (Refereed)
    Abstract [en]

    Chemometrics was started around 30 years ago to cope with the rapidly increasing volumes of data produced in chemical laboratories. A multivariate approach based on projections—PCA and PLS—was developed that adequately solved many of the problems at hand. However, with the further increase in the size of our data sets seen today in all fields of science and technology, we start to see inadequacies in our multivariate methods, both in their efficiency and interpretability.

    Starting from a few examples of complicated problems seen in RD&P (research, development, and production), possible extensions and generalizations of the existing multivariate projection methods—PCA and PLS—will be discussed. Criteria such as scalability of methods to increasing size of problems and data, increasing sophistication in the handling of noise and non-linearities, interpretability of results, and relative simplicity of use, will be held as important. The discussion will be made from a perspective of the evolution of scientific methodology as (a) driven by new technology, e.g., computers and graphical displays, and the need to answer some always reoccurring and basic questions, and (b) constrained by the limitations of the human brain, i.e., our ability to understand and interpret scientific and data analytic results.

  • 25. Lindon, John C
    et al.
    Nicholson, Jeremy K
    Holmes, Elaine
    Keun, Hector C
    Craig, Andrew
    Pearce, Jake T M
    Bruce, Stephen J
    Hardy, Nigel
    Sansone, Susanna-Assunta
    Antti, Henrik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Jonsson, Pär
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Daykin, Clare
    Navarange, Mahendra
    Beger, Richard D
    Verheij, Elwin R
    Amberg, Alexander
    Baunsgaard, Dorrit
    Cantor, Glenn H
    Lehman-McKeeman, Lois
    Earll, Mark
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Johansson, Erik
    Haselden, John N
    Kramer, Kerstin
    Thomas, Craig
    Lindberg, Johann
    Schuppe-Koistinen, Ina
    Wilson, Ian D
    Reily, Michael D
    Robertson, Donald G
    Senn, Hans
    Krotzky, Arno
    Kochhar, Sunil
    Powell, Jonathan
    Ouderaa, Frans van der
    Plumb, Robert
    Schaefer, Hartmut
    Spraul, Manfred
    Summary recommendations for standardization and reporting of metabolic analyses: The Standard Metabolic Reporting Structures (SMRS) working group outlines its vision for an open,community-driven specification for the standardization and reporting of metabolic studies2005In: Nature Biotechnology, Vol. 23, 833-8 p.Article in journal (Refereed)
  • 26. Linusson, Anna
    et al.
    Gottfries, Johan
    Lindgren, Fredrik
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Statistical Molecular Design of Building Blocks for Combinatorial Chemistry2000In: Journal of Medicinal Chemistry, Vol. 43, no 7, 1320-8 p.Article in journal (Refereed)
    Abstract [en]

    The reduction of the size of a combinatorial library can be made in two ways, either base the selection on the building blocks (BB's) or base it on the full set of virtually constructed products. In this paper we have investigated the effects of applying statistical designs to BB sets compared to selections based on the final products. The two sets of BB's and the virtually constructed library were described by structural parameters, and the correlation between the two characterizations was investigated. Three different selection approaches were used both for the BB sets and for the products. In the first two the selection algorithms were applied directly to the data sets (D-optimal design and space-filling design), while for the third a cluster analysis preceded the selection (cluster-based design). The selections were compared using visual inspection, the Tanimoto coefficient, the Euclidean distance, the condition number, and the determinant of the resulting data matrix. No difference in efficiency was found between selections made in the BB space and in the product space. However, it is of critical importance to investigate the BB space carefully and to select an appropriate number of BB's to result in an adequate diversity. An example from the pharmaceutical industry is then presented, where selection via BB's was made using a cluster-based design.

  • 27. Linusson, Anna
    et al.
    Gottfries, Johan
    Olsson, Thomas
    Örnskov, Eivor
    Folestad, Staffan
    Nordén, Bo
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Statistical Molecular Design, Parallel Synthesis, and Biological Evaluation of a Library of Thrombin Inhibitors2001In: Journal of Medicinal Chemistry, Vol. 44, no 21, 3424-39 p.Article in journal (Refereed)
    Abstract [en]

    A library of thrombin inhibitors has been designed using statistical molecular design. An aromatic scaffold was used, with three varied positions corresponding to three pockets at the active site of thrombin (the S-, P-, and D-pockets). The selection was performed in the building block space, and previously acquired data were included in the design procedure. The design resulted in six, four, and six building blocks for the first (S), second (P), and third (D) pockets, respectively. A second round of selection applied to the combined selected building blocks resulted in a subset of 18 compounds. The selected library was synthesized in parallel and biologically evaluated. The compounds were analyzed with respect to their inhibition (pIC50) of thrombin; membrane permeability, estimated by migration behavior in micellar media (CE log k') and pKa; and specificity with respect to inhibition (Ki) of trypsin. Multivariate QSAR studies of the responses yielded valuable results and information that could only be found using statistical molecular design in combination with multivariate analysis.

  • 28.
    Olsson, Ing-Marie
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Gottfries, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Controlling coverage of D-optimal onion designs and selections2004In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 18, no 12, 548-557 p.Article in journal (Refereed)
    Abstract [en]

    Statistical molecular design (SMD) is a powerful approach for selection of compound sets in medicinal chemistry and quantitative structure-activity relationships (QSARs) as well as other areas. Two techniques often used in SMD are space-filling and D-optimal designs. Both on occasions lead to unwanted redundancy and replication. To remedy such shortcomings, a generalization of D-optimal selection was recently developed. This new method divides the compound candidate set into a number of subsets (layers or shells), and a D-optimal selection is made from each layer. This improves the possibility to select representative molecular structures throughout any property space independently of requested sample size. This is important in complex situations where any given model is unlikely to be valid over the whole investigated domain of experimental conditions. The number of selected molecules can be controlled by varying the number of subsets or by altering the complexity of the model equation in each layer and/or the dependency of previous layers. The new method, called D-optimal onion design (DOOD), will allow the user to choose the model equation complexity independently of sample size while still avoiding unwarranted redundancy. The focus of the present work is algorithmic improvements of DOOD in comparison with classical D-optimal design. As illustrations, extended DOODs have been generated for two applications by in-house programming, including some modifications of the D-optimal algorithm. The performances of the investigated approaches are expected to differ depending on the number of principal properties of the compounds in the design, sample sizes and the investigated model, i.e. the aim of the design. QSAR models have been generated from the selected compound sets, and root mean squared error of prediction (RMSEP) values have been used as measures of performance of the different designs.

  • 29.
    Olsson, Ing-Marie
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Gottfries, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    D-optimal onion designs in statistical molecular design2004In: Chemometrics and Intelligent Laboratory Systems, ISSN 0169-7439, Vol. 73, no 1, 37-46 p.Article in journal (Refereed)
    Abstract [en]

    Statistical molecular design (SMD) is a technique for selecting a representative (diverse) set of substances in combinatorial chemistry and QSAR, as well as other areas depending on optimising chemical structure. Two approaches often used in SMD are space filling (SF) and D-optimal (DO) designs.

    Space-filling designs provide good coverage of the physicochemical space but are not explicitly based on a model. For small design sizes, they perform similar to D-optimal designs, which maximize the determinant of the variance–covariance matrix. This leads to selection of the most extreme points of the candidate set and gives a minimal set of selected compounds with maximal diversity. However, the inner regions of the experimental domain are not well sampled by DO or small SF designs.

    We have developed and evaluated an approach to remedy the shortcomings of SF and DO designs in SMD. This new approach divides the candidate set into a number of subsets (“shells” or “layers”), and a D-optimal selection is made from each layer. This makes it possible to select representative sets of molecular structures throughout any property space, e.g., the physicochemical space, with reasonable design sizes. The number of selected molecules is easily controlled by varying (a) the number of layers and (b) the model on which the design is based.

    We outline here this new approach, the D-optimal onion design (DOOD). It is tested on two molecular data sets with varying size and compared with SF designs and ordinary DO designs. The designs have been evaluated with parameters, such as condition number, determinant, Tanimoto coefficients and Euclidean distances, as well as external evaluation of the resulting projection to latent structures (PLS) model.

  • 30.
    Olsson, Ing-Marie
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Johansson, Erik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Berntsson, Martin
    Eriksson, Lennart
    Gottfries, Johan
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Rational DOE-protocols for 96-well plates2006In: Chemometrics and Intelligent Laboratory Systems, ISSN 0169-7439, E-ISSN 1873-3239, Vol. 83, no 1, 66-74 p.Article in journal (Refereed)
    Abstract [en]

    The use of 96-well plates for chemical and biological applications has rapidly increased as new applicable domains have been discovered and new laboratory instruments developed. There are 96, 384, 1536, etc. plates customized for diverse applications such as biological assays, sample preparation, solid-phase extraction and crystallization. Multi-pipettes as well as automated pipette systems accelerate the preparation of plates resulting in even faster evaluation systems. A bottleneck in the use of multi-unit plates is method development and optimization. By applying rational experimental design, the optimization could be made more efficient and less time-consuming. Unfortunately, the workload related to manual preparation of multi-unit plates according to an experimental design is often considered overwhelming. The present study introduces a new approach for experimental design in 96-well plates that minimizes the manual workload without compromising the quality of the experimental design. This approach is scalable to larger rectangular formats such as 384- and 1536-well plates. The optimal combinations will be delineated and applied experimentally to a reporter-gene assay.

  • 31.
    Trygg, Johan
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Orthogonal projections to latent structures (O-PLS)2002In: Journal of Chemometrics, Vol. 16, no 3, 119-28 p.Article in journal (Refereed)
    Abstract [en]

    A generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O-PLS), is described. O-PLS removes variation from X (descriptor variables) that is not correlated to Y (property variables, e.g. yield, cost or toxicity). In mathematical terms this is equivalent to removing systematic variation in X that is orthogonal to Y. In an earlier paper, Wold et al. (Chemometrics Intell. Lab. Syst. 1998; 44: 175-185) described orthogonal signal correction (OSC). In this paper a method with the same objective but with different means is described. The proposed O-PLS method analyzes the variation explained in each PLS component. The non-correlated systematic variation in X is removed, making interpretation of the resulting PLS model easier and with the additional benefit that the non-correlated variation itself can be analyzed further. As an example, near-infrared (NIR) reflectance spectra of wood chips were analyzed. Applying O-PLS resulted in reduced model complexity with preserved prediction ability, effective removal of non-correlated variation in X and, not least, improved interpretational ability of both correlated and non-correlated variation in the NIR spectra.

  • 32.
    Trygg, Johan
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry. Umetrics AB.
    Orthogonal signal projection2003Patent (Other (popular science, discussion, etc.))
    Abstract [en]

    Abstract: The invention regards a method and an arrangement for filtering or pre-processing most any type of multivariate data exemplified by NIR or NMR spectra measured on samples in order to remove systematic noise such as base line variation and multiplicative scatter effects. This is accomplished by differentiating the spectra to first or second derivatives, by Multiplicative Signal Correction (MSC), or by similar filtering methods. The pre-processing may, however, also remove information from the spectra, as well as other multiple measurement arrays, regarding (Y) (the response variables). Provided is a variant of PLS that can be used to achieve a signal correction that is as close to orthogonal as possible to a given (y) vector or (Y) matrix. Hence, ensuring that the signal correction removes as little information as possible regarding (Y). A filter according to the present invention is named Orthogonal Partial Least Squares (OPLS).

  • 33.
    Trygg, Johan
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry. Umetrics AB.
    Eriksson, Lennart
    Umetrics AB.
    Hierarchically Organizing Data Using a Partial Least Squares Analysis (PLS-Trees)2009Patent (Other (popular science, discussion, etc.))
    Abstract [en]

    Abstract: A method and system for partitioning (clustering) large amounts of data in a relatively short processing time. The method involves providing a first data matrix and a second data matrix where each of the first and second data matrices includes one or more variables, and a plurality of data points. The method also involves determining a first score from the first data matrix using a partial least squares (PLS) analysis or orthogonal PLS (OPLS) analysis and partitioning the first and second data matrices (e.g., row-wise) into a first group and a second group based on the sorted first score, the variance of the first data matrix, and a variance of the first and second groups relative to the variances of the first and second data matrices.

  • 34.
    Uppgård, Lise-Lott
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Lindgren, Åsa
    Akzo Nobel Surface Chemistry AB, Stenungsund, Sweden.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multivariate quantitative structure-activity relationships for the aquatic toxicity of technical nonionic surfactants2000In: Journal of Surfactants and Detergents, ISSN 1097-3958 (Print) 1558-9293 (Online), Vol. 3, no 1, 33-41 p.Article in journal (Refereed)
    Abstract [en]

    The aquatic toxicity of 36 technical nonionic surfactants (ethoxylated fatty alcohols) was examined toward two freshwater animal species, the fairy shrimp Thamnocephalus playtyurus and the rotifer Brachionus calyciflorus. Responses of the two species to the surfactants were generally similar. A multivariate-quantitative structure-activity relationship (M-QSAR) model was developed from the data. The M-QSAR model consisted of a partial least squares model with three components and explained 92.4% of the response variance and had a predictive capability of 89.1%. The most important physicochemical variables for the M-QSAR model were the number of carbon atoms in the longest chain of the surfactant hydrophobe (redC), the molecular hydrophobicity (log P), the number of carbon atoms in the hydrophobe (C), the hydrophilic-lipophilic balance according to Davis (Davis), the critical packing parameter with respect to whether the hydrophobe was branched or not (redCPP), and the critical micelle concentration. Surfactant toxicity tended to increase with increasing alkyl chain lengths.

  • 35.
    Uppgård, Lise-Lott
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multivariate quantitative structure-activity relationships for the aquatic toxicity of alkyl polyglucosides2000In: Tenside Surfactants Detergents:, Vol. 37, no 2, 131-8 p.Article in journal (Refereed)
    Abstract [en]

    The aquatic toxicity of 34 alkyl polyglucosides (APGs) towards two fresh-water species, Thamnocephalus platyurus and Brachionus calyciflorus were studied. The toxicity tests were performed using so-called toxkits, and for each surfactant the results are presented as (10)log (mean LC50) values. The toxicity data were combined with physico-chemical data for the APGs, and a Multivariate Quantitative Structure-Activity Relationship (M-QSAR) model was calculated. Partial Least Squares (PLS) regression was used to develop the M-QSAR model. The resulting linear M-QSAR model explained 93.6% of the variance in the biological response and had a predictability of 86.6% according to cross-validation. The physico-chemical properties with the strongest influences on the toxicity of the surfactants were the critical micelle concentration (c.m.c.), wetting, contact angle, and number of carbon atoms in their hydrophobic parts (C and redC).

  • 36.
    Wiklund, Susanne
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Nilsson, David
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Lennart
    Umetrics, Umeå, Sweden.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Faber, Klaas
    Chemometry Consultancy, Rubensstraat 7, 6717 VD Ede, The Netherlands.
    A randomization test for PLS component selection2007In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 21, no 10-11, 427-439 p.Article in journal (Refereed)
    Abstract [en]

    During the last two decades, a number of methods have been developed and evaluated for selecting the optimal number of components in a PLS model. In this paper, a new method is introduced that is based on a randomization test. The advantage of using a randomization test is that in contrast to cross validation (CV), it requires no exclusion of data, thus avoiding problems related to data exclusion, for example in designed experiments. The method is tested using simulated data sets for which the true dimensionality is clearly defined and also compared to regularly used methods for 10 real data sets. The randomization test works as a good statistical selection tool in combination with other selection rules. It also works as an indicator when the data require a pre-treatment.

  • 37.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Chemometrics and Bruce: Some Fond Memories2015In: 40 Years of Chemometrics – From Bruce Kowalski to the Future, 2015, Vol. 1199, 1-13 p.Conference paper (Refereed)
    Abstract [en]

    This chapter describes the transformation of a young physical organic chemist (SW, 1964), from a believer in first principles models to a middle-aged chemometrician (SW, 1974) promoting empirical and semiempirical "data driven, soft, analogy" models for the design of experiments and the analysis of the resulting data. This transformation was marked by a number of influential events, each tipping the balance towards the data driven, soft, analogy models until the point of no return in 1974. On June 10, 1974, Bruce and I together with our research groups joined forces formed the Chemometrics Society (later renamed to the International Chemometrics Society), and we took off into multidimensional space. This review of my personal scientific history, inspired and encouraged by Bruce, is illustrated by examples of method development driven by necessity to solve specific problems and leading to data driven soft models, which, at least in my own eyes, were superior to the classical first principles approaches to the same problems. Bruce and I met at numerous conferences between 1975 and 1990, but after that, Bruce and I gradually slid out of the academic world, and now Bruce has taken his final step.

  • 38.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Personal memories of the early PLS development2001In: Chemometrics and Intelligent Laboratory Systems, Vol. 58, no 2, 83-4 p.Article in journal (Other academic)
  • 39.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Berglund, Anders
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kettaneh, Nouna
    New and old trends in chemometrics. How to deal with the increasing data volumes in R&D&P (research, development and production) - with examples from pharmaceutical research and process modeling2002In: Journal of Chemometrics: Special Issue: Proceedings of the 7th Scandinavian Symposium on Chemometrics . Issue Edited by Lars Nørgaard, Vol. 16, no 8-10, 377-86 p.Article in journal (Refereed)
    Abstract [en]

    Chemometrics was started around 30 years ago to cope with and utilize the rapidly increasing volumes of data produced in chemical laboratories. The methods of early chemometrics were mainly focused on the analysis of data, but slowly we came to realize that it is equally important to make the data contain reliable information, and methods for design of experiments (DOE) were added to the chemometrics toolbox. This toolbox is now fairly adequate for solving most R&D problems of today in both academia and industry, as will be illustrated with a few examples. However, with the further increase in the size of our data sets, we start to see inadequacies in our multivariate methods, both in their efficiency and interpretability. Drift and non-linearities occur with time or in other directions in data space, and models with masses of coefficients become increasingly difficult to interpret and use. Starting from a few examples of some very complicated problems confronting chemical researchers today, possible extensions and generalizations of the existing chemometrics methods, as well as more appropriate preprocessing of the data before the analysis, will be discussed. Criteria such as scalability of methods to increasing size of problems and data, increasing sophistication in the handling of noise and non-linearities, interpretability of results, and relative simplicity of use will be held as important. The discussion will be made from a perspective of the evolution of the scientific methodology as driven by new technology, e.g. computers, and constrained by the limitations of the human brain, i.e. our ability to understand and interpret scientific and data analytical results. Quilt-PCA and Quilt-PLS presented here address and offer a possible solution to these problems.

  • 40.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Cheney, James
    Kettaneh, Nouna
    McCready, Chris
    The chemometric analysis of point and dynamic data in pharmaceutical and biotech production (PAT) — some objectives and approaches2006In: Chemometrics and Intelligent Laboratory Systems, Vol. 84, no 1-2, 159-63 p.Article in journal (Refereed)
    Abstract [en]

    Checking that a process is doing what it is supposed to is of critical importance in manufacturing, economics, environmental monitoring, patient monitoring, and more. Given sufficient and adequate analytical and process measurements made during the history of the well functioning process, a multivariate model of the process variation around a multivariate dynamic trajectory will, in principle, form a good basis for this checking. Such systems are often labeled process monitoring, real-time quality control (RTQC), PAT level 4, and advanced process control/fault detection and classification (APC/FDC). Here PAT stands for process analytical technology indicating the reliance on adequate and multiple data for this checking.

    In practice, there are many difficulties in making an RTQC/PAT-4 system work well. Starting from an industrial example, the problems of constructing and implementing a well working checking system are discussed in relation to its different parts — analytical and process data, chemometrical and other methods for their modeling and analysis, and various forms of data management to handle the data flow and synchronization, as well as storage and retrieval. The display and interpretability of diagnostics and results are emphasized.

  • 41.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Lennart
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kettaneh, Nouna
    The PLS method -- partial least squares projections to latent structures -- and its applications in industrial RDP (research, development, and production)2004Conference paper (Other (popular science, discussion, etc.))
  • 42.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Høy, Martin
    Martens, Harald
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Westad, Frank
    MacGregor, John
    Wise, Barry M
    The PLS model space revisited2008In: Journal of Chemometrics, Vol. 23, no 2, 67-68 p.Article in journal (Other (popular science, discussion, etc.))
    Abstract [en]

    Pell, Ramos and Manne (PRM) in a recent article in this journal claim that the conventional PLS algorithm with orthogonal scores has an inherent inconsistency in that it uses different model spaces for calculating the prediction model coefficients and for calculating the X-space model and it's residuals [1]. We disagree with PRM. All PLS model scores, residuals, coefficients, etc., obtained by the conventional PLS algorithm do come from the same underlying latent variable (LV) model, and not from different models or model spaces as PRM suggest. PRM have simply posed a different model with different assumptions and obtained slightly different results, as should have been expected.

  • 43.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Josefson, Mats
    Gottfries, Johan
    Linusson, Anna
    The utility of multivariate design in PLS modeling2004In: Journal of Chemometrics, Vol. 18, no 3-4, 156-165 p.Article in journal (Refereed)
    Abstract [en]

    We discuss the use of multivariate design to ensure representativity and balance of the training set data for PLS multivariate modeling. Three application areas are used to illustrate the discussion, namely multivariate calibration in process analytical chemistry, quantitative structure activity relationships (QSAR) in medicinal and pharmaceutical chemistry, and data mining. In both QSAR and data mining, the multivariate design is also useful for the balanced sampling of data from a large, complex, and unbalanced data repository.

  • 44.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Josefsson, Mats
    Multivariate Calibration of Analytical Chemistry2000In: Encyclopedia of Analytical Chemistry: Applications, Theory, and Instrumentation, 15 Volume Set, Wiley , 2000, 9710-36 p.Chapter in book (Refereed)
  • 45.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kettaneh-Wold, N.
    MacGregor, J. F.
    Dunn, K. G.
    Batch Process Modeling and MSPC2009In: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, VOLS 1-4 / [ed] Steven Brown, Romà Tauler, Beata Walczak, AMSTERDAM: Elsevier, 2009, A163-A197 p.Chapter in book (Other academic)
  • 46.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Andersson, Per M
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Linusson, Anna
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Edman, Maria
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Lundstedt, Torbjörn
    Nordén, Bo
    Sandberg, Maria
    Uppgård, Lise-Lott
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multivariate design and modelling in QSAR, combinatorial chemistry and bioinformatics2000In: Molecular modeling and prediction of bioactivity / [ed] Gundertofte, Klaus ; Jorgensen, Flemming Steen, Springer - Verlag , 2000, 27-45 p.Chapter in book (Other academic)
  • 47.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Lennart
    PLS-regression: a basic tool of chemometrics2001In: Chemometrics and Intelligent Laboratory Systems, Vol. 58, no 2, 109-30 p.Article in journal (Refereed)
    Abstract [en]

    PLS-regression (PLSR) is the PLS approach in its simplest, and in chemistry and technology, most used form (two-block predictive PLS). PLSR is a method for relating two data matrices, X and Y, by a linear multivariate model, but goes beyond traditional regression in that it models also the structure of X and Y. PLSR derives its usefulness from its ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. PLSR has the desirable property that the precision of the model parameters improves with the increasing number of relevant variables and observations.This article reviews PLSR as it has developed to become a standard tool in chemometrics and used in chemistry and engineering. The underlying model and its assumptions are discussed, and commonly used diagnostics are reviewed together with the interpretation of resulting parameters.Two examples are used as illustrations: First, a Quantitative Structure-Activity Relationship (QSAR)/Quantitative Structure-Property Relationship (QSPR) data set of peptides is used to outline how to develop, interpret and refine a PLSR model. Second, a data set from the manufacturing of recycled paper is analyzed to illustrate time series modelling of process data by means of PLSR and time-lagged X-variables.

  • 48.
    Wold, Svante
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Berglund, Anders
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Antti, Henrik
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Some recent developments in PLS modeling2001In: Chemometrics and Intelligent Laboratory Systems, Vol. 58, no 2, 131-50 p.Article in journal (Refereed)
    Abstract [en]

    The original chemometrics partial least squares (PLS) model with two blocks of variables (X and Y), linearly related to each other, has had several enhancements/extensions since the beginning of 1980. We here discuss multi-block and hierarchical PLS modeling for installing a priori knowledge of the data structure and simplifying the model interpretation, variable selection schemes for PLS with often similar objectives, nonlinear PLS, and prefiltered PLS, orthogonal signal correction (OSC). A very recent development, orthogonalized-PLS (O-PLS) is included as a way to accomplish both OSC, and a simpler interpretation of the PLS model. In this context, we also briefly mention time series, batch, and wavelets variants of PLS.

    These PLS extensions are illustrated by examples from peptide quantitative structure–activity relationships (QSAR) and multivariate characterization of pulp using NIR.

1 - 48 of 48
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf