umu.sePublications
Change search
Refine search result
1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Andersson, C David
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Chen, Brian Y
    Linusson, Anna
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Multivariate assessment of virtual screening experiments2010In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 24, no 11-12, p. 757-767Article in journal (Refereed)
    Abstract [en]

    Discovering molecules with a desired biological function is one of the great challenges in drug research. To discover new lead molecules, virtual screens (VS) are often conducted, in which databases of molecules are screened for potential binders to a specific protein, using molecular docking. The choice of docking software and parameter settings within the software can significantly influence the outcome of a VS. In this study, we have applied chemometric methods such as design of experiments, principal component analysis and partial least-square projections to latent structure (PLS) to simulated VS experiments to find and compare suitable conditions for performing VS against six protein targets selected from the DUD databases. The docking parameters in FRED, and scoring functions in both FRED and GOLD docking software, were varied according to a statistical experimental design and a PLS model was calculated to correlate the experimental setup to the VS outcome. The study revealed that the choice of scoring function has the greatest influence on VS outcome, and that other parameters have varying influence, depending on the protein target. We also found that substantial bias can be introduced by the lack of variation of molecular properties in the databases used in the screening. Our results provide indications that docking experiments could be tailored to the protein target in order to obtain satisfactory VS results.

  • 2.
    Andersson, Patrik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Haglund, Peter
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Rappe, Christoffer
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Tysklind, Mats
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Ultraviolet absorption characteristics and calculated semi-empirical parameters as chemical descriptors in multivariate modelling of polychlorinated biphenyls1996In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, no 2, p. 171-185Article in journal (Refereed)
    Abstract [en]

    The structural variation within the polychlorinated biphenyls (PCBs) was characterized by using principal component analysis (PCA). A multivariate model was evolved from 52 physicochemical descriptors including measured ultraviolet (UV) absorption spectra, calculated semiempirical parameters (AM1) and properties captured from the literature. Parameters calculated by using the AM1-Hamiltonian were e.g. heat of formation, dipole moments, ionization potential and the barrier of internal rotation. The UV spectra were measured and digitized in the range 200-300 nm. The multivariate model revealed that most of the information within the set of physicochemical parameters was related to molecular size. Descriptors depending on size were e.g. GC retention times, partition coefficients and a subset of semiempirically derived energy terms. Important also were parameters reflecting differences in substitution patterns and related to electronic and steric properties, such as UV absorption in the wavelength region 245-300 nm, the barrier of internal rotation and the ionization potential. The developed model describes the large variation in physicochemical characteristics within the PCBs. The importance of a broad chemical characterization is illustrated by a quantitative structure-activity relationship (QSAR) for the potency of inhibition of intercellular communication for 27 structurally diverse tetra- to heptachlorinated PCBs.

  • 3.
    Berglund, Anders
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kettaneh, Nouna
    Umetrics Inc., Kinnelon, NJ, USA.
    Uppgård, Lise-Lott
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Bendwell, Nancy
    Tembec Inc., Temiscaming, Quebec, Canada.
    Cameron, Dave R
    Tembec Inc., Temiscaming, Quebec, Canada.
    The GIFI approach to non-linear PLS modeling2001In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 15, no 4, p. 321-36Article in journal (Refereed)
    Abstract [en]

    The GIFI approach to non-linear modeling involves the transformation of quantitative variables to a set of 1/0 dummies in a similar manner to the way qualitative variables are coded. This is followed by analyzing the sets of 1/0 dummies by principal component analysis, multiple regression or, as discussed here, PLS. The patterns of the resulting coefficients indicate the nature of the non-linearities in the data. Here the potential uses and limitations of PLS regression, in combination with four variants of GIFI coding, are investigated using both simulated and empirical data sets.

  • 4.
    Bylesjö, Max
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Rantalainen, Mattias
    Cloarec, Olivier
    Nicholson, Jeremy K.
    Holmes, Elaine
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification2006In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 20, no 8-10, p. 341-351Article in journal (Refereed)
    Abstract [sv]

    The characteristics of the OPLS method have been investigated for the purpose of discriminant analysis (OPLS-DA). We demonstrate how class-orthogonal variation can be exploited to augment classification performance in cases where the individual classes exhibit divergence in within-class variation, in analogy with soft independent modelling of class analogy (SIMCA) classification. The prediction results will be largely equivalent to traditional supervised classification using PLS-DA if no such variation is present in the classes. A discriminatory strategy is thus outlined, combining the strengths of PLS-DA and SIMCA classification within the framework of the OPLS-DA method. Furthermore, resampling methods have been employed to generate distributions of predicted classification results and subsequently assess classification belief. This enables utilisation of the class-orthogonal variation in a proper statistical context. The proposed decision rule is compared to common decision rules and is shown to produce comparable or less class-biased classification results.

  • 5.
    Dumarey, Melanie
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Galindo-Prieto, Beatriz
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Fransson, Magnus
    Pharmaceutical Development, AstraZeneca R&D, Mölndal, Sweden.
    Josefson, Mats
    Pharmaceutical Development, AstraZeneca R&D, Mölndal, Sweden.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    OPLS methods for the analysis of hyperspectral images—comparison with MCR-ALS2014In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 28, no 8, p. 687-696Article in journal (Refereed)
    Abstract [en]

    Two new orthogonal projections to latent structures (OPLS) based methods were proposed to analyze hyperspectral images, enabling the visualization ofmultiple chemical compounds in onematrix without the need of extensive preprocessing. Both proposed methods delivered images representing the chemical distribution in the ribbon similar to the more traditional multivariate curve resolution–alternating least squares (MCR-ALS) method, but their image background was less dynamic resulting in a stronger chemical contrast. This indicated that the methods successfully removed structured variation orthogonal to the chemical information (pure spectra of individual compounds), which was confirmed by the fact that physical scattering effects caused by grooves and edges were captured in the images visualizing the orthogonal components of the model. Hereby, the OPLS-based method employing the pure spectra as weights in the OPLS algorithm was more successful in distinguishing compounds with a similar spectral signal than the transposed OPLS algorithm(pure spectra of individual compounds were used as response in OPLS model). It should be noted that for the main compounds, the MCR-ALS method enabled easier visual interpretation compared to the OPLS-based methods by setting all values below zero to zero, resulting in a higher contrast between pixels containing the studied compound and pixels not containing that compound.

  • 6. Eriksson, L.
    et al.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    A chemometrics toolbox based on projections and latent variables2014In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 28, no 5, p. 332-346Article in journal (Refereed)
    Abstract [en]

    A personal view is given about the gradual development of projection methods-also called bilinear, latent variable, and more-and their use in chemometrics. We start with the principal components analysis (PCA) being the basis for more elaborate methods for more complex problems such as soft independent modeling of class analogy, partial least squares (PLS), hierarchical PCA and PLS, PLS-discriminant analysis, Orthogonal projection to latent structures (OPLS), OPLS-discriminant analysis and more. From its start around 1970, this development was strongly influenced by Bruce Kowalski and his group in Seattle, and his realization that the multidimensional data profiles emerging from spectrometers, chromatographs, and other electronic instruments, contained interesting information that was not recognized by the current one variable at a time approaches to chemical data analysis. This led to the adoption of what in statistics is called the data analytical approach, often called also the data driven approach, soft modeling, and more. This approach combined with PCA and later PLS, turned out to work very well in the analysis of chemical data. This because of the close correspondence between, on the one hand, the matrix decomposition at the heart of PCA and PLS and, on the other hand, the analogy concept on which so much of chemical theory and experimentation are based. This extends to numerical and conceptual stability and good approximation properties of these models. The development is informally summarized and described and illustrated by a few examples and anecdotes.

  • 7. Eriksson, Lennart
    et al.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry. Umetrics Inc., 42 Pine Hill Rd, Hollis, NH 03049, USA.
    PLS-trees (R), a top-down clustering approach2009In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 23, no 11, p. 569-580Article in journal (Refereed)
    Abstract [en]

    A hierarchical clustering approach based on a set of PLS models is presented. Called PLS-Trees (R), this approach is analogous to classification and regression trees (CART), but uses the scores of PLS regression models as the basis for splitting the clusters, instead of the individual X-variables. The split of one cluster into two is made along the sorted first X-score (t(1)) of a PLS model of the cluster, but may potentially be made along a direction corresponding to a combination of scores. The position of the split is selected according to the improvement of a weighted combination of (a) the variance of the X-score, (b) the variance of Y and (c) a penalty function discouraging an unbalanced split with very different numbers of observations. Cross-validation is used to terminate the branches of the tree, and to determine the number of components of each cluster PLS model. Some obvious extensions of the approach to OPLS-Trees and trees based on hierarchical PLS or OPLS models with the variables divided in blocks depending on their type, are also mentioned. The possibility to greatly reduce the number of variables in each PLS model on the basis of their PLS w-coefficients is also pointed out. The approach is illustrated by means of three examples. The first two examples are quantitative structure-activity relationship (QSAR) data sets, while the third is based on hyperspectral images of liver tissue for identifying different sources of variability in the liver samples.

  • 8. Eriksson, Lennart
    et al.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    A graphical index of separation (GIOS) in multivariate modeling2010In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 24, no 11-12, p. 779-789Article in journal (Refereed)
    Abstract [en]

    We introduce a new measure for the importance of predictor variables, X, for the separation of two groups (classes) of observations. The measure is a Graphical Index of Separation (GIOS), and is, for each predictor, determined from the distribution of all possible pairs of observations with one from each group. GIOS is quantitative, intuitively simple and easy to interpret. The GIOS is straightforward to visualize in bivariate plots, and line or bar plots for larger number of variables. The approach applies both to discriminant analyses such as LDA, SIMCA, PLS-DA, OPLS-DA and to quantitative modeling such as MLR, PLS and OPLS. In the latter case, the observations are first divided into two groups based on their response values, Y. The GIOS approach is illustrated by PLS-DA/OPLS-DA and SIMCA-classification of a number of multivariate data sets with few and many variables relative to the number of observations.

  • 9.
    Galindo-Prieto, Beatriz
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Lennart
    MKS Umetrics, Umeå, Sweden.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Variable influence on projection (VIP) for orthogonal projections to latent structures (OPLS)2014In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 28, no 8, p. 623-632Article in journal (Refereed)
    Abstract [en]

    A new approach for variable influence on projection (VIP) is described, which takes full advantage of the orthogonal projections to latent structures (OPLS) model formalism for enhanced model interpretability. This means that it will include not only the predictive components in OPLS but also the orthogonal components. Four variants of variable influence on projection (VIP) adapted to OPLS have been developed, tested and compared using three different data sets, one synthetic with known properties and two real-world cases.

  • 10.
    Ghorbanzadeh, Mehdi
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Zhang, Jin
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Andersson, Patrik L.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Binary classification model to predict developmental toxicity of industrial chemicals in zebrafish2016In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 30, no 6, p. 298-307Article in journal (Refereed)
    Abstract [en]

    The identification of industrial chemicals, which may cause developmental effects, is of great importance for an early detection of hazardous chemicals. Accordingly, categorical quantitative structure-activity relationship (QSAR) models were developed, based on developmental toxicity profile data for zebrafish from the ToxCast Phase I testing, to predict the toxicity of a large set of high and low production volume chemicals (H/LPVCs). QSARs were created using linear (LDA), quadratic, and partial least squares-discriminant analysis with different chemical descriptors. The predictions of the best model (LDA) were compared with those obtained by the freely available QSAR model VEGA, created based on a dataset with a different chemical domain. The results showed that despite similar accuracy (AC) of both models, the LDA model is more specific than VEGA and shows a better agreement between sensitivity (SE) and specificity (SP). Applying a 90% confidence level on the Lou model led to even better predictions showing SE of 0.92, AC of 0.95, and geometric mean of SE and SP (G) of 0.96 for the prediction set. The LDA model predicted 608 H/LPVCs as toxicants among which 123 chemicals fall inside the AD of the VEGA model, which predicted 112 of those as toxicants. Among the 112 chemicals predicted as toxic H/LPVCs, 23 have been previously reported as developmental toxicants. The here presented LDA model could be used to identify and prioritize H/LPVCs for subsequent developmental toxicity assessment, as a screening tool of potential developmental effects of new chemicals, and to guide synthesis of safer alternative chemicals.

  • 11.
    Lindberg, Nils-Olof
    et al.
    Pharmaceutical R&D, Pharmacia Consumer Healthcare, Helsingborg, Sweden.
    Gabrielsson, Jon
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Use of software to facilitate pharmaceutical formulation: experiences from a tablet formulation2004In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 18, no 3-4, p. 133-138Article in journal (Refereed)
    Abstract [en]

    This paper exemplifies the benefits of using experimental design together with software to facilitate the formulation of a tablet for specific purposes, from screening to robustness testing. By applying a multivariate design for the screening experiments, many excipients were evaluated in comparatively few experiments. The formulation work was generally based on designed experiments. Most of the experiments were fractional or full factorial designs, generated and evaluated in Modde with the centre point replicated. The robustness of the formulation was evaluated with experimental designs on two different occasions. Tested flavours were found to have limited influence on the important responses, which was key information in order to proceed with that particular composition. The formulation was also robust towards normal batch-to-batch variation of the excipients and the active pharmaceutical ingredient. A process step was investigated and, by applying experimental design and keeping in mind previous findings, important information could be gained from the study. The different studies yielded good and very useful models. Established relationships between design factors and responses provided information that was vital for the project. In cases of poor models, essential information regarding robustness was obtained.

  • 12.
    Löfstedt, Tommy
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Lennart
    Gunilla Wormbs, Gunilla
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Bi-modal OnPLS2012In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 26, no 6, p. 236-245Article in journal (Refereed)
    Abstract [en]

    This paper presents an extension to the recently published OnPLS data analysis method. Bi-modal OnPLS allows for arbitrary block relationships in both columns and rows and is able to extract orthogonal variation in both columns and rows without bias towards any particular direction or matrix: the method is fully symmetric with regard to both rows and columns.

    Bi-modal OnPLS extracts a minimal number of globally predictive score vectors that exhibit maximal covariance and correlation in the column space and a corresponding set of predictive loading vectors that exhibit maximal correlation in the row space. The method also extracts orthogonal variation (i.e. variation that is not related to all other matrices) in both columns and rows. The method was applied to two synthetic datasets and one real data set regarding sensory information and consumer likings of dairy products. It was shown that Bi-modal OnPLS greatly improves the intercorrelations between both loadings and scores while still finding the correct variation. This facilitates interpretation of the predictive components and makes it possible to study the orthogonal variation in the data.

  • 13.
    Löfstedt, Tommy
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation2011In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 25, no 8, p. 441-455Article in journal (Refereed)
    Abstract [en]

    This paper presents a new multiblock analysis method called OnPLS, a general extension of O2PLS to the multiblock case. The proposed method is equivalent to O2PLS in cases involving only two matrices, but generalises to cases involving more than two matrices without giving preference to any particular matrix: the method is fully symmetric. OnPLS extracts a minimal number of globally predictive components that exhibit maximal covariance and correlation. Furthermore, the method can be used to study orthogonal variation, i.e. local phenomena captured in the data that are specific to individual combinations of matrices or to individual matrices. The method's utility was demonstrated by its application to three synthetic data sets. It was shown that OnPLS affords a reduced number of globally predictive components and increased intercorrelations of scores, and that it greatly facilitates interpretation of the predictive model.

  • 14.
    Olsson, Ing-Marie
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Gottfries, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Controlling coverage of D-optimal onion designs and selections2004In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 18, no 12, p. 548-557Article in journal (Refereed)
    Abstract [en]

    Statistical molecular design (SMD) is a powerful approach for selection of compound sets in medicinal chemistry and quantitative structure-activity relationships (QSARs) as well as other areas. Two techniques often used in SMD are space-filling and D-optimal designs. Both on occasions lead to unwanted redundancy and replication. To remedy such shortcomings, a generalization of D-optimal selection was recently developed. This new method divides the compound candidate set into a number of subsets (layers or shells), and a D-optimal selection is made from each layer. This improves the possibility to select representative molecular structures throughout any property space independently of requested sample size. This is important in complex situations where any given model is unlikely to be valid over the whole investigated domain of experimental conditions. The number of selected molecules can be controlled by varying the number of subsets or by altering the complexity of the model equation in each layer and/or the dependency of previous layers. The new method, called D-optimal onion design (DOOD), will allow the user to choose the model equation complexity independently of sample size while still avoiding unwarranted redundancy. The focus of the present work is algorithmic improvements of DOOD in comparison with classical D-optimal design. As illustrations, extended DOODs have been generated for two applications by in-house programming, including some modifications of the D-optimal algorithm. The performances of the investigated approaches are expected to differ depending on the number of principal properties of the compounds in the design, sample sizes and the investigated model, i.e. the aim of the design. QSAR models have been generated from the selected compound sets, and root mean squared error of prediction (RMSEP) values have been used as measures of performance of the different designs.

  • 15.
    Pinto, Rui Climaco
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Gottfries, Johan
    Department of Chemistry and Molecular Biology, Gothenburg University.
    Advantages of orthogonal inspection in chemometrics2012In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 26, no 6, p. 231-235Article in journal (Refereed)
    Abstract [en]

    The demand for chemometrics tools and concepts to study complex problems in modern biology and medicine has prompted chemometricians to shift their focus away from a traditional emphasis on model predictive capacity toward optimizing information exchange via model interpretation for biological validation. The interpretation of projection-based latent variable models is not straightforward because of its confounding of different systematic variations in the model components. Over the last 15 years, this has spurred the development of orthogonal-based methods that are capable of separating the correlated variation (to Y) from the noncorrelated (orthogonal to Y) variations in a single model. Here, we aim to provide a conceptual explanation of the advantages of orthogonal variation inspection in the context of Partial Least Squares (PLS) in multivariate classification and calibration. We propose that by inspecting the orthogonal variation, both model interpretation and information quality are improved by enhancement of the resulting level of knowledge. Although the predictive capacity of PLS using orthogonal methods may be identical to that of PLS alone, the combined result can be superior when it comes to the model interpretation. By discussing theory and examples, several new advantages revealed by inspection of orthogonal variation are highlighted.

  • 16. Rantalainen, Mattias
    et al.
    Bylesjö, Max
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Cloarec, Olivier
    Nicholson, Jeremy K
    Holmes, Elaine
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Kernel-based orthogonal projections to latent structures (K-OPLS)2007In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 21, no 7-9, p. 379-385Article in journal (Refereed)
    Abstract [en]

    The orthogonal projections to latent structures (OPLS) method has been successfully applied in various chemical and biological systems for modeling and interpretation of linear relationships between a descriptor matrix and response matrix. A kernel-based reformulation of the original OPLS algorithm is presented where the kernel Gram matrix is utilized as a replacement for the descriptor matrix. This enables usage of the kernel trick to efficiently transform the data into a higher-dimensional feature space where predictive and response-orthogonal components are calculated. This strategy has the capacity to improve predictive performance considerably in situations where strong non-linear relationships exist between descriptor and response variables while retaining the OPLS model framework. We put particular focus on describing properties of the rearranged algorithm in relation to the original OPLS algorithm. Four separate problems, two simulated and two real spectroscopic data sets, are employed to illustrate how the algorithm enables separate modeling of predictive and response-orthogonal variation in the feature space. This separation can be highly beneficial for model interpretation purposes while providing a flexible framework for supervised regression.

  • 17.
    Sandberg Hiltunen, Maria
    Umeå University, Faculty of Social Sciences, Centre for Demographic and Ageing Research (CEDAR).
    A multivariate characterization of tRNA nucleosides1996In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, no 5-6, p. 493-508Article in journal (Refereed)
    Abstract [en]

    Twenty nucleosides occurring in transfer ribonucleic acid (tRNA) have been characterized using 21 experimentally determined (HPLC, TLC, NMR, etc.) and calculated (log P, van der Waals surface area, ionization potential, etc.) variables. Principal component analysis (PCA) was performed on the data set and four statistically significant components or principal properties (PPs) were extracted. The PPs described 68·4% of the variance in the data. The PP values are discussed in terms of similarity and dissimilarity among the nucleosides. The loading vectors from the PCA are used for an interpretation of the nature of the PP vectors. Application of the PPs in sequence-activity modelling is demonstrated with 25 DNA-promoter sequences originating from E. coli.

  • 18.
    Sandberg Hiltunen, Maria
    Umeå University, Faculty of Social Sciences, Centre for Demographic and Ageing Research (CEDAR).
    A Multivariate Characterization of tRNAnucleosides1996In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, p. 493-508Article in journal (Refereed)
  • 19.
    Sandberg Hiltunen, Maria
    Umeå University, Faculty of Social Sciences, Centre for Demographic and Ageing Research (CEDAR).
    The evolutionary transition from uracil to thymine balances the genetic code1996In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 10, p. 163-170Article in journal (Refereed)
    Abstract [en]

    A multivariate quantitative physicochemical characterization of the five bases adenine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), followed by principal component analysis, shows that the relative dissimilarities between the bases of DNA (A, C, G and T) are almost the same (i.e. balanced). In contrast, mRNA (containing U instead of T) has a considerably larger relative physicochemical similarity between C and U than between all other pairs of bases and is therefore inherently more unbalanced. These results provide a physicochemical explanation of the presence of thymine instead of uracil as an element of DNA. The principal component scores enable a quantitative description of nucleic acid sequence data to be made for structure-activity modelling purposes.

  • 20.
    Sjögren, Rickard
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Stridh, Kjell
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Skotare, Tomas
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry. Sartorius Stedim Data Analytics, Umeå, Sweden.
    Multivariate patent analysis: using chemometrics to analyze collections of chemical and pharmaceutical patents2018In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, article id e3041Article in journal (Refereed)
    Abstract [en]

    Abstract Patents are an important source of technological knowledge, but the amount of existing patents is vast and quickly growing. This makes development of tools and methodologies for quickly revealing patterns in patent collections important. In this paper, we describe how structured chemometric principles of multivariate data analysis can be applied in the context of text analysis in a novel combination with common machine learning preprocessing methodologies. We demonstrate our methodology in 2 case studies. Using principal component analysis (PCA) on a collection of 12338 patent abstracts from 25 companies in big pharma revealed sub-fields which the companies are active in. Using PCA on a smaller collection of patents retrieved by searching for a specific term proved useful to quickly understand how patent classifications relate to the search term. By using orthogonal projections to latent structures (O-PLS) on patent classification schemes, we were able to separate patents on a more detailed level than using PCA. Lastly, we performed multi-block modeling using OnPLS on bag-of-words representations of abstracts, claims, and detailed descriptions, respectively, showing that semantic variation relating to patent classification is consistent across multiple text blocks, represented as globally joint variation. We conclude that using machine learning to transform unstructured data into structured data provide a good preprocessing tool for subsequent chemometric multivariate data analysis and provides an easily interpretable and novel workflow to understand large collections of patents. We demonstrate this on collections of chemical and pharmaceutical patents.

  • 21.
    Skotare, Tomas
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Sjögren, Rickard
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Surowiec, Izabella
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Nilsson, David
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Trygg, Johan
    Umeå University, Faculty of Science and Technology, Department of Chemistry. Sartorius Stedim Data Analytics, 907 36 Umeå, Sweden.
    Visualization of descriptive multiblock analysis2018In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128XArticle in journal (Refereed)
    Abstract [en]

    Abstract Understanding and making the most of complex data collected from multiple sources is a challenging task. Data integration is the procedure of describing the main features in multiple data blocks, and several methods for multiblock analysis have been previously developed, including OnPLS and JIVE. One of the main challenges is how to visualize and interpret the results of multiblock analyses because of the increased model complexity and sheer size of data. In this paper, we present novel visualization tools that simplify interpretation and overview of multiblock analysis. We introduce a correlation matrix plot that provides an overview of the relationships between blocks found by multiblock models. We also present a multiblock scatter plot, a metadata correlation plot, and a variation distribution plot, that simplify the interpretation of multiblock models. We demonstrate our visualizations on an industrial case study in vibration spectroscopy (NIR, UV, and Raman datasets) as well as a multiomics integration study (transcript, metabolite, and protein datasets). We conclude that our visualizations provide useful tools to harness the complexity of multiblock analysis and enable better understanding of the investigated system.

  • 22.
    Trygg, Johan
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Aberg, K. Magnus
    13th Scandinavian Symposium on Chemometrics2014In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 28, no 8, p. 604-605Article in journal (Other academic)
  • 23.
    Wiklund, Susanne
    et al.
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Nilsson, David
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Eriksson, Lennart
    Umetrics, Umeå, Sweden.
    Sjöström, Michael
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Wold, Svante
    Umeå University, Faculty of Science and Technology, Department of Chemistry.
    Faber, Klaas
    Chemometry Consultancy, Rubensstraat 7, 6717 VD Ede, The Netherlands.
    A randomization test for PLS component selection2007In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 21, no 10-11, p. 427-439Article in journal (Refereed)
    Abstract [en]

    During the last two decades, a number of methods have been developed and evaluated for selecting the optimal number of components in a PLS model. In this paper, a new method is introduced that is based on a randomization test. The advantage of using a randomization test is that in contrast to cross validation (CV), it requires no exclusion of data, thus avoiding problems related to data exclusion, for example in designed experiments. The method is tested using simulated data sets for which the true dimensionality is clearly defined and also compared to regularly used methods for 10 real data sets. The randomization test works as a good statistical selection tool in combination with other selection rules. It also works as an indicator when the data require a pre-treatment.

1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf