umu.sePublications
Change search
ReferencesLink to record
Permanent link

Direct link
PLS-trees (R), a top-down clustering approach
Umeå University, Faculty of Science and Technology, Department of Chemistry.
Umeå University, Faculty of Science and Technology, Department of Chemistry. Umetrics Inc., 42 Pine Hill Rd, Hollis, NH 03049, USA.
2009 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 23, no 11, 569-580 p.Article in journal (Refereed) PublishedText
Abstract [en]

A hierarchical clustering approach based on a set of PLS models is presented. Called PLS-Trees (R), this approach is analogous to classification and regression trees (CART), but uses the scores of PLS regression models as the basis for splitting the clusters, instead of the individual X-variables. The split of one cluster into two is made along the sorted first X-score (t(1)) of a PLS model of the cluster, but may potentially be made along a direction corresponding to a combination of scores. The position of the split is selected according to the improvement of a weighted combination of (a) the variance of the X-score, (b) the variance of Y and (c) a penalty function discouraging an unbalanced split with very different numbers of observations. Cross-validation is used to terminate the branches of the tree, and to determine the number of components of each cluster PLS model. Some obvious extensions of the approach to OPLS-Trees and trees based on hierarchical PLS or OPLS models with the variables divided in blocks depending on their type, are also mentioned. The possibility to greatly reduce the number of variables in each PLS model on the basis of their PLS w-coefficients is also pointed out. The approach is illustrated by means of three examples. The first two examples are quantitative structure-activity relationship (QSAR) data sets, while the third is based on hyperspectral images of liver tissue for identifying different sources of variability in the liver samples.

Place, publisher, year, edition, pages
Chichester: John Wiley & Sons, 2009. Vol. 23, no 11, 569-580 p.
Keyword [en]
PLS-Trees, PLS, dendrogram, data mining, clustering, variable selection, outlier detection
National Category
Analytical Chemistry Computer Engineering Mathematics Robotics
Identifiers
URN: urn:nbn:se:umu:diva-115961DOI: 10.1002/cem.1254ISI: 000273586400003OAI: oai:DiVA.org:umu-115961DiVA: diva2:907559
Available from: 2016-02-29 Created: 2016-02-08 Last updated: 2016-02-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Trygg, JohanWold, Svante
By organisation
Department of Chemistry
In the same journal
Journal of Chemometrics
Analytical ChemistryComputer EngineeringMathematicsRobotics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 239 hits
ReferencesLink to record
Permanent link

Direct link