PLS-trees (R), a top-down clustering approach
2009 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 23, no 11, 569-580 p.Article in journal (Refereed) PublishedText
A hierarchical clustering approach based on a set of PLS models is presented. Called PLS-Trees (R), this approach is analogous to classification and regression trees (CART), but uses the scores of PLS regression models as the basis for splitting the clusters, instead of the individual X-variables. The split of one cluster into two is made along the sorted first X-score (t(1)) of a PLS model of the cluster, but may potentially be made along a direction corresponding to a combination of scores. The position of the split is selected according to the improvement of a weighted combination of (a) the variance of the X-score, (b) the variance of Y and (c) a penalty function discouraging an unbalanced split with very different numbers of observations. Cross-validation is used to terminate the branches of the tree, and to determine the number of components of each cluster PLS model. Some obvious extensions of the approach to OPLS-Trees and trees based on hierarchical PLS or OPLS models with the variables divided in blocks depending on their type, are also mentioned. The possibility to greatly reduce the number of variables in each PLS model on the basis of their PLS w-coefficients is also pointed out. The approach is illustrated by means of three examples. The first two examples are quantitative structure-activity relationship (QSAR) data sets, while the third is based on hyperspectral images of liver tissue for identifying different sources of variability in the liver samples.
Place, publisher, year, edition, pages
Chichester: John Wiley & Sons, 2009. Vol. 23, no 11, 569-580 p.
PLS-Trees, PLS, dendrogram, data mining, clustering, variable selection, outlier detection
Analytical Chemistry Computer Engineering Mathematics Robotics
IdentifiersURN: urn:nbn:se:umu:diva-115961DOI: 10.1002/cem.1254ISI: 000273586400003OAI: oai:DiVA.org:umu-115961DiVA: diva2:907559