Change search
ReferencesLink to record
Permanent link

Direct link
On the selection of the training set in environmental QSAR analysis when compounds are clustered
Umeå University, Faculty of Science and Technology, Department of Chemistry. (Research Group for Chemometrics)
2000 (English)In: Journal of Chemometrics, Vol. 14, no 5-6, 599-616 p.Article in journal (Refereed) Published
Abstract [en]

In QSAR analysis in environmental sciences, adverse effects of chemicals released to the environment are modelled and predicted as a function of the chemical properties of the pollutants. Usually the set of compounds under study contains several classes of substances, i.e. a more or less strongly clustered set. It is then needed to ensure that the selected training set comprises compounds representing all those chemical classes. Multivariate design in the principal properties of the compound classes is usually appropriate for selecting a meaningful training set. However, with clustered data, often seen in environmental chemistry and toxicology, a single multivariate design may be suboptimal because of the risk of ignoring small classes with few members and only selecting training set compounds from the largest classes. Recently a procedure for training set selection recognizing clustering was proposed by us. In this approach, when non-selective biological or environmental responses are modelled, local multivariate designs are constructed within each cluster (class). The chosen compounds arising from the local designs are finally united in the overall training set, which thus will contain members from all clusters. The proposed strategy is here further tested and elaborated by applying it to a series of 351 chemical substances for which the soil sorption coefficient is available. These compounds are divided into 14 classes containing between 10 and 52 members. The training set selection is discussed, followed by multivariate QSAR modelling, model interpretation and predictions for the test set. Various types of statistical experimental designs are tested during the training set selection phase.

Place, publisher, year, edition, pages
2000. Vol. 14, no 5-6, 599-616 p.
Keyword [en]
multivariate design, multivariate QSAR, PCA, PLS, soil sorption
URN: urn:nbn:se:umu:diva-8422DOI: 10.1002/1099-128X(200009/12)14:5/6<599::AID-CEM619>3.0.CO;2-8OAI: diva2:148093
Available from: 2008-01-22 Created: 2008-01-22 Last updated: 2013-02-28Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Wold, Svante
By organisation
Department of Chemistry

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 17 hits
ReferencesLink to record
Permanent link

Direct link