Controlling coverage of D-optimal onion designs and selections
2004 (English)In: Journal of Chemometrics, ISSN 0886-9383, E-ISSN 1099-128X, Vol. 18, no 12, 548-557 p.Article in journal (Refereed) Published
Statistical molecular design (SMD) is a powerful approach for selection of compound sets in medicinal chemistry and quantitative structure-activity relationships (QSARs) as well as other areas. Two techniques often used in SMD are space-filling and D-optimal designs. Both on occasions lead to unwanted redundancy and replication. To remedy such shortcomings, a generalization of D-optimal selection was recently developed. This new method divides the compound candidate set into a number of subsets (layers or shells), and a D-optimal selection is made from each layer. This improves the possibility to select representative molecular structures throughout any property space independently of requested sample size. This is important in complex situations where any given model is unlikely to be valid over the whole investigated domain of experimental conditions. The number of selected molecules can be controlled by varying the number of subsets or by altering the complexity of the model equation in each layer and/or the dependency of previous layers. The new method, called D-optimal onion design (DOOD), will allow the user to choose the model equation complexity independently of sample size while still avoiding unwarranted redundancy. The focus of the present work is algorithmic improvements of DOOD in comparison with classical D-optimal design. As illustrations, extended DOODs have been generated for two applications by in-house programming, including some modifications of the D-optimal algorithm. The performances of the investigated approaches are expected to differ depending on the number of principal properties of the compounds in the design, sample sizes and the investigated model, i.e. the aim of the design. QSAR models have been generated from the selected compound sets, and root mean squared error of prediction (RMSEP) values have been used as measures of performance of the different designs.
Place, publisher, year, edition, pages
Chichester: Wiley & Sons , 2004. Vol. 18, no 12, 548-557 p.
statistical molecular design, space-filling design, D-optimal design, D-optimal onion designs, principal properties, PLS
IdentifiersURN: urn:nbn:se:umu:diva-14083DOI: 10.1002/cem.901OAI: oai:DiVA.org:umu-14083DiVA: diva2:153754