D-optimal onion designs in statistical molecular design
2004 (English)In: Chemometrics and Intelligent Laboratory Systems, ISSN 0169-7439, Vol. 73, no 1, 37-46 p.Article in journal (Refereed) Published
Statistical molecular design (SMD) is a technique for selecting a representative (diverse) set of substances in combinatorial chemistry and QSAR, as well as other areas depending on optimising chemical structure. Two approaches often used in SMD are space filling (SF) and D-optimal (DO) designs.
Space-filling designs provide good coverage of the physicochemical space but are not explicitly based on a model. For small design sizes, they perform similar to D-optimal designs, which maximize the determinant of the variance–covariance matrix. This leads to selection of the most extreme points of the candidate set and gives a minimal set of selected compounds with maximal diversity. However, the inner regions of the experimental domain are not well sampled by DO or small SF designs.
We have developed and evaluated an approach to remedy the shortcomings of SF and DO designs in SMD. This new approach divides the candidate set into a number of subsets (“shells” or “layers”), and a D-optimal selection is made from each layer. This makes it possible to select representative sets of molecular structures throughout any property space, e.g., the physicochemical space, with reasonable design sizes. The number of selected molecules is easily controlled by varying (a) the number of layers and (b) the model on which the design is based.
We outline here this new approach, the D-optimal onion design (DOOD). It is tested on two molecular data sets with varying size and compared with SF designs and ordinary DO designs. The designs have been evaluated with parameters, such as condition number, determinant, Tanimoto coefficients and Euclidean distances, as well as external evaluation of the resulting projection to latent structures (PLS) model.
Place, publisher, year, edition, pages
2004. Vol. 73, no 1, 37-46 p.
Statistical molecular design, D-optimal design, Space-filling design
IdentifiersURN: urn:nbn:se:umu:diva-14076DOI: 10.1016/j.chemolab.2004.04.001OAI: oai:DiVA.org:umu-14076DiVA: diva2:153747