Change search
ReferencesLink to record
Permanent link

Direct link
Hierarchical PLS modeling for predicting the binding of a comprehensive set of structurally diverse protein-ligand complexes.
Umeå University, Faculty of Science and Technology, Chemistry. (Kemometri)
Umeå University, Faculty of Science and Technology, Chemistry.
Umeå University, Faculty of Science and Technology, Chemistry. (Kemometri)
Show others and affiliations
2006 (English)In: Journal of Chem Inf Model, ISSN 1549-9596, Vol. 46, no 3, 1154-1167 p.Article in journal (Refereed) Published
Abstract [en]

A new approach is presented for predicting ligand binding to proteins using hierarchical partial-least-squares regression to latent structures (Hi-PLS). Models were based on information from the 2002 release of the PDBbind database containing (after in-house refinement) high-resolution X-ray crystallography and binding affinity (Kd or Ki) data for 612 protein-ligand complexes. The complexes were characterized by four different descriptor blocks: three-dimensional (3D) structural descriptors of the proteins, protein-ligand interactions according to the Validate scoring function, binding site surface areas, and ligand 2D and 3D descriptors. These descriptor blocks were used in Hi-PLS models, generated using both linear and nonlinear terms, to relate the characterizations to pKd/i. The results show that each of the four descriptor blocks contributed to the model, and the predictions of pKd/i of the internal test set gave a root-mean-square error of prediction (RMSEP) of 1.65. The data were further divided according to the structural classification of the proteins, and Hi-PLS models were constructed for the resulting subclasses. The models for the four subclasses differed considerably in terms of both their ability to predict pKd/i (with RMSEPs ranging from 0.8 to 1.56) and the descriptor block that had the strongest influence. The models were validated with an external test set of 174 complexes from the 2003 release of the PDBbind database. The overall results show that the presented Hi-PLS methodology could facilitate the difficult task of predicting binding affinity.

Place, publisher, year, edition, pages
2006. Vol. 46, no 3, 1154-1167 p.
Keyword [en]
Crystallography, X-Ray, Ligands, Models, Molecular, Multivariate Analysis, Protein Binding, Proteins/*metabolism
URN: urn:nbn:se:umu:diva-11779DOI: doi:10.1021/ci050323kPubMedID: 16711735OAI: diva2:151450
Available from: 2007-05-25 Created: 2007-05-25Bibliographically approved
In thesis
1. A multivariate approach to characterization of drug-like molecules, proteins and the interactions between them
Open this publication in new window or tab >>A multivariate approach to characterization of drug-like molecules, proteins and the interactions between them
2008 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [sv]

En sjukdom kan många gånger härledas till en kaskadereaktion mellan proteiner, co-faktorer och substrat. Denna kaskadreaktion blir många gånger målet för att behandla sjukdomen med läkemedel. För att designa nya läkemedelsmoleyler används vanligen datorbaserade verktyg. Denna design av läkemedelsmolekyler drar stor nytta av att målproteinet är känt och då framförallt dess tredimensionella (3D) struktur. Är 3D-strukturen känd kan man utföra så kallad struktur- och datorbaserad molekyldesign, 3D-geometrin (f.f.a. för inbindningsplatsen) blir en vägledning för designen av en ny molekyl. Många faktorer avgör interaktionen mellan en molekyl och bindningsplatsen, till exempel fysikalisk-kemiska egenskaper hos molekylen och bindningsplatsen, flexibiliteten i molekylen och målproteinet, och det omgivande lösningsmedlet.

För att strukturbaserad molekyldesign ska fungera väl måste två viktiga steg utföras: i) 3D anpassning av molekyler till bindningsplatsen i ett målprotein (s.k. dockning) och ii) prediktion av molekylers affinitet för bindningsplatsen.

Huvudsyftena med arbetet i denna avhandling var som följer: i) skapa modeler för att prediktera affiniteten mellan en molekyl och bindningsplatsen i ett målprotein; ii) förfina molekyl-protein-geometrin som skapas vid 3D-anpassning mellan en molekyl och bindningsplatsen i ett målprotein (s.k. dockning); iii) karaktärisera proteiner och framför allt deras sekundärstruktur; iv) bedöma effekten av olika matematiska beskrivningar av lösningsmedlet för förfining av 3D molekyl-protein-geometrin skapad vid dockning och prediktion av molekylers affinitet för proteiners bindningsfickor. Ett övergripande syfte var att använda kemometriska metoder för modellering och dataanalys på de ovan nämnda punkterna. För att sammanfatta så presenterar denna avhandling metoder och resultat som är användbara för strukturbaserad molekyldesign.

De rapporterade resultaten visar att det är möjligt att skapa kemometriska modeler för prediktion av molekylers affinitet för bindningsplatsen i ett protein och att dessa presterade lika bra som andra vanliga metoder. Dessutom kunde kemometriska modeller skapas för att beskriva effekten av hur inställningarna för olika parametrar i dockningsprogram påverkade den 3D molekyl-protein-geometrin som dockingsprogram skapade. Vidare kunde kemometriska modeller andvändas för att öka förståelsen för deskriptorer som beskrev sekundärstrukturen i proteiner.

Förfining av molekyl-protein-geometrin skapad genom dockning gav liknande och ickesignifikanta resultat oberoende av vilken matematisk modell för lösningsmedlet som användes, förutom för ett fåtal (sex av 30) fall. Däremot visade det sig att användandet av en förfinad geometri var värdefullt för prediktion av molekylers affinitet för bindningsplatsen i ett protein. Förbättringen av prediktion av affintitet var markant då en Poisson-Boltzmann beskrivning av lösningsmedlet användes; jämfört med prediktionerna gjorda med ett dockningsprogram förbättrades korrelationen mellan beräknad affintiet och uppmätt affinitet med 0,7 (R2).

Abstract [en]

A disease is often associated with a cascade reaction pathway involving proteins, co-factors and substrates. Hence to treat the disease, elements of this pathway are often targeted using a therapeutic agent, a drug. Designing new drug molecules for use as therapeutic agents involves the application of methods collectively known as computer-aided molecular design, CAMD. When the three dimensional (3D) geometry of a macromolecular target (usually a protein) is known, structure-based CAMD is undertaken and structural information of the target guides the design of new molecules and their interactions with the binding sites in targeted proteins. Many factors influence the interactions between the designed molecules and the binding sites of the target proteins, such as the physico-chemical properties of the molecule and the binding site, the flexibility of the protein and the ligand, and the surrounding solvent.

In order for structure-based CAMD to be successful, two important aspects must be considered that take the abovementioned factors into account. These are; i) 3D fitting of molecules to the binding site of the target protein (like fitting pieces of a jigsaw puzzle), and ii) predicting the affinity of molecules to the protein binding site.

The main objectives of the work underlying this thesis were: to create models for predicting the affinity between a molecule and a protein binding site; to refine the geometry of the molecule-protein complex derived by or in 3D fitting (also known as docking); to characterize the proteins and their secondary structure; and to evaluate the effects of different generalized-Born (GB) and Poisson-Boltzmann (PB) implicit solvent models on the refinement of the molecule-protein complex geometry created in the docking and the prediction of the molecule-to-protein binding site affinity. A further objective was to apply chemometric methodologies for modeling and data analysis to all of the above. To summarize, this thesis presents methodologies and results applicable to structure-based CAMD.

Results show that predictive chemometric models for molecule-to-protein binding site affinity could be created that yield comparable results to similar, commonly used methods. In addition, chemometric models could be created to model the effects of software settings on the molecule-protein complex geometry using software for molecule-to-binding site docking. Furthermore, the use of chemometric models provided a more profound understanding of protein secondary structure descriptors.

Refining the geometry of molecule-protein complexes created through molecule-to-binding site docking gave similar results for all investigated implicit solvent models, but the geometry was significantly improved in only a few examined cases (six of 30). However, using the geometry-refined molecule-protein complexes was highly valuable for the prediction of molecule-to-binding site affinity. Indeed, using the PB solvent model it yielded improvements of 0.7 in correlation coefficients (R2) for binding affinity parameters of a set of Factor Xa protein drug molecules, relative to those obtained using the fitting software.

Place, publisher, year, edition, pages
Umeå: Kemi, 2008. 85 p.
binding affinity, prediction, CAMD, principal component analysis (PCA), partial least squares projections to latent structures (PLS), MM-GB-SA, MM-PB-SA, docking, geometry optimization, protein secondary structure characterization, implicit solvent, generalized-Born, Poisson-Boltzmann, molecular mechanics (MM), drug discovery, bindningsaffinitet, prediktion, dockning, geometrioptimering, sekundärstruktur, matematisk vattenmodel, generalized-Born, Poisson-Boltzmann, molekylmekanik (MM), läkemedelsdesign, principal komponent analys (PCA), partial least squares projections to latent structures (PLS), MM-GB-SA, MM-PB-SA
National Category
Other Chemistry Topics
urn:nbn:se:umu:diva-1924 (URN)978-91-7264-690-2 (ISBN)
Public defence
2008-12-12, KB3B1, KBC, Umeå Universitet, Umeå, 13:00 (English)
Available from: 2008-11-19 Created: 2008-11-19 Last updated: 2009-06-25Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Lindström, AntonAlmqvist, FredrikKihlberg, JanLinusson, Anna
By organisation

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 89 hits
ReferencesLink to record
Permanent link

Direct link