umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A multivariate approach to characterization of drug-like molecules, proteins and the interactions between them
Umeå University, Faculty of Science and Technology, Chemistry.
2008 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [sv]

En sjukdom kan många gånger härledas till en kaskadereaktion mellan proteiner, co-faktorer och substrat. Denna kaskadreaktion blir många gånger målet för att behandla sjukdomen med läkemedel. För att designa nya läkemedelsmoleyler används vanligen datorbaserade verktyg. Denna design av läkemedelsmolekyler drar stor nytta av att målproteinet är känt och då framförallt dess tredimensionella (3D) struktur. Är 3D-strukturen känd kan man utföra så kallad struktur- och datorbaserad molekyldesign, 3D-geometrin (f.f.a. för inbindningsplatsen) blir en vägledning för designen av en ny molekyl. Många faktorer avgör interaktionen mellan en molekyl och bindningsplatsen, till exempel fysikalisk-kemiska egenskaper hos molekylen och bindningsplatsen, flexibiliteten i molekylen och målproteinet, och det omgivande lösningsmedlet.

För att strukturbaserad molekyldesign ska fungera väl måste två viktiga steg utföras: i) 3D anpassning av molekyler till bindningsplatsen i ett målprotein (s.k. dockning) och ii) prediktion av molekylers affinitet för bindningsplatsen.

Huvudsyftena med arbetet i denna avhandling var som följer: i) skapa modeler för att prediktera affiniteten mellan en molekyl och bindningsplatsen i ett målprotein; ii) förfina molekyl-protein-geometrin som skapas vid 3D-anpassning mellan en molekyl och bindningsplatsen i ett målprotein (s.k. dockning); iii) karaktärisera proteiner och framför allt deras sekundärstruktur; iv) bedöma effekten av olika matematiska beskrivningar av lösningsmedlet för förfining av 3D molekyl-protein-geometrin skapad vid dockning och prediktion av molekylers affinitet för proteiners bindningsfickor. Ett övergripande syfte var att använda kemometriska metoder för modellering och dataanalys på de ovan nämnda punkterna. För att sammanfatta så presenterar denna avhandling metoder och resultat som är användbara för strukturbaserad molekyldesign.

De rapporterade resultaten visar att det är möjligt att skapa kemometriska modeler för prediktion av molekylers affinitet för bindningsplatsen i ett protein och att dessa presterade lika bra som andra vanliga metoder. Dessutom kunde kemometriska modeller skapas för att beskriva effekten av hur inställningarna för olika parametrar i dockningsprogram påverkade den 3D molekyl-protein-geometrin som dockingsprogram skapade. Vidare kunde kemometriska modeller andvändas för att öka förståelsen för deskriptorer som beskrev sekundärstrukturen i proteiner.

Förfining av molekyl-protein-geometrin skapad genom dockning gav liknande och ickesignifikanta resultat oberoende av vilken matematisk modell för lösningsmedlet som användes, förutom för ett fåtal (sex av 30) fall. Däremot visade det sig att användandet av en förfinad geometri var värdefullt för prediktion av molekylers affinitet för bindningsplatsen i ett protein. Förbättringen av prediktion av affintitet var markant då en Poisson-Boltzmann beskrivning av lösningsmedlet användes; jämfört med prediktionerna gjorda med ett dockningsprogram förbättrades korrelationen mellan beräknad affintiet och uppmätt affinitet med 0,7 (R2).

Abstract [en]

A disease is often associated with a cascade reaction pathway involving proteins, co-factors and substrates. Hence to treat the disease, elements of this pathway are often targeted using a therapeutic agent, a drug. Designing new drug molecules for use as therapeutic agents involves the application of methods collectively known as computer-aided molecular design, CAMD. When the three dimensional (3D) geometry of a macromolecular target (usually a protein) is known, structure-based CAMD is undertaken and structural information of the target guides the design of new molecules and their interactions with the binding sites in targeted proteins. Many factors influence the interactions between the designed molecules and the binding sites of the target proteins, such as the physico-chemical properties of the molecule and the binding site, the flexibility of the protein and the ligand, and the surrounding solvent.

In order for structure-based CAMD to be successful, two important aspects must be considered that take the abovementioned factors into account. These are; i) 3D fitting of molecules to the binding site of the target protein (like fitting pieces of a jigsaw puzzle), and ii) predicting the affinity of molecules to the protein binding site.

The main objectives of the work underlying this thesis were: to create models for predicting the affinity between a molecule and a protein binding site; to refine the geometry of the molecule-protein complex derived by or in 3D fitting (also known as docking); to characterize the proteins and their secondary structure; and to evaluate the effects of different generalized-Born (GB) and Poisson-Boltzmann (PB) implicit solvent models on the refinement of the molecule-protein complex geometry created in the docking and the prediction of the molecule-to-protein binding site affinity. A further objective was to apply chemometric methodologies for modeling and data analysis to all of the above. To summarize, this thesis presents methodologies and results applicable to structure-based CAMD.

Results show that predictive chemometric models for molecule-to-protein binding site affinity could be created that yield comparable results to similar, commonly used methods. In addition, chemometric models could be created to model the effects of software settings on the molecule-protein complex geometry using software for molecule-to-binding site docking. Furthermore, the use of chemometric models provided a more profound understanding of protein secondary structure descriptors.

Refining the geometry of molecule-protein complexes created through molecule-to-binding site docking gave similar results for all investigated implicit solvent models, but the geometry was significantly improved in only a few examined cases (six of 30). However, using the geometry-refined molecule-protein complexes was highly valuable for the prediction of molecule-to-binding site affinity. Indeed, using the PB solvent model it yielded improvements of 0.7 in correlation coefficients (R2) for binding affinity parameters of a set of Factor Xa protein drug molecules, relative to those obtained using the fitting software.

Place, publisher, year, edition, pages
Umeå: Kemi , 2008. , 85 p.
Keyword [en]
binding affinity, prediction, CAMD, principal component analysis (PCA), partial least squares projections to latent structures (PLS), MM-GB-SA, MM-PB-SA, docking, geometry optimization, protein secondary structure characterization, implicit solvent, generalized-Born, Poisson-Boltzmann, molecular mechanics (MM), drug discovery
Keyword [sv]
bindningsaffinitet, prediktion, dockning, geometrioptimering, sekundärstruktur, matematisk vattenmodel, generalized-Born, Poisson-Boltzmann, molekylmekanik (MM), läkemedelsdesign, principal komponent analys (PCA), partial least squares projections to latent structures (PLS), MM-GB-SA, MM-PB-SA
National Category
Other Chemistry Topics
Identifiers
URN: urn:nbn:se:umu:diva-1924ISBN: 978-91-7264-690-2 (print)OAI: oai:DiVA.org:umu-1924DiVA: diva2:142451
Public defence
2008-12-12, KB3B1, KBC, Umeå Universitet, Umeå, 13:00 (English)
Opponent
Supervisors
Available from: 2008-11-19 Created: 2008-11-19 Last updated: 2009-06-25Bibliographically approved
List of papers
1. Hierarchical PLS modeling for predicting the binding of a comprehensive set of structurally diverse protein-ligand complexes.
Open this publication in new window or tab >>Hierarchical PLS modeling for predicting the binding of a comprehensive set of structurally diverse protein-ligand complexes.
Show others...
2006 (English)In: Journal of Chem Inf Model, ISSN 1549-9596, Vol. 46, no 3, 1154-1167 p.Article in journal (Refereed) Published
Abstract [en]

A new approach is presented for predicting ligand binding to proteins using hierarchical partial-least-squares regression to latent structures (Hi-PLS). Models were based on information from the 2002 release of the PDBbind database containing (after in-house refinement) high-resolution X-ray crystallography and binding affinity (Kd or Ki) data for 612 protein-ligand complexes. The complexes were characterized by four different descriptor blocks: three-dimensional (3D) structural descriptors of the proteins, protein-ligand interactions according to the Validate scoring function, binding site surface areas, and ligand 2D and 3D descriptors. These descriptor blocks were used in Hi-PLS models, generated using both linear and nonlinear terms, to relate the characterizations to pKd/i. The results show that each of the four descriptor blocks contributed to the model, and the predictions of pKd/i of the internal test set gave a root-mean-square error of prediction (RMSEP) of 1.65. The data were further divided according to the structural classification of the proteins, and Hi-PLS models were constructed for the resulting subclasses. The models for the four subclasses differed considerably in terms of both their ability to predict pKd/i (with RMSEPs ranging from 0.8 to 1.56) and the descriptor block that had the strongest influence. The models were validated with an external test set of 174 complexes from the 2003 release of the PDBbind database. The overall results show that the presented Hi-PLS methodology could facilitate the difficult task of predicting binding affinity.

Keyword
Crystallography, X-Ray, Ligands, Models, Molecular, Multivariate Analysis, Protein Binding, Proteins/*metabolism
Identifiers
urn:nbn:se:umu:diva-11779 (URN)doi:10.1021/ci050323k (DOI)16711735 (PubMedID)
Available from: 2007-05-25 Created: 2007-05-25Bibliographically approved
2. A multivariate approach to investigate docking parameters' effects on docking performance
Open this publication in new window or tab >>A multivariate approach to investigate docking parameters' effects on docking performance
Show others...
2007 (English)In: Journal of chemical information and modeling, ISSN 1549-9596, Vol. 47, no 4, 1673-1687 p.Article in journal (Refereed) Published
Abstract [en]

Increasingly powerful docking programs for analyzing and estimating the strength of protein-ligand interactions have been developed in recent decades, and they are now valuable tools in drug discovery. Software used to perform dockings relies on a number of parameters that affect various steps in the docking procedure. However, identifying the best choices of the settings for these parameters is often challenging. Therefore, the settings of the parameters are quite often left at their default values, even though scientists with long experience with a specific docking tool know that modifying certain parameters can improve the results. In the study presented here, we have used statistical experimental design and subsequent regression based on root-mean-square deviation values using partial least-square projections to latent structures (PLS) to scrutinize the effects of different parameters on the docking performance of two software packages: FRED and GOLD. Protein-ligand complexes with a high level of ligand diversity were selected from the PDBbind database for the study, using principal component analysis based on 1D and 2D descriptors, and space-filling design. The PLS models showed quantitative relationships between the docking parameters and the ability of the programs to reproduce the ligand crystallographic conformation. The PLS models also revealed which of the parameters and what parameter settings were important for the docking performance of the two programs. Furthermore, the variation in docking results obtained with specific parameter settings for different protein-ligand complexes in the diverse set examined indicates that there is great potential for optimizing the parameter settings for selected sets of proteins.

Place, publisher, year, edition, pages
American Chemical Society Publications, 2007
Identifiers
urn:nbn:se:umu:diva-16146 (URN)10.1021/ci6005596 (DOI)
Available from: 2007-08-20 Created: 2007-08-20 Last updated: 2010-09-09Bibliographically approved
3. Quantitative protein descriptors for secondary structure characterization and protein classification
Open this publication in new window or tab >>Quantitative protein descriptors for secondary structure characterization and protein classification
2009 (English)In: Chemometrics and Intelligent Laboratory Systems, ISSN 0169-7439, E-ISSN 1873-3239, Vol. 95, no 1, 74-85 p.Article in journal (Refereed) Published
Abstract [en]

In this study protein chains were characterized based on alignment-independent protein descriptors using three types of structural and sequence data; (i) C-α atom Euclidean distances, (ii) protein backbone ψ and φ angles and (iii) amino acid physicochemical properties (zz-scales). The descriptors were analyzed using principal component analysis (PCA) and further elucidated using the multivariate methods partial least-squares projections to latent structures discriminant-analysis (PLS-DA) and hierarchical-PLS-DA. The descriptors were applied to three protein chain datasets: (i) 82 chains classified, according to the structural classification of proteins (SCOP) scheme, as either all-α or all-β; (ii) 96 chains classified as either α + β or α/β and (iii) 6590 chains of all aforementioned classes selected from the PDB-select database. Results showed that the descriptors related to the secondary structure of the chains. The C-α Euclidean distances, and as expected, the protein backbone angles were found to be most important for the characterization and classification of chains. Assignment of SCOP classes using PLS-DA based on all descriptor types was satisfactory for all-α and all-β chains with more than 93% correct classifications of a large external test set, while the protein chains of types α/β and α + β was harder to discriminate between, resulting in 74% and 54% correct classifications, respectively.

Keyword
Multivariate analysis, Protein descriptor, SCOP, Auto covariance, Auto cross-covariance
National Category
Chemical Sciences
Identifiers
urn:nbn:se:umu:diva-3651 (URN)10.1016/j.chemolab.2008.08.006 (DOI)
Available from: 2008-11-19 Created: 2008-11-19 Last updated: 2017-12-14Bibliographically approved
4. Geometry Optimization of Docking Poses Using Implicit Solvation Models
Open this publication in new window or tab >>Geometry Optimization of Docking Poses Using Implicit Solvation Models
(English)Manuscript (preprint) (Other (popular science, discussion, etc.))
Identifiers
urn:nbn:se:umu:diva-3652 (URN)
Available from: 2008-11-19 Created: 2008-11-19 Last updated: 2010-01-14Bibliographically approved
5. Investigation of MM-GB/PB-SA for Rescoring of Docking Poses and Accurate Prediction of Relative Potencies of Binding Affinity
Open this publication in new window or tab >>Investigation of MM-GB/PB-SA for Rescoring of Docking Poses and Accurate Prediction of Relative Potencies of Binding Affinity
(English)Manuscript (preprint) (Other (popular science, discussion, etc.))
Identifiers
urn:nbn:se:umu:diva-3653 (URN)
Available from: 2008-11-19 Created: 2008-11-19 Last updated: 2010-07-09Bibliographically approved

Open Access in DiVA

fulltext(5703 kB)984 downloads
File information
File name FULLTEXT01.pdfFile size 5703 kBChecksum SHA-1
67179fc83cf20ee4c8ddb07c25fa4a7fca32c7837a580808355118ad8f478ae926d8d4bb
Type fulltextMimetype application/pdf

By organisation
Chemistry
Other Chemistry Topics

Search outside of DiVA

GoogleGoogle Scholar
Total: 984 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 527 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf