Change search
ReferencesLink to record
Permanent link

Direct link
Prediction of protein stability changes due to single amino acid mutations
Umeå University, Faculty of Science and Technology, Department of Chemistry.
Umeå University, Faculty of Science and Technology, Department of Chemistry.
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Accurate prediction of the change in protein stability due to single amino acid mutations is important for guiding site-directed mutagenesis and other protein-engineering techniques. Recently, different two state predictors became available aimed at predicting if point mutations stabilize or destabilize a protein. Considering the experimental errors and tolerances of protein with respect to mutations, we realized that the neutral mutations, which only slightly affect the protein’s stability, must be considered as well. Here, we present a new classification scheme for a three-state predictor (destabilizing, neutral and stabilizing mutations) based on multi-class support vector machines (SVM). We have created a refined training dataset of single amino acid mutations and evaluate the predictive ability of models trained on homology clustered and non-clustered training data using two different cross validation procedures. The experimental results reveal the significant difference of prediction accuracy according to different evaluation procedures. Furthermore we demonstrate that, for non-clustered model, the prediction accuracy based on the protein sequence information alone is comparable to the prediction accuracy based on protein structure information. On the other hand, for clustered model, the prediction ability is significantly improved when protein tertiary structure information is included. The comparison of prediction accuracy for the two models reveals that the prediction accuracy of mutation stability on clustered proteins is still a challenging task. Moreover, benchmarking by using previously published datasets, demonstrate that our method has an improved prediction performance over many established methods.

National Category
Bioinformatics and Systems Biology
URN: urn:nbn:se:umu:diva-33769OAI: diva2:317972
Available from: 2010-05-06 Created: 2010-05-05 Last updated: 2010-05-10Bibliographically approved
In thesis
1. From protein sequence to structural instability and disease
Open this publication in new window or tab >>From protein sequence to structural instability and disease
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A great challenge in bioinformatics is to accurately predict protein structure and function from its amino acid sequence, including annotation of protein domains, identification of protein disordered regions and detecting protein stability changes resulting from amino acid mutations. The combination of bioinformatics, genomics and proteomics becomes essential for the investigation of biological, cellular and molecular aspects of disease, and therefore can greatly contribute to the understanding of protein structures and facilitating drug discovery.

In this thesis, a PREDICTOR, which consists of three machine learning methods applied to three different but related structure bioinformatics tasks, is presented: using profile Hidden Markov Models (HMMs) to identify remote sequence homologues, on the basis of protein domains; predicting order and disorder in proteins using Conditional Random Fields (CRFs); applying Support Vector Machines (SVMs) to detect protein stability changes due to single mutation.

To facilitate structural instability and disease studies, these methods are implemented in three web servers: FISH, OnD-CRF and ProSMS, respectively.

For FISH, most of the work presented in the thesis focuses on the design and construction of the web-server. The server is based on a collection of structure-anchored hidden Markov models (saHMM), which are used to identify structural similarity on the protein domain level.

For the order and disorder prediction server, OnD-CRF, I implemented two schemes to alleviate the imbalance problem between ordered and disordered amino acids in the training dataset. One uses pruning of the protein sequence in order to obtain a balanced training dataset. The other tries to find the optimal p-value cut-off for discriminating between ordered and disordered amino acids.  Both these schemes enhance the sensitivity of detecting disordered amino acids in proteins. In addition, the output from the OnD-CRF web server can also be used to identify flexible regions, as well as predicting the effect of mutations on protein stability.

For ProSMS, we propose, after careful evaluation with different methods, a clustered by homology and a non-clustered model for a three-state classification of protein stability changes due to single amino acid mutations. Results for the non-clustered model reveal that the sequence-only based prediction accuracy is comparable to the accuracy based on protein 3D structure information. In the case of the clustered model, however, the prediction accuracy is significantly improved when protein tertiary structure information, in form of local environmental conditions, is included. Comparing the prediction accuracies for the two models indicates that the prediction of mutation stability of proteins that are not homologous is still a challenging task.

Benchmarking results show that, as stand-alone programs, these predictors can be comparable or superior to previously established predictors. Combined into a program package, these mutually complementary predictors will facilitate the understanding of structural instability and disease from protein sequence.

Place, publisher, year, edition, pages
Umeå: Kemiska institutionen, 2010. 67 p.
protein domain, remote homologue, intrinsically disorder/unstructured proteins, protein function, point mutation, protein family protein stability, HMMs, CRFs, SVMs
urn:nbn:se:umu:diva-33845 (URN)978-91-7459-016-6 (ISBN)
Public defence
2010-05-31, KB3B1, KBC-huset, Umeå Univerisity, 10:00 (English)
Available from: 2010-05-10 Created: 2010-05-07 Last updated: 2010-05-18Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Wang, LixiaoSauer, Uwe
By organisation
Department of Chemistry
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 102 hits
ReferencesLink to record
Permanent link

Direct link