Change search
ReferencesLink to record
Permanent link

Direct link
Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions.
Umeå University, Faculty of Science and Technology, Department of Plant Physiology. Umeå University, Faculty of Science and Technology, Umeå Plant Science Centre (UPSC). (Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden)ORCID iD: 0000-0001-6097-2539
Genome Center, UC Davis, Davis, California.
Genome Center, UC Davis, Davis, California.
2009 (English)In: Proteins: Structure, Function, and Genetics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 75, no 4, 870-884 p.Article in journal (Refereed) Published
Abstract [en]

Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.

Place, publisher, year, edition, pages
2009. Vol. 75, no 4, 870-884 p.
National Category
Biological Sciences
URN: urn:nbn:se:umu:diva-23612DOI: 10.1002/prot.22296PubMedID: 19025980OAI: diva2:225485
Available from: 2009-06-29 Created: 2009-06-29 Last updated: 2015-04-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Hvidsten, Torgeir
By organisation
Department of Plant PhysiologyUmeå Plant Science Centre (UPSC)
In the same journal
Proteins: Structure, Function, and Genetics
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 34 hits
ReferencesLink to record
Permanent link

Direct link