Bioinformatics – Protein structure prediction approaches

Bioinformatics – Protein structure prediction approaches


In the past 50 years, there has been tremendous
progress in experimental determination of protein three-dimensional structures, but
this has not kept pace with the explosive growth of sequence information that results
from massively parallel sequencing technology. We therefore know many more protein sequences
than protein three-dimensional structures, and the gap is widening rather than diminishing. Yet, many proteins contain enough information
in their amino acid sequence to determine their three-dimensional structure, thus opening
the possibility of predicting three-dimensional structure from sequence. Computational prediction of protein structures,
which has been a long-standing challenge in molecular biology for more than 40 years,
may be able to fill this gap. Many useful and accurate three-dimensional
models have been computed from amino acid sequences by using the similarity of the protein
sequence of interest to another protein whose three-dimensional structure is known, often
called template or homology model building. However, correct de novo predictions from
sequence, when not a single structure in a protein family is known, have been hard to
achieve. How can the native state of a protein be predicted? There are three major approaches to this problem:
‘comparative modelling’, ‘threading’, and ‘ab initio prediction’. In comparative modelling, also called homology
modelling, exploits the fact that evolutionarily related proteins with similar sequences, as
measured by the percentage of identical residues at each position based on an optimal structural
superposition, often have similar structures. In this approach, protein structure is constructed
by matching the sequence of the protein of target to an evolutionarily related protein
with a known structure or template in the PDB. Thus, a prerequisite for comparative modelling
technique is the presence of a homologous protein in the PDB library. For the protein targets where templates with
a sequence identity of more than 50% similarity, the homologous templates can be easily identified
with the sequence-template alignments precisely conducted. The backbone models generated using comparative
modelling techniques can have a high modelling accuracy. However, when the target-template sequence
identity drops below 30%, modelling accuracy by comparative modelling sharply decreases
because of substantial alignment errors and the lack of significant template hits. Because comparative modelling builds models
by copying the aligned structures of the templates, an essential limit of the approach is that
the comparative modelling models usually have a strong bias and are closer to the template
structure rather than to the native structure of the target protein. Accordingly, one of the important challenges
to comparative modelling is how to refine the models closer to the native structure
than the initial templates. Threading or fold recognition refers to a
bioinformatics procedure that identifies protein templates in the PDB library, which have a
similar fold or similar structural motif to the target protein. It is similar to comparative modelling in
the sense that both approaches try to build a structural model by using the experimentally
solved structures as template. However, since many proteins with low sequence
identity can have similar folds, threading aims to detect the target template alignments
regardless of the evolutionary relationship. Finally, ab initio or de novo modelling refers
to the methods that are based on the first principle laws of physics and chemistry. The guiding principle is that the native state
of the protein lies at the global free-energy minimum. Therefore, ab initio methods try to fold a
given protein from the query sequence using various force fields and extensive conformational
search algorithms.