The focus of these experiments in on homology or similarity modeling. The utility of homology modeling and the general approach to it are well established. (although there is considerable controversy over the details). Other aspects of modeling such as threading or ab initio folding are still experimental and so will not be treated in these lectures.
Evaluating the difference between two structures or the accuracy of a model is not trivial. For example, at the Comparative Assessment of Structure Prediction meetings (CASP I,II,III) most of the discussion centered not on the differences between modeling methods but how they were evaluated. This is still not settled. For the purpose of these experiments, a simple RMS deviation and graphical examination of the differences will suffice.
Two factors enter into accuracy. First is the obvious question of how close is our result to the experimental structure. However, it is equally important to understand the accuracy of the experiment. If the experiment has an expected error of 0.5 Å and our deviation from the experiment is 0.5 Å, then we are essentially correct. On the other hand if the experiment has an expected error of 0.1 Å and our error is 0.5 Å then we have some real errors.
Real errors in protein structure depend on several factors. First of all is the method of structure determination. NMR structures tend to be less accurate than crystal structures simply because the NMR data are sparser than Xray data (NMR data only relate distances between hydrogen atoms, rather than all atoms). In crystal structures, high resolution structures have less inherent error than low resolution structures because the atomic positions are better defined. However, except at the very highest resolutions (about 1Å) the atomic positions are not determined solely by the diffraction data. The crystal lattice can also affect the accuracy of the structure. The same protein crystallized in different crystal forms will have small, but real differences in structure (Note: the overall structure of the protein will remain the same). Examination of differences between high resolution crystal structures of proteins have shown expected errors of 0.4-0.5Å. Differences between NMR structures and crystal structures have shown expected errors of 0.75-1.5Å.
Papers about structural accuracy
![]() | BPTI |
![]() | Dendrotoxin |
The sequence alignment between the two proteins, shown below, shows moderate
homology. However, notice that all of the Cysteine residues
are conserved, which suggests (correctly) that the disulfide bonds are conserved
between the two proteins. Conservation of sequence patterns, like this one,
can suggest homology even where it is otherwise weak.
The conserved patterns are not always cysteines, but can be patterns
of hydrophobic residues or active site residues.
Dendrotoxin pRrklCilhrnpGrCydkIpafyYNqKkkqCerFdwsGCggnsNrFKtiEeCrRyCig* BPTI *RpdfCleppytGpCkarIiryfYNaKaglCqtFvygGCrakrNnFKsaEdCmRtCgga
Two approaches to building these new atoms should be used for this experiment, and you should monitor the differences in the solutions. Torsion search algorithms are quite popular and are implemented in the script tsearch.ammp. An attractive alternative is to use distance geometry which is implemented in dgeom.ammp. Since it is important to understand the limits of modeling, as opposed to the limits of a particular algorithm, it is important to run both of these procedures and compare the results.
Using tsearch.ammp tsearch.ammp builds a structure
with highly ideal bond-angle covalent geometry and then searches the
side chain torsion angles to find the energy minima.
Read in the file tsearch.ammp and watch the GRAMMP window for the progress
of the torsion search . The molecule can be moved about in the draw window
with either the mouse or the scrollbars. The size of the molecule will appear to hop about
because of the "autoscaling" feature. You probably want to turn this
off using the clicking the Autoscale checkbox on the
Controls menu. While you're at it
you should play with the other controls to see what they do. You can
select an atom by clicking on it. The atom name and
serial number are displayed. You can calculate geometric properties using the
Geometric menu. Select the Distance option
and then click on two atoms. The distance (in Angstroms) will appear. Watch the distance
change as the molecule is minimized. You can also select Angles (then select three atoms) or
dihedral (then select four atoms). The Controls menu contains
options for rotating, translating, and scaling. Note:
The View menu contains options for changing the molecular representation.
You can view down each axis or change the color of the molecule to Monochrome (white) or
CPK (color based on atom type) or Force (color based on the force on each atom, white-small
force, red-large force).
Using dgeom.ammp tgeom.ammp performs a distance geometry calculation to build the missing atoms. This is the complement or dual of the torsion search. Instead of building a highly ideal structure and then finding the best packing, dgeom.ammp builds a well-packed structure and tries to find the closest ideal structure. Exit GRAMMP (using the Exit option on the File menu. Restart GRAMMP and read the bptidtx.ammp file again. To begin the minimization, read the dgeom.ammp file.
Cleaning up the structure The script polish.ammp performs a short run of energy minimization with the whole potential. The use of energy minimization on a homology model is somewhat controversial, as one school of thought believes that the best model is simply to replace the side chains on residues in the conserved regions. Homology modeling provides a valuable test case for the development of potential energy force fields and therefor not performing energy minimization because some molecular mechanics force fields are poor is an unscientific choice.
Saving the results You may want to save the results for later study. Use the Output AMMP file option on the File menu to output a new ammp file. It is a good idea to chose a new unique name for the output.
Analyzing the results The script rms.ammp superimposes the structure on the experimental coordinates of the target. The coordinates are stored in the tether data structure of AMMP. This script uses a genetic algorithm to superimpose the structures. For highly similar structures, the genetic algorithm is somewhat wasteful, but for dissimilar structures it is an appropriate algorithm. After the superposition, you should examine the detailed differences in structure. Click on the Show Tethers checkbox on the Geometry menu. A purple dashed line is drawn between each atom and the position of the atom in the dendrotoxin target.
Note where the largest errors are. (hint - look at the side chains on the surface and the N and C-terminals). Are they significant? Why or why not? Now color the atoms by force (View|Colors menu). Do the errors correlate with the force (which is proportionate to the residual error in the structure)? Do you see the same errors in both the distance geometry and torsion search models?
Just like in the modeling building of Dendrotoxin, two approaches will be employed to build the model. The script tsearch.ammp uses a torsion search algorithm for building new atoms. While the file dgeom.ammp uses a distance geometry algorithm.
Cleaning up the structure The script polish.ammp performs a short run of energy minimization with the whole potential.
Saving the results You may want to save the results for later study. Use the Output AMMP file option on the File menu to output a new ammp file. It is a good idea to chose a new unique name for the output.
Analyzing the results The script rms.ammp superimposes the structure on the experimental coordinates of the target. The coordinates are stored in the tether data structure of AMMP. After the superposition, you should examine the detailed differences in structure. Click on the Show Tethers checkbox on the Geometry menu. A purple dashed line is drawn between each atom and the position of the atom in the dendrotoxin target. Note where the largest errors are.
Questions
![]() | BPTI |
![]() | Protein G |
Protein G ***mtyklilnGktlkgettteavdaAtaekvFkqyandngvdgewtydDatkTftvte BPTI rpdfcleppytGpckariiryfynakAglcqtFvyggcrakrnnfksaeDcmrTcgga*
Questions
Use your judgement on the results. Are the alignments like the BPTI-Dendrotoxin alignment or more like the BPTI-Protein G alignment?
If you can't find a match, or have no idea of an interesting protein run the
sequence search in reverse. For example, take the sequence of Hen egg white lysozyme
and find a sequence from another species. A few more exciting examples are Human
growth hormone (pdb1hgu.ent), Epidermal Growth factor (pdb1egf.ent), GCSF (pdb1bgc.ent),
Beta Nerve growth factor (pdb1bet.ent), CD2 (pdb1cdb.ent), CD4 (pdb3cd4.ent)
and Protein G (pdb1pga.ent).
The alignment file for BPTI-Dendrotoxin is:
pRrklCilhrnpGrCydkIpafyYNqKkkqCerFdwsGCggnsNrFKtiEeCrRyCig*
*RpdfCleppytGpCkarIiryfYNaKaglCqtFvygGCrakrNnFKsaEdCmRtCgga
Edit the pdb entry file to extract the particular chain in a multi-chain
entry, or the average coordinate set in an NMR structure entry.
While we usually include ligands and water molecules when generating homology models,
these should be removed in order to make the generation of the molecular geometry
file straightforward. While multiple chains can easily be handled
in AMMP, the writeup below described how to handle a single chain
Edit the output file and replace the CYS with CSS for any cysteine in a disulfide
bond. The preammp dictionary differentiates between CYS and CSS.
Note how many residues there are in the chain because you will need this later
Preammp first requests a parameter file. Select the file named "atoms.sp4". It then asks to find a template. The template defines the atoms and bonds in each type of amino acid. They should be in a directory "protein", but could be somewhere else depending on how they were unziped. There will be many files in this directory, but you should see files named like "ALA", "ASP", ..., and "TRP". Open any file, because all that is being extracted is the directory name. It then asks for a pdb file. This is the file you just generated with NEWHO. Open it. Finally it asks for an output file. Supply a new file name ending in ".ammp" or ".amp".
The program should run silently without warnings or error messages. You may receive a message about "something.OXT" not being in the geometry. This means the c-terminal residue has both atoms of the terminal acid. If that is the case you will have to add the geometry terms to link that atom into the structure (which is very easy to do).
Linking the chain. The individual amino acid residues are not joined at this stage. It is necessary to link them. The script linkme.ammp will link all the residues. However, you must set the variable iresm. If there are N amino acids then set the value to N+1 with the command "seti iresm value;" (e.g. seti iresm 101; for a 100 residue structure). Explicitly setting the variable ires to a value, and the variable jres to a value and reading the script "peplink.ammp" (in the protein directory with the templates) will link residues ires and jres. Linkme.ammp will also attempt to link an oxt group at the cterminal.
Disulfides must be specified by hand. Set the variable ires to one cysteine and the variable jres to the other and read "sslink.amp" (also in the protein directory). "seti ires 100; seti jres 110; read protein/sslink.amp;" will generate the disulfide between cysteines 100 and 110.
The terminal acid group can be linked, if necessary by using the script "oxtlink.amp". (again in the protein directory with the templates). Set the variable ires to the last residue and read oxtlink.amp. "seti ires 100; read protein/oxtlink.amp;" links the c-terminal residue 100. If you used the peplink.ammp script and the variable iresm was properly set this routine will not be needed.
All of the linking scripts check for valid atoms. You cannot create a disulfide bonds between two non-sulfur atoms, nor improperly link in a non-extant oxt atom.
It is a good idea to save the file at this point, because generating all these linkages is a bit of a pain. You don't really want to do it again. The main window "file|output AMMP file" menu is the easiest way to do this.