visualCMAT output



Navigation:


NB: This page contains animated illustrations worth of 75MB. If you have a slow internet connection please allow some time for loading of all images.


[To Navigation]
Residue numbering in the visualCMAT output

Please note, that the residue numbering in the visualCMAT output may differ from the numbering in the PDB file that you have originally submitted. E.g., amino acid residues with identical IDs within the same chain will be identified and renumbered automatically. The complete list of changes to your query PDB will be printed to the log file. The most important changes will be colored in red. Press the "Log" button to view the on-line log page. E.g.:

Warning: Amino acid residues with identical residue IDs have been identifies and will be renumbered as described below
Warning: Residues A-TYR-169 and A-PRO-169 have the same residue ID
Warning: Offseting all residue IDs starting from 169 in chain A by 1
Warning: Residues A-HIS-413 and A-GLN-413 have the same residue ID
Warning: Offseting all residue IDs starting from 413 in chain A by 2
Warning: Residues A-PRO-414 and A-HIS-414 have the same residue ID
Warning: Offseting all residue IDs starting from 414 in chain A by 3

Info: Correcting the PDB file...

The corrected PDB file of your query can be downloaded at the "Results" page, see section "The Input data after preprocessing".

 

[To Navigation]
Description of the visualCMAT output The visualCMAT server will provide the following set of output files upon successful completion of the task processing:

  • The visualCMAT annotation file is a PyMol 'PSE' session file which contains the representative protein structure annotated according to the bioinformatic, statistical and structural analyzes of the predicted correlated mutations/co-evolving residues;
     
  • The list of correlated pairs and the amino acids co-occurrence statistics is a plain text file listing all predicted correlated positions that are numbered according to both the PDB and the amino acid sequence. You may want to see this file to learn the occurrence of amino acids at each pair of correlated/co-evolving positions;
     
  • The sum-of-Z-scores table is a plain text file which provides the sum-of-Z-score ranking for each position predicted as correlating with other positions in the alignment;
     
  • The visualCMAT PyMol script pack contains all information required to recompile the visualCMAT annotation 'PSE' file. The pack is provided in the unlikely event of incompatibility of the 'PSE' file compiled by the visualCMAT web-server with the PyMol version installed on the user's local computer. The files are packed in 'tar.gz' archive. To extract files from a 'tar.gz' archive use the command tar xzf visualcmat_TaskID.tar.gz in Linux and in Widows use a free 7-zip tool. To recompile the 'PSE' session run the command pymol -c visualcmat/visualcmat_TaskID_pymol.py in Linux or use "File" → "Run Script" interface menu in Windows. When running the script it is important to maintain the relative path (i.e., the representative protein structure file will be extracted to the current folder ./ and the other files to the visualcmat folder, and all these files must be accessible by these paths when launching the script).

 

[To Navigation]
On-line tools to operate the visualCMAT annotation

The visualCMAT annotation can be downloaded to a local computer as a content-rich all-in-one binary file in the PyMol PSE format or operated on-line using the built-in on-line analysis tools. Interactivity is implemented in HTML5 and therefore no plugins nor Java are required. See the screenshot of the on-line analysis page below. Although the off-line and on-line versions of the visualCMAT annotations are identical, the PyMol's toolbox and command line functionality to study the off-line content are more versatile than those of the JSMol program which is implemented at the on-line Analysis page.

 

[To Navigation]
The visualCMAT annotation layers

The primary visualCMAT output is the annotation of the representative protein structure according to the results of bioinformatic, statistical and structural analyzes of the predicted correlated mutations. The visualCMAT annotation can be downloaded to a local computer as a content-rich all-in-one binary file in the PyMol PSE format or operated on-line using the built-in on-line analysis tools. The visualCMAT annotation is presented in multiple layers. The user can benefit from the multi-layered annotation by selecting and combining different types of information for a particular purpose. Layers with different content can be turned on and off in the PyMol viewer and in the on-line analysis tool. This feature can be used to study the annotation layers independently, and also provides an opportunity to combine the selected layers to create new information content for expert analysis.

The PSE file with the visualCMAT annotation can be opened by the PyMol Molecular graphics engine. The key advantage of the PSE format is that complex structural annotation, which is created by the visualCMAT, can be easily saved, stored, edited, and transferred. You can add your own data or annotation to the file produced by the visualCMAT, then save it to the hard drive and restore later at any time. The PSE file can be sent to a colleague to share and discuss your results. The only significant disadvantage is that the PSE format has compatibility issues between different versions of PyMol, in particular, it may not be backwards compatible (i.e., the PSE file created by PyMol v.1.7 may not work with the PyMol v.1.6). If you have an old version of PyMol (i.e., below v. 1.7.3.0) you may need to compile the PSE file with the visualCMAT annotation by manually executing the PyMol script, which is provided at the visualCMAT results page as "The visualCMAT PyMol script pack".

The PyMol PSE file contains a multi-layered annotation, i.e., several layers with different information can be turned on and off by the user (see an example below). Each layer can be studied independently or in a combination with other layers to help the user in interpreting the structural, functional, and regulatory significance of the predicted correlations in a protein family.

The annotation layers provided by the visualCMAT are further discussed in more details.

[To Navigation]
Layer 1: Gradient paint of amino acids according to the best correlation Z-score

Backbone of each position in the representative protein structure is gradient-painted according to the predicted correlation Z-score with other positions in the structure. If one position participates in more than one correlation with other positions, then the largest Z-score is used. Z-scores are measures of statistical significance of the predicted correlations with larger values indicating stronger correlations (painted in intensive red). The CMAT statistics to be used (MIc/Zc or MIp/Zp) is selected by the user when submitting a new task to the visualCMAT. The visualCMAT annotations created with either of the two statistics are equivalent (the default is MIc/Zc).

Please note that the gradient paint is relative to the particular set of Z-scores. In the example above the grey-to-red gradient paint corresponds to Z-scores within a range [3.5; 5.868], i.e, grey corresponds to positions with Z-scores around 3.5. The P-value of P(Z>3.5) is 0.0002, meaning that the respective correlations yet are highly significant (i.e., unlikely to be observed by chance) from the statistical point of view. Therefore, in this example the grey paint does not indicate "weak" correlations, but rather indicates that positions painted in red participate in more significant correlations and thus could be more functionally/structurally important, compared to positions painted in grey. The range of Z-scores to be used for gradient paint is set between the user-defined Z-statistics cutoff ("Minimum Zc" or "Minimum Zp" in the CMAT options section at the visualCMAT submission page) and the largest Z-score of a correlated pair which has passed the structural filtration. If the user-defined Z-statistics cutoff was set below 1.644854, i.e., P(Z>1.644854)=0.05, than the lower-bound of the range of Z-scores to be used for gradient paint will be set to 1.644854 and all positions with lower Z-scores will be painted in grey. In that case the grey paint will indicate the statistically "weak" correlations. The range of Z-scores which was used for gradient paint in the visualCMAT output is printed in the visualCMAT log file, e.g.:

Info: The grey-to-red gradient paint in the output file will correspond to Z-scores within a range [3.5; 5.868]
Info: where 3.5 is the user-defined zc-statistics cutoff
Info: and 5.868 is the Z-score of the most significant predicted correlation after structural filtration

To sum up, the Layer 1 outlines the individual positions which form the most significant correlations with other positions in the structure.

[To Navigation]
Layer 2 (sublayers 2.1 and 2.2): Pairs of correlated amino acid residues

The visualCMAT web-server predicts correlated substitutions and classifies them into two classes:

  • spatially close co-evolving residues which either form direct physical contacts or interact with the same ligand (e.g., a substrate or a crystallographic water molecule);
  • the long-range correlations.

 

These two classes of pairs are further processed and displayed separately in the visualCMAT output due to different possible interpretation of their structural, functional, and regulatory significance. The predicted correlations are classified into interacting or long-range pairs by the structural filtration (see Parameters). A pair of correlated residues i and j will be classified as interacting if the distance between them is within the user-defined cut-off (a direct physical interaction), or if the distance between them and the same ligand (e.g., a substrate or a structural water molecule) is within the cut-off (a mediated interaction).

The interacting co-evolving residues are presented at the Layer 2.1. Each pair of predicted correlated positions is connected by a dashed line in the structure of the representative protein. Each dashed line between two positions is painted in one color in a range from grey to red which is proportional to the predicted correlation Z-score for this pair of positions (with intensive red indicating stronger correlations). Please note that the gradient paint is relative to the particular set of Z-scores (see the respective description for the Layer1). In the example above grey does not indicate "weak" correlations, but rather indicates that correlations painted in red are statistically more significant and thus could be more functionally/structurally important, compared to correlations painted in grey. The CMAT statistics to be used (MIc/Zc or MIp/Zp) is selected by the user when submitting a new task to the visualCMAT. The visualCMAT annotations created with either of the two statistics are equivalent (the default is MIc/Zc).

The long-range correlations are presented at the Layer 2.2 and will be disabled by default in the visualCMAT output 'PSE' file, but can be enabled by the user (see the Guidelines on working with the visualCMAT output). The dashed lines connecting the long-range correlations are painted in blue. An example of graphical output with the long-range correlations enabled by the user is shown below.

To sum up, the Layer 2 outlines pairs, clusters, and networks of correlated/co-evolving positions.

[To Navigation]
Layer 3: Annotation of positions according to the cumulative degree of correlation

The CA-atom of each position in the representative protein structure is shown as a sphere whose radius is proportional to a sum of Z-scores of all predicted correlations of this position with other positions in the structure. I.e., the visualCMAT calculates the sum of Z-scores for each position i as Zi = ∑j Zi,j, where j are indexes of all other positions which correlate with the position at i. The CMAT statistics to be used (MIc/Zc or MIp/Zp) is selected by the user when submitting a new task to the visualCMAT. The visualCMAT annotations created with either of the two statistics are equivalent (the default is MIc/Zc). Larger spheres indicate positions which tend to participate in a larger number of correlations with other positions, or tend to participate in stronger correlations with other positions, or both. To discriminate between the three possibilities the user can combine this Layer3 with the annotations provided by the Layer1 and Layer2.

To sum up, the Layer 3 can be useful to visually select the key individual residues involved in multiple statistically significant correlations.

[To Navigation]
Layer 4 (sublayers 4.10, 4.20, 4.25, 4.27, 4.30, 4.40, and 4.50): Annotation of potential binding sites in the representative protein structure

 

Layer 4.10 (small pockets are preferred) Layer 4.50 (large pockets are preferred)
Only pockets which contain correlated positions involved in direct/mediated interactions in their structures are shown

 

The Layer 4 contains annotation of the binding pockets on the protein surface. Binding sites are classified into three classes depending on whether they contain:

  • correlated positions involved in direct/mediated interactions,
  • or only the long-range correlations,
  • or no statistically significant co-evolving residues at all in their structure.

 

The binding sites which include interacting co-evolving residues (not necessarily within the same site) are ranked by the sum of Zi,j scores calculated over all statistically significant correlations which have passed the structural filtration for each residue i in the site (i.e., the long-range correlations are excluded). Similarly, the binding sites which include only the long-range co-evolving pairs are ranked by the sum of respective Z-scores. Finally, the binding sites which do not include any correlated positions are ranked by the FPOCKET scoring function. Each predicted pocket is shown as a separate object named according to its rank and the sum of Z-scores.

The Layer 4 contains sublayers 4.10, 4.20, 4.25, 4.27, 4.30, 4.40, and 4.50 which correspond to automatic annotation (i.e., prediction) of binding sites on the surface of the submitted representative protein with different algorithm parameters. The structural information submitted to the visualCMAT is used to predict binding sites by the FPOCKET algorithm which is based on the concept of alpha-spheres and Voronoi tessellation. An alpha-sphere is a sphere in contact with four atoms on its boundary, not containing any internal atoms inside. The shape and distribution of alpha-spheres is different in clefts and cavities, inside the protein globule, and on the flat protein surface. Thus, it is possible to apply geometry filters to detect pockets in protein structures by clustering the alpha-spheres. The visualCMAT executes the FPOCKET algorithm seven times to predict potential binding sites with the "Minimum number of alpha-sphere per pocket" parameter taking values in a range from 10 (small pockets are preferred) to 50 (large pockets are preferred). Running the FPOCKET algorithm with the alpha-spheres=30 usually provides the binding sites annotation which is very close to the prior knowledge about a particular protein. However, sometimes it misses out some important smaller sites, or merges multiple sites, which are close in the structure but are known to bind different ligands, into one larger pocket. Usually these problems could be solved with a slight fine tuning of the algorithm parameters, e.g., setting the min[alpha-spheres] to 25 or 27. Consideration of pockets with 10-20 alpha-spheres-spheres can be useful in particular cases when small sites present strong interest, and setting the min[alpha-spheres] to 40 or 50 can be implemented to quickly visualize only the largest pockets and cavities in the protein structure. The pockets predicted with different algorithm parameters are independently ranked by the presence of statistically significant co-evolving/correlated pairs as discussed above and provided as separate sublayers of annotation in the visualCMAT output.

To sum up, the Layer 4 contains information about potential binding sites which are automatically predicted, mapped onto the representative protein structure, and ranked by the presence of co-evolving positions. By default, pockets which contain correlated positions involved in direct/mediated interactions are enabled, and pockets which contain only long-range correlations or no co-evolving residues at all in their structures are disabled in the visualCMAT output (user can enable/disable the selected pockets by clicking on the object title in the PyMol viewer). It has been previously suggested that the presence of co-evolving residues in a binding site can be an indicator of its functional or regulatory significance. It is especially interesting to implement this assumption to annotate novel allosteric sites and study the mechanisms of allosteric communication, which as yet remain poorly understood. Therefore the purpose of the visualCMAT ranking is to discriminate pockets by the presence of co-evolving positions and to show pockets enriched by the most statistically significant correlated residues first, to facilitate their further analysis in particular proteins. See Implementation of visualCMAT in the laboratory practice for more information.

 

[To Navigation]
Guidelines on working with the visualCMAT output

The primary visualCMAT output is the annotation of the representative protein structure according to the results of bioinformatic, statistical and structural analyzes of the predicted correlated mutations. The visualCMAT annotation can be downloaded to a local computer as a content-rich all-in-one binary file in the PyMol PSE format or operated on-line using the built-in on-line analysis tools. The on-line interactivity is implemented in HTML5 and therefore no plugins nor Java are required. Although the off-line and on-line versions of the visualCMAT annotations are identical, the PyMol's toolbox and command line functionality to study the off-line content are more versatile than those of the JSMol program which is implemented at the on-line Analysis page.

The visualCMAT annotation is presented in multiple layers. The user can benefit from the multi-layered annotation by selecting and combining different types of information for a particular purpose. Layers with different content can be turned on and off in the PyMol viewer and in the on-line analysis tool. This feature can be used to study the annotation layers independently, and also provides an opportunity to combine the selected layers to create new information content for expert analysis.

Our recommendations on working with the visualCMAT output are provided below.

  • In further development, would you wish to add the long-range correlations involving Thr-216 you could additionally run the following instruction in the command line:

    enable LR_PAIR*THR`216*

    In both cases you may want to manually disable the Layer3 with the sum-of-Z-scores for a better view on the selected positions;
     
  • If you are interested in particular amino acid residues (e.g., residues which were shown by the experimental site-directed mutagenesis to have an impact on protein function, activity or stability, or positions which were preliminary chosen as hotspots for future experiments), then visualize only the correlated pairs which include these residues (execute the disable/enable commands as shown above). In addition to Layer1 and Layer2 you should enable the Layer3 with the sum-of-Z-scores. You can now see how the selected residues correlate with other positions in the structure and how strong are these correlations (i.e., by the intensity of the red gradient paint of the backbone and the dashed lines, as well as the size of the spheres). In addition, you can download the visualCMAT supplementary output - "The sum-of-Z-scores table". This is a plain text file which provides the sum-of-Z-score ranking for each position predicted as correlating with other positions in the alignment;
     
  • If you do not have any particular residues in focus of your study you could start from evaluating the possible roles of the first few most statistically significant correlations (the top ranking pairs with largest Z-scores), as well as residues with the largest sum-of-Z-scores - make sure that the Layer3 is enabled and then pay attention to the residues marked by the largest spheres. In addition, you can download the visualCMAT supplementary output - "The sum-of-Z-scores table". This is a plain text file which provides the sum-of-Z-score ranking for each position predicted as correlating with other positions in the alignment;
     
  • The structural filtration which is automatically performed by the visualCMAT outlines networks of interacting correlated residues. Some interactions can be mediated by a ligand - a substrate or a crystallographic water molecule (see an example below). You should study these networks and their possible implication to protein function and regulation;

  • To learn the co-occurrence of particular amino acid types at the selected positions you will need the visualCMAT supplementary output - "The list of correlated pairs and the amino acids co-occurrence statistics". This is a plain text file listing all predicted correlated positions numbered according to both the PDB and the amino acid sequence with a detailed information about their amino acid content;
     
  • The Layers 4.10, 4.20, 4.25, 4.27, 4.30, 4.40, and 4.50 correspond to automatic annotation (i.e., prediction) of binding sites on the surface of the submitted representative protein according to their shape and size. Selection of a particular set of pockets depends on the task. The Layer4.30 usually provides the binding sites annotation which is very close to the prior knowledge about a particular protein. However, sometimes it misses out some important smaller sites, or merges multiple sites, which are close in the structure but are known to bind different ligands, into one larger pocket. Usually these problems could be solved with a slight fine tuning of the algorithm parameters, e.g., see Layer4.25 and Layer4.27. Consideration of Layer4.10 and Layer4.20 can be useful in particular cases when small sites present strong interest, and Layer4.40 / Layer4.50 can be implemented to quickly visualize only the largest pockets and cavities in the protein structure:

  • At first, you should enable only one sublayer (e.g., the "Layer4.40"). Then you can expand this sublayer to choose only the particular pockets, which are ranked by the presence of statistically significant co-evolving/correlated pairs as discussed above.

  • Each predicted pocket is shown as a separate object named according to its rank (i.e., rank) and the sum-of-Z-scores of correlated positions in its structure (i.e., SoZ). The titles of individual pockets have different prefixes according to the classification. The "P_" prefix means that the respective site contains at least one residue that has a statistically significant correlation with any other residues in the protein structure accepted by the structural filtration (i.e., the direct physical correlation or a mediated correlation). These pockets are enabled by default in the visualCMAT output. All other types of pockets are disabled by default but can be re-enabled by the user (simply click on the selected object title). The "P_LR" prefix (the LR stands for long-range) means that the respective site contains only residues which were dismissed by the structural filtration (i.e., the long-range correlation). The "P_NULL" prefix means that the respective site does not contain any statistically significant correlated residues. To operate this content you can use the same disable/enable commands with the selected prefix (i.e., "P_", "P_LR_" or "P_NULL_") as discussed earlier;
     
  • It has been previously suggested that the presence of co-evolving residues in a binding site can be an indicator of its functional or regulatory significance. Therefore the purpose of the visualCMAT ranking is to discriminate pockets by the presence of co-evolving positions and to show pockets enriched by the most statistically significant correlated residues first, to facilitate their further analysis in particular proteins. E.g., an example of the human p38a MAP kinase (PDB 1p38) below shows long range pairs of residues which correlate across a predicted binding site (left). In fact, this site is known as allosteric in p38a MAPKs, in particular it can bind an allosteric inhibitor doramapimod (PDB 1kv2, right):

[To Navigation]