Guide to Zebra input

Submission to the Zebra should be simple. Please read our guidelines and the troubleshooting guide below. Keep in mind that the input data (i.e., the multiple alignment) can be prepared automatically for your particular protein families by the sister web-service Mustguseal using all available information about protein structures and sequences in public databases.

Navigation:



[To Navigation]
Input description and guidelines


The input to the Zebra is (1) a multiple sequence alignment and, optional but highly recommended, (2) a representative protein structure in the PDB format which should correspond to one of the proteins in the multiple alignment. Availability of structural information can increase the accuracy of bioinformatic predictions (the 3D-mode). The user can also benefit from the structure-based annotation of the subfamily-specific positions which is a convenient tool to study the subfamily-specific positions and their implication to protein function and regulation. The Zebra algorithm does not require pre-defined subfamilies and can propose multiple classifications automatically by graph based clustering at different fragmentation levels. Random shuffling and Bernoulli statistics are applied to rank hits by decreased significance and select highly valuable SSPs for further evaluation. The Zebra results are provided in two ways – as a single all-in-one parsable text file and PyMol sessions with structural representation of SSPs.

The multiple sequence alignment

The input multiple alignment:

  • should contain protein amino acid sequences;

  • the "B", "Z", "X" residues will be automatically substituted for "D", "E", "G", respectively;

  • should contain at least six proteins;

  • should be an alignment (i.e., not just sequences, but aligned sequences, i.e., "sequences with gaps");

  • the "-" character should be used for a gap;

  • the special characters in the protein names are not allowed and will be automatically substituted for "_";

  • the very long protein names will automatically truncated to the first 100 characters;

  • the special characters in the protein sequences are not allowed and will be automatically substituted for gaps;

  • should be in the FASTA format (not ClustalW, not Phylip, etc.). If you do not know what is the format of your alignment - submit it to Zebra and you`ll find out. If you have your alignment in the wrong format use a sequence format converter, e.g., sequenceconversion.bugaco.com;

  • should contain at least some columns with low content of gaps. By default all columns with more than 5% of gaps are removed from the alignment prior to running the bioinformatic analysis. This threshold can be changed in the Manual Mode (see details).

The representative structure

The input structure:

  • should contain the coordinates of amino acids atoms of one protein;

  • should be in the PDB format;

  • should correspond to (i.e., ideally should be 100% identical to) one protein sequence in the multiple alignment;

  • may not be 100% identical to any protein sequence in the multiple alignment. The preprocessing script will automatically select the representative sequence from the multiple alignment by the best pairwise match between your PDB structure and any sequence in the alignment. All inconsistencies between the representative protein structure and sequence will be removed. You should always aim at submitting a representative PDB structure that corresponds to a protein in the alignment with at least 95% pairwise sequence identity. You will be allowed to proceed with up to 50% sequence similarity between the two, however, this may cause errors during the bioinformatic analysis. See our Troubleshooting guide for details.

  • may contain multiple chains (e.g., A, B, C) even if the multiple alignment contains the sequences of only one chain (e.g., A) - in this case set the one chain to be used for Zebra in the respective field at the submission page;

  • may contain heteroatoms. All non-protein atoms (e.g., of a substrate) should have the HETATM prefix in the PDB file. Non-canonical amino acids will be automatically changed to the canonical equivalents (i.e., SME/MSE to MET). Ligands, cofactors, solvent and other instances will not be used for the bioinformatic analysis but will be used to prepare the graphical output and can help with the interpretation of functional and regulatory significance of the SSPs;

  • may contain a ligand whose atoms have the ATOM prefix in the PDB file. There should be only one instance of the ligand in the PDB file (i.e., only one molecule with this name and atoms with the ATOM prefix). The name of this ligand should be specified in the "Active site" field of the "Prediction of subfamily-specific positions: optional input" in the Manual Mode (see details) to mark all residues within a certain residues from this ligand with an '*' sign in the Zebra output text file. This feature is a bit outdated and made sense when Zebra did not have the PyMol-based graphical output. However, it could still be used for a particular purpose.

The general guidelines

The multiple alignment is the primary source of information for the bioinformatic analysis. It is important that this alignment is accurate and representative. While there is no universal metric to check for the overall alignment "accuracy" it is well known that the sequence-based superimposition is meaningful when aligning close homologs with a high sequence identity, and structure-based alignments should be used to compare distant evolutionary relatives. Depending on the task the proteins in the alignment should represent the desired functional diversity among the families of interest. The alignment should not contain redundant information, i.e., there is no point in building a large alignment of thousands of proteins which are highly similar to each other, instead the sequences should be filtered for similarity and only the representative proteins should be further aligned and studied. The alignment should have at least some columns with a low content of gaps - no more than 5% of gaps. The contrary would mean that your proteins are too different from each other an there is no core part which is shared by all homologs. This could indicate alignment errors and/or incorrect choice of an alignment protocol/method.

Choose the representative protein based on your particular task and primary interest. It can be the target protein selected for the further experimental design, the most studied member of the superfamily, or a protein which you are the most familiar with. You should always aim at submitting a representative PDB structure that corresponds to a protein in the alignment with at least 95% pairwise sequence identity. If structural information is not available for all proteins in your alignment then you could use the structure of a very close homolog from the PDB database or build a 3D model of the representative protein based on the available structural data using the homology modeling (e.g., with the help of the highly capable Modeller software).

Generally, you do not have to explicitly indicate which protein in the sequence alignment does the submitted PDB correspond to. Our preprocessing system will try to automatically superimpose your PDB with all proteins in your multiple sequence alignment and select the best pairwise sequence similarity match. Alternatively, you can set the "Reference" id in the Manual Mode (see details).

Keep in mind that the input data can be prepared automatically for your particular protein families by the sister web-service Mustguseal using all available information about protein structures and sequences in public databases. The Mustguseal stands for Multiple Structure-Guided Sequence Alignment. The most simple way of using the Mustguseal is to submit a PDB ID of the query protein. The Mustguseal protocol implements structure similarity search to collect remote evolutionary relatives of the query, which are expected to represent different protein families. Then, for each collected remote evolutionary relative, Mustguseal runs a sequence similarity search to collect close evolutionary relatives - members of the corresponding families. In such a way Mustguseal takes into account variability of sequences and structures within a large superfamily to obtain a set of functionally diverse homologous proteins. A combination of structure and sequence alignment procedures is then implemented to build the final multiple alignment. This final alignment and the structure of the query protein can be directly submitted from the Mustguseal web-service to the Zebra web-service in one click.




[To Navigation]
The subfamily classification

The predict the subfamily-specific positions the input set of sequences has to be classified (i.e., divided) into subfamilies. The advantage of Zebra is that the algorithm can propose multiple classifications automatically by graph based clustering at different fragmentation levels, each classification is further implemented to predict SSPs and estimate their statistical significance. Finally, the automatically proposed subfamily classifications are ranked by the statistical significance of the SSPs they produce. Therefore, Zebra can be used for both to predict the subfamily-specific positions the and to predict a classification (classifications) of a large set of proteins into functional subfamilies.

The Zebra algorithm parameters for the prediction of functional subfamilies are usually robust, with one important exception. By default a subfamily can not be smaller than 5% from the number of proteins in the alignment. If you have a large alignment (>500 protein) it is likely that the true functional groups in your alignment substantially vary in size (for example - one subfamily contains more than 90% of all proteins and another 2 subfamilies do not exceed 5% of all proteins) or are too small compared to the overall size of the sample (for example, more than 20 subfamilies in the set with some groups containing less than 5% of sequences). In this case it is important to run Zebra twice with different settings - i.e., run two independent tasks with the same input data but different parameters. Run the first task with the default setup (e.g., the "Subfamily size limit" set to 5% of the alignment size). To run the second task switch to the Manual mode and set the "Subfamily size limit" to a smaller value, e.g., 3 sequences - this will allow unique functional subfamilies for the bioinformatic analysis. To change this parameter choose the "Manual" mode, scroll down to section "Functional subfamily classification", select the checkbox "Specify clustering parameters manually" and type a number corresponding the expected minimum number of sequences in a subfamily in the "Subfamily size limit" text field. See the Manual Mode for details.

If the automatic subfamily classification fails it could be for two main reasons:

  • The proteins in your alignment are too distant;
  • The proteins in your alignment are too close;

    See our Troubleshooting guide for details.

    You can submit a custom subfamily classification for the Zeba bioinformatic analysis. You should choose the "Manual" mode in order to edit the algorithm parameters that control the automatic classification. On the submission page scroll down to section "Functional subfamily classification". Here you can submit your own subfamily classification or specify clustering parameters for the automatic classification. See the Manual Mode for details.




    [To Navigation]
    The three Zebra input modes


    The Zebra provides three input modes that differ by complexity and type of the input data required to start the analysis. The “QuickZebra” mode is the most straightforward and easy to use way to run the bioinformatic analysis which requires a multiple sequence alignment only for the input. The “QuickZebra + 3D” mode performs sequence and structural bioinformatic analysis and in addition to the sequence input requires a PDB structure file that should correspond to one of the MSA sequences. Finally, the “Manual” mode provides the ability to edit algorithm parameters that control the automatic classification and identification of SSPs. See the Manual Mode for details.




    [To Navigation]
    Description of parameters in the Manual Mode


    If for any reasons you don`t like the default setup provided by the "QuickZebra" and "QuickZebra+3D" modes you can set the parameters manually. The description of Zebra parameters is provided below. You should consult our paper for benchmarking with different setup and also check the our example for a parameters setup:

    Management of input data

    • Multiple sequence alignment of a protein family. At least 6 sequences are required for the bioinformatic analysis (at least 3 sequences per subfamily and at least two subfamilies).
    • Gap threshold - maximal gap occurrence in a column. Columns dominated by gaps usually do not contain any important information.
      Example: set to "30" to remove columns with more than 30% of gaps
      Default: 5% of gaps
    • Reference and offset . Select a sequence to be used as reference and an offset value to amino acid position in the sequence in the output file. The two parameters would not affect the calculations.
      Example: setting reference to "5" will select the 5th sequence (ex.:ADSST) from the top of the alignment file as a reference. Positions will be shown in the output file as 1A, 2D, 3S, 4S, 5T. Setting offset to "3" would change it to 4A, 5D, 6S, 7S, 8T and could be useful in case the alignment sequence is incomplete and misses first three residues. Offset could be set to a negative value.
      Default: 1st sequence is taken as reference with zero offset (positions are numbered according to the order they appear in the sequence

    Prediction of subfamily-specific positions

    • Specificity scoring function. RESP function that considers residue conservation and physicochemical conservation will be used.
    • Random permutations. Reliability of statistical calculations is regulated by number of random permutations.
      Example: Setting to "1000" will perform 1000 random permutation in every column of a multiple sequence alignment
      Default: 1000 shuffles

    Prediction of subfamily-specific positions: optional input

    • Upload a PDB coordinate structure file that corresponds to one of the sequences in the alignment.

      Only if the PDB coordinate structure file had been uploaded:

    • Define "Active site" area. Residues from the active site will be indicated as '*' in the output file.
      Example: "ATP 10" will select all residues within 10 angstroms from any atom of ATP molecule from the PDB file.
      Default: active site definition is off
      NB: There should be only one instance of the ligand in the PDB file (i.e., only one molecule with this name and atoms with the ATOM prefix). The atoms of this ligand should have the ATOM prefix in the PDB file.
    • Use 3D-mode: set radius to calculate neighborhood for every residue and number of random permutations to calculate conserved positions.
      Example: set radius to "4" to consider specificity and conservation of neighboring residues within 4 angstroms when calculating specificity of a residue. Set random permutation to "1000" for 1000 shuffles in every column.
      Default: 4 angstroms radius is used to calculate neighbors with 1000 random permutations to calculate conservation rate in every column

    Functional subfamily classification

    • Manual subfamily definition. If pre-defined subfamily classification is available user is welcome to provide it to the program. Classification is submitted as a text file listing space separated sequence ID`s that belong to one family in one line and different subfamilies as different lines (first sequence has id of 1). If you have a problem preparing the user-defined subfamily classification file please consult our Troubleshooting guide.
      Example: Alignment, Groupfile. Perl script can be used to learn sequence id`s from a fasta alignment and assist groupfile preparation.
    • Alternatively, user has an opportunity to process his request in the absence of external functional annotation. Zebra provides a built-in procedure that can be used to Predict functional subfamilies.
      • Subfamily size limit. Zebra can create subfamilies with at least 3 sequences (minimally reasonable value). Program was benchmarked with the value of '3' and showed competitive results. However, if you are analyzing a superfamily of thousandths of sequences you are probably not interested in looking at subfamilies of size 3. Thus, to save computational time you can adjust this parameter to the expected size of the smallest subfamily. If you are not happy with automatically proposed classifications it could mean that functional groups in your alignment substantially vary in size or are too small compared to the overall size of the sample. Try setting "Subfamily size limit" to smaller values (for example: 3 sequences) and repeat the calculation.
        Example: value of "3" will allow subfamilies with at least 3 sequences
        Default: 5% from the number of sequences in alignment but not less then 3 sequences
      • Outliers . User has an option to select a threshold for outliers (sequences not assigned to a subfamily). Classification exceeding this threshold will be removed. Setup for this parameter showed comparable performance in a range 0-30%. Thus, value 20% is set by default.
        Example: value of '0.2' will make the program to accept classification with not more than 20% outliers compared to the sample size
        Default: 0.2 (not more than 20% from the samle size)
      • Search by expected number of subfamilies. User has an option to select expected number of subfamilies. This can be set either as a range or as a particular value. Classifications exceeding this threshold will not be considered. Zebra showed significantly better performance when given expected number of subfamilies as an input.
        Example: setting "mingroups" to "2" and "maxgroups" to "2" will make the program to create only two-group classifications
        Default: number of subfamilies is not limited





    [To Navigation]
    Troubleshooting guide


    Troubleshooting guide for common errors related to the "Step 4: The Zebra bioinformatic analysis"
    We recommend to submit a representative protein structure together with a multiple sequence alignment. Availability of structural information can increase the accuracy of bioinformatic predictions (the 3D-mode). The user can also benefit from the structure-based annotation of the subfamily-specific positions which is a convenient tool to study the server output.

    It is expected that the submitted multiple alignment
    (1) contains alignment of sequences of homologous (evolutionary related) proteins,
    (2) is in the FASTA format, and
    (2) does not contain a major amount of columns with a high content of gaps. By default, all columns with more than 5% gap frequency are dismissed.

    It is expected that the submitted protein structure
    (1) corresponds (i.e., is 100% identical) to one of the protein sequences in the alignment, and
    (2) does not contain errors (i.e., ambiguous formatting, incomplete records, etc.).

    The server implements the automatic preprocessing step to help the user and prepare his data for compatibility with this server. The task will be accepted even if the highest pairwise sequence identify between the submitted protein structure and any protein in the alignment is far below 100%. In rare cases the automatic preprocessing can not be applied due to ambiguity of the user input. In these cases the "Step 4: The Zebra bioinformatic analysis" would fail and the user would have to manually correct the input prior to a new submission.

    The common errors and recommended solutions are described below. If this troubleshooting guide does not solve your problem please send us an error report and we will be happy to provide assistance in your particular case.

    ERROR: The file submitted is not a valid FASTA Alignment

    The input multiple alignment should be in the FASTA format (not ClustalW, not Phylip, etc.). If you do not know what is the format of your alignment - submit it to Zebra and you`ll find out. If you have your alignment in the wrong format use a sequence format converter, e.g., sequenceconversion.bugaco.com;

    An example of the FASTA ALIGNMENT format:

    >protein_name_1
    ISPQHIQYFMYHILLGLHVLHE--AG--VVHRS---------DLHPGNI
    >protein_name_2
    LEESHMQYFVYQILRGLKYLHS--AN--VAHRS------KNCDLKPANL
    >protein_name_3
    LSNDHICYFLYQILRGLKYIHS--AN--VLHRT------KNCDLKPSNL
    >protein_name_4
    LTDDHVQFLIYQILRGLKYIHSDFANRGIIHRSIYPGAAKNCDLKPSNL
    >protein_name_5
    LEKQFIQYFLYQILRGLKYVHSDFAGRAVVHRSLFPAGGKQCDLKPSNI
    >protein_name_6
    IDKQFIQYFLYQILKGLKYVHTDFAGRAVVHKTLFPAGGKQCDLKPPSI

    ERROR: Reference pdb file "someproteinname.pdb" not found

    This error would be thrown if the 3D-mode was requested but no structure was uploaded by the user. Consequently, Zebra tries to access the PDB file which corresponds to the first protein in the multiple alignment file (and this happens to be protein someproteinname) and fails because there is no such file. The solution is simple. If you are running in the "QuickZebra+3D" mode don't forget to upload the representative PDB file together with the multiple protein alignment. If you are running in the Manual mode you should either do not enable the 3D-mode (disabled by default in the Manual mode) or enable the 3D-mode and upload the representative PDB file together with the multiple protein alignment.

    ERROR: Subfamilies were not found

    If the automatic subfamily classification fails it could be for two main reasons:

  • The proteins in your alignment are too distant;
  • The proteins in your alignment are too close;

    If you have requested the automatic prediction of functional subfamilies (enabled by default) and this automatic classification has failed you will see the following message in the log:

    WARNING: Subfamily classification into [2; 1000000000] subfamilies was not found
    WARNING: Re-running search with new limits [2; 1000000001]

    In this case check the log for a line like this:

    INFO: Columns Valid:0 Gapped:2215 Invariant:3

    Only the "Valid" columns can be used for the bioinformatic analysis and subfamily classification. A column is marked "Valid" if the content of gaps in it is below the threshold and if it is not 100% conserved. The columns with a high content of gaps (above the selected threshold) are marked as "Gapped" and the 100% conserved columns are marked as "Invariant". If the amount of "Valid" columns is zero or is very low, then the bioinformatic analysis is likely to fail due to poor information content derived from your alignment.

    The situation described above (i.e., Columns Valid:0, Gapped:2215, Invariant:3) means that your proteins are too distant and thus their structures are too different (i.e., most columns have a lot of gaps), and the common core which is shared by all of the homologs is very small and totally conserved (e.g., the catalytic triad).

    There are two things you could do:

    • Set the gap threshold to a higher value. You may, in principle, set this threshold to as low as 30-50%. This will apply a less strict filter on the gap content and as a result more columns will be available for the bioinformatic analysis;
    • You may also try to construct a new alignment with a better coverage. Try our Mustguseal web-service to automatically construct large structure-guided sequence alignments of your protein families;

    ERROR: Definition of functional subfamilies is not consisten with the alignment

    This error occurs in the Manual mode when the user requests a custom subfamily classification to be implemented by the Zebra bioinformatic analysis but fails to provide a valid subfamily classification file.

    A valid user-defined subfamily classification:

    • should assign each protein in the multiple alignment to a subfamily (i.e., the manual classification can not be used to select a subset of proteins from the alignment, it should cover all proteins);
    • should assign a protein to only one subfamily;
    • each protein in the classification file should be addressed by its ID - rank in the multiple alignment file (starting from 1);
    • each subfamily should be represented by one line of protein IDs in the classification file;
    • the order of IDs within a subfamily line, as well as the order of the lines in the classification file can be arbitrary.

    How to obtain protein IDs in the multiple alignment file? In Linux it is easy. Run a sequence of shell commands:


    cat alignment.fasta | grep '^>' | nl

    This command will produce the following output:

    1 >sp|P46429|GST2_MANSE
    2 >tr|D3YEX2|D3YEX2_9NEOP
    3 >tr|Q5CCJ4|Q5CCJ4_BOMMO
    4 >tr|Q4ACU7|Q4ACU7_HYPCU
    5 >tr|D3JYQ6|D3JYQ6_ANTPE
    ...
    144 >tr|A0A075X269|A0A075X269_SPOLT
    145 >tr|A0A075X2X3|A0A075X2X3_SPOLT
    146 >tr|A0A075X8Z8|A0A075X8Z8_SPOLT
    147 >tr|A0A075X3T3|A0A075X3T3_SPOLT
    148 >tr|S4NVD9|S4NVD9_9NEOP

    Thus, all IDs in a range 1-148 should be used in the subfamily classification file.

    Example. If you multiple alignment file contains six proteins (i.e., IDs are #1, #2, #3, #4, #5, #6) below are examples of valid classification files for Zebra. Classify the sequences into two groups (the first three proteins and the last three proteins):

    1 2 3
    4 5 6

    OR

    4 5 6
    1 2 3

    OR

    6 5 4
    3 2 1

    And these classification files are invalid.
    One protein (#6) is assigned to both groups:

    6 5 4
    6 3 2 1

    One protein (#6) is not assigned to any group:

    5 4
    3 2 1

    More proteins (#7-10) are assigned groups then are present in the alignment:

    6 5 4
    3 2 1
    7 8 9 10

    ERROR: Internal error while calculating structure-based neighbour list (0 neighbours found)

    This error usually happens when the representative protein, which was submitted as a PDB file to the server, has poor similarity (local or global) with the reference protein in the multiple sequence alignment.

    The PDB file is expected to represent one protein from the sequence alignment, i.e. the two should be identical in the amino acid sequence. You should always aim at submitting a representative PDB structure that corresponds to a protein which is highly similar to the reference protein sequence in the alignment (i.e., >95%). For your convenience Zebra implements the preprocessing step to automatically select the reference protein in the multiple alignment that has the highest pairwise sequence similarity with the representative protein in the PDB. All mismatching regions between the representative protein structure in the PDB and the reference protein sequence in the alignment are automatically removed. Significant changes to the representative protein structure and the reference protein sequence could be introduced at this step if the representative protein sequence has low similarity to the reference protein sequence. The low sequence similarity can be global, i.e. the two proteins are remote homologs, or local, e.g. the sequence and structure belong to the same protein but some flexible loop is only partially resolved in the PDB. As a result of these changes the automatically updated PDB structure can contain gaps in the backbone. If a residue in the structure has no neighbours within the 3D-mode cut-off radius (4A by default) then the bionformatic analysis fails.

    There are three ways to resolve this problem:

    (1) Quick and easy workaround - can make it work but is likely to introduce bias and reduce the scientific value of the results, should be used for information purposes only. Open the log file and go to the "Step 3: Preprocessing of the Input Data" section. Download the PDB structure of the representative protein after preprocessing (you can find the link at the end of the section). Have a look at the annotated pairwise alignment between the "pdb" and the "best_match". Find the "lonely" residues in the "pdb" and delete them from the downloaded PDB file. E.g., >pdb ANKGGPSEGA means that only the P was preserved in the PDB file after preprocessing and all other residues were dismissed. This Proline is likely to cause the error when using the 3D-mode because it seems to be too far from other residues in the 3D space. Check that the Proline is also "lonely" in the structure by looking at the corresponding PDB file and then remove it (delete the coordinates of the corresponding atoms) from the PDB file. Submit a new task to the server;

    (2) Find a better representative protein in the PDB database. BLAST your multiple alignment versus the sequences of proteins in the PDB database and select a better match than your current representative protein;

    (3) Build a 3D model of the representative protein based on the available structural data. You can reconstruct the missing loops in the globule or predict the entire structure using the homology modeling. The highly capable Modeller software can do both. If you do not know how to use the Modeller to build a model of the representative protein for Zebra/pocketZebra you can contact us and we will e-mail you the template scripts.

    ERROR: Neighbour list for acid with alignment position XXX contains pdb position YYY not presented in the alignment

    This error means that you have an incomplete amino acid record in your PDB file, i.e. the coordinates of the 'CA' atom of a certain residue are missing. This problem with your PDB file has to be corrected manually.

    Open the PDB file in the 3D structure viewer (e.g., PyMol) or in the text editor and find the YYY'th amino acid, i.e., the YYY'th amino acid record in the file (the count starts from 1). Please note that the residue ID of the YYY'th amino acid would be different from "YYY" if the residue numbering in the PBD does not start from 1 and/or some residues are missing in the backbone. Check if the record for the YYY'th amino acid is incomplete (if not, you might have lost your count), e.g.:


    ATOM    675  N   ALA A  84      30.430   1.654  28.087  1.00 11.02           N  
    ATOM    676  CA  ALA A  84      31.779   1.682  27.569  1.00 13.03           C  
    ATOM    677  C   ALA A  84      32.111   2.986  26.814  1.00 14.26           C  
    ATOM    678  O   ALA A  84      33.266   3.161  26.456  1.00 13.99           O  
    ATOM    679  CB  ALA A  84      32.036   0.525  26.603  1.00 16.45           C  
    ATOM    680  N   LEU A  85      31.167   3.874  26.582  1.00 14.45           N  
    ATOM    688  N   SER A  86      33.067   6.758  25.759  1.00 15.08           N  
    ATOM    689  CA  SER A  86      33.859   7.861  26.275  1.00 16.99           C  
    ATOM    690  C   SER A  86      32.871   8.932  26.675  1.00 18.70           C  
    ATOM    691  O   SER A  86      31.712   9.057  26.205  1.00 16.46           O  
    ATOM    692  CB  SER A  86      34.744   8.419  25.138  1.00 17.40           C  
    ATOM    693  OG  SER A  86      33.883   9.078  24.215  1.00 17.15           O  

    In the example above the record of the LEU85 is incomplete as the coordinates for most atoms (including the 'CA' atom) are missing.

    Once you find the residue in question you have the following possible actions:
    (1) remove (delete) the entire residue from the PDB file and submit a new task;
    (2) append the coordinates of the missing atoms (e.g., with the help of Modeller molecular modeling package) to the PDB file and submit a new task. In this case you may need to re-build the entire multiple alignment to include this residue.