Bioinformatic analysis of glutathione S-transferase superfamily
Citation: This example has been discussed in detail in [Suplatov D., Kirilin E., Takhaveev V., Švedas V. (2014). Zebra: web-server for bioinformatic analysis of diverse protein families, J.Biomol.Struct.Dyn., 32(11), 1752-1758]

Bioinformatic analysis with Zebra was used to suggest different functional classifications within GST superfamily and identify a set of subfamily-specific positions for further evaluation.

Glutathione transferases (GSTs;EC represent a wide group of dimeric enzymes whose common feature is catalysing the nucleophilic attack of glutathione (GSH) on the electrophilic groups of a wide range of hydrophobic toxic compounds. Each subunit of GST dimer possesses two structurally distinct domains and contains independent catalytic center that consists of a G-site for binding physiological substrate GSH and an H-site for binding structurally diverse endogenous and xenobiotic substances. The GST superfamily has been sub-grouped into an ever-increasing number of classes according to a variety of criteria.

Input data

  • QuickZebra+3D mode was used
  • Multiple sequence alignment file
  • PDB structure of the Rat Mu class GST. It was selected as reference structure for Zebra-3D mode because it contains a bound reaction product of the addition of GSH to phenanthrene 9,10-oxide and therefore specific interactions with both G- and H-site can be visualized

In QuickZebra+3D mode algorithm parameters are automatically set to the following values (See Explanation of the input data for more info):

  • Gap threshold: 5 (%)
  • Number of random permutations: 1000
  • 3D-mode: 4Å, 1000 random permutations to calculate conserved positions
  • Outliers: 0.2 (not more than 20% from the samle size)
  • Subfamily size limit: 5% from the number of sequences in alignment but not less then 3 sequences

Output data:

  • Text file output: download
  • PyMol session for Subfamily definition 1: download
  • PyMol session for Subfamily definition 2: download

Brief explanation of results:

  • Two subfamily classifications were automatically predicted by the server that correspond to previously discussed functional classification of GST enzymes (Dixon, 2002). Structural representation of results (PyMol "pse" sessions) were automatically created by the server (see animated gif`s below) and reveal SSPs in the H-site of the active center as well as at domain-domain, subunit-subunit and, possibly, dimer-dimer interfaces.
  • Automatic cut-off procedure has suggested 11 positions as the most statistically significant SSPs for 6-group classification and 13 positions in a case of 9-group classification (see the text output file). The two sets differ to some extent in positions and their ranking, and therefore should be both considered for further study and evaluation.

Subfamily definition 1:
Alpha/Mu/Pi/Sigma, Phi, Tau, Theta, Zeta, and Omega (6 groups)
P-value 9.5∙10-21
Subfamily definition 2:
Alpha, Mu, Pi, Sigma, Phi, Tau, Theta, Zeta, and Omega (9 groups)
P-value 2.1∙10-10

Animations. Structural representation of subfamily-specific positions in GST superfamily (automatically produced by the server). Gradient paint corresponds to estimated specificity: red stands for highly significant SSPs, cyan – for non-specific positions. Heteroatoms are colored in yellow and shown as ball-and-sticks. Download animations: subfamily definition 1, subfamily definition 2