Research :: Zebra :: Algorithm outline

Back to Zebra Main

Zebra is intended to systematically study diverse protein superfamilies and identify the subfamily-specific positions (SSPs) – conserved only within functional subfamilies but different between them – that seem to be responsible for different substrate specificity, catalytic activity, stability, etc. To identify functionally important SSPs a novel scoring function is suggested that incorporates structural information as well as physicochemical and residue conservation in protein subfamilies. A multiple sequence alignment and, optionally, structural information about the protein superfamily are used as an input. The algorithm proposes classifications into functional subfamilies automatically and predicts the subfamily-specific positions that are responsible for functional diversity in protein families.

The algorithm has three main steps:

  • Prediction of functional subfamily classifications(alternatively, experimentally derived functional annotation can be manually provided by the user)
  • Identification of the subfamily-specific positions
  • Statistical analysis to automatically select the most significant hotspots for further consideration

Want to understand the algorithm in detail? Please refer to our paper:
Suplatov D., Shalaeva D., Kirilin E., Arzhanik V., Švedas V. (2014). Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J.Biomol.Struct.Dyn. 32(1), 75-87. DOI:10.1080/07391102.2012.750249 PMID:23384165