Mustguseal Performance





Mustguseal limits the size of the core structural alignment to at most 16-150 proteins, depending on the input mode and the database selected for the sequence similarity search, in order to increase the overall performance of this service and provide it to as many people as possible. The set of 16-150 proteins in the core structural alignment can represent 16-150 structurally and functionally diverse protein families. As a result, multiple alignments of thousands of proteins representing large superfamilies can be constructed using this public web-server in Modes 1, 2, or 3.




The following settings are currently implemented to increase the overall performance of this service and provide it to as many people as possible:


General notes on the Mustguseal Performance

  • The running time of a Mustguseal task and the size of a final alignment will depend on the particular input, parameter setup, and availability of data in the PDB, Swiss-Prot, and TrEMBL databases;

  • The Mode 1 is the default, fully automated, and the easiest way to obtain the alignment by submitting PDB and chain IDs of a query protein. It also takes more time to complete because all steps of the Mustguseal protocol are executed to collect and align the related sequences and structures from the selected databases;

  • The structure similarity search in Mode 1 works more efficiently, if the requested structural similarity thresholds are higher, i.e., within a range 70-100%, and the Step 1 will take longer when the percentage of secondary structure equivalences is set to 30-40%;

  • All pairwise comparisons that were once created during the structure similarity search in Mode 1 are hashed into a PostgreSQL-controlled database to be re-used in the consequent searches. When a new task is submitted in Mode 1 with the same query structure, the Step 1 (i.e., the structure similarity search) takes only a few seconds to complete because the results are not re-computed but restored from the database, which is hosted on a very fast solid-state drive. This provides an opportunity for the user to refine the alignment by submitting a new task with the same query but different parameters and getting the results significantly faster;

  • The user can choose to perform sequence similarity searches either in the Swiss-Prot database or deal with the much larger dataset basing on Swiss-Prot+TrEMBL databases;

  • A GPU-compatible version of BLAST is used to accelerate sequence similarity searches;

  • The redundancy filter threshold has a direct impact on the speed of a sequence similarity search – a value below 80% is the fastest option, and 100% is the slowest option – because a pre-calculated non-redundant sets of Swiss-Prot and TrEMBL databases are actually being used by the server, and the nr80 database set is smaller in size compared to the nr100 database set (see The Parameters above);

  • Runtime of a task submitted in Mode 2 is on average at least two times faster than in Mode 1 because the time consuming Steps 1 and 2 (i.e., the structure similarity search and construction of the structural alignment) are skipped;

  • A task submitted in Mode 3 takes between several seconds to several minutes.


Performance in Mode 1

Selected sequence similarity search database The maximum size of the
core structural alignment
What happens
if you exceed the limit?
Swiss-Prot 32 proteins The first 32 proteins most similar to the query will be automatically selected
Swiss-Prot+TrEMBL 16 proteins The first 16 proteins most similar to the query will be automatically selected



In Mode 1 the proteins for the core structural alignment are selected automatically based on the results of the structure similarity search. At this step the Mustguseal protocol selects a representative set of not more than 16/32 protein structures (depending on the database selected for consequent sequence similarity searches) by clustering their corresponding sequences at different pairwise similarity thresholds in a range from 95% to 40%. The selected representative structures are further aligned by means of structural superimposition to create the core structural alignment. If the size of the smallest set of representative structures (i.e., produced by clustering at the 40% threshold) will exceed 16/32, then the first 16/32 proteins most similar to the query would be automatically selected. In this case a warning message will appear on the log. In turn, each representative protein will be used as a query to execute a sequence similarity search. Sequence similarity search in the much larger Swiss-Prot+TrEMBL database is significantly slower; this explains why the limit for the number of proteins in the core structural alignment depends on the database selected for the sequence similarity search.

Performance in Mode 2

Selected sequence similarity search database The maximum size of the
core structural alignment
What happens
if you exceed the limit?
Swiss-Prot 64 proteins The task will be rejected
Swiss-Prot+TrEMBL 32 proteins The task will be rejected



In Mode 2 the core structural alignment is submitted by the user and each protein in that alignment is used as a query to execute a sequence similarity search. Sequence similarity search in the much larger Swiss-Prot+TrEMBL database is significantly slower. Thus, to increase the overall performance of this service the number of the Swiss-Prot scans is limited to at most 64 per task, and the number of the Swiss-Prot+TrEMBL scans is limited to at most 32 per task. Would the user-submitted core structural alignment contain more proteins the task would be rejected. In this case you can split a large core structural alignment into several text files (i.e., copy-paste the names, sequences, and gaps of the first 32 proteins into the first text file, then copy-paste the names, sequences, and gaps of the next 32 proteins into the second text file, etc.), submit each file as a separate task in Mode 2 to collect the sequence alignment blocks for each representative protein, and then submit the entire core structural alignment and all sequence alignment blocks in Mode 3.

Performance in Mode 3

The maximum size of the
core structural alignment
The maximum size of each sequence alignment block The maximum total size of the final alignment
150 proteins 500 proteins 15000 proteins



Submissions exceeding these thresholds will be rejected. These limitations in Mode 3 were introduced to prevent intentional abuse of the service. It seems impractical to build alignments larger than 15000 proteins as they would most certainly contain redundant information and would be computationally hard to analyze. However, would you like to build larger alignments for a particular purpose you could contact the Support to receive a special access code. Please briefly state you name, position, institution and the purpose of your study.