Mustguseal Performance





Mustguseal limits the size of the core structural alignment to at most 16-150 proteins, depending on the input mode and the database selected for the sequence similarity search, in order to increase the overall performance of this service and provide it to as many people as possible. The set of 16-150 proteins in the core structural alignment can represent 16-150 structurally and functionally diverse protein families. As a result, multiple alignments of thousands of proteins representing large superfamilies can be constructed using this public web-server in Modes 1, 2, or 3.




The following settings are currently implemented to increase the overall performance of this service and provide it to as many people as possible:


Performance in Mode 1

Selected sequence similarity search database The maximum size of the
core structural alignment
What happens
if you exceed the limit?
Swiss-Prot 32 proteins The first 32 proteins most similar to the query will be automatically selected
Swiss-Prot+TrEMBL 16 proteins The first 16 proteins most similar to the query will be automatically selected



In Mode 1 the proteins for the core structural alignment are selected automatically based on the results of the structure similarity search. At this step the Mustguseal protocol selects a representative set of not more than 16/32 protein structures (depending on the database selected for consequent sequence similarity searches) by clustering their corresponding sequences at different pairwise similarity thresholds in a range from 95% to 40%. The selected representative structures are further aligned by means of structural superimposition to create the core structural alignment. If the size of the smallest set of representative structures (i.e., produced by clustering at the 40% threshold) will exceed 16/32, then the first 16/32 proteins most similar to the query would be automatically selected. In this case a warning message will appear on the log. In turn, each representative protein will be used as a query to execute a sequence similarity search. Sequence similarity search in the much larger Swiss-Prot+TrEMBL database is significantly slower; this explains why the limit for the number of proteins in the core structural alignment depends on the database selected for the sequence similarity search.

Performance in Mode 2

Selected sequence similarity search database The maximum size of the
core structural alignment
What happens
if you exceed the limit?
Swiss-Prot 64 proteins The task will be rejected
Swiss-Prot+TrEMBL 32 proteins The task will be rejected



In Mode 2 the core structural alignment is submitted by the user and each protein in that alignment is used as a query to execute a sequence similarity search. Sequence similarity search in the much larger Swiss-Prot+TrEMBL database is significantly slower. Thus, to increase the overall performance of this service the number of the Swiss-Prot scans is limited to at most 64 per task, and the number of the Swiss-Prot+TrEMBL scans is limited to at most 32 per task. Would the user-submitted core structural alignment contain more proteins the task would be rejected. In this case you can split a large core structural alignment into several text files (i.e., copy-paste the names, sequences, and gaps of the first 32 proteins into the first text file, then copy-paste the names, sequences, and gaps of the next 32 proteins into the second text file, etc.), submit each file as a separate task in Mode 2 to collect the sequence alignment blocks for each representative protein, and then submit the entire core structural alignment and all sequence alignment blocks in Mode 3.

Performance in Mode 3

The maximum size of the
core structural alignment
The maximum size of each sequence alignment block The maximum total size of the final alignment
150 proteins 500 proteins 10000 proteins



Submissions exceeding these thresholds will be rejected. These limitations in Mode 3 were introduced to prevent intentional abuse of the service. It seems impractical to build alignments larger than 10000 proteins as they would most certainly contain redundant information and would be computationally hard to analyze. However, would you like to build larger alignments for a particular purpose you could contact the Support to receive a special access code. Please briefly state you name, position, institution and the purpose of your study.

The 24-hour quota for concurrent submissions per IP address

Currently at most 50 tasks submitted from one IP address can be queued, in progress, or stored after completion at the server within a 24-hour period. This limitation was introduced to prevent a single user from monopolizing the Mustguseal resources and to encourage users to delete their tasks upon completion. Deleting a task restores the quota. Therefore if you delete your tasks after downloading the results the number of tasks you can process from one IP in a day is not limited.