Mustguseal Limitations Page





A hybrid computer (CPU+GPU+SSD) was used to implement Mustguseal as a freely available platform on the Internet. Multiple alignments of thousands of protein sequences and structures can be constructed using this public web-server in Modes 1, 2, and 3. This page describes the measures we have implemented to increase the overall performance of this service and make Mustguseal available to as many people as possible




The aim of this service is to automatically construct large multiple alignments of protein families by collecting all available information about their structures and sequences in public databases. Comparative analysis of homologous proteins within a large superfamily is a computationally complex and resource-demanding procedure based on sequential and interconnected execution of various bioinformatic methods. In particular, the following three procedures put the most heavy load on the resources:
  • The structure similarity search. The PDB database currently contains more than 130 000 proteins structures (more than 350 000 structures of individual chains) which should all be matched with the query in Mode 1. The Mustguseal implementation of the structure similarity search was designed to accelerate this step. The server performs formatting and analysis of the PDB database in order to select in advance the most promising candidates to be further matched with a particular query based on its size and number of amino acids forming the secondary structure elements. The results of each pairwise structure comparison are stored in the database managed by the PostgreSQL database management system and re-used in consequent searches. Both databases - with the PDB files and the pairwise structural alignments - are stored on a SSD drive to speed-up the read-write operations. When a new task is submitted with the same query (e.g., by the same user to refine the alignment with different parameters, or by a different user) the structure similarity search can take only a few seconds because the results are not re-computed but are being restored from the database. Consequently, no limitations are applied to this step.

  • Multiple structure alignment. Multiple alignment of protein 3D coordinates is a time and memory consuming task. Mustguseal currently implements a classical CPU version of the algorithm. We are in progress of developing the GPU implementation. For now, the number of structures to be aligned in the Mode 1 should be limited. See the details below. To overcome these limitations you can submit your own core structural alignment in Mode 2 or Mode 3. Please note that the sequence representation of the core structural alignment (i.e., not the 3D coordinates but the fasta sequence file) should be submitted in Mode 2 or Mode 3. We recommend parMATT to build your own core structural alignment from a large collection of protein structures on your local computer or a supercomputer. The parMATT is a parallel implementation of a popular algorithm MATT (Multiple Alignment with Translations and Twists) and is intended for distributed-memory systems (i.e., computing clusters and supercomputers hosting memory-independent computing nodes). parMATT can significantly accelerate the time-consuming process of building a large structural alignment.

  • Sequence similarity search. The Swiss-Prot and TrEMBL databases together deposit more than 85 million protein sequences which should all be matched with each representative protein from the core structural alignment in Mode 1 and Mode 2. To accelerate this computationally complex task we implement a GPU-implementation of the popular sequence similarity search algorithm. Yet, limitations should be applied to the number of searches in the largest TrEMBL database per task. See the details below. To overcome these limitations you can submit your own sequence alignment blocks (one for each representative protein in the core structural alignment) in Mode 3.


The following limitations are currently being implemented to increase the overall performance of this service and provide it to as many people as possible:


Limitations in Mode 1

Selected sequence similarity search database The maximum size of the
core structural alignment
What happens
if you exceed the limit?
Swiss-Prot 32 proteins The first 32 proteins most similar to the query will be automatically selected
Swiss-Prot+TrEMBL 16 proteins The first 16 proteins most similar to the query will be automatically selected



In Mode 1 the proteins for the core structural alignment are selected automatically based on the results of the structure similarity search. At this step the Mustguseal protocol will attempt to select a representative set of not more than 16/32 protein structures (depending on the selected sequence similarity search database) by clustering their corresponding sequences at different pairwise similarity thresholds in a range from 40% to 95%. The selected representative structures will be further aligned by means of structural superimposition. If the size of the smallest set of representative structures will exceed the Mustguseal threshold the first 16/32 proteins (depending on the selected sequence similarity search database) most similar to the query would be selected and further processed automatically. In this case a warning message will appear on the log.

Limitations in Mode 2

Selected sequence similarity search database The maximum size of the
core structural alignment
What happens
if you exceed the limit?
Swiss-Prot 64 proteins The task will be rejected
Swiss-Prot+TrEMBL 32 proteins The task will be rejected



In Mode 2 the core structural alignment is submitted by the user and each protein in that alignment is used as a query to execute sequence similarity search. The Swiss-Prot scans are fast and therefore at most 64 can be executed per task, but the Swiss-Prot+TrEMBL scans are significantly slower and thus their number is limited to at most 32 per task. Would the user-submitted core structural alignment contain more proteins the task would be rejected.

Limitations in Mode 3

The maximum size of the
core structural alignment
The maximum size of each sequence alignment block The maximum total size of the final alignment
150 proteins 500 proteins 10000 proteins



Submissions exceeding these thresholds will be rejected. These limitations in Mode 3 were introduced to prevent intentional abuse of the service. It seems impractical to build alignments larger than 10000 proteins as they would most certainly contain redundant information and would be computationally hard to analyze. However, would you like to build larger alignments for a particular purpose you could contact the Support to receive a special access code. Please briefly state you name, position, institution and the purpose of your study.

The 24-hour quota per IP address

Currently at most 50 tasks submitted from one IP address can be queued, in progress, or stored after completion at the server within a 24-hour period. This limitation was introduced to prevent a single user from monopolizing the Mustguseal resources and to encourage users to delete their tasks upon completion. Deleting a task restores the quota. Therefore if you delete your tasks after downloading the results the number of tasks you can process from one IP in a day is not limited.