parMATT

Parallel multiple alignment with translations and twists for distributed-memory systems




Protein structure is more conserved throughout the evolution compared to sequence. Currently, there are more than 135000 structures in the Protein Data Bank (PDB) and growing. The availability of this information gives new opportunities for comparative bioinformatic analysis of remote evolutionary relatives which have lost sequence similarity during natural selection and specialization from a common ancestor but preserved a common structural core. The parMATT is a parallel implementation of a popular algorithm MATT (Multiple Alignment with Translations and Twists) and is intended for distributed-memory systems, i.e., computing clusters and supercomputers hosting memory-independent computing nodes. The parMATT can significantly accelerate the time-consuming process of building a structural alignment from a large collection of 3D-models of homologous proteins.




Download parMATT


parMATT v. 1.0 [2017-00-00] download




Citing parMATT

If you find parMATT or its results useful please cite our work:

Shegai M., et al. (2017) parMATT: Parallel multiple alignment with translations and twists for distributed-memory systems, in preparation




The parMATT's User Manual


The User Manual provides a detailed description of the parMATT program and its features, including installation and execution syntax.
The following chapters are included in the User Manual:

  • Prerequisites
  • Compilation
  • parMATT's options and variables
  • The parMATT input
  • Running parMATT
  • The parMATT output
  • The parMATT example
  • Collecting a set of 3D-models of homologous proteins
  • Implementation of parMATT in the laboratory practice
  • The parMATT license
  • Citing parMATT

Download PDF file (623 KB)




The parMATT: In a nutshell


A brief overview of the parMATT is provided below. You should see the User Manual for a full description of parMATT's features and options.


Prerequisites


The advantage of the parMATT over MATT is the ability to run on multiple nodes (multiple CPUs) of multiprocessor computer systems. Therefore, to achieve the best performance of parMATT you should use a computing cluster or a supercomputer, i.e. a powerful machine with multiple CPUs. If you intend to build a multiple structural alignment on a single desktop CPU with multiple computing cores you could use parMATT or compile the original MATT sources with the openMP support (see installation instructions for the MATT program).

To compile the parMATT binary from the source code you need a Linux computer system with MPI compiler (e.g. Intel MPI or openMPI) and a C compiler (e.g., Intel icc or GNU gcc).




Running parMATT


The difference between running the original MATT on a local computer and running parMATT on a computing cluster/supercomputer is:

    (1) parMATT has to be launched as an MPI program by the appropriate MPI utility;
    (2) the ‘-t t' parameter should be set equal to the number of physical cores in the CPUs which are used in your computing cluster/supercomputer.

Once you know the command and the number of physical cores in your CPU model, running parMATT will be as easy as running any other program on your local computer.

Launch parMATT locally on 4 physical cores of a single Desktop CPU:

/path/to/parMatt -t 4 -L input.list -o output

Please note that the advantage of the parMATT over MATT is the ability to run on multiple nodes (multiple CPUs). Thus, local execution on a single CPU should be considered for evaluation purposes only.

Launch parMATT on 8 nodes (i.e., 8 CPUs), 14 physical cores on each node, using the mpirun:

mpirun -np 8 /path/to/parMatt -t 14 -L input.list -o output




The parMATT output


The following files are produced by parMATT on successful completion:

  • the coordinate representation of a multiple structural alignment, i.e., a PDB file with aligned coordinates of all 3D-models from the input;
  • the sequence representation of a multiple structural alignment, i.e., a sequence alignment file in FASTA format;
  • a text file with a summary of the input PDBs (the pairwise comparison tree) and the output superimposition (number of residues in the core alignment, RMSD of the core alignment, the MATT raw score and the sequence representation of the alignment in the PHYLIP format);
  • a Rasmol script to highlight aligned residues.




The parMATT example

The example is described in details in the parMATT's User Manual. This section provides download links to the input set and output data.

Download file 3.20.20.80_input.tar.gz with the input set.

Download file 3.20.20.80_output.tar.gz with the parMATT output for this example.




The parMATT license


parMatt is licensed under the GNU public license version 2.0.




Contacts and support

Max Shegai






Flag Counter