easyAmber

A comprehensive toolbox to automate the molecular dynamics simulation of proteins




The easyAmber is a comprehensive set of programs to automate the molecular dynamics routines implemented in the Amber package. The toolbox can address a wide set of tasks in computational biology struggling to account for protein flexibility, and supports the full-atom model building, optimization/equilibration of the molecular system, classical/conventional and accelerated molecular dynamics simulations. The easyAmber software takes the molecular dynamics to the next level in terms of usability for complex processing of large volumes of data. It implements advanced MD protocols, but is highly automated and easy-to-operate to attract a broad audience. The toolbox can be used on a personal desktop station equipped with a gaming GPU-accelerator, as well as help to manage huge workloads on a powerful supercomputer.

The molecular dynamics is a very complex approach, and the general-purpose protocols implemented in our toolbox may require further customization to fit the particular research objective. We believe that any potential obstacles will be compensated by the overall simplicity of the use of the new software, thus promoting the value of molecular modeling among general biologists. We expect the easyAmber will contribute to a wide-spread implementation of the molecular dynamics at a daily laboratory routine, supporting a recent trend away from inefficient "static" approaches in protein sciences towards a deeper understanding of the dynamics in protein structures.

The software is freely available for download with no login requirement.




Press the button to ask a question about using the easyAmber toolbox




Publication and tutorial

The definitive guide to the easyAmber is provided in the following publication:

Suplatov D., Sharapova Y., Švedas V. (2020) easyAmber: a comprehensive toolbox to automate the molecular dynamics simulation of proteins, submitted

Please refer to this publication for a step-by-step practical guide to the easyAmber.





Page contents:





[to the top]
General information


The easyAmber is a set of three master-scripts to automate the configuration and execution of the MD routines provided by the Amber package, and several auxiliary scripts for data management. All programs are continuously updated to implement new functions and improve usability. The prerequisites and download links to files are provided below.

License

This program is a free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or of any later version.

Prerequisites

The easyAmber toolbox is a supplement, not a replacement, for the Amber molecular modeling suite. The original Amber and AmberTools software packages have to be obtained and installed manually by the user prior to running the easyAmber. The $AMBERHOME environmental variable should point to the Amber+AmberTools installation folder, and will be probed by the easyAmber to validate the dependencies and binary files. The easyAmber toolbox is compatible with Amber v.16+ and AmberTools v.17+ (i.e., the older versions of AmberTools do not feature the FF15IPQ force-field, see section). It is recommended to install the most recent Amber18 and AmberTools19 with the GPU support.

In addition, the following open-source or freeware software should be installed for full compatibility:





[to the top]
Download section

Title Role Description Version Date Link
autoligand.pl Master Assists the preparation of the Amber parameter files for a custom low-molecular-weight ligand 2.0 2019-12-15 [link]
automodel.pl Master Assists the preparation of the Amber parameter files for full-atom molecular system of a protein or a protein-ligand
complex in a cubic water box
2.0 2019-12-15 [link]
autorun.pl Master Accommodates the seven-step MD simulation pipeline: the initial optimization of the molecular system (i.e., three-step energy minimization with water relaxation), heating (in the NVT ensemble), equilibration (in the NPT
ensemble), followed by the classical/conventional MD (in the NVT ensemble), and
optionally concluded by the accelerated MD simulation (in the NVT ensemble)
2.0 2019-12-15 [link]
pdbwat2amber.sh Aux Renames crystallographic solvent molecules according to the Amber naming scheme 2.0 2019-12-15 [link]
catnc.sh Aux Concatenate NetCDF trajectories of a crashed run into a single file 2.0 2019-12-15 [link]
stripWater.sh Aux Removes the solvent from the trajectories to reduce the file size and facilitate further expert analysis 2.0 2019-12-15 [link]
snapshot.sh Aux Converts the NetCDF binary trajectory into
PDB snapshots
2.0 2019-12-15 [link]





[to the top]
Supplementary technical documentation

The definitive guide to the easyAmber was published elsewhere (see "Publication and tutorial"). This on-line documentation is a supplement to the step-by-step practical guide provided in that paper.


Table of contents:


[return to toc]
Fidelity of the initial protein 3D-structure
Please note, that easyAmber is a set of tools to assist the molecular dynamics simulation starting from the protein/protein-ligand complex that has to be provided by the user. The easyAmber does not cover the construction of this initial molecular model. In most cases, the protein 3D-structures available in the PDB can be submitted to the MD straight ahead. However, sometimes PDB entries can be missing flexible loop regions, or implement ambiguous formatting (e.g., duplicate numbering of amino acid residues), etc. These minor errors have to be corrected using an appropriate molecular modeling software (e.g., Modeller) prior to running the easyAmber.


[return to toc]
Define new covalent bonds
Definition of covalent bonds beyond the protein backbone can be loaded into the automodel.pl master-script using the -bonds= command-line flag.
In particular, the disulfide bridges will not be created automatically even if amino acid residues have already been renamed to CYX in the PDB. E.g., to create an S-S bond between Cysteines 318 and 335, set the respective residue names to CYX in the PDB file, and create a plain text file entitled SSBONDS.dat with the following content:


bond PROT.318.SG PROT.335.SG

Run the automodel.pl master-script with an additional command-line flag to load the new bond into the Leap utility:

automodel.pl input.pdb ... -bonds=SS_BONDS.dat

Please note, that the residue numbering in Leap may differ from the original numbering in the PDB. To learn the new numbering load the PDB into the Leap utility (i.e., manually start tleap in the command line, then run PROT=loadpdb file.pdb) and use the desc command (i.e., desc PROT).


[return to toc]
Define new atom types
Definition of new atom types can be loaded into the automodel.pl master-script using the -atoms= command-line flag. E.g., to accommodate AMBER parameters for phosphates introduced by Steinbrecher et al., you should create a plain text file entitled PO4_ATOMS.dat with the following content:

addAtomTypes {
{ "OP" "O" "sp3" }
{ "OQ" "O" "sp3" }
{ "OR" "O" "sp3" }
{ "OT" "O" "sp3" }
{ "OV" "O" "sp3" }
{ "OX" "O" "sp3" }
{ "OZ" "O" "sp3" }
}


Run the automodel.pl master-script with an additional command-line flag to load the new atom types into the Leap utility:

automodel.pl input.pdb ... -atoms=PO4_ATOMS.dat



[return to toc]
Load auxiliary force-fields
Auxiliary force-fields can be loaded into the automodel.pl master-script using the -ffmisc= command-line flag. E.g., to simulate a protein in complex with a linear oligosaccharide you would need to load a force-field for the protein (e.g., FF14SB) and the GLYCAM force-field for the ligand. First, you should edit the PDB file of the corresponding complex to ensure that atom names and residue names of your low-molecular-weight compound comply with the GLYCAM nomenclature. Then, create a plain text file entitled load_GLYCAM.txt with the following content:

source leaprc.GLYCAM_06j-1

Finally, run the automodel.pl master-script with an additional command-line flag to load the GLYCAM force-field into the Leap utility:

automodel.pl input.pdb ... -ffmisc=load_GLYCAM.txt



[return to toc]
The autorun.pl protocol fails on the step #2
In some cases, the step #2 can fail with the following error:

ERROR: Calculation halted.
Periodic box dimensions have changed too much from their initial values.


In general, this does not imply any specific problem with your protein system, but rather is due to the inability of the GPU code (which is used by default) to automatically reorganize grid cells. A simple workaround would be to restart just this step #2 on a CPU build of the software. Use either of the two commands to run pmemd.MPI or sander.MPI (i.e., the results will be qualitatively the same, but pmemd.MPI is faster):

autorun.pl complex_amber pmemdmpi=true uselocalmpi=true
autorun.pl complex_amber sandermpi=true uselocalmpi=true cpu=4


Wait for the step #2 to complete, then press Ctrl-C to terminate, and finally restart on GPU using the regular command:

autorun.pl complex_amber nofree=true

This error should occur only once, i.e., on the step #2. If it emerges again at a later stage, this could indicate a more significant problem with your protein model.


[return to toc]
Running autorun.pl on a supercomputer
The autorun.pl utility can be used to execute the long-term MD simulations using the Slurm Workload Manager on a computing cluster or a supercomputer. This makes sense for the "production runs" of the conventional MD (step #6) and accelerated MD (step #7).

Each supercomputer system has its own unique hardware configuration. Below we provide several scenarios to accommodate the most common designs. If your configuration is not provided below, don't hesitate to send us a message with the details of the desired system configuration, and we will implement the corresponding solution into the autorun.pl master-script for everyone's benefit. You could also edit the autorun.pl master-script yourself to change the command used to query the Slurm manager.

Create a new task to run step #6 on a single GPU card of a single node in the queue compute:

autorun.pl dhfr_amber freeonly=true slurm=true gpunodes=1 queue=compute maxtime=3000 script=run

The following command will be used to query the Slurm workload manager:

sbatch -N 1 -p compute -t 3000 run /usr/local/amber18/bin/pmemd.cuda -i dhfr_amber.step6_free.conf -p dhfr_amber.prmtop -c dhfr_amber.step6_free.rst_bkp6 -o dhfr_amber.step6_free.out -r dhfr_amber.step6_free.rst -inf dhfr_amber.step6_free.mdinfo -x dhfr_amber.step6_free.nc

To add a queue name that is supported by your computing cluster open the script in the text editor, search for the "%cpu_per_node" hashtable at the top of the code, and add your queue by specifying the number of physical CPU cores per code, e.g.:

#Setup allowed cluster queues and number of CPUs per node
my %cpu_per_node = ("yourqueue" => 12, "compute" => 14);



Create a new task to run step #6 on a four nodes with a single GPU per node in the queue compute:

autorun.pl dhfr_amber freeonly=true slurm=true gpunodes=4 queue=compute maxtime=3000 script=ompi

The following command will be used to query the Slurm workload manager:

sbatch -N 4 -p compute -t 3000 ompi /usr/local/amber18/bin/pmemd.cuda.MPI -i dhfr_amber.step6_free.conf -p dhfr_amber.prmtop -c dhfr_amber.step6_free.rst_bkp6 -o dhfr_amber.step6_free.out -r dhfr_amber.step6_free.rst -inf dhfr_amber.step6_free.mdinfo -x dhfr_amber.step6_free.nc

To add a queue name that is supported by your computing cluster open the script in the text editor, search for the "%cpu_per_node" hashtable at the top of the code, and add your queue by specifying the number of physical CPU cores per code, e.g.:

#Setup allowed cluster queues and number of CPUs per node
my %cpu_per_node = ("yourqueue" => 12, "compute" => 14);



Create a new task to run step #6 on two GPU cards of a single node in the queue pascal:

autorun.pl dhfr_amber freeonly=true slurm=true gpunodes=1 queue=pascal maxtime=3000 script=ompi

The following command will be used to query the Slurm workload manager:

export CUDA_VISIBLE_DEVICES=0,1; sbatch -N 1 -n 2 -p pascal -t 3000 ompi /usr/local/amber18/bin/pmemd.cuda.MPI -i dhfr_amber.step6_free.conf -p dhfr_amber.prmtop -c dhfr_amber.step6_free.rst_bkp6 -o dhfr_amber.step6_free.out -r dhfr_amber.step6_free.rst -inf dhfr_amber.step6_free.mdinfo -x dhfr_amber.step6_free.nc

To add a queue name that is supported by your computing cluster with two GPU cards per node open the script in the text editor, search for the "%cpu_per_node" hashtable at the top of the code, and add your queue by specifying the number of physical CPU cores per code, e.g.:

#Setup allowed cluster queues and number of CPUs per node
my %cpu_per_node = ("yourqueue" => 12, "pascal" => 12);


Then search for the "%gpu_per_node" hashtable right below the previous one, and add your queue by specifying the number of GPUs per node, e.g.:

#Setup cluster queues with MULTIPLE Graphical processing units
# DO NOT APPEND QUEUE WITH SINGLE-GPU NODES !!!
my %gpu_per_node = ("pascal" => 2, "yourqueue" => 2);



Create a new task to run step #6 on two nodes with four GPU cards per node in the custom queue yourqueue:
First, add a new queue name that is supported by your cluster. Open the script in the text editor, search for the "%cpu_per_node" hashtable at the top of the code, and add your queue by specifying the number of physical CPU cores per code, e.g.:

#Setup allowed cluster queues and number of CPUs per node
my %cpu_per_node = ("yourqueue" => 12, "pascal" => 12);


Second, if your queue contains more than one GPU per node, search for the "%gpu_per_node" hashtable right below the previous one, and add your queue by specifying the number of GPUs per node, e.g.:

#Setup cluster queues with MULTIPLE Graphical processing units
# DO NOT APPEND QUEUE WITH SINGLE-GPU NODES !!!
my %gpu_per_node = ("pascal" => 2, "yourqueue" => 4);


Finally, execute the command:

autorun.pl dhfr_amber freeonly=true slurm=true gpunodes=2 queue=yourqueue maxtime=3000 script=ompi

The following command will be used to query the Slurm workload manager:

export CUDA_VISIBLE_DEVICES=0,1,2,3; sbatch -N 2 -n 8 -p yourqueue -t 3000 ompi /usr/local/amber18/bin/pmemd.cuda.MPI -i dhfr_amber.step6_free.conf -p dhfr_amber.prmtop -c dhfr_amber.step6_free.rst_bkp6 -o dhfr_amber.step6_free.out -r dhfr_amber.step6_free.rst -inf dhfr_amber.step6_free.mdinfo -x dhfr_amber.step6_free.nc


Create a new task to run step #6 on a two nodes each equipped with a single 14-core CPU in the queue compute:

autorun.pl dhfr_amber freeonly=true slurm=true clustercpu=28 queue=compute maxtime=3000 script=ompi

The following command will be used to query the Slurm workload manager:

sbatch -N 2 --ntasks-per-node=14 -p compute -t 3000 ompi /usr/local/amber18/bin/pmemd.MPI -i dhfr_amber.step6_free.conf -p dhfr_amber.prmtop -c dhfr_amber.step6_free.rst_bkp6 -o dhfr_amber.step6_free.out -r dhfr_amber.step6_free.rst -inf dhfr_amber.step6_free.mdinfo -x dhfr_amber.step6_free.nc

To change the name of the queue or change the number of CPU-cores per node open the script in the text editor, search for the "%cpu_per_node" hashtable at the top of the code, and add your queue by specifying the number of physical CPU cores per code, e.g.:

#Setup allowed cluster queues and number of CPUs per node
my %cpu_per_node = ("yourqueue" => 12, "pascal" => 12);



The following Slurm launch scripts are supported:

autorun.pl dhfr_amber script=ompi ...
autorun.pl dhfr_amber script=impi ...
autorun.pl dhfr_amber script=run ...


To set the uptime for your task to 10000 minutes use the dedicated flag:

autorun.pl dhfr_amber maxtime=10000 ...


To print the launch command to the run.sh script without actually trying to execute it (i.e., good for manual launch) use the dedicated flag:

autorun.pl dhfr_amber dummy=true ...

To run step #7 (accelerated MD) instead of step #6 (conventional MD):

autorun.pl dhfr_amber amdonly=true ...





[to the top]
Acknowledgments

This work was supported by the Russian Foundation for Basic Research
grant #19-04-01297.




Flag Counter