UNRES web server
now available:UNRES web server
Laboratory of Molecular Modeling
Faculty of Chemistry
University of Gdansk
Wita Stwosza 63
80-308 Gdansk, Poland
Scheraga Group
Baker Laboratory of Chemistry
and Chemical Biology
Cornell University
Ithaca, NY 14853-1301, USA
December 4, 2014
The program incorporates the hierarchical-clustering subroutine, hc.f written by G. Murtagh (refs 1 and 2). The subroutine contains seven methods of hierarchical clustering.
The program runs cluster analysis of UNRES simulation results. There are two versions of the program depending on the origin of input conformation:
The source code of this version is deposited in clust-unres/src
The version developed for oligomeric proteins treats whole system as a single chain with dummy residues inserted. It also works for single chains but is not fully checked and it is recommended to use single-chain version for single-chain proteins.
It is recommended to use Cmake to install the whole package; please see Installation Guide.
Customize Makefile to your system. See section 7 of the description of UNRES for compiler flags that are used to created executables for a particular force field. There are already several Makefiles prepared for various systems and force fields.
Run make in the appropriate source directory version. CLUST-UNRES runs only in single-processor mode an CLUST-WHAM runs in both serial and parallel mode [only conformation-distance (rmsd) calculations are parallelized]. The parallel version uses MPI.
The program requires a parallel system to run. Depending on system, either the wham.csh C-shell script (in WHAM/bin directory) can be started using mpirun or the binary in the C-shell script must be executed through mpirun. See the wham.csh C-shell script and section 6 for the files processed by the program.
The C-shell script wham.csh is used to run the program (see the bin/WHAM directory). The data files that the script needs are mostly the same as for UNRES (see section 6 of UNRES description). In addition, the environmental variable CONTFUN specifies the method to assess whether two side chains are at contact; if EONTFUN=GB, the criterion defined by eq 8 of ref 4 is used to assess whether two side chains are at contact. Also, the parameter files from the C-shell scripts are overridden if the data from Hamiltonian MREMD are processed; if so, the parameter files are defined in the main input file.
The main input file must have inp extension. If it is INPUT.inp, the output files are as follows:
Coordinate input file COORD.ext, where ext denotes file extension in one of the following formats:
This file has the same structure as the UNRES input file; most of the data are input in a keyword-based form (see section 7.1 of UNRES description). The data are grouped into records, referred to as lines. Each record, except for the records that are input in non-keyword based form, can be continued by placing an ampersand (&) in column 80. Such a format is referred to as the data list format.
In the following description, the default values are given in parentheses.
An 80-character string from the first line is input.
(Data list format.)
Instead of IOPT=1, MINTREE and instead of IOPT=2 MINVAR can be specified
Amino-acid sequence
3-letter code: Sequence is input in format 20(1X,A3)
1-letter code: Sequence is input in format 80A1
This is the information about dihedral-angle restraints, if any are present. It is specified only when WITH_DIHED_CONSTR is present in the first record.
1st line: ndih_constr - number of restraints (free format)
2nd line: ftors - force constant (free format)
Each of the following ndih_constr lines:
idih_constr(i),phi0(i),drange(i) (free format)
1st line: NS, (ISS(I),I=1,NS) (free format)
2nd line: NSS, (IHPB(I),JHPB(I),I=1,NSS) (free format)
Because the input is in free format, each line can be split
If PDBREF is specified, filename with reference (experimental) structure, otherwise UNRES internal coordinates as the theta, gamma, alpha, and beta angles.
The main (with name INPUT_clust.out or INPUT_clust.out_000 for parallel runs) output file contains the results of clustering (numbers of families at different cut-off values, probabilities of clusters, composition of families, and rmsd values corresponding to families (0 if rmsd was not computed or read from WHAM-generated cx file).
The output files corresponding to non-master processors (INPUT_clust.out_xxx where xxx >0 contain only the information up to the clustering protocol. These files can be deleted right after the run.
Excerpts from the a sample output file are given below:
CLUST-UNRES:
THERE ARE 20 FAMILIES OF CONFORMATIONS FAMILY 1 CONTAINS 2 CONFORMATION(S): 42 -2.9384E+03 50 -2.9134E+03 Max. distance in the family: 14.0; average distance in the family: 14.0 FAMILY 2 CONTAINS 3 CONFORMATION(S): 13 -2.9342E+03 7 -2.8827E+03 10 -2.8682E+03
CLUST-WHAM:
AT CUTOFF: 200.00000 Maximum distance found: 137.82 Free energies and probabilities of clusters at 325.0 K clust efree prob sumprob 1 -76.5 0.25035 0.25035 2 -76.5 0.24449 0.49484 3 -76.4 0.21645 0.71129 4 -76.4 0.20045 0.91174 5 -75.8 0.08826 1.00000 THERE ARE 5 FAMILIES OF CONFORMATIONS FAMILY 1 WITH TOTAL FREE ENERGY -7.65228E+01 CONTAINS 548 CONFORMATION(S): 8363 -7.332E+013939 -7.332E+012583 -7.332E+017395 -7.332E+019932 -7.332E+01 5816 -7.332E+013096 -7.332E+012663 -7.332E+014099 -7.332E+016822 -7.332E+01 3176 -7.332E+017542 -7.332E+018933 -7.332E+017315 -7.332E+01 200 -7.332E+01. . 5637 -7.062E+018060 -7.061E+013797 -7.060E+018800 -7.057E+016295 -7.057E+01 6298 -7.057E+012332 -7.057E+012709 -7.057E+01 Max. distance in the family: 16.5; average distance in the family: 8.8 Average RMSD 8.22 A
The file with name COORD_clust.int contains the angles theta, gamma, alpha, and beta of all residues of the leaders (lowest UNRES energy conformations from consecutive families for CLUST-UNRES runs and lowest free energy conformations for CLUST-WHAM runs). The format is the same as that of the file output by UNRES; see section 9.1.1 of UNRES description.
For CLUST-WHAM runs, the first line contains more items:
number of family | (format i5) |
UNRES free energy of the conformation | (format f12.3) |
Free energy of the entire family | (format f12.3) |
number of disulfide bonds | (format i2) |
list disulfide-bonded pairs | (format 2i3) |
conformation class number (0 if not provided) | (format i10) |
The file with name COORD_clust.x contains the Cartesian coordinates of the alpha-carbon and side-chain-center coordinates. The coordinate format is as in section 9.1.2 of UNRES description and the first line contains the following items:
Number of the family | (format I5) |
UNRES free energy of the conformation | (format f12.3) |
Free energy of the entire family | (format f12.3) |
number of disulfide bonds | (format i2) |
list disulfide-bonded pairs | (format 2i3) |
conformation class number (0 if not provided) | (format i10) |
The PDB files are in standard format (see ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf). The ATOM records contain Cα coordinates (CA) or UNRES side-chain-center coordinates (CB). For oligomeric proteins chain identifiers are present (A, B, ..., etc.) and each chain ends with a TER record. Coordinates of a single conformation or multiple conformations The header (REMARK) records and the contents depends on cluster run type. The next subsections are devoted to different run types.
The files contain the members of the families obtained from clustering such that the lowest-energy conformation of a family is within ECUT kcal/mol higher in energy than the lowest-energy conformation. Again, within a family, only those conformations are output whose energy is within ECUT kcal/mol above that of the lowest-energy member of the family. Families and the members of a family within a family are ranked by increasing energy. The file names are:
COORD_xxxx.pdb where xxxx is the number of the family, if the family contains only one member of if only one member is output.
COORD_xxxx_yyy.pdb where xxxx is the number of the family and yyy is the number of the member of this family.
An example is the following:
REMARK R0001 ENERGY -2.93843E+03 ATOM 1 CA GLY 1 0.000 0.000 0.000 ATOM 2 CA HIS 2 3.800 0.000 0.000 ATOM 3 CB HIS 2 5.113 1.656 0.015 ATOM 4 CA VAL 3 5.927 -3.149 0.000 . . . ATOM 346 CB GLU 183 -43.669 -32.853 -7.320 TER CONECT 1 2 CONECT 2 4 3 . . . CONECT 341 343 342 CONECT 343 344 CONECT 345 346
where ENERGY is the UNRES energy. The CONECT records defined the Cα-Cα and Cα-SC connection.
The program generates a file for each family with its members and a summary file with ensemble-averaged conformations for all families. These are described in the two next sections.
For each family, the file name is COORD_TxxxK_yyyy.pdb, where yyyy is the number of the family and xxx is the integer part of the temperature (K). The first REMARK line in the file contains the information about the free energy and average rmsd of the entire cluster and, for each conformation, the initial REMARK line contains these quantities for this conformation. Same applies to oligomeric proteins, for which the TER records separate the chains and the ENDMDL record separates conformations. An example is given below.
REMARK CLUSTER 1 FREE ENERGY -7.65228E+01 AVE RMSD 8.22 REMARK 1BDD L18G full clust ENERGY -7.33241E+01 RMS 10.40 ATOM 1 CA VAL 1 18.059 -33.585 4.616 1.00 5.00 ATOM 2 CB VAL 1 18.720 -32.797 3.592 1.00 5.00 . . . ATOM 115 CA LYS 58 29.641 -44.596 -8.159 1.00 5.00 ATOM 116 CB LYS 58 27.593 -45.927 -8.930 1.00 5.00 TER CONECT 1 3 2 CONECT 3 5 4 . . CONECT 113 114 CONECT 115 116 TER REMARK 1BDD L18G full clust ENERGY -7.33240E+01 RMS 10.04 ATOM 1 CA VAL 1 3.174 2.833 -34.386 1.00 5.00 ATOM 2 CB VAL 1 3.887 2.811 -33.168 1.00 5.00 . . ATOM 115 CA LYS 58 16.682 6.695 -20.438 1.00 5.00 ATOM 116 CB LYS 58 18.925 5.540 -20.776 1.00 5.00 TER CONECT 1 3 2 CONECT 3 5 4 CONECT 113 114 CONECT 115 116 TER
The file name is COORD_T_xxxK_ave.pdb. The entries are in pairs; the first one is cluster-averaged conformation and the second is a family member which has the lowest rmsd from this average conformation. Computing average conformations is explained in section 2.5 of ref 3. Example excerpts from an entry corresponding to a given family are shown below.
REMAR AVERAGE CONFORMATIONS AT TEMPERATURE 300.00 REMARK CLUSTER 1 REMARK 2HEP clustering 300K ENERGY -8.22572E+01 RMS 3.29 ATOM 1 CA MET 1 -17.748 48.148 -19.284 1.00 5.96 ATOM 2 CB MET 1 -17.373 47.911 -19.294 1.00 6.34 ATOM 3 CA ILE 2 -18.770 49.138 -18.133 1.00 3.98 . . . ATOM 80 CB PHE 41 -14.353 44.680 -15.642 1.00 2.62 ATOM 81 CA ARG 42 -11.619 41.645 -13.117 1.00 4.06 ATOM 82 CB ARG 42 -11.330 40.378 -13.313 1.00 5.19 TER CONECT 1 3 2 CONECT 3 5 4 . . . CONECT 76 78 77 CONECT 78 79 CONECT 79 80 CONECT 81 82 TER REMARK 2HEP clustering 300K ENERGY -8.22572E+01 RMS 3.29 ATOM 1 CA MET 1 -37.698 40.489 -32.408 1.00 5.96 ATOM 2 CB MET 1 -38.477 39.426 -34.159 1.00 6.34 . . . ATOM 80 CB PHE 41 -35.345 50.342 -31.371 1.00 2.62 ATOM 81 CA ARG 42 -33.603 54.332 -27.130 1.00 4.06 ATOM 82 CB ARG 42 -33.832 53.074 -24.415 1.00 5.19 TER CONECT 1 3 2 CONECT 3 5 4 . . . CONECT 76 78 77 CONECT 78 79 CONECT 79 80 CONECT 81 82 TER
The file name is INPUT_clust.rms. It contains the upper-diagonal part of the matrix of rmsds between conformations and differences between their energies:
i,j,rmsd,energy(j)-energy(i) (format 2i5,2f10.5)
where i and j, j > i are the numbers of the conformations, rmsd is the rmsd between conformation i and conformation j and energy(i) and energy(j) are the UNRES energies of conformations i and j, respectively.
This file contains the PicTeX code of the clustering tree. The file name is INPUT_clust.tex. It should be supplemented with LaTeX preamble and final commands or incorporated into a LaTeX source and compiled with LaTeX. The picture is produced by running LaTeX followed by dvips, dvipdf or other command to convert LaTeX-generated dvi files into a human-readable files.
Dr. Adam Liwo
Faculty of Chemistry, University of Gdansk
ul. Wita Stwosza 63, 80-308 Gdansk Poland.
phone: +48 58 523 5124
fax: +48 58 523 5012
e-mail: adam(at)sun1.chem.univ.gda.pl
Dr. Cezary Czaplewski
Faculty of Chemistry, University of Gdansk
ul. Wita Stwosza 63, 80-308 Gdansk Poland.
phone: +48 58 523 5126
fax: +48 58 523 5012
e-mail: cezary.czaplewski(at)ug.edu.pl
Prepared by Adam Liwo, 02/19/12
LATEXversioin, 09/28/12
Revised by Adam Liwo, 12/04/14
This manual is available in the PDF format also.