User Guide to PIPSA

User Guide to PIPSA 3.0
(Protein Interaction Property Similarity Analysis)

The PIPSA similarity analysis procedure consists of several steps :

(0) preparation step - making a directory for similarity calculations and arranging pdb files there
(1) calculating protein interaction field grid
(2) calculating similarity matrix from pdb files and protein interaction field grids
(2a) adding additional protein(s) to an already processed set (without repeating previously done pair-wise similarity calculations)
(3) phylogenic tree anaysis or other visualisation
(4) correlate kinetic parameters with average interaction field differences

The PIPSA programs can be run using scripts provided in the scr/ subdirectory of the PIPSA distribution.
A combination of these scripts allows similarity analysis of proteins starting from a set of their pdb files.
The scripts can be used in meta-scripts to run all calculations at once; they also can be modified to meet user requirements.

The scripts are described in the next session with numbers (e.g. 1+2) referring to the steps of the PIPSA procedure given above.

Scripts

Data files

Programs

Auxiliary programs and scripts

Calculation parameters

Scripts

Below is a list of low-level scripts which may be used in combination to perform the similarity analysis starting from pdb files.

(1+2) To calculate similarity matrix analytically

preparations needed Make a directory for pipsa analysis (e.g. call it "pdbs/") and put pdb files there;
make a list of pdb files (one line per pdb filename) to be analysed in the file "pdbnames" (see an example in the exa/ directory of the PIPSA distribution) .

do_pipsa_analyt_sim Computes analytically monopole+dipole electrostatic potential similarity matrix from pdb files and post-processes it
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_pdbs_dir = the directory where pdb files are located
- assumes pdb files in pipsa_pdbs_dir/ and the list of pdb files to analyse in "pdbnames"
- will go to pipsa_pdbs_dir/, compute similarity matrix, distance matrix and kinemage representations there > N.B. The value of the sphere, for which it is best to compute the potential similarity here is 9.815 , which is suitable for PH domains (must be proportionally larger if proteins in the set are larger than PH domains)

preparations needed	Make a directory for pipsa analysis (e.g. call it "pdbs/") and put pdb files there; make a list of pdb files (one line per pdb filename) to be analysed in the file "pdbnames" (see an example in the exa/ directory of the PIPSA distribution) .
do_pipsa_analyt_sim	Computes analytically monopole+dipole electrostatic potential similarity matrix from pdb files and post-processes it - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_pdbs_dir = the directory where pdb files are located - assumes pdb files in pipsa_pdbs_dir/ and the list of pdb files to analyse in "pdbnames" - will go to pipsa_pdbs_dir/, compute similarity matrix, distance matrix and kinemage representations there > N.B. The value of the sphere, for which it is best to compute the potential similarity here is 9.815 , which is suitable for PH domains (must be proportionally larger if proteins in the set are larger than PH domains)

(1) To calculate the GRID interaction fields

preparations needed Make a directory for pipsa analysis (e.g. call it "grid/") and put GRID readable pdb files in its gpdbs/ subdirectory;
make a list of pdb files (one line per pdb filename) to be analysed in the file gpdbs/pdbnames

do_pipsa_GRID_prep Prepares GRID calculations
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_wrk_dir = the directory for pipsa calculations, where grid readable pdb files are placed in gpdbs/ subdirectory and pdb file list is in gpdbs/pdbnames
- assumes that the pdb files are in pipsa_wrk_dir/gpdbs/
- will create directory pipsa_wrk_dir/grid and generate 2 files there: a list of protein names "names" and the file "grid.in" for GRID calculations and copy data files needed for GRID calculations

do_pipsa_GRID_grids Computes GRID probe interaction field grids
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_grid_dir = the directory where GRID calculation are already prepared by the script do_pipsa_GRID_prep
$3 = grid_bin_dir = directory with GRID executables, the programs grin, grid and k2a will be used
- assumes grid readable pdb files in pipsa_grid_dir/../gpdbs/
- assumes that do_pipsa_GRID_prep has been executed and necessary files (names, grid.in) generated in pipsa_grid_dir/
- will go to directory pipsa_grid_dir/ and compute and write GRID interaction field grids there

preparations needed	Make a directory for pipsa analysis (e.g. call it "grid/") and put GRID readable pdb files in its gpdbs/ subdirectory; make a list of pdb files (one line per pdb filename) to be analysed in the file gpdbs/pdbnames
do_pipsa_GRID_prep	Prepares GRID calculations - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_wrk_dir = the directory for pipsa calculations, where grid readable pdb files are placed in gpdbs/ subdirectory and pdb file list is in gpdbs/pdbnames - assumes that the pdb files are in pipsa_wrk_dir/gpdbs/ - will create directory pipsa_wrk_dir/grid and generate 2 files there: a list of protein names "names" and the file "grid.in" for GRID calculations and copy data files needed for GRID calculations
do_pipsa_GRID_grids	Computes GRID probe interaction field grids - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_grid_dir = the directory where GRID calculation are already prepared by the script do_pipsa_GRID_prep $3 = grid_bin_dir = directory with GRID executables, the programs grin, grid and k2a will be used - assumes grid readable pdb files in pipsa_grid_dir/../gpdbs/ - assumes that do_pipsa_GRID_prep has been executed and necessary files (names, grid.in) generated in pipsa_grid_dir/ - will go to directory pipsa_grid_dir/ and compute and write GRID interaction field grids there

(1) To calculate the electrostatic potentials with UHBD

preparations needed Make a directory for pipsa analysis and put WHATIF readable pdb files without hydrogens in its opdbs/ subdirectory;
make a list of pdb files (one line per pdb filename) to be analysed in the file opdbs/pdbnames

do_pipsa_WHATIF Adds polar hydrogens to pdb files and converts them to UHBD readable format using WHATIF
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_wrk_dir   = the directory for pipsa calculations, where original pdb files are placed in opdbs/ subdirectory and pdb file list is in opdbs/pdbnames
$3 = whatif_exe      = WHATIF executable with path
- assumes original (without hydrogens) pdb files in pipsa_wrk_dir/opdbs/ and the list of pdb filenames in pipsa_wrk_dir/opdbs/pdbnames
- will make directory pipsa_wrk_dir/pdbs and convert original pdbs to pdbs with polar hydrogens
N.B. Current version of whatif2uhbd can handle only single chain proteins correctly

alternative to above 2 steps Make a directory for pipsa analysis and put UHBD readable pdb files in its pdbs/ subdirectory;
make a list of pdb files (one line per pdb filename) to be analysed in the file pdbs/pdbnames

do_pipsa_UHBD_prep Prepares UHBD calculations
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_wrk_dir   = the directory for pipsa calculations, where uhbd readable pdb files are placed in pdbs/ subdirectory and pdb file list is in pdbs/pdbnames
- assumes that pdb files are in pipsa_wrk_dir/pdbs/
- will create directory pipsa_wrk_dir/uhbd and generate 2 files there: list of protein names "names" and the file "uhbd.in" for UHBD calculations

do_pipsa_UHBD_chk Checks if parameters to all atoms in the pdb files can be assigned in UHBD calculations
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_uhbd_dir = the directory where UHBD calculation are already prepared by the script do_pipsa_UHBD_prep
$3 = uhbd_executable = location of uhbd executable
- assumes uhbd readable pdb files in pipsa_uhbd_dir/../pdbs/
- assumes that the file with protein names "names" generated by the script do_pipsa_UHBD_prep or manually
- will go to directory pipsa_uhbd_dir/ and check if parameters (charges and radii) for electrostatic calculations can be assigned

do_pipsa_UHBD_grids Computes electroststic potential grids with UHBD
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_uhbd_dir = the directory where the UHBD calculation
                       has already been prepared by the script do_pipsa_UHBD_prep
$3 = uhbd_executable = location of uhbd executable
- assumes uhbd readable pdb files in pipsa_uhbd_dir/../pdbs/
- assumes that do_pipsa_UHBD_prep has been executed and necessary files (names, uhbd.in) generated in pipsa_uhbd_dir/
- will go to directory pipsa_uhbd_dir/ and compute and write electrostatic potential grids there

preparations needed	Make a directory for pipsa analysis and put WHATIF readable pdb files without hydrogens in its opdbs/ subdirectory; make a list of pdb files (one line per pdb filename) to be analysed in the file opdbs/pdbnames
do_pipsa_WHATIF	Adds polar hydrogens to pdb files and converts them to UHBD readable format using WHATIF - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_wrk_dir = the directory for pipsa calculations, where original pdb files are placed in opdbs/ subdirectory and pdb file list is in opdbs/pdbnames $3 = whatif_exe = WHATIF executable with path - assumes original (without hydrogens) pdb files in pipsa_wrk_dir/opdbs/ and the list of pdb filenames in pipsa_wrk_dir/opdbs/pdbnames - will make directory pipsa_wrk_dir/pdbs and convert original pdbs to pdbs with polar hydrogens N.B. Current version of whatif2uhbd can handle only single chain proteins correctly
alternative to above 2 steps	Make a directory for pipsa analysis and put UHBD readable pdb files in its pdbs/ subdirectory; make a list of pdb files (one line per pdb filename) to be analysed in the file pdbs/pdbnames
do_pipsa_UHBD_prep	Prepares UHBD calculations - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_wrk_dir = the directory for pipsa calculations, where uhbd readable pdb files are placed in pdbs/ subdirectory and pdb file list is in pdbs/pdbnames - assumes that pdb files are in pipsa_wrk_dir/pdbs/ - will create directory pipsa_wrk_dir/uhbd and generate 2 files there: list of protein names "names" and the file "uhbd.in" for UHBD calculations
do_pipsa_UHBD_chk	Checks if parameters to all atoms in the pdb files can be assigned in UHBD calculations - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_uhbd_dir = the directory where UHBD calculation are already prepared by the script do_pipsa_UHBD_prep $3 = uhbd_executable = location of uhbd executable - assumes uhbd readable pdb files in pipsa_uhbd_dir/../pdbs/ - assumes that the file with protein names "names" generated by the script do_pipsa_UHBD_prep or manually - will go to directory pipsa_uhbd_dir/ and check if parameters (charges and radii) for electrostatic calculations can be assigned
do_pipsa_UHBD_grids	Computes electroststic potential grids with UHBD - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_uhbd_dir = the directory where the UHBD calculation has already been prepared by the script do_pipsa_UHBD_prep $3 = uhbd_executable = location of uhbd executable - assumes uhbd readable pdb files in pipsa_uhbd_dir/../pdbs/ - assumes that do_pipsa_UHBD_prep has been executed and necessary files (names, uhbd.in) generated in pipsa_uhbd_dir/ - will go to directory pipsa_uhbd_dir/ and compute and write electrostatic potential grids there

(1) To calculate the electrostatic potentials with APBS

preparations needed Make a directory for pipsa analysis and put APBS readable pqr files in its pqrs/ subdirectory;
make a list of pqr files (one line per pqr filename) to be analysed in the file pqrs/pqrnames

do_pipsa_APBS_prep Prepares APBS calculations
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_wrk_dir = the directory for pipsa calculations, where APBS readable pqr files are placed in pqrs/ subdirectory and pqr file list is in pqrs/pqrnames
- assumes that pqr files are in pipsa_wrk_dir/pqrs/
- will create directory pipsa_wrk_dir/apbs and generate 2 files there: list of protein names "names" and the file "apbs.in" for APBS calculations

do_pipsa_APBS_grids Computes electrostatic potential grids
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_apbs_dir = the directory where APBS calculation has already been prepared by the script do_pipsa_APBS_prep
$3 = apbs_executable = location of apbs executable
- assumes apbs readable pqr files in pipsa_apbs_dir/../pqrs/
- assumes that do_pipsa_APBS_prep has been executed and necessary files (names, apbs.in) generated in pipsa_apbs_dir/
- will go to directory pipsa_apbs_dir/ and compute electrostatic potential grids there

preparations needed	Make a directory for pipsa analysis and put APBS readable pqr files in its pqrs/ subdirectory; make a list of pqr files (one line per pqr filename) to be analysed in the file pqrs/pqrnames
do_pipsa_APBS_prep	Prepares APBS calculations - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_wrk_dir = the directory for pipsa calculations, where APBS readable pqr files are placed in pqrs/ subdirectory and pqr file list is in pqrs/pqrnames - assumes that pqr files are in pipsa_wrk_dir/pqrs/ - will create directory pipsa_wrk_dir/apbs and generate 2 files there: list of protein names "names" and the file "apbs.in" for APBS calculations
do_pipsa_APBS_grids	Computes electrostatic potential grids - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_apbs_dir = the directory where APBS calculation has already been prepared by the script do_pipsa_APBS_prep $3 = apbs_executable = location of apbs executable - assumes apbs readable pqr files in pipsa_apbs_dir/../pqrs/ - assumes that do_pipsa_APBS_prep has been executed and necessary files (names, apbs.in) generated in pipsa_apbs_dir/ - will go to directory pipsa_apbs_dir/ and compute electrostatic potential grids there

(2) To calculate similarity matrix from GRID, UHBD, or APBS grids

preparations needed Either GRID, UHBD or APBS grids should have be computed in a directory (pipsa_sim_dir) and the pdb files should be available in a subdirectory (pipsa_sim_dir/../pdbs/)
The file "pdbnames" in pipsa_sim_dir/../pdbs/ should have the list of pdb file names (with extension) to be processed further
For versions *parts or *spheres the file with the definition of conical (filename "parts") or spherical (filename "spheres") part of the comparison region is required

do_pipsa_sim Computes similarity matrix and post-processes it
Uses 2potsim_skin program - similarity on complete "skins" of proteins
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_sim_dir   = directory, where similarity matrix should be computed; the grid files should also be there
- assumes grid files to be in pipsa_sim_dir and pdb files in pipsa_sim_dir/../pdbs/
- will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivatives

do_pipsa_sim_parts Computes similarity matrix and post-processes it
Similarity of 2 potential grids are calculated on the molecular skin and over a conical part of the space
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_sim_dir   = directory, where similarity matrix should be computed; the grid files should also be there
- assumes grid files to be in pipsa_sim_dir and pdb files in pipsa_sim_dir/../pdbs/
- assumes the conical part of the comparison region to be defined in the file "parts", see format of this file here
- will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivatives

do_pipsa_sim_spheres Computes similarity matrix and post-processes it
Similarity of 2 potential grids are calculated on the molecular skin and over a spherical part of the space
- needs 2 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_sim_dir   = directory, where similarity matrix should be computed; the grid files should also be there
- assumes grid files to be in pipsa_sim_dir and pdb files in pipsa_sim_dir/../pdbs/
- assumes the conical part of the comparison region to be defined in the file "spheres", see format of this file here
- will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivative

preparations needed	Either GRID, UHBD or APBS grids should have be computed in a directory (pipsa_sim_dir) and the pdb files should be available in a subdirectory (pipsa_sim_dir/../pdbs/) The file "pdbnames" in pipsa_sim_dir/../pdbs/ should have the list of pdb file names (with extension) to be processed further For versions parts or spheres the file with the definition of conical (filename "parts") or spherical (filename "spheres") part of the comparison region is required
do_pipsa_sim	Computes similarity matrix and post-processes it Uses 2potsim_skin program - similarity on complete "skins" of proteins - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_sim_dir = directory, where similarity matrix should be computed; the grid files should also be there - assumes grid files to be in pipsa_sim_dir and pdb files in pipsa_sim_dir/../pdbs/ - will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivatives
do_pipsa_sim_parts	Computes similarity matrix and post-processes it Similarity of 2 potential grids are calculated on the molecular skin and over a conical part of the space - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_sim_dir = directory, where similarity matrix should be computed; the grid files should also be there - assumes grid files to be in pipsa_sim_dir and pdb files in pipsa_sim_dir/../pdbs/ - assumes the conical part of the comparison region to be defined in the file "parts", see format of this file here - will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivatives
do_pipsa_sim_spheres	Computes similarity matrix and post-processes it Similarity of 2 potential grids are calculated on the molecular skin and over a spherical part of the space - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_sim_dir = directory, where similarity matrix should be computed; the grid files should also be there - assumes grid files to be in pipsa_sim_dir and pdb files in pipsa_sim_dir/../pdbs/ - assumes the conical part of the comparison region to be defined in the file "spheres", see format of this file here - will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivative

(2a) To add one additional protein to already processed set of proteins

preparations needed Steps 1 and 2 of the pipsa calculation should have been completed for the original set of proteins in a directory (pipsa_sim_dir/) and pdb files should be available in a subdirectory (pipsa_sim_dir/../pdbs/ ) > Grid files for the additional proteins should have been computed and should also be located in pipsa_sim_dir and their pdb files should be in pipsa_sim_dir/../pdbs/ .
The content of the file "names" in pipsa_sim_dir/ should be as from original computations.

do_pipsa_sim_add1 Adds extra protein to already processed set of proteins
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_sim_dir = directory, where similarity matrix was computed (by do_pipsa_sim) for the original set
$3 = the name of the protein to be added
- assumes grid and pdb file of additional protein are located in pipsa_sim_dir/$3.grd and pipsa_sim_dir/../pdbs/$3.pdb.
- assumes that original protein name list, similarity matrix, distance matrix and kinemage are in pipsa_sim_dir/names,sims.log,sims.mat and sims.kin, resp.
- will generate new protein name list, similarity matrix, distance matrix and kinemage under old names and move original ones to *-old
N.B. This script can be repeated to add more than 1 protein, in this case you may need to save the list of original set of proteins for later reference, because only one previous list remains after execution of this script.

preparations needed	Steps 1 and 2 of the pipsa calculation should have been completed for the original set of proteins in a directory (pipsa_sim_dir/) and pdb files should be available in a subdirectory (pipsa_sim_dir/../pdbs/ ) > Grid files for the additional proteins should have been computed and should also be located in pipsa_sim_dir and their pdb files should be in pipsa_sim_dir/../pdbs/ . The content of the file "names" in pipsa_sim_dir/ should be as from original computations.
do_pipsa_sim_add1	Adds extra protein to already processed set of proteins - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_sim_dir = directory, where similarity matrix was computed (by do_pipsa_sim) for the original set $3 = the name of the protein to be added - assumes grid and pdb file of additional protein are located in pipsa_sim_dir/$3.grd and pipsa_sim_dir/../pdbs/$3.pdb. - assumes that original protein name list, similarity matrix, distance matrix and kinemage are in pipsa_sim_dir/names,sims.log,sims.mat and sims.kin, resp. - will generate new protein name list, similarity matrix, distance matrix and kinemage under old names and move original ones to *-old N.B. This script can be repeated to add more than 1 protein, in this case you may need to save the list of original set of proteins for later reference, because only one previous list remains after execution of this script.

(3) To generate Phylip presentations from a "similarity distance" matrix

preparations needed Steps 1 and 2 of the pipsa calculation should have been completed for a set of proteins in a directory "pipsa_sim_dir/"

do_pipsa_phylip Post-processes similarity matrix and draw phylip diagrams and trees
- needs 3 parameters:
$1 = pipsa_distr_dir = pipsa distribution directory
$2 = pipsa_sim_dir = directory, where similarity matrix was computed (by do_pipsa_sim)
$3 = phylip_bin_dir = Phylip binaries directory, where neighbour, drawtree and drawgram programs can be found
- assumes protein names are in the file pipsa_sim_dir/names and similarity based distance matrix is in pipsa_sim_dir/sims.mat
- will go to pipsa_sim_dir/ and generate plots of trees and graphs
N.B. Only 1 fontfile from phylip is used here, phylip_fontfile from pipsa_distr_dir/data. If needed, replace it by another one from the Physlip package fonts.

(4) To correlate kinetic parameters with average interaction field differences

preparations needed	Steps 1 and 2 of the pipsa calculation should have been completed for a set of proteins in a directory "pipsa_sim_dir/"
do_pipsa_phylip	Post-processes similarity matrix and draw phylip diagrams and trees - needs 3 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_sim_dir = directory, where similarity matrix was computed (by do_pipsa_sim) $3 = phylip_bin_dir = Phylip binaries directory, where neighbour, drawtree and drawgram programs can be found - assumes protein names are in the file pipsa_sim_dir/names and similarity based distance matrix is in pipsa_sim_dir/sims.mat - will go to pipsa_sim_dir/ and generate plots of trees and graphs N.B. Only 1 fontfile from phylip is used here, phylip_fontfile from pipsa_distr_dir/data. If needed, replace it by another one from the Physlip package fonts.


preparations needed	Steps 1 and 2 of the pipsa calculation should have been completed for a set of proteins in a directory "pipsa_sim_dir/" Known experimental kinetic parameters entered in the file "exp" having the format: protein_name kinetic_parameter_value_if_known Blank space should be left for kinetic parameters to be predicted or not known. > There should be at least 2 known parameters (or at least one if the regression coefficient is given)
do_pipsa_qpipsa	Post-processes similarity matrix correlate - needs 2 parameters: $1 = pipsa_distr_dir = pipsa distribution directory $2 = pipsa_sim_dir = directory, where similarity matrix was computed (by do_pipsa_sim) - assumes known kinetic parameters to be entered in the file pipsa_sim_dir/exp and similarity matrix is in pipsa_sim_dir/sims.log - will go to pipsa_sim_dir/ and correlate data from "exp" and "sims.log" and predict missing kinetic parameters

Data files


grin.in, grub.dat	Standard input file for executable "grin" of GRID and the parameter file used by GRID; from the GRID distribution
parts	example of the file "parts" for 2potsim_skin_parts. The format of this file is: xr1,xr2,angle, where xr1 and xr2 are 3 coordinates (in Å) of the beginning and the end of the vector defining the direction of the conus and angle is angular extent (in degrees, with 180.0 defining a whole space) of the conus
phylip_neighbour.in phylip_drawgram.in phylip_drawtree.in phylip_fontfile	Input files for Phylip programs neighbour, drawgram and drawtree and font file
qtable.dat	Parameter file for UHBD, used to assign OPLS charge+radius parameters to atoms
qtable_f.dat	Parameter file for UHBD, modified to assign partial charges only to charged residue side-chains. Can be used with the pdb files without hydrogens (accuracy of electrostaic potentials are not guaranteed).
uhbd.in_tmpl / apbs.in_tmpl	template input command script for UHBD / APBS. This is rewritten by mkuhbdin / mkapbsin to adjust ionic strength conditions and the center of the electrostatic potential grid
uhbd_chk.in	Input file for UHBD to check if parameters can be assigned to all atoms from pdb files before doing electrostatic calculations
whatif_addH.in	Input command script for WHATIF to add hydrogens to the pdb file

Programs


2potsim_noskin.f	Computes the similarity index of 2 proteins, the interaction properties of which are given in two grid files (in UHBD format). In order to use only the points outside the protein, the interaction property grid should be assigned zero values in the protein interior before using this program. Input: - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd The program will use the values of electrostatic potentials at each point of the grids and derive the similarity index. The following will be computed: aa = square of the norm of the grid 1 bb = square of the norm of the grid 2 ab = scalar product of 2 potentials Output: - fort.66 - some information about constructed skins - standard output (fort.6) has the following data on one line: si_hodgkin = 2ab/(aa+bb) si_carbo = ab/sqrt(aabb) aa bb ab
2potsim_skin.f	Computes similarity of 2 potential grids on the molecular skin Input: - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd - command line arg -p1 - the name of the file with atom coordinates of protein 1 in PDB format, default pdb1.pdb - command line arg -p2 - the name of the file with atom coordinates of protein 2 in PDB format, default pdb2.pdb - command line arg -pr - probe radius, default is 3 Å - command line arg -sk - skin thickness, default is 4 Å The program will construct 2 skins (for protein 1 and 2) having thickness "skin" and at distance "probes" from the van der Waals surface of the proteins (i.e. from "probes" to "probes+skin" distance), using the points of the potential grids. The potential values outside this skin will not be used. The followings will be computed: np1 = no of points of the skin of the protein 1 np2 = no of points of the skin of the protein 2 npoi = no of points of the intersection of 2 skins aa0 = square of the norm of the grid 1 on its skin aa = square of the norm of the grid 1 on intersection of the skins bb0 = square of the norm of the grid 2 on its skin bb = square of the norm of the grid 2 on intersection of the skins ab = scalar product of 2 potentials (on intersection of skins) Output: - fort.66 - some info about constructed skins - standard output has following data in one line: si_hodgkin = 2ab/(aa+bb) si_carbo = ab/sqrt(aabb) aa bb ab aa0 bb0 si_hodgkin_shape = 2.float(npoi)/float(np1+np2) si_carbo_shape = float(npoi)/sqrt(float(np1np2)) np1 np2 npoi
2potsim_skin_parts.f	Computes similarity of 2 potential grids on the molecular skin and over a conical part of the space. Input: - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd - command line arg -p1 - the name of the file with atom coordinates of protein 1 in PDB format, default pdb1.pdb - command line arg -p2 - the name of the file with atom coordinates of protein 2 in PDB format, default pdb2.pdb - command line arg -pa - the name of the file with the list of directions default name "parts" - a list of (up to 99) directions and angles to define conical parts; the format of the file "parts" is: xr1,xr2,angle, where xr1 and xr2 are 3 coordinates (in Å) of the beginning and the end of the vector defining the direction of the conus and angle is angular extent (in degrees, with 180.0 defining a whole space) of the conus (see an example in data/ diirectoy of the pipsa distribution). - command line arg -pr - probe radius, default is 3 Å - command line arg -sk - skin thickness, default is 4 Å + default values are used if no input given The program will construct 2 skins (for protein 1 and 2) having thickness "skin" and at distance "probes" from van der Waals surface of the proteins (i.e. from "probes" to "probes+skin" distance), using the points of the potential grids. The potential values outside this skin and outside the comparison region (conical region or regions here) will not be used. The followings will be computed: np1 = no of points of the skin of the protein 1 np2 = no of points of the skin of the protein 2 npoi = no of points of the intersection of 2 skins aa0 = square of the norm of the grid 1 on its skin aa = square of the norm of the grid 1 on intersection of the skins bb0 = square of the norm of the grid 2 on its skin bb = square of the norm of the grid 2 on intersection of the skins ab = scalar product of 2 potentials (on intersection of skins) Output: - fort.66 - some info about constructed skins - standard output has following data in one line: si_hodgkin = 2ab/(aa+bb) si_carbo = ab/sqrt(aabb) aa bb ab aa0 bb0 si_hodgkin_shape = 2.float(npoi)/float(np1+np2) si_carbo_shape = float(npoi)/sqrt(float(np1np2)) np1 np2 npoi (all this information is printed for every part defined in the file "parts")
2potsim_skin_spheres.f	Computes similarity of 2 potential grids on the molecular skin and over a spherical part of the space. Input: - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd - command line arg -p1 - the name of the file with atom coordinates of protein 1 in PDB format, default pdb1.pdb - command line arg -p2 - the name of the file with atom coordinates of protein 2 in PDB format, default pdb2.pdb - command line arg -pa - the name of the file with the list of comparison region centers and extents, default name "spheres" - a list of (up to 999) centers and radii to define spherical regions; the format of the file "spheres" is: xc,radius, where xc is 3 coordinates (in Å units) of the sphere center and radius is the radius of this spherical comparison region (in Å) (see an example in data/ diirectoy of the pipsa distribution). - command line arg -pr - probe radius, default is 3 Å - command line arg -sk - skin thickness, default is 4 Å + default values are used if no input given The program will construct 2 skins (for protein 1 and 2) having thickness "skin" and at distance "probes" from van der Waals surface of the proteins (i.e. from "probes" to "probes+skin" distance), using the points of the potential grids. The potential values outside this skin and outside the comparison region (sphere or a set of spheres here) will not be used. The followings will be computed: np1 = no of points of the skin of the protein 1 np2 = no of points of the skin of the protein 2 npoi = no of points of the intersection of 2 skins aa0 = square of the norm of the grid 1 on its skin aa = square of the norm of the grid 1 on intersection of the skins bb0 = square of the norm of the grid 2 on its skin bb = square of the norm of the grid 2 on intersection of the skins ab = scalar product of 2 potentials (on intersection of skins) amb = average difference of the potentials a and b (a-b)/npoi ambl = log(sum[exp(a)]/sum[exp(b)]) ambm = log(sum[exp(-a)]/sum[exp(-b)]) Output: - fort.66 - some info about constructed skins - standard output has following data in one line: si_hodgkin = 2ab/(aa+bb) si_carbo = ab/sqrt(aabb) aa bb ab aa0 bb0 si_hodgkin_shape = 2.float(npoi)/float(np1+np2) si_carbo_shape = float(npoi)/sqrt(float(np1np2)) np1 np2 npoi amb ambl ambm (all this information is printed for every sphere defined in the file "spheres")
2potsim_skin_spheresNN.f	Slightly modified 2potsim_skin_spheres.f printing also average potential values for proteins 1 and 2
2potsim_skin_spheresU.f	Slightly modified 2potsim_skin_spheres.f doing analysis on entire skin, when the file spheres is empty
ccenter.f	Computes the geometric center of all atoms in a pdb file
grid_asc2bin.f	Converts grid from GRID ASCII format to UHBD binary format
mkapbsin.f mkuhbdin.f mkgridin.f	Programs to compute the average center of proteins from the output of ccenter.f, which is then used as the center for all interaction potential grids. Prints dispersion of the centers and size of proteins, this can be used to check the quality of superposition. Note that: If you use the recent apbs vesion 0.3.2 or newer, the program mkapbsin.f should be replaced with mkapbs-0.3.2-in.f from the src/ directory and recompiled, because the grid origin writing is changed (corrected) in the later versions of apbs.
mkdismx.f	Converts similarity matrix to distance matrix
mkkin.f	Reads the similarity matrix, computes distances between proteins as defined by their similarity index, and represents proteins as points in 3D space, such that pairwise distances between each pair of proteins are represented by distances between corresponding points
modeller2grin.f	The program to rename some atom names from MODELLER output to the names readable by the program grin of GRID
npotsim.f	Drives similarity index calculations with 2potsim* Input: - command line arg -pg - similarity calculation program name: 2potsim_noskin, 2potsim_skin or 2potsim_skin_parts, default is ../bin/2potsim_noskin - command line arg -fp - the directory where pdb files are located, default is ../pdbs command line arg -fn - the name of the file with the names of proteins, each of which should have corresponding PDB file in the directory ../pdbs/, and corresponding potential file in ./ , default is "names" - command line arg -lg - the name of the similarity matrix file, default is "sims.log" - command line arg -pr - the value of probe radius, default is 3 Å - command line arg -sk - the value of skin thickness, default is 4 Å - command line arg -pa - the name of the file with the list of directions, default name "parts" - a list of directions and angles to define conical parts Note that: + the program implies that it is executed in the directory, where grid files are located and expects pdb files to be located in the directory defined by command line arg -fp + all grid files corresponding to protein names in the file "names" must have extension .grd and all pdb files must have extension .pdb
n1potsim.f	Drives similarity index calculations with 2potsim* when 1 extra protein to be added to the set of originally processed proteins Input: - command line arg -pg - similarity calculation program name; 2potsim_noskin, 2potsim_skin or 2potsim_skin_parts, default is ../bin/2potsim_noskin - command line arg -fp - the directory where pdb files are located, default is ../pdbs - command line arg -fn - the name of the file with the names of original set of proteins, each of which should have corresponding PDB file in the directory ../pdbs/, and corresponding potential file in ./ , default is "names" - command line arg -p1 - the name of the protein to be added, should have PDB file in ../pdbs/, and the potential file in ./ - command line arg -lg - the name of the similarity matrix file, default is "sims.log" - command line arg -pr - the value of probe radius, default is 3 Å - command line arg -sk - the value of skin thickness, default is 4 Å - command line arg -pa - the name of the file with the list of directions, default name "parts" - a list of directions and angles to define conical parts Note that : > + the program should be executed in the directory, where grid files are located and expects pdb files to be located in the directory defined by command line arg -fp. > + all grid files corresponding to protein names in the file "names" must have extension .grd and all pdb files must have extension .pdb + the file "names" after execution of this program will be renamed to "names-old" and a new file "names" will be created, which includes newly added protein name + the old similarity matrix file will be renamed to "sims.log-old" and new file "sims.log" (or any other name given after -lg) will be created which has indices for the added protein
nm1potsim.f	Simple operation: removes a given protein from pipsa analysis. This is done by reading the protein's name, removing it from the list (file "names") and from related to it similarity matrix ("sims.log") Input: - command line arg -fn - the name of the file with the names of original set of proteins, each of which should have corresponding PDB file in the directory ../pdbs/, and corresponding potential file in ./, default is "names" - command line arg -p1 - the name of the protein to be removed, its potential in ./ and pdb file in ../pdbs are not removed and not used. - command line arg -p - the same as above - command line arg -lg - the name of the similarity matrix file, default is "sims.log" - will be modified Note that + the program implies that it is executed in the directory, where grid files are located and expects pdb files to be located in ../pdbs/ subdirectory + the file "names" after execution of this program will be renamed to "names-old" and a new file "names" will be created, which does not have the removed protein name + the old similarity matrix file will be renamed to "sims.log-old" and new file "sims.log" (or any other name given after -lg) sill be created which does not have all entries related to the removed protein anymore + note that the grid and pdb files will not be removed. These need to be either removed separately, or replaced by a new versions, if subsequently a new version of a protein is supposed to be added
qdipsim.f	Computes pairwise electrostatic similarity of a list of proteins, based on their monopole and dipole moments Input: - command line arg -fn - the name of the file with the names of proteins, each of which should have corresponding PDB file in the current directory, default is "pdbnames" - command line arg -fd - the name of the file where dipole moment information will be written, default name "dipoles" - command line arg -r - the size of the proteins, approx average gyration radius, default value 9.815 Å, valid for PH domains The program will assign formal charges to all charged residues (+0.5 e for NHX of Arg, +1 for NZ of Lys, -0.5 for OEX of Glu and ODX of Asp, compute the monopole and dipole moments of protein. The similarity index is then computed following the analytical formula (6) from the Proteins paper, i.e. comparing monopole+dipole potentials at the sphere of some radius R. Default value of R is 9.815, i.e. parameter alpha (coded as scf) is 17 Å*-2. The following quantities are computed: aa = square of the norm of the dipole+monopole potential of protein 1 bb = square of the norm of the dipole+monopole potential of protein 2 ab = scalar product of 2 dipole+monopole potentials Output: - file "dipoles" where the following information about proteins are printed: the name of pdb-file total charge, the norm of the dipole moment, 3 (x,y,z) components of the dipole moment, number of charge sites - standard output has following data in one line: si_hodgkin = 2ab/(aa+bb) si_carbo = ab/sqrt(aa*bb) aa bb ab aa bb npoi
smNextopred.f	Reads sims.log and exp data, derives regression from known kinetic parameters (rate ratio vs ep difference)predicts unknown kinetic parameters. Input - command line arg -fl - the name of pipsa similarity log file, default is sims.log - command line arg -fe - the name of the file with experimental data, where in one line the protein name is followed by experimental data, when available and with nothing or 0.0 when needs to be predicted - command line arg -fo - the name of the output file with correlation between log(k1/k2) and (ep1-ep2), default is smNex2cor.out - command line arg -fp - the name of the output file with all predictions for each case, default is smNextopred.pre - command line arg -sn - the ep difference measure to be used: 1 - difference of average ep : av(ep1)-av(ep2) 2 - log of ratio of exp(ep) : log (sum(exp(ep1))/sum(exp(ep2))) 3 - log of ratio of exp(-ep) : log (sum(exp(-ep1))/sum(exp(-ep2))) default is 1 - command line arg -rc - user-defined correlation coefficient, default is define it correlating known cases, correlation is Drate = rc*Dpotential, i.e. how much rate changes by 1 kcal/mole change in potential Output - Standard out - prediction results with errors
uhbd_asc2bin.f	Converts grid from UHBD ASCII format to UHBD binary format
whatif2uhbd.f	The program to convert the WHATIF output file of protein coordinates to the format readable by UHBD. Note that the current version treats correctly only one-chain proteins, i.e. you need to do modifications if proteins having more than one chain need to be analysed .

Auxiliary programs and scripts


addlinks.pl	Adds a link before every line that has '(.*) show' in a .ps file. Basically, this program adds kewl code to the .ps files, so that when they are converted to .pdf files, one gets kewl links.
do_pipsa_UHBD2APBS	pipsa 2.0 script to compare UHBD grids and APBS grids (supposes that grids have the same spacing and origin and size)
do_pipsa_uho2pqr	Script to convert UHBD output to pqr
getnames.sh	Downloads names, long names, gene definitions from the SWISS-PROT database for the proteins
delphi2uhbd.f	Converts DELPHI output grid to UHBD ascii fomat grid, which can be then handled the same way as APBS output grid (i.e. converted to binary and used in similarity analysis). See the program header for compilation instructions.
gridinfo gridinfo.f	Gets information from the UHBD grid
grid2insight grid2insight.f	Converts GRID/UHBD format grid file to InsightII readable file
highlight_kin_groups highlight_kin_groups.f	Highlights specified groups of proteins in different colours
highlight_kin_points highlight_kin_points.f	Highlights specified proteins in red
insightII.HydrSurface.in	Input file for InsightII used to produce Hydrophobic Surfaces in .wrl, virtual reality modeling language format.
insightII.electSurface.in	Input file for InsightII used to produce Electrostatic Surfaces in .wrl, virtual reality modeling language format.
malign3d.2.sh	Aligns the sequences to the TEMPLATE.pdb sequence, using modeller and modeller.2.in script
malign3d.pl	Runs modeller 6 to create 3D multiple alignment of all the proteins in pdbs directory
malign3d.sh	Aligns the sequences to the TEMPLATE.pdb sequence, using whatif
malign3d.whatif.in	Script used by malign3d.sh
mkElecSurface.sh	Creates .wrl files that represent electrostatic surfaces. UHBD calculations should be done before the script is executed.
mkHydrSurface.sh	Creates .wrl files that represent hydrophobic surfaces. GRID calculatoins should be done before the script is executed.
mktree.sh	Creates the tree from the similarity matrix - prototype to do_pipsa_phylip
modeller.2.in	Script used by the modeller 4 in malign3d.2.sh program
orient.sh	Aligning the proteins using ORIENT command from the modeller. _very_ inaccurate
pqr2qcd pqr2qcd.f	Converts PQR format to UHBD readable QCD format
uho2pqr uho2pqr.f	Converts UHBD output to pqr

Some calculation parameters:

Some parameters defining interaction potentials can be changed to adjust the calculations to the case studied. For example, interaction potential grid dimensions might need to be increased for a set of larger proteins; different probes can be used in GRID calculations.

Parameter Parameter name (default) Where can be changed Where used

Grid maximal dimension im_max (110) src/maxdim.inc 2potsim*, grid_asc2bin, uhbd_asc2bin

Grid dimensions
Grid spacing
Probe for dielectric surface dime (65 65 65)
glen/(dime-1) (1.5)
srad (0.0) pipsa_wrk_dir/apbs/apbs.in after executing do_pipsa_APBS_prep script APBS

Grid dimensions
Grid spacing
Probe for dielectric surface dim (65)
spa (1.5)
nmap probe_radius_value (-) pipsa_wrk_dir/uhbd/uhbd.in after executing do_pipsa_UHBD_prep script UHBD

ionic strength - (50) scr/do_pipsa_APBS_prep
scr/do_pipsa_UHBD_prep APBS
UHBD

PROBE name - (PO4) scr/do_pipsa_GRID_prep GRID

GRID calculation parameters
for example grid dimension
imax (65) pipsa_wrk_dir/grid/grid.in/grin.in after executing do_pipsa_GRID_prep script GRID

Maximal number of proteins nprmx (999) src/maxdim.inc kapbsin, mkdismx, mkkin, mkuhbdin, n1potsim, npotsim, qdipsim

Maximal number of atoms per proteins
namx (999)
src/maxdim.inc ccenter

Parameter	Parameter name (default)	Where can be changed	Where used
Grid maximal dimension	im_max (110)	src/maxdim.inc	2potsim*, grid_asc2bin, uhbd_asc2bin
Grid dimensions Grid spacing Probe for dielectric surface	dime (65 65 65) glen/(dime-1) (1.5) srad (0.0)	pipsa_wrk_dir/apbs/apbs.in after executing do_pipsa_APBS_prep script	APBS
Grid dimensions Grid spacing Probe for dielectric surface	dim (65) spa (1.5) nmap probe_radius_value (-)	pipsa_wrk_dir/uhbd/uhbd.in after executing do_pipsa_UHBD_prep script	UHBD
ionic strength	- (50)	scr/do_pipsa_APBS_prep scr/do_pipsa_UHBD_prep	APBS UHBD
PROBE name	- (PO4)	scr/do_pipsa_GRID_prep	GRID
GRID calculation parameters for example grid dimension	imax (65)	pipsa_wrk_dir/grid/grid.in/grin.in after executing do_pipsa_GRID_prep script	GRID
Maximal number of proteins	nprmx (999)	src/maxdim.inc	kapbsin, mkdismx, mkkin, mkuhbdin, n1potsim, npotsim, qdipsim
Maximal number of atoms per proteins	namx (999)	src/maxdim.inc	ccenter

[Back to Index]

Privacy Imprint