User Guide to PIPSA 3.0
 (Protein Interaction Property Similarity Analysis
)


The PIPSA similarity analysis procedure consists of several steps :

 

The PIPSA programs can be run using scripts provided in the scr/ subdirectory of the PIPSA distribution.
A combination of these scripts allows similarity analysis of proteins starting from a set of their pdb files.
The scripts can be used in meta-scripts to run all calculations at once; they also can be modified to meet user requirements.

The scripts are described in the next session with numbers (e.g. 1+2) referring to the steps of the PIPSA procedure given above.

  •  Scripts
  • Data files
  • Programs
  • Auxiliary programs and scripts
  • Calculation parameters



  • Scripts

    Below is a list of low-level scripts which may be used in combination to perform the similarity analysis starting from pdb files.


    (1+2)  To calculate similarity matrix analytically
     
    preparations needed Make a directory for pipsa analysis (e.g. call it "pdbs/") and put pdb files there;
    make a list of pdb files (one line per pdb filename) to be analysed in the file "pdbnames" (see an example in the exa/ directory of the PIPSA distribution) .
    do_pipsa_analyt_sim Computes analytically monopole+dipole electrostatic potential similarity matrix from pdb files and post-processes it
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_pdbs_dir  = the directory where pdb files are located
    - assumes pdb files in  pipsa_pdbs_dir/ and the list of pdb files to analyse in "pdbnames"
    - will go to pipsa_pdbs_dir/, compute similarity matrix, distance matrix and kinemage representations there > N.B.  The value of the sphere, for which it is best to compute the potential similarity here is 9.815 , which is suitable for PH domains (must be proportionally larger if proteins in the set are larger than PH domains)


    (1) To calculate the GRID interaction fields
     
    preparations needed Make a directory for pipsa analysis (e.g. call it "grid/") and put GRID readable pdb files in its gpdbs/ subdirectory;
    make a list of pdb files (one line per pdb filename) to be analysed in the file gpdbs/pdbnames
    do_pipsa_GRID_prep Prepares GRID calculations
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_wrk_dir   = the directory for pipsa calculations, where grid readable pdb files are placed in gpdbs/ subdirectory and pdb file list is in gpdbs/pdbnames
    - assumes that the pdb files are in pipsa_wrk_dir/gpdbs/
    - will create directory pipsa_wrk_dir/grid and generate 2 files there: a list of protein names "names" and the file "grid.in" for GRID calculations and copy data files needed for GRID calculations
    do_pipsa_GRID_grids Computes GRID probe interaction field grids
    - needs 3 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_grid_dir  = the directory where GRID calculation  are already prepared by the script do_pipsa_GRID_prep
    $3 = grid_bin_dir    = directory with GRID executables, the programs grin, grid and k2a will be used
    - assumes grid readable pdb files in  pipsa_grid_dir/../gpdbs/
    - assumes that do_pipsa_GRID_prep has been executed and necessary files  (names, grid.in) generated in pipsa_grid_dir/
    - will go to directory pipsa_grid_dir/ and compute and write GRID interaction field grids there


    (1) To calculate the electrostatic potentials with UHBD
     
    preparations needed Make a directory for pipsa analysis and put WHATIF readable pdb files without hydrogens in its opdbs/ subdirectory;
    make a list of pdb files (one line per pdb filename) to be analysed in the file opdbs/pdbnames
    do_pipsa_WHATIF Adds polar hydrogens to pdb files and converts them to UHBD readable format using WHATIF
    - needs 3 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_wrk_dir   = the directory for pipsa calculations, where original pdb files are placed in opdbs/ subdirectory and pdb file list is in opdbs/pdbnames
    $3 = whatif_exe      = WHATIF executable with path 
    - assumes original (without hydrogens) pdb files in  pipsa_wrk_dir/opdbs/ and the list of pdb filenames in pipsa_wrk_dir/opdbs/pdbnames
    - will make directory pipsa_wrk_dir/pdbs and convert original pdbs to pdbs with polar hydrogens
    N.B. Current version of whatif2uhbd can handle only single chain proteins correctly
    alternative to above 2 steps Make a directory for pipsa analysis and put UHBD readable pdb files in its pdbs/ subdirectory;
    make a list of pdb files (one line per pdb filename) to be analysed in the file pdbs/pdbnames
    do_pipsa_UHBD_prep Prepares UHBD calculations
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_wrk_dir   = the directory for pipsa calculations, where uhbd readable pdb files are placed in pdbs/ subdirectory and pdb file list is in pdbs/pdbnames
    - assumes that pdb files are in pipsa_wrk_dir/pdbs/
    - will create directory pipsa_wrk_dir/uhbd and generate 2 files there: list of protein names "names" and the file "uhbd.in" for UHBD calculations
    do_pipsa_UHBD_chk Checks if parameters to all atoms in the pdb files can be assigned in UHBD calculations
    - needs 3 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory 
    $2 = pipsa_uhbd_dir  = the directory where UHBD calculation are already prepared by the script do_pipsa_UHBD_prep
    $3 = uhbd_executable = location of uhbd executable
    - assumes uhbd readable pdb files in  pipsa_uhbd_dir/../pdbs/
    - assumes that the file with protein names "names" generated  by the script do_pipsa_UHBD_prep or manually
    - will go to directory pipsa_uhbd_dir/ and check if parameters (charges and radii) for electrostatic calculations can be assigned
    do_pipsa_UHBD_grids Computes electroststic potential grids with UHBD
    - needs 3 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_uhbd_dir  = the directory where the UHBD calculation 
                           has already been prepared by the script do_pipsa_UHBD_prep
    $3 = uhbd_executable = location of uhbd executable
    - assumes uhbd readable pdb files in  pipsa_uhbd_dir/../pdbs/
    - assumes that do_pipsa_UHBD_prep has been executed and necessary files (names, uhbd.in) generated in pipsa_uhbd_dir/
    - will go to directory pipsa_uhbd_dir/ and compute and write electrostatic potential grids there


    (1) To calculate the electrostatic potentials with APBS
     
    preparations needed Make a directory for pipsa analysis and put APBS readable pqr files in its pqrs/ subdirectory;
    make a list of pqr files (one line per pqr filename) to be analysed in the file pqrs/pqrnames
    do_pipsa_APBS_prep Prepares APBS calculations
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_wrk_dir   = the directory for pipsa calculations, where APBS readable pqr files are placed in pqrs/ subdirectory and pqr file list is in pqrs/pqrnames
    - assumes that pqr files are in pipsa_wrk_dir/pqrs/
    - will create directory pipsa_wrk_dir/apbs and generate 2 files there: list of protein names "names" and the file "apbs.in" for APBS calculations
    do_pipsa_APBS_grids Computes electrostatic potential grids
    - needs 3 parameters:
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_apbs_dir  = the directory where APBS calculation has already been prepared by the script do_pipsa_APBS_prep
    $3 = apbs_executable = location of apbs executable
    - assumes apbs readable pqr files in  pipsa_apbs_dir/../pqrs/
    - assumes that do_pipsa_APBS_prep has been executed and necessary files (names, apbs.in) generated in pipsa_apbs_dir/
    - will go to directory pipsa_apbs_dir/ and compute electrostatic potential grids there


    (2) To calculate similarity matrix from GRID, UHBD, or APBS grids
     
    preparations needed Either GRID, UHBD or APBS grids should have be computed in a directory (pipsa_sim_dir) and the pdb files should be available in a subdirectory (pipsa_sim_dir/../pdbs/)
    The file "pdbnames"  in p
    ipsa_sim_dir/../pdbs/ should have the list of pdb file names (with extension) to be processed further
    For versions *parts or *spheres the file with the definition of conical (filename "parts") or spherical (filename "spheres") part of the comparison region is required
    do_pipsa_sim Computes similarity matrix and post-processes it
    Uses 2potsim_skin program - similarity on complete "skins" of proteins
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_sim_dir   = directory, where similarity matrix should be computed; the grid files should also be there
    - assumes grid files to be in pipsa_sim_dir and pdb files in  pipsa_sim_dir/../pdbs/
    - will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivatives




    do_pipsa_sim_parts
    Computes similarity matrix and post-processes it
    Similarity of 2 potential grids are calculated on the molecular skin and over a  conical part of the space
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_sim_dir   = directory, where similarity matrix should be computed; the grid files should also be there
    - assumes grid files to be in pipsa_sim_dir and pdb files in  pipsa_sim_dir/../pdbs/
    - assumes the conical part of the comparison region to be defined in the file "parts", see format of this file here
    - will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivatives




    do_pipsa_sim_spheres
    Computes similarity matrix and post-processes it
    Similarity of 2 potential grids are calculated on the molecular skin and over a  spherical part of the space
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_sim_dir   = directory, where similarity matrix should be computed; the grid files should also be there
    - assumes grid files to be in pipsa_sim_dir and pdb files in  pipsa_sim_dir/../pdbs/
    - assumes the conical part of the comparison region to be defined in the file "spheres", see format of this file here
    - will go to pipsa_sim_dir/ and calculate the similarity matrix and its derivative


    (2a) To add one additional protein to already processed set of proteins
      
    preparations needed Steps 1 and 2 of the pipsa calculation should have been completed for the original set of proteins in a directory (pipsa_sim_dir/) and pdb files should be available in a subdirectory (pipsa_sim_dir/../pdbs/ ) > Grid files for the additional proteins should have been computed and should also be located in pipsa_sim_dir and their pdb files should be in pipsa_sim_dir/../pdbs/ .
    The content of the file "names" in pipsa_sim_dir/ should be as from original computations.
    do_pipsa_sim_add1 Adds extra protein to already processed set of proteins
    - needs 3 parameters: 
    $1 = pipsa_distr_dir =  pipsa distribution directory
    $2 = pipsa_sim_dir   = directory, where similarity matrix was computed (by do_pipsa_sim) for the original set
    $3 = the name of the protein to be added 
    - assumes grid and pdb file of additional protein are located in pipsa_sim_dir/$3.grd and pipsa_sim_dir/../pdbs/$3.pdb.
    - assumes that original protein name list, similarity matrix, distance matrix and kinemage are in pipsa_sim_dir/names,sims.log,sims.mat and sims.kin, resp.
    - will generate new protein name list, similarity matrix, distance matrix and kinemage under old names and move original ones to *-old
    N.B.  This script can be repeated to add more than 1 protein, in this case you may need to save the list of original set of proteins for later reference, because only one previous list remains after execution of this script.


    (3) To generate Phylip presentations from a "similarity distance" matrix
       
    preparations needed Steps 1 and 2 of the pipsa calculation should have been completed for a set of proteins in a directory "pipsa_sim_dir/"
    do_pipsa_phylip Post-processes similarity matrix and draw phylip diagrams and trees
    - needs 3 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_sim_dir   = directory, where similarity matrix was computed (by do_pipsa_sim)
    $3 = phylip_bin_dir  = Phylip binaries directory, where neighbour, drawtree and drawgram programs can be found
    - assumes protein names are in the file  pipsa_sim_dir/names and similarity based distance matrix is in pipsa_sim_dir/sims.mat
    - will go to  pipsa_sim_dir/ and generate plots of trees and graphs
    N.B.  Only 1 fontfile from phylip is used here, phylip_fontfile from pipsa_distr_dir/data.  If needed, replace it by another one from the Physlip package fonts.

    (4) To correlate kinetic parameters with average interaction field differences

       
    preparations needed Steps 1 and 2 of the pipsa calculation should have been completed for a set of proteins in a directory "pipsa_sim_dir/"
    Known experimental kinetic parameters entered in the file "exp" having the format:
    protein_name  kinetic_parameter_value_if_known
    Blank space should be left for kinetic parameters to be predicted or not known. > There should be at least 2 known parameters (or at least one if the regression coefficient is given)
    do_pipsa_qpipsa Post-processes similarity matrix correlate 
    - needs 2 parameters: 
    $1 = pipsa_distr_dir = pipsa distribution directory
    $2 = pipsa_sim_dir   = directory, where similarity matrix was computed (by do_pipsa_sim)
    - assumes known kinetic parameters to be entered in the file  pipsa_sim_dir/exp and similarity matrix is in pipsa_sim_dir/sims.log
    - will go to  pipsa_sim_dir/ and correlate data from "exp" and "sims.log" and predict missing kinetic parameters





    Data files
     

    grin.in, grub.dat Standard input file for executable "grin" of GRID and the parameter file used by GRID; from the GRID distribution
    parts example of the file "parts" for 2potsim_skin_parts.  The format of this file is: xr1,xr2,angle, where xr1 and xr2 are 3 coordinates (in Å) of the beginning and the end of the vector defining the direction of the conus and angle is angular extent (in degrees, with 180.0 defining a whole space) of the conus
    phylip_neighbour.in
    phylip_drawgram.in
    phylip_drawtree.in
    phylip_fontfile
    Input files for Phylip programs neighbour, drawgram and drawtree and font file
    qtable.dat Parameter file for UHBD, used to assign OPLS charge+radius parameters to atoms
    qtable_f.dat Parameter file for UHBD, modified to assign partial charges only to charged residue side-chains.  Can be used with the pdb files without hydrogens (accuracy of electrostaic potentials are not guaranteed).
    uhbd.in_tmpl /
    apbs.in_tmpl
    template input command script for UHBD / APBS. This is rewritten by mkuhbdin / mkapbsin to adjust ionic strength conditions and the center of the electrostatic potential grid
    uhbd_chk.in Input file for UHBD to check if parameters can be assigned to all atoms from pdb files before doing electrostatic calculations
    whatif_addH.in Input command script for WHATIF to add hydrogens to the pdb file




    Programs
     

    2potsim_noskin.f Computes the similarity index of 2 proteins, the interaction properties of which are given in two grid files (in UHBD format).  In order to use only the points outside the protein, the interaction property grid should be assigned zero values in the protein interior before using this program. 

    Input:
     - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd
     - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd

     The program will use the values of electrostatic potentials at each point of the grids and derive the similarity index.
     The following will be computed:
     aa   = square of the norm of the grid 1
     bb   = square of the norm of the grid 2
     ab   = scalar product of 2 potentials

      Output:
     - fort.66 - some information about constructed skins
     - standard output (fort.6) has the following data on one line:
       si_hodgkin = 2*ab/(aa+bb)
       si_carbo   = ab/sqrt(aa*bb)
       aa
       bb
       ab

    2potsim_skin.f Computes similarity of 2 potential grids on the molecular skin

    Input: 
    - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd
    - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd
    - command line arg -p1 - the name of the file with atom coordinates of protein 1 in PDB format, default pdb1.pdb
    - command line arg -p2 - the name of the file with atom coordinates of protein 2 in PDB format, default pdb2.pdb
    - command line arg -pr - probe radius, default is 3 Å
    - command line arg -sk - skin thickness, default is 4 Å

    The program will construct 2 skins (for protein 1 and 2) having thickness "skin" and at distance "probes" from the van der Waals surface of the proteins (i.e. from "probes" to "probes+skin" distance), using the points of the potential grids.  The potential values outside this skin will not be used.
    The followings will be computed:
    np1  = no of points of the skin of the protein 1 
    np2  = no of points of the skin of the protein 2
    npoi = no of points of the intersection of 2 skins
    aa0  = square of the norm of the grid 1 on its skin
    aa   = square of the norm of the grid 1 on intersection of the skins
    bb0  = square of the norm of the grid 2 on its skin
    bb   = square of the norm of the grid 2 on intersection of the skins
    ab   = scalar product of 2 potentials (on intersection of skins)

     Output:
    - fort.66 - some info about constructed skins
    - standard output has following data in one line:
      si_hodgkin = 2*ab/(aa+bb)
      si_carbo   = ab/sqrt(aa*bb)
      aa
      bb
      ab
      aa0
      bb0
      si_hodgkin_shape = 2.*float(npoi)/float(np1+np2)
      si_carbo_shape   = float(npoi)/sqrt(float(np1*np2))
      np1
      np2
      npoi

    2potsim_skin_parts.f Computes similarity of 2 potential grids on the molecular skin and over a  conical part of the space.

    Input:
    - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd
    - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd
    - command line arg -p1 - the name of the file with atom coordinates of protein 1 in PDB format, default pdb1.pdb
    - command line arg -p2 - the name of the file with atom coordinates of protein 2 in PDB format, default pdb2.pdb
    - command line arg -pa - the name of the file with the list of directions default name "parts" - a list of (up to 99) directions and angles to define conical parts; the format of the file "parts" is: xr1,xr2,angle, where xr1 and xr2 are 3 coordinates (in Å) of the beginning and the end of the vector defining the direction of the conus and angle is angular extent (in degrees, with 180.0 defining a whole space) of the conus (see an example in data/ diirectoy of the pipsa distribution).
    - command line arg -pr - probe radius, default is 3 Å
    - command line arg -sk - skin thickness, default is 4 Å
    + default values are used if no input given

    The program will construct 2 skins (for protein 1 and 2) having thickness "skin" and at distance "probes" from van der Waals surface of the proteins (i.e. from "probes" to "probes+skin" distance), using the points of the potential grids.  The potential values outside this skin and outside the comparison region (conical region or regions here) will not be used.
    The followings will be computed:
    np1  = no of points of the skin of the protein 1
    np2  = no of points of the skin of the protein 2
    npoi = no of points of the intersection of 2 skins
    aa0  = square of the norm of the grid 1 on its skin
    aa   = square of the norm of the grid 1 on intersection of the skins
    bb0  = square of the norm of the grid 2 on its skin
    bb   = square of the norm of the grid 2 on intersection of the skins
    ab   = scalar product of 2 potentials (on intersection of skins)

     Output:
    - fort.66 - some info about constructed skins
    - standard output has following data in one line:
      si_hodgkin = 2*ab/(aa+bb)
      si_carbo   = ab/sqrt(aa*bb)
      aa
      bb
      ab
      aa0
      bb0
      si_hodgkin_shape = 2.*float(npoi)/float(np1+np2)
      si_carbo_shape   = float(npoi)/sqrt(float(np1*np2))
      np1
      np2
      npoi
    (all this information is printed for every part defined in the file "parts")
























    2potsim_skin_spheres.f

    Computes similarity of 2 potential grids on the molecular skin and over a  spherical part of the space.

    Input:
    - command line arg -g1 - the name of the file with potential grid for protein 1 in UHBD format, default grd1.grd
    - command line arg -g2 - the name of the file with potential grid for protein 2 in UHBD format, default grd2.grd
    - command line arg -p1 - the name of the file with atom coordinates of protein 1 in PDB format, default pdb1.pdb
    - command line arg -p2 - the name of the file with atom coordinates of protein 2 in PDB format, default pdb2.pdb
    - command line arg -pa - the name of the file with the list of comparison region centers and extents, default name "spheres" - a list of (up to 999) centers and radii to define spherical regions; the format of the file "spheres" is: xc,radius, where xc is 3 coordinates (in Å units) of the sphere center and radius is the radius of this spherical comparison region (in Å) (see an example in data/ diirectoy of the pipsa distribution).
    - command line arg -pr - probe radius, default is 3 Å
    - command line arg -sk - skin thickness, default is 4 Å
    + default values are used if no input given

    The program will construct 2 skins (for protein 1 and 2) having thickness "skin" and at distance "probes" from van der Waals surface of the proteins (i.e. from "probes" to "probes+skin" distance), using the points of the potential grids.  The potential values outside this skin and outside the comparison region (sphere or a set of spheres here) will not be used.
    The followings will be computed:
    np1  = no of points of the skin of the protein 1
    np2  = no of points of the skin of the protein 2
    npoi = no of points of the intersection of 2 skins
    aa0  = square of the norm of the grid 1 on its skin
    aa   = square of the norm of the grid 1 on intersection of the skins
    bb0  = square of the norm of the grid 2 on its skin
    bb   = square of the norm of the grid 2 on intersection of the skins
    ab   = scalar product of 2 potentials (on intersection of skins)
    amb   = average difference of the potentials a and b (a-b)/npoi
    ambl   = log(sum[exp(a)]/sum[exp(b)])
    ambm  = log(sum[exp(-a)]/sum[exp(-b)])

     Output:
    - fort.66 - some info about constructed skins
    - standard output has following data in one line:
      si_hodgkin = 2*ab/(aa+bb)
      si_carbo   = ab/sqrt(aa*bb)
      aa
      bb
      ab
      aa0
      bb0
      si_hodgkin_shape = 2.*float(npoi)/float(np1+np2)
      si_carbo_shape   = float(npoi)/sqrt(float(np1*np2))
      np1
      np2
      npoi
      amb
      ambl
      ambm
    (all this information is printed for every sphere defined in the file "spheres")
    2potsim_skin_spheresNN.f
    Slightly modified 2potsim_skin_spheres.f printing also average potential values for proteins 1 and 2
    2potsim_skin_spheresU.f Slightly modified 2potsim_skin_spheres.f doing analysis on entire skin, when the file spheres is empty
    ccenter.f Computes the geometric center of all atoms in a pdb file
    grid_asc2bin.f Converts grid from GRID ASCII format to UHBD binary format
    mkapbsin.f
    mkuhbdin.f
    mkgridin.f
    Programs to compute the average center of proteins from the output of ccenter.f, which is then used as the center for all interaction potential grids.  Prints dispersion of the centers and size of proteins, this can be used to check the quality of superposition.

    Note that:
    If you use the recent apbs vesion 0.3.2 or newer, the program mkapbsin.f should be replaced with mkapbs-0.3.2-in.f from the src/ directory and recompiled, because the grid origin writing is changed (corrected) in the later versions of apbs.
    mkdismx.f Converts similarity matrix to distance matrix
    mkkin.f Reads the similarity matrix, computes distances between proteins as defined by their similarity index, and represents proteins as points in 3D space, such that pairwise distances  between each pair of proteins are represented by distances between corresponding points
    modeller2grin.f The program to rename some atom names from MODELLER output to the names readable by the program grin of GRID
    npotsim.f Drives similarity index calculations with 2potsim*
    Input:
    - command line arg -pg - similarity calculation program name: 2potsim_noskin, 2potsim_skin or 2potsim_skin_parts, default is ../bin/2potsim_noskin
    - command line arg -fp - the directory where pdb files are located, default is ../pdbs
    command line arg -fn - the name of the file with the names of proteins, each of which should have corresponding PDB file in the directory ../pdbs/, and corresponding potential file in ./ , default is "names"
    - command line arg -lg - the name of the similarity matrix file, default is "sims.log"
    - command line arg -pr - the value of probe radius, default is 3 Å
    - command line arg -sk - the value of skin thickness, default is 4 Å
    - command line arg -pa - the name of the file with the list of directions, default name "parts" - a list of directions and angles to define conical parts

    Note that:
    + the program implies that it is executed in the directory, where grid files are located and expects pdb files to be located in the directory defined by command line arg -fp
    + all grid files corresponding to protein names in the file "names" must have extension .grd and all pdb files must have extension .pdb

    n1potsim.f Drives similarity index calculations with 2potsim* when 1 extra protein to be added to the set of originally processed proteins

    Input:
    - command line arg -pg - similarity calculation program name; 2potsim_noskin, 2potsim_skin or 2potsim_skin_parts, default is ../bin/2potsim_noskin
    - command line arg -fp - the directory where pdb files are located, default is ../pdbs
    - command line arg -fn - the name of the file with the names of original set of proteins, each of which should have corresponding  PDB file in the directory ../pdbs/, and corresponding potential file in ./ , default is "names"
    - command line arg -p1 - the name of the protein to be added, should have PDB file in ../pdbs/, and the potential file in ./
    - command line arg -lg - the name of the similarity matrix file, default is "sims.log"
    - command line arg -pr - the value of probe radius, default is 3 Å
    - command line arg -sk - the value of skin thickness, default is 4 Å
    - command line arg -pa - the name of the file with the list of directions,  default name "parts" - a list of directions and angles to define conical parts

    Note that : > + the program should be executed in the directory, where grid files are located and expects pdb files to be located in the directory defined by command line arg -fp. > + all grid files corresponding to protein names in the file "names" must have extension .grd and all pdb files must have extension .pdb
    + the file "names" after execution of this program will be renamed to "names-old" and a new file "names" will be created, which includes newly added protein name
    + the old similarity matrix file will be renamed to "sims.log-old" and new file  "sims.log" (or any other name given after -lg) will be created which has indices for the added protein











    nm1potsim.f
    Simple operation: removes a given protein from pipsa analysis.
    This is done by reading the protein's name, removing it from the list (file "names") and from related to it similarity matrix ("sims.log")

    Input:
    - command line arg -fn - the name of the file with the names of original set of proteins, each of which should have corresponding PDB file in the directory ../pdbs/, and corresponding potential file in ./, default is "names"
    - command line arg -p1 - the name of the protein to be removed, its potential in ./ and pdb file in ../pdbs are not removed and not used.
    - command line arg -p  - the same as above
    - command line arg -lg - the name of the similarity matrix file, default is "sims.log" - will be modified

    Note that
    + the program implies that it is executed in the directory, where grid files are located and expects pdb files to be located in ../pdbs/ subdirectory
    + the file "names" after execution of this program will be renamed to "names-old" and a new file "names" will be created, which does not have the removed protein name
    + the old similarity matrix file will be renamed to "sims.log-old" and new file  "sims.log" (or any other name given after -lg) sill be created which does not have all entries related to the removed protein anymore
    + note that the grid and pdb files will not be removed.  These need to be either removed separately, or replaced by a new versions, if subsequently a new version of a protein is supposed to be added
    qdipsim.f Computes pairwise electrostatic similarity of a list of proteins, based on their monopole and dipole moments 

    Input: 
    - command line arg -fn - the name of the file with the names of proteins, each of which should have corresponding PDB file in the current directory, default is "pdbnames"
    - command line arg -fd - the name of the file where dipole moment information will be written, default name "dipoles" 
    - command line arg -r - the size of the proteins, approx average gyration radius, default value 9.815 Å, valid for PH domains

    The program will assign formal charges to all charged residues (+0.5 e for NHX of Arg, +1 for NZ of Lys, -0.5 for OEX of Glu and ODX of Asp, compute the monopole and dipole moments of protein. 
    The similarity index is then computed following the analytical formula (6) from the Proteins paper, i.e. comparing monopole+dipole potentials at the sphere of some radius R.  Default value of R is 9.815, i.e. parameter alpha (coded as scf) is 17 Å**-2. 

    The following quantities are computed:
    aa   = square of the norm of the dipole+monopole potential of protein 1
    bb   = square of the norm of the dipole+monopole potential of protein 2
    ab   = scalar product of 2 dipole+monopole potentials

     Output:
    - file "dipoles" where the following information about proteins are printed:
      the name of pdb-file
      total charge, the norm of the dipole moment, 3 (x,y,z) components of the dipole moment, number of charge sites 
    - standard output has following data in one line:
      si_hodgkin = 2*ab/(aa+bb)
      si_carbo   = ab/sqrt(aa*bb)
      aa
      bb
      ab
      aa
      bb
      npoi








    smNextopred.f
    Reads sims.log and exp data, derives regression from known kinetic parameters (rate ratio vs ep difference)predicts unknown kinetic parameters.
    Input
    - command line arg -fl - the name of pipsa similarity log file, default is sims.log
    - command line arg -fe - the name of the file with experimental data, where in one line the protein name is followed by experimental data, when available and with nothing or 0.0 when needs to be predicted
    - command line arg -fo - the name of the output file with correlation between log(k1/k2) and    (ep1-ep2), default is smNex2cor.out
    - command line arg -fp - the name of the output file with all predictions for each case, default is smNextopred.pre
    - command line arg -sn - the ep difference measure to be used:
     1 - difference of average ep : av(ep1)-av(ep2)
     2 - log of ratio of exp(ep)  : log (sum(exp(ep1))/sum(exp(ep2)))
     3 - log of ratio of exp(-ep) : log (sum(exp(-ep1))/sum(exp(-ep2)))
    default is 1
    - command line arg -rc - user-defined correlation coefficient, default is define it correlating known cases, correlation is Drate = rc*Dpotential, i.e. how much rate changes by 1 kcal/mole change in potential

    Output
    - Standard out - prediction results with errors

    uhbd_asc2bin.f  Converts grid from UHBD ASCII format to UHBD binary format
    whatif2uhbd.f The program to convert the WHATIF output file of protein coordinates to the format readable by UHBD.  Note that the current version treats correctly only one-chain proteins, i.e. you need to do modifications if proteins having more than one chain need to be analysed . 




    Auxiliary programs and scripts
     
    addlinks.pl  Adds a link before every line that has '(.*) show' in a .ps file. Basically, this program adds kewl code to the .ps files, so that when they are converted to .pdf files, one gets kewl links.
    do_pipsa_UHBD2APBS  pipsa 2.0 script to compare UHBD grids and APBS grids (supposes that grids have the same spacing and origin and size)
    do_pipsa_uho2pqr  Script to convert UHBD output to pqr
    getnames.sh  Downloads names, long names, gene definitions from the SWISS-PROT database for the proteins
    delphi2uhbd.f
    Converts DELPHI output grid to UHBD ascii fomat grid, which can be then handled the same way as APBS output grid (i.e. converted to binary and used in similarity analysis).  See the program header for compilation instructions.
    gridinfo
    gridinfo.f 
    Gets information from the UHBD grid
    grid2insight
    grid2insight.f
    Converts GRID/UHBD format grid file to InsightII readable file
    highlight_kin_groups
    highlight_kin_groups.f 
    Highlights specified groups of proteins in different colours
    highlight_kin_points
    highlight_kin_points.f 
    Highlights specified proteins in red
    insightII.HydrSurface.in  Input file for InsightII used to produce Hydrophobic Surfaces in .wrl, virtual reality modeling language format.
    insightII.electSurface.in  Input file for InsightII used to produce Electrostatic Surfaces in .wrl, virtual reality modeling language format.
    malign3d.2.sh  Aligns the sequences to the TEMPLATE.pdb sequence, using modeller and modeller.2.in script
    malign3d.pl  Runs modeller 6 to create 3D multiple alignment of all the proteins in pdbs directory
    malign3d.sh  Aligns the sequences to the TEMPLATE.pdb sequence, using whatif
    malign3d.whatif.in  Script used by malign3d.sh
    mkElecSurface.sh  Creates .wrl files that represent electrostatic surfaces. UHBD calculations should be done before the script is executed.
    mkHydrSurface.sh  Creates .wrl files that represent hydrophobic surfaces. GRID calculatoins should be done before the script is executed.
    mktree.sh Creates the tree from the similarity matrix - prototype to do_pipsa_phylip
    modeller.2.in  Script used by the modeller 4 in malign3d.2.sh program
    orient.sh  Aligning the proteins using ORIENT command from the modeller. _very_ inaccurate
    pqr2qcd
    pqr2qcd.f 
    Converts PQR format to UHBD readable QCD format
    uho2pqr
    uho2pqr.f 
    Converts UHBD output to pqr




    Some calculation parameters:

    Some parameters defining interaction potentials can be changed to adjust the calculations to the case studied.  For example, interaction potential grid dimensions might need to be increased for a set of larger proteins; different probes can be used in GRID calculations.
     

    Parameter Parameter name  (default) Where can be changed Where used
    Grid maximal dimension  im_max  (110) src/maxdim.inc 2potsim*, grid_asc2bin, uhbd_asc2bin
    Grid dimensions
    Grid spacing
    Probe for dielectric surface
    dime  (65 65 65)
    glen/(dime-1) (1.5)
    srad (0.0)
    pipsa_wrk_dir/apbs/apbs.in after executing do_pipsa_APBS_prep script APBS
    Grid dimensions
    Grid spacing
    Probe for dielectric surface
    dim (65)
    spa (1.5)
    nmap probe_radius_value (-)
    pipsa_wrk_dir/uhbd/uhbd.in after executing do_pipsa_UHBD_prep script UHBD
    ionic strength -  (50) scr/do_pipsa_APBS_prep
    scr/do_pipsa_UHBD_prep
    APBS
    UHBD
    PROBE name - (PO4) scr/do_pipsa_GRID_prep GRID
    GRID calculation parameters
    for example grid dimension

    imax (65)
    pipsa_wrk_dir/grid/grid.in/grin.in after executing do_pipsa_GRID_prep script GRID
    Maximal number of proteins nprmx  (999) src/maxdim.inc kapbsin, mkdismx, mkkin, mkuhbdin, n1potsim, npotsim, qdipsim
    Maximal number of atoms per proteins
    namx (999)
    src/maxdim.inc ccenter


    [Back to Index]
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

       
    Imprint/Privacy