Protein Interaction Property Similarity Analysis

Example of the usage of the program

Running the usage example
Computation options
The directory tree of PIPSA distribution
Timing
References

9 protein models of PH domains are compared on the basis of their electrostatic potentials. This is a subset of the 104 PH domains analyzed in the paper ¹. All model structures are available elsewhere² . PIPSA performs an automated comparison, from which it is clear that there are 2 clusters, one formed by proteins having a mostly positive potential, and the other formed by proteins having negative potentials. In this case, this classification could be done by simply looking at electrostatic potential contours. However, PIPSA does the classification in an automated, objective and quantitative fashion, and allows more subtle classification, as found amongst the full 104 domain dataset ¹. Indeed, PIPSA shows that for this 9 protein model example, the cluster of proteins having positive potentials can be further divided into two subclusters.

Running the usage example:

1.    Unzip and untar the PIPSA distribution (see the description of appearing directory tree here).
2.    Put your pdb files in the directory pdbs/. For the example run, copy the 9 pdb files from the directory pdbs_example/ to the directory pdbs/.
3.    Go to the directorysource/and run the script do_pipsa, choose the calculation mode (1, 2 or 3):
4.    Collect results (3 files: sims.log, sims.kin and sims.mat) in corresponding directories (pdbs/, uhbd/ and grid/ for options 1, 2 and 3, respectively).
5.     Display the proteins clustered according to the similarity of their interaction properties with the kinemage program using the command "mage sims.kin"
6.     Cluster proteins according to the similarity matrix sims.mat by using the command nmrclust.
Note.    Executables in the distribution were compiled under unix (SGI IRIX 6.5) (see the file Makefile in source/ directory).

Computation options:

   Option "1. Analytical estimate" does not require any additional programs to be installed. Computes the electrostatic similarity matrix based on the monopole+dipole representation of proteins. The results of the similarity analysis are in files:
pdbs/sims.log - log-file, containing the similarity matrix, created by running qdipsim.
pdbs/sims.kin - kinemage file for presenting proteins as points in 3D;
pdbs/sims.mat - matrix of pairwise distances between proteins, based on their similarity.
    Option "2. Similarity of electrostatic potentials, computed using UHBD" may be chosen only if you have UHBD executable located at source/uhbd and WHATIF program installed, so that it may be executed by typing whatif on the command line. Computes electrostatic potential grids using the UHBD program to solve the FDPBE, writes electrostatic potentials as files and computes the similarity matrix based on these electrostatic potentials. The results of similarity analysis are in files:
uhbd/sims.log - log-file, containing the similarity matrix, created by 2potsim_skin.
uhbd/sims.kin - kinemage file for presenting proteins as points in 3D;
uhbd/sims.mat - matrix of pairwise distances between proteins, based on their similarity.
    For this example case, you can skip WHATIF calculations, when necessary pdb files will simply be copied from the directory whatif_example/ of PIPSA distribution. The script will prompt about this possibility.
    For example case you can also skip UHBD calculations, if you download a zip file of the electrostatic potential grid files (8220 KB) and unzip it in pipsa directory, so that necessary grid files for similarity calculations will be in the directory uhbd_example/. The script will prompt about this possibility.
    Option "3. Similarity of probe interaction fields, computed using the program GRID may be chosen only if you have GRID executables grin and grid located at source/grin and source/grid. Computes molecular interaction field grids for small chemical probes (the PO₄^2- ion by default) using the GRID program and computes the similarity matrix based on these interaction fields. The results of similarity analysis are in files:
grid/sims.log - log-file, containing the similarity matrix, created by 2potsim_skin.
grid/sims.kin - kinemage file for presenting proteins as points in 3D;
grid/sims.mat - matrix of pairwise distances between proteins, based on their similarity.    The files sims.kin are kinemage files to be visualized by MAGE³ ("mage sims.kin" if your MAGE executable is mage). The files sims.mat may be used as a distance matrix for the program NMRCLUST⁴. For that, after executing nmrclust, answer "no" to the question "Use a PDB file for input?", and enter sims.mat as a Matrix filename.

The directory tree of PIPSA distribution:

doc/     - documentation;
pdbs/    - the directory to keep original data - pdb files of (superimposed) proteins;
source/ - the directory with all scripts and programs to use;
pdbs_example/   - has the pdb files of the usage example and the results of running qdipsim;
grid_example/   - copy of the directory grid/ obtained after running the PIPSA demo, with grid files removed;
whatif_example/ - copy of the directorywhatif/with pdb files for electrostatic computations, obtained after running the PIPSA demo;
uhbd_example/   - copy of the directory uhbd/ obtained after running the PIPSA demo, with grid files removed.

The following 3 directories will be created by the main script "do_pipsa"
grid/ - the directory to store interaction field grids computed by the GRID program;
whatif/ - the directory to store the pdb files prepared for electrostatic computations;
uhbd/ - the directory to store electrostatic potential grids and perform similarity computations.
These 3 directories and the directorypdbs/were renamed to corresponding *_example/ directories after test run of the script do_pipsa and all grid files were deleted.

Timing:

Timing for example case:

1. 42 min to compute n=9 GRID (m=65)^3 grids
2. 02 min to compute n^2 similarity indices
3. 03 min to compute n=9 (m=65)^3 electrostatic potential grids
4. 02 min to compute n^2 similarity indices

Steps 1 and 3 proportional to ~ n*m3,
steps 2 and 4 proportional to ~ n2,
i.e. expect timing of 1 to be 420 min and timing of 2 to be 200 min
when doing the same for n=90 pdb files

References:

¹ N. Blomberg, R.R. Gabdoulline, M. Nilges and R. C. Wade. Classification of protein sequences by homology modeling and quantitative analysis of electrostatic similarity. Proteins: Str., Function and Genetics, 37:379-387 (1999)
² http://www.EMBL-Heidelberg.DE/~blomberg/PHdomains/pdbfiles/
³ MAGE: copyright © 1998 by David C. Richardson, Little River Institute, 5820 Old Stony Way, Durham NC 27705; dcr@kinemage.biochem.duke.edu; Biochemistry Dept., Duke University, North Carolina 27710, USA
⁴ Lawrence A. Kelley, Stephen P. Gardner and Michael J. Sutcliffe. An Automated Approach For Clustering An Ensemble Of NMR-Derived Protein Structures Into Conformationally-Related Subfamilies. Protein Engineering 9, 1063-1065(1996).

Razif Gabdoulline , January 2000

Privacy Imprint