s d a

simulation of diffusionalassociation of proteins
(version 4.23)
EMBL, Heidelberg
February 2000

Introduction
The distribution
Input
Output
Acknowledgements
sda usage references

Introduction

Association:

The most ubiquitous event in protein function.
Not only the event itself but its speed is relevant for function.
Association rates can currently be measured by a variety of methods: from stopped flow fluorescence to BIAcore commercial devices. The determination procedure involves a variety of assumptions and the results from different methods are not always the same.

Diffusion:

The diffusion places an upper limit on the speed of association - the reaction cannot occur faster than the time it takes to bring the "reactive patches" of 2 proteins together by diffusion.
The diffusion controlled limit is 10¹⁰ M^-1 s^-1 - if uniformly reactive spheres associate under normal conditions. It is regulated, however, by the size of the reactive patches and the interactions between proteins present and is often on the order of 10⁵M^-1 s^-1.

Simulations:

provide checks of the model of proteins used to represent them;
permit tuning of the force field parameters to describe interprotein interactions;
allow the design of proteins with desired association properties, which could enable control of many cell functions.

What sda does:

sda computes the rates of diffusional association of 2 proteins given the atomic structure of the bound complex of the 2 proteins. It uses UHBD computed electrostatic potentials and ecm generated charges of proteins. While ecm is included in the sda distribution, the UHBD executable should be obtained separately.
compiles and runs under SGI IRIX 6.5 (O2, Octane, Origin2000), Digital UNIX 4.0 (DEC Alpha) and Linux 2.0 (PC).

What sda does not do:

Errors due to inaccuracies in modeling of the bound complex, either in terms of location of the binding site or in terms of protein conformation are not estimated
No estimates of association rate can be made when the association is NOT diffusion limited.

The distribution

source files - fortran source code;
ecm program suite to compute effective charges of proteins;
auxiliary programs;
Examples of Brownian dynamics simulation of barnase and barstar association;
documentation (this file).

Input

-------- SDA input file --- dseed
253.d0
--------------------------- nrun, nprint, iwrite
1000, 500, 0
--------------------------- probes, thres, hexcl, dsfactor
2.0, 1. 0.5, 1.62
--------------------------- pdb1f, pdb2f
a.pdb
s.pdb
--------------------------- po1f, po2f
ep1.grd
ep2.grd
--------------------------- qef1f, qef2f
a.echa
s.echa
--------------------------- rxna1f, rxna2f
a_rxnaP
s_rxnaP
--------------------------- ic1fix,xc1, ic2fix,xc2
1, 20.310, 43.892, 12.113
1, 38.835, 34.293, 1.266
--------------------------- icomm, iforce, dind, aiostr
1, 1, 6.0, 50.
--------------------------- dm, dr, drI, irot2
0.030, 4.5e-5, 4.0e-5, 1
--------------------------- b, c
100., 500.
--------------------------- rboost, novers
1.0, 150, 5.
--------------------------- dt1, swd1, dt2, swd2, rswd
2.0, 40., 20., 80., 80.
--------------------------- win0, dwin, nwin, iwrec, nwrec
3.00, 0.25, 32, 0, 2
--------------------------- dspo1f, dspo2f
ds1.grd
ds2.grd

The description of each input variable follows:

dseed - random number generator seed. Defines the sequence of random numbers used to model the Brownian dynamics. Different numbers result in different sampling and trajectories.

nrun, nprint, iwrite - the number of Brownian dynamics runs, frequency of printout of reaction events, and the number of the trajectory to write.

probes, thres, hexcl, dsfactor - the probe radius used to represent the atoms of the second protein and the threshold value for atomic accessibility. The atoms of protein 2 having accessibilities less than thres are not used for determinimg molecular exclusion in the simulations. hexcl is the spacing for the exclusion grid of the protein 1. dsfactor is a factor to multiply the desolvation penalty grid (see the description of the desolvation term).

pdb1f, pdb2f - the names of files with atomic coordinates of proteins 1 and 2, in pdb format ('ATOM',a24,3f8.3).

po1f, po2f - the names of files with electrostatic potentials of proteins 1 and 2. UHBD binary format.

qef1f, qef2f - the names of files with charges for proteins 1 and 2. ('ATOM',24x,3f8.3,f15.8).

rxna1f, rxna2f - the names of files with reaction atom coordinates for proteins 1 and 2 in pdb format. ('ATOM',24x,3f8.3).

ic1fix, xc1 - 2 lines, the integer on the first line is 1 or 0, the second line defines the coordinates of the center of the first protein supplied by the user. These coordinates are used by the program when ic1fix=1, and recomputed when ic1fix=0.

ic2fix, xc2 - the same as above for the second protein center.

icomm - switches the encounter complex modeling options. icomm=1-3 deal with 3 different ways to count reaction events: icomm=1 counts RMSD reaction conditions (Fig 1),

icomm=2 interresidue contact pairs (Fig 2)

icomm=3 counts independent contact pairs among all possible donor-acceptor pairs formed between the 2 proteins. The pairs are defined by a maximal allowed (in order to interact) distance between the atoms belonging to different proteins. Independence is defined by a minimal allowed (in order to be independent) distance between the atoms of the same protein. The figure below (Fig. 3) shows schematically the selection of 5 independent pairs (black lines) among 22 interprotein donor (red) - acceptor (blue) pairs. In this case, (only) the value of dind (== d_min) will be used to define the independence of pairs.

Note that more reaction criterion atoms and more sophisticated reaction criteria imply longer computation times, e.g. sda computations may take 3 times longer when option icomm=3 with 88 reaction criterion atoms is chosen compared to using option icomm=2 with 7 reaction criterion atoms.

iforce=0 switches off all the interaction forces except the exclusion forces. iforce =1 is the standard option when the forces are computed using the potential grid of protein 1 (2) and the charges of protein 2 (1). iforce =2 does not use the potential grids at all but computes the forces using the charges alone - this is efficient only when the number of charges is less than 10. Charge-charge treatment is useful when a large executable is not affordable as a separately compiled small executable can be used. For iforce =2, it is necessary to give a correct value of aiostr == ionic strength of the solvent in mM.

dm, dr, drI , irot2f - 3 diffusion coefficients: relative translational, rotational for protein 2, rotational for protein 1. irot2f=0 switches off the modeling of rotation of the first protein. dm is in Å²/ps, dr and drI are in radian²/ps.

b, c - the distances at which the trajectories are started and truncated, respectively (Fig. 4)

rboost, novers - after novers rejections of Brownian dynamics moves, the proteins are taken apart from one another by a distance rboost. The number of boosts is reported in the output log. Choice novers ~100 usually does not need any boosts, unless the interactions are too strong due to use of an unrealistically small probe size. Many boosts may indicate erroneous input (uncorrelated pdb and grid etc).

dt1, swd1, dt2, swd2, rswd - time step parameters. dt1 is the smallest time step to be used, it is automatically used when the proteins are close to overlap. The next 3 parameters define the time step profile as in Figure 4:

The decreasing of time step to 0 near the c-surface is done automatically to reduce artifacts due to trajectory truncations. Although the profile of the time step influences the computed rates, the most important parameter is dt1. Take care not to choose this parameter too small (since the computing time scales almost linearly by it). The physical parameter sqrt(dt*dm) measures an average Brownian dynamics step length. This must be less than 1 Å, which is a characteristic distance of every protein surface roughness since the atomic bond lengths and van der Waals radii define this scale. Moreover, it should be less than the characteristic distance over which the interaction forces vary substantially, which means that the optimal value is influenced by interaction force strength: one needs smaller time steps for stronger interacting proteins. The rotations are accumulated when the time step is small to make the simulations faster. The rotation is carried out only when the accumulated random component of rotation is ~ 3^o at small interprotein separations and ~90^o at separations larger than rswd (see figure).

win0, dwin, nwin - the reaction complex will be recorded for each of nwin contacts (or RMSD distances) d_i, i=1,nwin. d_i=win0+(i-1)*dwin. If iwrec.ne.0 then the positions of the second protein will be written to file fort.55, when 2 contacts at distance d_nwrec are satisfied.

dspo1f, dspo2f - the names of files with charge desolvation penalties of proteins 1 and 2. UHBD binary format. Charge desolvation free energies are computed by a separate program in sda distribution, mk_ds_grd.f. The charge desolvation penalty for protein 1 is the sum of desolvation penalties of each charge of the protein 1, i=1,...; the desolvation penalty of each charge is the sum of desolvation penalties due to the low dielectric cavity of each atom of protein 2, i.e. the final expression giving the desolvation penalty for protein 2 due to the presence of protein 1 is:

Here

and

are the dielectric constants of the solvent and the protein, respectively; k - is the Debye-Hueckel parameter, q_i is the value of the i-th charge on the protein 2, a_j - radius of the j-th atom of the protein 1, a is the sum of the average radius of the atoms of the protein 1 incremented by the ionic radius of salt (~1.7+1.5=3.2 Å). This description is approximate and depends on the treatment of the interior dielectric of the protein and the method to define the dielectric boundary between the protein and solvent. Meaningful values are 1.62 when the protein interior dielectric is 4 and the dielectric boundary between the protein interior and high dielectric solvent is defined as van der Waals surface of the protein. The value of ca 4.0 would approximate the desolvation penalties when the molecular surface of the protein is used as a dielectric boundary, although in this case the description of charge desolvation via precalculated grids is much less accurate.

Output

The file fort.55 contains close 2-contact complexes defined by iwrec and nwrec (close complexes are written if nwrec=1), or/and a complete trajectory whose number is given by iwrite (the trajectory is written only if 0<iwrite<nrun+1).

The file fort.66 contains the length of each trajectory.

The file fort.77 contains the values of electrostatic potentials of protein 1 and protein 2 along 5 selected lines in 3-dimensional space; the values of the desolvation penalty grid along the same 5 lines; a section through the exclusion grid in ascii format in the plane passing through the center of the protein 1; reaction atom information.

The file fort.99 lists occurrence of reaction atom contacts. The first record is for the number of contacts observed at each specified contact distance for each reaction atom pair. The second record gives the number of contacts counted only once per Brownian dynamics run.

Since the above files are overwritten each program run, one needs to rename them after the execution if they are needed.

The file runs.done has the number of runs performed so far. The information is written every 50 runs and is useful when the printout frequency is low and the machine writes to files with buffering.

Standart output (fort.6) has the following.

The input parameters are echoed.

Each nprint runs, the number of runs done and the ratio of reaction events at every reaction distance are printed. This would allow restoration of the main results (association rate) in the case that the program is interrupted before it finishes.

At the end of the simulations, the statistics over all runs are computed and printed out, these are:

number of Brownian dynamics steps during all simulations;
the duration of simulations in ps;
above value scaled per run;
number of overlaps (when the step is repeated) that occurred;
number of boosts;
number of large moves - these last 2 numbers are for checking if the dynamics was modeled reasonably.

Mean residence times are computed and printed at every 1 Å slab of interprotein center-to-center separation. These unsophisticated values measure the strength and extent of interaction as a function of the separation distance.

The main result of simulations - association rates - are printed for every distance window and every number of contact pairs. Typically, the expected association rates are defined by 2 interresidue contacts defined at 6.0-6.5 Å (curves labelled 2-R at the figure below), or by 2 independent contacts at 4.0-4.5 Å. Below are the computed association rates for barnase and barstar. In this case, there were 8-9 interresidue contacts revealed by graphical analysis based on residues making H-bonds in the x-ray structure of the complex. Among 85 donor-acceptor pairs, the number of independent contacts in the x-ray complex appeared to be 7, when both d_min and d_max (see the definitions when icomm=3) were set to 6 Å.

It is necessary to assume too small 1-contact distance in order to predict experimental association rates. In this case, they are 3.5-3.75 Å, i.e. close to the minimal atom-atom distance allowed in the model. This means that a 1-contact description of the encounter complex is impossible. Moreover, the results are sensitive to definition of contact distance. 2-3 contacts in either interresidue or independent contacts model are meaningful. 2 interresidue contacts happen at 6.0-6.7 Å and 2 independent atom-atom contacts - at 4.0-4.5 Å. The left and right interval ends define the contact distances giving the rates equal to experimental (exp) and twice experimental (2*exp) rates, respectively.

The execution time of the program is printed for initialization and simulation parts.

Acknowledgements

The program gained much from reading and experience with the Brownian dynamics programs UHBD and Macrodox, which simulate enzyme-substrate and electron transfer proteins association (respectively).

References

R.R. Gabdoulline, R.C. Wade. (1997) Biophys. J., 72, 1917-1929. Simulation of diffusional association of barnase and barstar.
R.R. Gabdoulline, R.C. Wade. (1998) Methods, 14, 329-341. Brownian dynamics simulation of protein-protein diffusional encounter.
J.D. Madura, J.M. Briggs, R.C. Wade, R. Gabdoulline. (1998) in The Encyclopedia of Computational Chemistry, Schleyer, P.v. R.; Allinger, N. L.; Clark, T.; Gasteiger, J.; Kollman, P. A.; Schaefer III, H. F.; Schreiner, P. R., Eds.; John Wiley & Sons, Chichester, 1998 Brownian dynamics.
R.R. Gabdoulline and R.C. Wade (1999) J. Mol. Recogn., 12, 226-234. On the Protein-Protein Diffusional Encounter Complex.
A.H. Elcock, Gabdoulline,R.R., Wade,R.C. and McCammon,J.A (1999) J. Mol. Biol., 291, 149-162. Computer Simulation of Protein-Protein Association Kinetics: Acetylcholinesterase-Fasciculin.
D. Sept, A.H. Elcock and J.A. McCammon (1999) J.Mol.Biol. 294, 1181-1189. Computer Simulation of Actin Polymerization Can Explain the Barbed-Pointed Asymmetry.

Viel Spass.

Razif Gabdoulline

Rebecca Wade

February, 2000

Privacy Imprint