Figures and Tables

Figure 1

Histograms for molecular weight (a) and experimental free binding energy ∆G (b) of the training sets used for building the final COMBINE models of trypsin, thrombin, and urokinase. The molecular weight of the ligands were distributed between 100 and 650 Da and the experimental ∆G values (in kcal/mol) spanned in total a range of 11 log units with a maximum of 9 log units for a single training set. The training sets of thrombin and trypsin showed a larger distribution of the weight and the binding affinity than those of urokinase.
(some more histograms)
Tanimoto Similarity
For the training set, used for building the COMBINE model of thrombin, a matrix of Tanimoto similarity values were generated. The Tanimoto similarity is a value between 0 and 1 based 2D structure information of two ligands.

Table 1

The best coefficients of determination R2 and predictive correlation Q2 of the COMBINE models of thrombin, trypsin, and urokinase were tabulated in respect the latent variables (LV). The best Q2-LOO (leave one out) and Q2-LTO (leave two out) values for thrombin, trypsin and urokinase were 0.89 (LV5), 0.83 (LV3), and 0.68 (LV4), respectively. In trypsin, variable selection did not improve the model, but in thrombin and urokinase the models could be improved according to internal cross-validation by using D-optimal pre-selection (D-opt) and fractional factorial design (FFD) variable selection at LV4. For thrombin LV4 and LV5 resulted in nearly the same values. Due to the risk of over fitting, a lower latent variable was chosen. The COMBINE model of thrombin could be slightly improved by using four highly conserved water molecules in the active site as additional ‘residues’ (X-variables).

Figure 2

a) The R2 and Q2 values of internal cross-validation of the different COMBINE models were plotted in dependency of the number of latent variables (LV). (For more details see legend of table 1).
b) Predicted versus experimental binding free energy ∆G in kcal/mol. For the COMBINE models of urokinase and thrombin latent variable 4 were chosen before (blue dots) and after variable selection (red dots). For trypsin the best model could be reached at latent variable 3 without any variable selection. The R2 values based on the plots are given in the figures and in table 1.

Figure 3

In the Regression Error Characteristic (REC) curves the cumulative proportion were plotted versus the error tolerance of the absolute difference between the experimental and predicted ∆G values. Ligands of the ‘pseudo’ test set were docked ten times in the corresponding receptor models of thrombin, trypsin and urokinase. For each docking solution a ∆G value were predicted and were ranked according to RMSD, GoldS, ∆∆Gexper‑pred, (best abs(exper-pred)), ∆∆Gdesolv (best dG bind elec) and ∆Gpred (for more details see results).

a) The different curves shows the cumulative distribution of best RMSD (dotted blue line), GoldS (green line), ∆∆Gexper‑pred (dotted dark red), ∆∆Gdesolv (brown) ranked and the 5th ranked (red) ∆G values against the error of prediction in kcal/mol. In addition, curves based predicted ∆G values of ligand conformations taken from X-ray structures before (blue) and after (purple) variable selection are given.

b) The same ranking was used to plot the cumulative proportion versus the RMSD (in Å) of the docking solutions.


Selectivity for thrombin and trypsin

The 5th ranked predicted ∆G value of the docking solution for trypsin was divided by the 5th ranked predicted ∆G value of the docking solution for thrombin, which gives the predicted selectivity. The predicted selectivity was plotted against the experimental selectivity (experimental ∆G of trypsin/ experimental ∆G of thrombin). The inhibitors of the Klebe data set were not used. The five points in the lower right part were not used for calculating the R2, because the absolute difference between the experimental and predicted ∆G values for thrombin were greater than 3 log units. Although the prediction for the Klebe data set was quite good, the experimental ∆G values were within the noise of the prediction. No prediction for selectivity could be given.


PLS real coefficient

The real coefficient for van der Waals and electrostatic interactions of the COMBINE models of thrombin and trypsin were plotted for the different residues.

PLS real coefficient surface

The surface of the active-site of thrombin and trypsin were coloured according the real coefficient for van der Waals and electrostatic interactions. The labeling of the residues based on chymotrypsin numbering.


05.10.2006
Privacy Imprint