Comparative Binding Energy Analysis

R. C. Wade, A. R. Ortiz and F. Gago


Classical regression techniques have long been used to correlate the properties of a series of molecules with their biological activities in order to derive quantitative structure activity relationships (QSAR) to assist the design of more active compounds (1). This approach has been successfully extended to three dimensions by using molecular coordinates of the ligands to derive 3D-QSARs (2). However, the availability of the three-dimensional structures of many macromolecular drug targets has opened an alternative approach to drug design, namely structure-based drug design (SBDD), in which the physicochemical interactions between the receptor and a series of ligands are used to rationalise the binding affinities (3,4). SBDD makes use of techniques ranging from those employing simple scoring functions through molecular mechanics calculations to detailed free energy perturbation calculations employing molecular dynamics simulation (5). Now, particularly as a result of recent developments in the design of targeted combinatorial libraries of compounds (6), it is becoming increasingly common to have data on the activities of a family of compounds and knowledge of the three-dimensional structure of the target macromolecule to which they bind. While the activities of these compounds could be improved using the techniques of classical QSAR, 3D-QSAR or SBDD, none of these alone makes full, simultaneous and systematic use of all the available information. This is the purpose of Comparative Binding Energy (COMBINE) Analysis.
The "COMBINE" acronym refers to combinations in terms of bow data and techniques:
- data on ligand-receptor structures and the measured activities of a series of ligands are combined;
- molecular mechanics and chemometrics are combined for the analysis.
In outline, COMBINE involves generating molecular mechanics models of a series of ligands in complex with their receptor and of the ligands and the receptor in unbound forms; and then subjecting the computed ligand-receptor interaction energies to regression analysis in order to derive a QSAR relating ligand binding constants or activities to weighted selected components of the ligand-receptor interaction energy. While the chemometric analysis performed is similar to that in a Comparative Molecular Field Analysis (CoMFA), the data analysed in COMBINE analysis differ by explicitly including information about the receptor-ligand interaction energies rather than only about the interaction properties of the ligands.

In contrast to free energy perturbation methods, a full sampling of phase space is not performed in COMBINE analysis; it is instead assumed that one or a few representative structures of the molecules are sufficient when experimental information about binding free energies is used for model derivation. Although any error in the modelling would introduce "noise" into the dataset, this can be filtered out by means of the subsequent chemometric analysis.
Although occasionally there is a linear relationship between binding free energy and computed binding energy derived from molecular mechanics calculations for single conformations of the bound and unbound states of a series of ligand-receptor pairs, this is not the case in general. This is because the entropic contribution to binding is rarely constant for a series of ligands and because sufficiently accurate modelling of a full series of compounds can be difficult to achieve. A number of authors have correlated binding free energies with a few terms, defined according to physical interaction type, of the computed binding energies by linear regression. A physical basis for such an analysis is provided by linear response theory which relates the electrostatic binding energy to the electrostatic binding free energy. The COMBINE method differs from these approaches in that more extensive partitioning of the binding energy is considered and multivariate regression analysis is used to derive a model. This is important for two reasons: first, from a modelling perspective, because it is not assumed that the computed components of the binding free energy can be calculated with high accuracy. Rather, one of the foundations of COMBINE analysis is the realization that such calculations are usually noisy, and that is why only those contributions of the binding energy that present the best predictive ability are selected and weighted in the resultant model. Second, it is realized that binding free energies rarely a linear function of binding energy. The extensive decomposition allows Nose components that are predictive of binding free energy to be detected and these may implicitly represent other physically important interactions or even entropic terms.

A QSAR model is derived for each target receptor studied with the COMBINE method as the method was specifically designed for ligand optimization. Thus, a derived regression model is not applicable to all ligand-receptor interactions in the way that a general purpose empirical "scoring function" derived from statistical analysis of a diverse set of protein-ligand complexes is designed to be. The philosophy is to account for peculiarities in the modelling and parameterization of a given set of compounds, so that both optimal and inexpensive predictive models can be derived.


In ``3D QSAR in Drug Design.
Volume 2: Ligand Protein Interactions and Molecular Similiarity'' and Perspectives in Drug Discovery and Design (1998) 9, 19-34.
Eds. Kubinyi,H., Folkers,G. and Martin,Y.
Kluwer Academic Publishers, Dordrecht, The Netherlands. (1998).


Privacy Imprint