R. C. Wade, A. R. Ortiz and F. Gago
Classical regression techniques have long been used to correlate
the properties of a series of molecules with their biological activities
in order to derive quantitative structure activity relationships (QSAR)
to assist the design of more active compounds (1). This approach has been
successfully extended to three dimensions by using molecular coordinates
of the ligands to derive 3D-QSARs (2). However, the availability of the
three-dimensional structures of many macromolecular drug targets has opened
an alternative approach to drug design, namely structure-based drug design
(SBDD), in which the physicochemical interactions between the receptor
and a series of ligands are used to rationalise the binding affinities
(3,4). SBDD makes use of techniques ranging from those employing simple
scoring functions through molecular mechanics calculations to detailed
free energy perturbation calculations employing molecular dynamics simulation
(5). Now, particularly as a result of recent developments in the design
of targeted combinatorial libraries of compounds (6), it is becoming increasingly
common to have data on the activities of a family of compounds and knowledge
of the three-dimensional structure of the target macromolecule to
which they bind. While the activities of these compounds could be improved
using the techniques of classical QSAR, 3D-QSAR or SBDD, none of these
alone makes full, simultaneous and systematic use of all the available
information. This is the purpose of Comparative Binding Energy (COMBINE)
Analysis.
The "COMBINE" acronym refers to combinations in terms of bow data and
techniques:
- data on ligand-receptor structures and the measured activities of
a series of ligands are combined;
- molecular mechanics and chemometrics are combined for the analysis.
In outline, COMBINE involves generating molecular mechanics models
of a series of ligands in complex with their receptor and of the ligands
and the receptor in unbound forms; and then subjecting the computed ligand-receptor
interaction energies to regression analysis in order to derive a
QSAR relating ligand binding constants or activities to weighted
selected components of the ligand-receptor interaction energy. While the
chemometric analysis performed is similar to that in a Comparative Molecular
Field Analysis (CoMFA), the data analysed in COMBINE analysis differ by
explicitly including information about the receptor-ligand interaction
energies rather than only about the interaction properties of the ligands.
In contrast to free energy perturbation methods, a full sampling of
phase space is not performed in COMBINE analysis; it is instead assumed
that one or a few representative structures of the molecules are sufficient
when experimental information about binding free energies is used for model
derivation. Although any error in the modelling would introduce "noise"
into the dataset, this can be filtered out by means of the subsequent chemometric
analysis.
Although occasionally there is a linear relationship between binding
free energy and computed binding energy derived from molecular mechanics
calculations for single conformations of the bound and unbound states of
a series of ligand-receptor pairs, this is not the case in general. This
is because the entropic contribution to binding is rarely constant for
a series of ligands and because sufficiently accurate modelling of a full
series of compounds can be difficult to achieve. A number of authors have
correlated binding free energies with a few terms, defined according to
physical interaction type, of the computed binding energies by linear regression.
A physical basis for such an analysis is provided by linear response theory
which relates the electrostatic binding energy to the electrostatic binding
free energy. The COMBINE method differs from these approaches in that more
extensive partitioning of the binding energy is considered and multivariate
regression analysis is used to derive a model. This is important for two
reasons: first, from a modelling perspective, because it is not assumed
that the computed components of the binding free energy can be calculated
with high accuracy. Rather, one of the foundations of COMBINE analysis
is the realization that such calculations are usually noisy, and that is
why only those contributions of the binding energy that present the best
predictive ability are selected and weighted in the resultant model. Second,
it is realized that binding free energies rarely a linear function of binding
energy. The extensive decomposition allows Nose components that are predictive
of binding free energy to be detected and these may implicitly represent
other physically important interactions or even entropic terms.
A QSAR model is derived for each target receptor studied with
the COMBINE method as the method was specifically designed for ligand
optimization. Thus, a derived regression model is not applicable to all
ligand-receptor interactions in the way that a general purpose empirical
"scoring function" derived from statistical analysis of a diverse set of
protein-ligand complexes is designed to be. The philosophy is to account
for peculiarities in the modelling and parameterization of a given
set of compounds, so that both optimal and inexpensive predictive
models can be derived.
In ``3D QSAR in Drug Design.
Volume 2: Ligand Protein Interactions and Molecular Similiarity'' and
Perspectives in Drug Discovery and Design (1998) 9, 19-34.
Eds. Kubinyi,H., Folkers,G. and Martin,Y.
Kluwer Academic Publishers, Dordrecht, The Netherlands. (1998).