Chains of macromolecules tested for interactions
(less than 6 A distance between atoms of different chains, chains longer
than 40 aminoacids). This gave 2783 interchain pairwise interactions.
Sequences of interacting proteins were compared
to homomeric+monomeric proteins in PQS (from 7000+4771 macromolecules).
All hits with the similarity measure (blast
e-value) better than 1e-20 were recorded.
===================================
Database reference line
Protein-1 information
homologues:
Protein-1 homologue-1
Protein-1 homologue-2
...
++++++++++
Protein-2 information
homologues:
Protein-2 homologue-1
Protein-2 homologue-2
...
++++++++++
Annotation of homologues:
pdb-id, no of aminoacids in the pdb file,
sequence similarity (e-value) and pdb-annotation (absent if not available).
Remarks:
Seems like NMR coordinates are not in PQS,
therefore they did not appear in this list too.
The difference between monomers and homomers
can be traced from pdb-id of homologues. Monomers do not have chain
ID, sometimes they have molecule ID, like _1, _2 etc: 4pti, 1aal_1 - are
the monomeric structures of bpti. Homomers always have chain ID:
6ptiA is from homomeric structure of BPTI (whatever this means).
Mono/homomeric homologue is not necessarily
the same protein as in heteromer and e-value can not tell this, because
it depends on the length and complexity of aligned parts of the sequences,
not only on sequence identity (almost identical short 50/51 aa sequences
can have e-value of 4e-26, for example). Looking at the sequence
lengths
+ e-values would give better idea about this.
No checks for source organisms done, i.e.
there should be cases, when the proteins in heteromer and mono/homomer
are from different organisms.
Not necessarily all related homologues are
found by cutoff 1e-20. I was adjusting this number so, that all free
structures used in docking papers are found, like 1aapA having e-value
1e-32 to 1brcI, 1bpi having e-value 4e-26 to 1brbI.
Number of homologues listed is never more
than 100.
No attempt done to make combinations of interacting
chains, for example, antibody-antigen cases appear as 3 interactions, between
L and H, L and antigen, H and antigen.
Annotations not always give description of
that specific chain, because they are generated from COMPND remarks of
pdb files in automated way, and these remarks sometimes do not have descriptions
of all chains.
There are many cases (like hemoglobin) when
heteromer is almost the same as homomer.
In some cases the same protein has several
chains in some pdb files and only one in the other (like Thrombin), and
the homologues are in fact for both chains simultaneously. This can
be traced by comparing the homologue lists (if they are the same for both
interacting chains) and residue lengths.