1943 complexes for re-docking



Procedure:
  • 1984 heteromeric macromolecules downloaded from PQS, http://pqs.ebi.ac.uk/ , January 2002 . Heteromer means that the macromolecule has at least 2 different chains, see http://pqs.ebi.ac.uk/pqs-doc/pqs-help.html to learn what means "diffferent chains".
  • Chains of macromolecules tested for interactions (less than 6 A distance between atoms of different chains, chains longer than 40 aminoacids). This gave 2783 interchain pairwise interactions.
  • Sequences of interacting proteins were compared to homomeric+monomeric proteins in PQS (from 7000+4771 macromolecules).
  • All hits with the similarity measure (blast e-value) better than 1e-20 were recorded.



  • Result:

    Annotated list of complexes - here (4.5 MB file, here is zipped version, 436 KB).
    Here is a selected list of comlexes, which have free form structures of both subunits .


    Format:

    ===================================
    Database reference line
    Protein-1 information
    homologues:
    Protein-1 homologue-1
    Protein-1 homologue-2
    ...
    ++++++++++
    Protein-2 information
    homologues:
    Protein-2 homologue-1
    Protein-2 homologue-2
    ...
    ++++++++++

    Annotation of homologues:
    pdb-id, no of aminoacids in the pdb file, sequence similarity (e-value) and pdb-annotation (absent if not available).



    Remarks:
  • Seems like NMR coordinates are not in PQS, therefore they did not appear in this list too.
  • The difference between monomers and homomers can be traced from pdb-id of homologues. Monomers do not have chain ID, sometimes they have molecule ID, like _1, _2 etc: 4pti, 1aal_1 - are the monomeric structures of bpti. Homomers always have chain ID: 6ptiA is from homomeric structure of BPTI (whatever this means).
  • Mono/homomeric homologue is not necessarily the same protein as in heteromer and e-value can not tell this, because it depends on the length and complexity of aligned parts of the sequences, not only on sequence identity (almost identical short 50/51 aa sequences can have e-value of 4e-26, for example). Looking at the sequence lengths + e-values would give better idea about this.
  • No checks for source organisms done, i.e. there should be cases, when the proteins in heteromer and mono/homomer are from different organisms.
  • Not necessarily all related homologues are found by cutoff 1e-20. I was adjusting this number so, that all free structures used in docking papers are found, like 1aapA having e-value 1e-32 to 1brcI, 1bpi having e-value 4e-26 to 1brbI.
  • Number of homologues listed is never more than 100.
  • No attempt done to make combinations of interacting chains, for example, antibody-antigen cases appear as 3 interactions, between L and H, L and antigen, H and antigen.
  • Annotations not always give description of that specific chain, because they are generated from COMPND remarks of pdb files in automated way, and these remarks sometimes do not have descriptions of all chains.
  • There are many cases (like hemoglobin) when heteromer is almost the same as homomer.
  • In some cases the same protein has several chains in some pdb files and only one in the other (like Thrombin), and the homologues are in fact for both chains simultaneously. This can be traced by comparing the homologue lists (if they are the same for both interacting chains) and residue lengths.



  • Razif Privacy Imprint