Page 1 of 1

BioLiP mismatch with updated PDB atom records (ligand residue name & identifier)

Posted: Mon Oct 02, 2023 1:37 pm
by cpfxyz
Hello,
I have recently detected a small number of mismatches in i) ligand residue name identifiers and ii) ligand residue index identifiers. This is most likely because PDB has revised the mmCIF atom records between BioLiP releases. I provide here a few examples I came across:

pdb_code ligand_chain ligand_name ligand_index
2h35 A HEC 142: Ligand "HEC" is not present in the current version of the mmCIF (or the PDB) file. It seems to have been replaced by "HEM".
2m6z A HEC 201: The same change here, "HEC":"HEM".
3pse B 3CN 157: "3CN":"4LJ".
1t4c A COA 1: Here the ligand name stays the same, but the index (residue sequence number) is changed from 1 to 501.

Changes in index or ligand name are not very common in the PDB from my experience, but apparently they do happen. Is there a plan to update BioLiP to reflect the current atom records?

P.S. Thank you for implementing the ligand residue index (sequence) identifier in BioLiP.txt.gz, it is a life saver.

Re: BioLiP mismatch with updated PDB atom records (ligand residue name & identifier)

Posted: Tue Oct 03, 2023 1:03 am
by zcx@umich.edu
Most likely we will not update PDB records that are changed after initial release, as regenerating the full database for every weekly release is a huge undertake. By the way, for the cases you mentioned where the ligand name is changed, it is not entirely true that the ligand is not present in the current version of the mmCIF file. For example, in https://files.rcsb.org/view/2H35.cif, there is the following block showing that the author originally assign HEC rather than HEM as the ligand name:

#
loop_
_pdbx_nonpoly_scheme.asym_id
_pdbx_nonpoly_scheme.entity_id
_pdbx_nonpoly_scheme.mon_id
_pdbx_nonpoly_scheme.ndb_seq_num
_pdbx_nonpoly_scheme.pdb_seq_num
_pdbx_nonpoly_scheme.auth_seq_num
_pdbx_nonpoly_scheme.pdb_mon_id
_pdbx_nonpoly_scheme.auth_mon_id
_pdbx_nonpoly_scheme.pdb_strand_id
_pdbx_nonpoly_scheme.pdb_ins_code
E 3 HEM 1 142 142 HEM HEC A .
F 3 HEM 1 147 147 HEM HEC B .
G 3 HEM 1 142 142 HEM HEC C .
H 3 HEM 1 147 147 HEM HEC D .
#

Re: BioLiP mismatch with updated PDB atom records (ligand residue name & identifier)

Posted: Wed Oct 04, 2023 1:14 pm
by cpfxyz
Thank you for the reply. The atom records section you quoted is where I got the hint for the revisions.
Unfortunately, these records are not used from molecular visualization applications or parsers (they can only "see" the revised ligand). I understand it is impractical to rebuild the database weekly or even monthly. I was mostly wondering whether there is a plan to revise it perhaps annually for example, to reflect possibly meaningful updates (ligand name/index), but again, in my experience such revisions are not that common anyway.
Thank you again!