PPLM: protein-protein language model for inter-protein contact & interaction prediction

Online Services

●I-TASSER ●C-I-TASSER ●QUARK ●C-QUARK ●LOMETS ●COACH ●COFACTOR ●MetaGO ●MUSTER ●CEthreader ●SEGMER ●FG-MD ●ModRefiner ●REMO ●DEMO ●SPRING ●COTH ●Threpp ●BSpred ●ANGLOR ●EDock ●BSP-SLIM ●SAXSTER ●FUpred ●ThreaDom ●ThreaDomEx ●EvoDesign ●CR-I-TASSER ●GPCR-I-TASSER ●MAGELLAN ●BindProf ●BindProfX ●SSIPe ●ResQ ●IonCom ●STRUM ●DAMpred

●TM-score ●TM-align ●US-align ●MM-align ●RNA-align ●NW-align ●LS-align ●EDTSurf ●MVP ●MVP-Fit ●SPICKER ●HAAD ●PSSpred ●3DRobot ●MR-REX ●I-TASSER-MR ●SVMSEQ ●NeBcon ●ResPRE ●TripletRes ●WDL-RF ●ATPbind ●DockRMSD ●DeepMSA ●FASPR ●EM-Refiner

●BioLiP ●E. coli ●GLASS ●GPCR-HGmod ●GPCR-RD ●GPCR-EXP ●Tara-3D ●TM-fold ●DECOYS ●POTENTIAL ●RW/RWplus ●EvoEF ●HPSF ●THE-DB ●ADDRESS ●Alpaca-Antibody ●CASP7 ●CASP8 ●CASP9 ●CASP10 ●CASP11 ●CASP12 ●CASP13 ●CASP14

PPLM

Method of PPLM

PPLM is a protein–protein language model that learns directly from paired sequences through a novel attention architecture, explicitly capturing inter-chain context. Building on PPLM, we developed PPLM-PPI, PPLM-Affinity, and PPLM-Contact for predicting protein–protein interactions, estimating binding affinity, and identifying interface residue contacts, respectively. In PPLM-PPI, the embeddings and attention matrices generated by PPLM are first pooled using max and mean pooling strategies, and then passed through a multilayer perceptron to predict the interaction probability between the input sequences. In PPLM-Affinity, the final layer of PPLM is fine-tuned on binding affinity data, and its embeddings are utilized to predict the binding affinity of the receptor and ligand sequences through max pooling and two translation layers. n PPLM-Contact, the inter-protein attention matrices generated by PPLM are integrated with MSA-derived features and monomer distance maps to capture both evolutionary and structural information, which are then used to predict interface residue contacts through a novel inter-protein transformer network.

Figure 1. Overview of the PPLM framework and its downstream applications. (A) PPLM architecture. Paired protein sequences are independently tokenized and then concatenated. The concatenated tokens are processed through transformer blocks equipped with a cross-protein attention mechanism, generating embeddings and attention matrix for the paired sequence. (B) PPLM-Contact pipeline. The inter-protein attention matrix from PPLM is integrated with MSA-derived features and monomer distance maps to capture both evolutionary and structural information. These features are processed by a specialized inter-protein transformer composed of three core modules, each adopting a parallel architecture to update inter-chain representations using both intra- and inter-protein information. (C) Pipeline of PPLM-Affinity. The last layer of PPLM is fine-tuned on affinity dataset to generate the embedding of input receptor and ligand sequences. These embeddings are aggregated using max pooling across sequence length, followed by two fully connected layers to predict the binding affinity. (D) Pipeline of PPLM-PPI. PPLM is used to generate the embeddings, intra-protein attention matrix and inter-protein attention matrix for the input sequences. These features are aggregated using max and mean pooling, followed by linear layers, and then concatenated. The resulting representations are passed to a multilayer perceptron to predict the interaction probability between the input sequences.

PPLM Server

PPLM server contain two modes: PPLM-Contact for protein-protein contact prediction and PPLM-PPI for protein-protein interaction prediction. User can choose the mode through the selection box.

PPLM-PPI

Input
PPLM-PPI accepts paired protein sequences (separated by a colon ':') as input in FASTA format, with support for up to 50 sequence pairs. Users are also required to provide an email address for job completion notifications. Additionally, after submitting a job, an email will be sent to notify users that their job is running.

Output
The output of PPLM-PPI includes the predicted interaction probability along with the sequences of each protein pair.

PPLM-Affinity

Input
PPLM-PPI accepts receptor and ligand sequences as input in FASTA format, with support multiple sequences in the receptor or ligand. Users are also required to provide an email address for job completion notifications. Additionally, after submitting a job, an email will be sent to notify users that their job is running.

Output
The output of PPLM-Affinity includes the predicted binding affinity along with the sequences of receptor and ligand.

PPLM-Contact

Input
PPLM-Contact accepts two monomer protein structure in PDB format as input, along with an email address for job completion notifications. Additionally, after submitting a job, an email will be sent to notify users that their job is running.

Output
The output of PPLM-Contact contains predicted contact map and the detailed information of contacts. ResIdx1 and ResIdx2 indicate the residue indexes of first (A) and second (B) proteins, respectively. ResTyep1 and ResType2 represent the amino acid types of ResIdx1 and ResIdx2, respectively. Contact_Probability denotes the probability that ResIdx1 and ResIdx2 is contact.

[Back to PPLM server]

References

Jun Liu, Hungyu Chen, Yang Zhang. A Protein-Protein Language Model for Interaction, Binding Affinity, and Interface Contact Prediction. In preparation.

zhanglab

zhanggroup.org | +65-6601-1241 | Computing 1, 13 Computing Drive, Singapore 117417