#################################################################################################### _____ _____ _ __ __ | __ \ | __ \ | | | \/ | | |__) || |__) || | | \ / | | ___/ | ___/ | | | |\/| | | | | | | |____ | | | | |_| |_| |______||_| |_| (Version 1.0, 03/25/2025) (Copyrighted by the Regents of the National University of Singapore, All rights reserved) PPLM is a protein–protein language model that learns directly from paired sequences through a novel attention architecture, explicitly capturing inter-protein context. Building on PPLM, we developed PPLM-PPI, PPLM-Affinity, and PPLM-Contact for predicting protein–protein interactions, estimating binding affinity, and identifying interface residue contacts, respectively. Author: Jun Liu, Hungyu Chen, and Yang Zhang. For bug reports and inquiries, please contact: junl_sg@nus.edu.sg If you use this program, please cite: Jun Liu, Hungyu Chen, Yang Zhang. A Corporative Language Model for Protein-Protein Interaction, Binding Affinity, and Interface Contact Prediction. In preparation. This is the stand-alone program. Alternatively, users can submit jobs online at: https://zhanggroup.org/PPLM/ The source code is freely available to academic and non-profit users under the PolyForm Noncommercial License #################################################################################################### ####################################### System Requirements ######################################## x86_64 machine, Linux kernel OS. ######################### Software & Dataset Requirements for PPLM-Contact ######################### 1. HH-suite3 for MSA Search. Install HH-suite3 (https://github.com/soedinglab/hh-suite) and set the "hhsuite_dir" parameter in the "pplm_contact/config.py" file. 2. Uniclust Database for MSA Search. Download the Uniclust database (http://wwwuser.gwdg.de/~compbiol/uniclust/2021_03/)and set the "UniRef_database" parameter in the "config.py" file. 3. CCMpred for Direct Coupling Analysis (DCA). Install ccmpred (https://github.com/soedinglab/CCMpred), or use the pre-packaged version in the "pplm_contact/external_tools" directory. Set the "ccmpred" parameter in the "config.py" file. 4. LoadHHM for PSSM Calculation. Download "LoadHHM.py" (https://github.com/j3xugit/RaptorX-Contact/blob/master/Common/LoadHHM.py) and place the file in the "pplm_contact" directory of the PPLM package, or use the pre-packaged version within the "pplm_contact" directory. 5. ESM-MSA for Feature Generation. Install the ESM package (https://github.com/facebookresearch/esm), or use the pre-packaged version within "pplm_contact/external_tools" directory. Download the pre-trained model and set the "esm_msa_model" parameter in the "config.py" file. (https://dl.fbaipublicfiles.com/fair-esm/models/esm_msa1_t12_100M_UR50S.pt) ############################################## Usage ############################################### 1. Install PPLM environment: conda env create -f environment.yml 2. Activate PPLM environment: conda activate PPLM 3. Run PPLM-PPI for a batch of paired sequences: python run_pplm-ppi.py example/seq_pairs.fasta example/seq_pairs.results You can also run PPLM-PPI for two individual sequences: python pplm_ppi/predict.py example/seq1.fasta example/seq2.fasta 4. Run PPLM-Affinity for receptor and ligand sequences: python run_pplm-affinity.py example/receptor.fasta example/ligand.fasta 5. Run PPLM-Contact for protein homodimer: python run_pplm-contact.py example/protein.pdb example/protein.pdb example/homo_example 6. Run PPLM-Contact for protein heterodimer: python run_pplm-contact.py example/protein1.pdb example/protein2.pdb example/hetero_example 7. Run PPLM to generate embeddings and attention weights for other applications: python run_pplm.py example/seq1.fasta example/seq2.fasta example/seq1-seq2.pplm.pkl ########################################## Example Output ########################################## 1. Output of PPLM-PPI: (a) Run PPLM-PPI: python run_pplm-ppi.py example/seq_pairs.fasta example/seq_pairs.results (b) The predicted interaction probabilities are saved in example/seq_pairs.results. The file is structured as follows: >10090.ENSMUSP00000085394:10090.ENSMUSP00000116785 0.001926 >10090.ENSMUSP00000043111:10090.ENSMUSP00000102211 0.991765 >10090.ENSMUSP00000134644:10090.ENSMUSP00000131939 0.000425 >10090.ENSMUSP00000104648:10090.ENSMUSP00000095136 0.060997 >10090.ENSMUSP00000131855:10090.ENSMUSP00000118766 0.004577 >10090.ENSMUSP00000008036:10090.ENSMUSP00000046016 0.929329 ... Each entry consists of: • Protein Pair: Represented in the format >Protein1:Protein2. • Interaction Probability: The likelihood of interaction between the given protein pair. 2. Output of PPLM-Affinity: (a) Run PPLM-PPI: python run_pplm-ppi.py example/receptor.fasta example/ligand.fasta (b) The predicted binding affinity will be directly printed to the command line. Predicted binding affinity: -7.6090136 3. Output of PPLM-Contact: (a) Run PPLM-Contact: python run_pplm-contact.py example/protein.pdb example/protein.pdb example/homo_example (b) The predicted contacts are saved in example/homo_example/homo_example.pred_contact.txt. The file is structured as follows: Rank ResIdx1 ResType1 ResIdx2 ResType2 Contact_Probability 1 23:A MET 26:B CYS 0.976151 2 26:A CYS 23:B MET 0.974481 3 22:A ILE 26:B CYS 0.971633 4 23:A MET 30:B GLN 0.971191 5 30:A GLN 22:B ILE 0.970514 6 27:A GLY 23:B MET 0.970334 7 22:A ILE 30:B GLN 0.970124 8 30:A GLN 23:B MET 0.96919 9 23:A MET 27:B GLY 0.966725 10 23:A MET 23:B MET 0.966512 ... • ResIdx1 and ResIdx2: Residue indexes of first (A) and second (B) proteins, respectively. • ResTyep1 and ResType2: Amino acid types corresponding to ResIdx1 and ResIdx2. • Contact_Probability: The predicted probability of residue contact. ####################################################################################################