Skip to content

ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties

License

Notifications You must be signed in to change notification settings

KrishnaswamyLab/ImmunoStruct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImmunoStruct

bioRxiv Twitter Follow GitHub Stars

ImmunoStruct: A multimodal neural network framework for immunogenicity prediction
from peptide-MHC sequence, structure, and biochemical properties

Table of Contents
  1. About The Project
  2. Citation
  3. Getting Started
  4. Usage
  5. Model Architecture
  6. Troubleshooting
  7. Contributing
  8. License
  9. Contact
  10. Acknowledgments

About The Project

ImmunoStruct Architecture

ImmunoStruct is a multimodal deep learning framework that integrates sequence, structural, and biochemical information to predict multi-allele class-I peptide-MHC immunogenicity. By leveraging multimodal data from ~27,000 peptide-MHCs and jointly modeling sequence and structure, ImmunoStruct significantly improves immunogenicity prediction performance for both infectious disease epitopes and cancer neoepitopes.

(back to top)

Key Features

  • Multimodal Integration: Combines peptide-MHC protein sequence, structure, and biochemical properties
  • Novel Cancer-Wildtype Contrastive Learning: Enhances specificity for cancer neoepitope detection
  • Enhanced Interpretability: Provides insights into the substructural basis of immunogenicity
Contrastive Learning Approach

(back to top)

Citation

If you use ImmunoStruct in your research, please cite our paper:

@article{givechian2024immunostruct,
  title={ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties},
  author={Givechian, Kevin Bijan and Rocha, Joao Felipe and Yang, Edward and Liu, Chen and Greene, Kerrie and Ying, Rex and Caron, Etienne and Iwasaki, Akiko and Krishnaswamy, Smita},
  journal={bioRxiv},
  pages={2024--11},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

(back to top)

Getting Started

To get ImmunoStruct up and running locally, follow these steps.

Pre-requisites

Before installation, ensure you have:

  • Python 3.10+
  • CUDA-compatible GPU (recommended)
  • Conda package manager
  • Weights & Biases account for experiment tracking

Dependencies

  • python 3.10
  • torch 2.1.2
  • dgl
  • torch_geometric 2.5.3

Installation

  1. Clone the repository

    git clone https://github.com/KrishnaswamyLab/ImmunoStruct.git
    cd ImmunoStruct
  2. Create and activate conda environment

    conda create --name immuno python=3.10 -c anaconda -c conda-forge
    conda activate immuno
  3. Install core dependencies

    conda install cudatoolkit=11.2 wandb pydantic -c conda-forge
    conda install scikit-image pillow matplotlib seaborn tqdm -c anaconda
  4. Install PyTorch

    python -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
  5. Install DGL

    python -m pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu118/repo.html
    python -m pip install torchdata==0.7.1
  6. Install PyTorch Geometric and related packages

    python -m pip install torch-scatter==2.1.2+pt21cu118 torch-sparse==0.6.18+pt21cu118 torch-cluster==1.6.3+pt21cu118 torch-spline-conv==1.2.2+pt21cu118 torch_geometric==2.5.3 numpy==1.26.3 -f https://data.pyg.org/whl/torch-2.1.2+cu118.html
  7. Install additional packages

    python -m pip install graphein[extras]
    python -m pip install lifelines
    python -m pip install -U phate
    python -m pip install multiscale-phate
  8. Set up environment variables (if needed)

    export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH

(back to top)

Usage

Data Preparation

Place the following files in the data/ folder:

  • cedar_data_final_with_mprop1_mprop2_v2.txt
  • complete_score_Mprops_1_2_smoothed_sasa_v2.txt
  • HLA_27_seqs_csv.csv

Additionally, ensure you have these folders:

  • graph_pyg_Cancer
  • graph_pyg_IEDB

Generate PyG graph files:

These PyG graph files can be generated using the below command from the corresponding AlphaFold folders.

python immunostruct/preprocessing/cancer_graph_construction_new_KBG.py

Training and Testing

  1. Set up Weights & Biases

    Create a project on Weights & Biases matching your project name.

  2. Run Experiments

    # HybridModelv2 with full sequence and sequence loss
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModelv2 --wandb-username YOUR_WANDB_USERNAME
    
    # HybridModel with full sequence and sequence loss
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModel --wandb-username YOUR_WANDB_USERNAME
    
    # Sequence with fingerprint model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceFpModel --wandb-username YOUR_WANDB_USERNAME
    
    # Sequence-only model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceModel --wandb-username YOUR_WANDB_USERNAME
    
    # Structure-only model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --model StructureModel --wandb-username YOUR_WANDB_USERNAME

(back to top)

Troubleshooting

Common Issues

GLIBCXX Error

ImportError: $some_path/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found

Solution: Add your conda environment path to LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH

CUDA Compatibility Issues

  • Ensure your CUDA version matches the PyTorch installation
  • Verify GPU availability with torch.cuda.is_available()

Memory Issues

  • Reduce batch size in training scripts
  • Use gradient checkpointing for large models

Wandb Authentication

  • Login to Wandb: wandb login
  • Ensure project names match between script and Wandb dashboard

(back to top)

License

Distributed under the Yale License. See LICENSE.txt for more information.

(back to top)

Contact

Krishnaswamy Lab - @KrishnaswamyLab

Project Link: https://github.com/KrishnaswamyLab/ImmunoStruct

(back to top)

About

ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages