ImmunoStruct: A multimodal neural network framework for immunogenicity prediction
from peptide-MHC sequence, structure, and biochemical properties
Table of Contents
ImmunoStruct is a multimodal deep learning framework that integrates sequence, structural, and biochemical information to predict multi-allele class-I peptide-MHC immunogenicity. By leveraging multimodal data from ~27,000 peptide-MHCs and jointly modeling sequence and structure, ImmunoStruct significantly improves immunogenicity prediction performance for both infectious disease epitopes and cancer neoepitopes.
- Multimodal Integration: Combines peptide-MHC protein sequence, structure, and biochemical properties
- Novel Cancer-Wildtype Contrastive Learning: Enhances specificity for cancer neoepitope detection
- Enhanced Interpretability: Provides insights into the substructural basis of immunogenicity
If you use ImmunoStruct in your research, please cite our paper:
@article{givechian2024immunostruct,
title={ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties},
author={Givechian, Kevin Bijan and Rocha, Joao Felipe and Yang, Edward and Liu, Chen and Greene, Kerrie and Ying, Rex and Caron, Etienne and Iwasaki, Akiko and Krishnaswamy, Smita},
journal={bioRxiv},
pages={2024--11},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
To get ImmunoStruct up and running locally, follow these steps.
Before installation, ensure you have:
- Python 3.10+
- CUDA-compatible GPU (recommended)
- Conda package manager
- Weights & Biases account for experiment tracking
- python 3.10
- torch 2.1.2
- dgl
- torch_geometric 2.5.3
-
Clone the repository
git clone https://github.com/KrishnaswamyLab/ImmunoStruct.git cd ImmunoStruct
-
Create and activate conda environment
conda create --name immuno python=3.10 -c anaconda -c conda-forge conda activate immuno
-
Install core dependencies
conda install cudatoolkit=11.2 wandb pydantic -c conda-forge conda install scikit-image pillow matplotlib seaborn tqdm -c anaconda
-
Install PyTorch
python -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
-
Install DGL
python -m pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu118/repo.html python -m pip install torchdata==0.7.1
-
Install PyTorch Geometric and related packages
python -m pip install torch-scatter==2.1.2+pt21cu118 torch-sparse==0.6.18+pt21cu118 torch-cluster==1.6.3+pt21cu118 torch-spline-conv==1.2.2+pt21cu118 torch_geometric==2.5.3 numpy==1.26.3 -f https://data.pyg.org/whl/torch-2.1.2+cu118.html
-
Install additional packages
python -m pip install graphein[extras] python -m pip install lifelines python -m pip install -U phate python -m pip install multiscale-phate
-
Set up environment variables (if needed)
export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH
Place the following files in the data/
folder:
cedar_data_final_with_mprop1_mprop2_v2.txt
complete_score_Mprops_1_2_smoothed_sasa_v2.txt
HLA_27_seqs_csv.csv
Additionally, ensure you have these folders:
graph_pyg_Cancer
graph_pyg_IEDB
Generate PyG graph files:
These PyG graph files can be generated using the below command from the corresponding AlphaFold folders.
python immunostruct/preprocessing/cancer_graph_construction_new_KBG.py
-
Set up Weights & Biases
Create a project on Weights & Biases matching your project name.
-
Run Experiments
# HybridModelv2 with full sequence and sequence loss python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModelv2 --wandb-username YOUR_WANDB_USERNAME # HybridModel with full sequence and sequence loss python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModel --wandb-username YOUR_WANDB_USERNAME # Sequence with fingerprint model python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceFpModel --wandb-username YOUR_WANDB_USERNAME # Sequence-only model python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceModel --wandb-username YOUR_WANDB_USERNAME # Structure-only model python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --model StructureModel --wandb-username YOUR_WANDB_USERNAME
GLIBCXX Error
ImportError: $some_path/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found
Solution: Add your conda environment path to LD_LIBRARY_PATH
:
export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH
CUDA Compatibility Issues
- Ensure your CUDA version matches the PyTorch installation
- Verify GPU availability with
torch.cuda.is_available()
Memory Issues
- Reduce batch size in training scripts
- Use gradient checkpointing for large models
Wandb Authentication
- Login to Wandb:
wandb login
- Ensure project names match between script and Wandb dashboard
Distributed under the Yale License. See LICENSE.txt
for more information.
Krishnaswamy Lab - @KrishnaswamyLab
Project Link: https://github.com/KrishnaswamyLab/ImmunoStruct