PhaseMotif is a sequence-based PS IDR classifier built with interpretable deep attention framework. Phase separation is crucial for the formation of biomolecular condensates, which play key roles in regulating cellular activities such as gene expression and signal transduction. This repository provides the source code for PhaseMotif, enabling researchers to explore and analyze key regions within IDRs that drive phase separation.
📄 This work has been published — see the Citation section for the full reference.
To make it even easier to use, please visit the PhaseMotif Website
# Step 1: Create a new conda virtual environment with Python 3.8
conda create --name myenv python=3.8
# Step 2: Activate the new environment
conda activate myenv
# Step 3: Install the specified versions of PyTorch and related libraries
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
# Note: If the download fails, please visit the official PyTorch website (https://pytorch.org/) to find and install the PyTorch version that matches your CUDA version.
# Step 4: Clone the PhaseMotif repository from GitHub
git clone https://github.com/TingtingLiGroup/PhaseMotif.git
# Step 5: Navigate into the cloned repository directory
cd PhaseMotif
# Step 6: Install the package using setup.py
python setup.py installIf you do not want to configure CUDA and dependencies manually, you can directly use our prebuilt Docker image.
The image already includes pretrained weights under /app/model_save, so it works out of the box.
docker pull ghcr.io/tingtingligroup/phasemotif:latest
docker run --rm -it --gpus all ghcr.io/tingtingligroup/phasemotif:latest
Inside Python:
import PhaseMotif as pm
idr_list = ["HNSNRQLERSGRFGGNPGGFGNQGGFGNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWGMMGMLASQQNQSGPSGNNQNQGNMQREPNQAFGSGNNSYSGSNSGAAIGWGSASNAGSGSGFNGGFGSSMDSKSSGWGM", "GPTLSEDNLSYYKSQPGFQKMSADKMPPPSPDSENGFYPGLPSSMNPAFFPSFSPVSPHGCTGLSVPTSGGGGGGFGGPFSATAVPPPPPPAMNIPQQQPPPPAAPQQPQSRRSPVSPQLQQQHQAAAAAFLQQRNSYNHHQPLLKQSPWSNHQSSGWGTGSMSWGAMHGRDHRRTGNMGIPGTMNQISPLKKPFSGNVIAPPKFTRSTPSLTPKSWIEDNVFRTDNNSNTLLPLQVRSSLQLPAWGSDSLQDSWCTAAGTSRIDQDRSRMYDSLNMHSLENSLIDIMRAEHDPLKGRLSYPHPGTDNLLMLNGRSSLFPIDDGLLDDGHSDQVGVLNSPTCYSAHQNGE"]
idr_name = ["TDP43", "CPEB2"]
# Analyse
pm.analyse_main(idr_list)
pm.analyse_main(idr_list, idr_name, paint=True)
# Predict
df = pm.predict_main(idr_list, idr_name)
print(df)
Linux
docker run --rm --gpus all ghcr.io/tingtingligroup/phasemotif:latest - <<'PY'
import PhaseMotif as pm
idr_list = ["HNSNRQLERSGRFGGNPGGFGNQGGFGNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWGMMGMLASQQNQSGPSGNNQNQGNMQREPNQAFGSGNNSYSGSNSGAAIGWGSASNAGSGSGFNGGFGSSMDSKSSGWGM", "GPTLSEDNLSYYKSQPGFQKMSADKMPPPSPDSENGFYPGLPSSMNPAFFPSFSPVSPHGCTGLSVPTSGGGGGGFGGPFSATAVPPPPPPAMNIPQQQPPPPAAPQQPQSRRSPVSPQLQQQHQAAAAAFLQQRNSYNHHQPLLKQSPWSNHQSSGWGTGSMSWGAMHGRDHRRTGNMGIPGTMNQISPLKKPFSGNVIAPPKFTRSTPSLTPKSWIEDNVFRTDNNSNTLLPLQVRSSLQLPAWGSDSLQDSWCTAAGTSRIDQDRSRMYDSLNMHSLENSLIDIMRAEHDPLKGRLSYPHPGTDNLLMLNGRSSLFPIDDGLLDDGHSDQVGVLNSPTCYSAHQNGE"]
idr_name = ["TDP43", "CPEB2"]
print(pm.predict_main(idr_list, idr_name))
PY
Windows PowerShell:
docker run --rm --gpus all ghcr.io/tingtingligroup/phasemotif:latest -c `
"import PhaseMotif as pm; `
idr_list = ["HNSNRQLERSGRFGGNPGGFGNQGGFGNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWGMMGMLASQQNQSGPSGNNQNQGNMQREPNQAFGSGNNSYSGSNSGAAIGWGSASNAGSGSGFNGGFGSSMDSKSSGWGM", "GPTLSEDNLSYYKSQPGFQKMSADKMPPPSPDSENGFYPGLPSSMNPAFFPSFSPVSPHGCTGLSVPTSGGGGGGFGGPFSATAVPPPPPPAMNIPQQQPPPPAAPQQPQSRRSPVSPQLQQQHQAAAAAFLQQRNSYNHHQPLLKQSPWSNHQSSGWGTGSMSWGAMHGRDHRRTGNMGIPGTMNQISPLKKPFSGNVIAPPKFTRSTPSLTPKSWIEDNVFRTDNNSNTLLPLQVRSSLQLPAWGSDSLQDSWCTAAGTSRIDQDRSRMYDSLNMHSLENSLIDIMRAEHDPLKGRLSYPHPGTDNLLMLNGRSSLFPIDDGLLDDGHSDQVGVLNSPTCYSAHQNGE"]
idr_name = ["TDP43", "CPEB2"]
print(pm.predict_main(idr_list,idr_name))"
- Make sure your host has an NVIDIA GPU with the proper driver installed.
- Install the NVIDIA Container Toolkit so that
--gpus allworks inside Docker: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html - The container runs a Python interpreter by default, so you can either start an interactive session (
-it) or run inline scripts with-c "...".
| Function | Description | Parameters | Returns |
|---|---|---|---|
analyse_main |
Analyzes IDR sequences and generates visualizations (optional). | idr_list, idr_name=None, paint=False |
DataFrame with analysis results |
predict_main |
Predicts the results for IDR sequences. | idr_list, idr_name=None |
DataFrame with prediction results |
generate |
Generates new sequences based on a specified cluster. | cluster, epoch=20, overLap=3, nomalize_threshold=0.95 |
DataFrame |
The analyse_main function performs analysis on Intrinsically Disordered Regions (IDRs). It evaluates the density of significant points, frequency of selections, important points, and cluster labels for each IDR sequence.
- idr_list (list of str): List of IDR sequences to be analyzed.
- idr_name (list of str, optional): List of names for each IDR sequence. If not provided, names will be generated automatically.
- paint (bool, optional): Flag indicating whether to generate visualizations for the results. Default is
False.
- analyse_result_df (pd.DataFrame): DataFrame containing the analysis results including IDR name, sequence, density, important positions, frequency of selections, cluster labels, and key regions.
import PhaseMotif as pm
# Sample IDR sequences
idr_list = ["HNSNRQLERSGRFGGNPGGFGNQGGFGNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWGMMGMLASQQNQSGPSGNNQNQGNMQREPNQAFGSGNNSYSGSNSGAAIGWGSASNAGSGSGFNGGFGSSMDSKSSGWGM", "GPTLSEDNLSYYKSQPGFQKMSADKMPPPSPDSENGFYPGLPSSMNPAFFPSFSPVSPHGCTGLSVPTSGGGGGGFGGPFSATAVPPPPPPAMNIPQQQPPPPAAPQQPQSRRSPVSPQLQQQHQAAAAAFLQQRNSYNHHQPLLKQSPWSNHQSSGWGTGSMSWGAMHGRDHRRTGNMGIPGTMNQISPLKKPFSGNVIAPPKFTRSTPSLTPKSWIEDNVFRTDNNSNTLLPLQVRSSLQLPAWGSDSLQDSWCTAAGTSRIDQDRSRMYDSLNMHSLENSLIDIMRAEHDPLKGRLSYPHPGTDNLLMLNGRSSLFPIDDGLLDDGHSDQVGVLNSPTCYSAHQNGE"]
idr_name = ["TDP43", "CPEB2"]
# Analyze the IDR sequences without naming or painting
pm.analyse_main(idr_list)
# Analyze with name and visualization
pm.analyse_main(idr_list, idr_name, paint=True)
# If you need to further manipulate or visualize the results, you can store them in a DataFrame
# results_df = pm.analyse_main(idr_list)- The function will check if the lengths of
idr_listandidr_namematch. If they don't, it raises aValueError. - Each element in
idr_namemust be a non-empty string. If any element doesn't meet this criterion, aValueErroris raised. - If
paintis set toTrue, visualizations for each IDR will be saved in thePM_analyse/Pic_resultdirectory. - The results DataFrame is also saved as
PM_analyse/PM_analyse_result.csv.
The predict_main function predicts the results for a list of Intrinsically Disordered Regions (IDRs). It evaluates the predict scores for each IDR sequence.
- idr_list (list of str): List of IDR sequences to be analyzed.
- idr_name (list of str, optional): List of names for each IDR sequence. If not provided, names will be generated automatically.
- predict_result_list (pd.DataFrame): DataFrame containing the prediction results including IDR name, sequence, and predict score.
import PhaseMotif as pm
idr_list = ["HNSNRQLERSGRFGGNPGGFGNQGGFGNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWGMMGMLASQQNQSGPSGNNQNQGNMQREPNQAFGSGNNSYSGSNSGAAIGWGSASNAGSGSGFNGGFGSSMDSKSSGWGM", "GPTLSEDNLSYYKSQPGFQKMSADKMPPPSPDSENGFYPGLPSSMNPAFFPSFSPVSPHGCTGLSVPTSGGGGGGFGGPFSATAVPPPPPPAMNIPQQQPPPPAAPQQPQSRRSPVSPQLQQQHQAAAAAFLQQRNSYNHHQPLLKQSPWSNHQSSGWGTGSMSWGAMHGRDHRRTGNMGIPGTMNQISPLKKPFSGNVIAPPKFTRSTPSLTPKSWIEDNVFRTDNNSNTLLPLQVRSSLQLPAWGSDSLQDSWCTAAGTSRIDQDRSRMYDSLNMHSLENSLIDIMRAEHDPLKGRLSYPHPGTDNLLMLNGRSSLFPIDDGLLDDGHSDQVGVLNSPTCYSAHQNGE"]
idr_name = ["TDP43", "CPEB2"]
# Predict without naming the sequences
pm.predict_main(idr_list)
# Predict with named sequences
pm.predict_main(idr_list, idr_name)
# you can also use results_df = pm.predict_main(...) directly for further manipulation.
results_df = pm.predict_main(idr_list, idr_name)
print(results_df)- The function checks if the lengths of
idr_listandidr_namematch. If they don't, it raises aValueError. - Each element in
idr_namemust be a non-empty string. If any element doesn't meet this criterion, aValueErroris raised. - The results DataFrame is saved as
PM_analyse/PM_predict_result.csv.
The generate function uses a Variational Autoencoder (VAE) to generate new sequences based on a specified cluster. It normalizes, filters, and merges the generated sequences and saves them to a CSV file.
- cluster (str): The cluster name to use for generation. Must be one of
['0', 'polar', 'pos_neg', 'P', 'G', 'pos', 'aliphatic', 'neg', 'Q']. - epoch (int, optional): Number of generations to perform. Default is 20.
- overLap (int, optional): Overlap parameter for merging sequences. Default is 3.
- nomalize_threshold (float, optional): Normalization threshold to filter values. Default is 0.95.
- pd.DataFrame: DataFrame containing the generated sequences.
# For quick and easy use:
pm.generate('polar')
# For more detailed customization, you can use the following method:
result_df = generate(cluster='polar', epoch=30, overLap=5, nomalize_threshold=0.9)
# Print the resulting DataFrame
print(result_df)- The function raises a
ValueErrorif the providedclusteris not in the predefined list. - Generates sequences using a VAE model and filters them based on the normalization threshold.
- Merges sequences based on the specified overlap and retains sequences that match the target cluster.
- Saves the results to
PM_generate/generate_{cluster}.csv. If the file exists, it appends the data; otherwise, it creates a new file. - You can further manipulate the resulting DataFrame as needed.
If you use PhaseMotif in your research, please cite:
H. Yang, K. You, L. Ma, X. Wang, G. Pei, T. Li, Interpretable and generative deep learning models explicate phase separating intrinsically disordered motifs, Nat Commun 17 (2026) 2571. https://doi.org/10.1038/s41467-026-69252-z
@article{Yang2026PhaseMotif,
author = {Yang, H. and You, K. and Ma, L. and Wang, X. and Pei, G. and Li, T.},
title = {Interpretable and generative deep learning models explicate phase separating intrinsically disordered motifs},
journal = {Nature Communications},
volume = {17},
pages = {2571},
year = {2026},
doi = {10.1038/s41467-026-69252-z},
url = {https://doi.org/10.1038/s41467-026-69252-z}
}