Code for the paper "High-Throughput DNA melt measurements enable improved models of DNA folding thermodynamics" https://doi.org/10.1101/2024.01.08.574731. "NNN" stands for "Not-Nearest-Neighbor"
Jupyter notebooks 01.1 to 01.5 correspond to the 5 main figures.
Notebook 01.0_DataPrep.ipynb
performs data cleaning and train-val-test split from the output of the preprocessing pipeline.
Functions used for generation of figures are defined in nnn/
.
Three major conda environments were used:
- `nnn.yml` most analysis in the repository
- `torch.yml` for training and running graph neural networks
- `nn_train.yml` for fitting and running linear regression models.
Also available as a singularity container as defined in `nn_train.def`.
To install, make sure conda
is already installed, then run conda env create -f {path/to/yml/file/name}
. For example, conda env create -f envs/nnn.yml
.
The local conda environment nnn
is also directly exported to envs/nnn_environment.yml
. It is provide for record keeping, and envs/nnn.yml
is still recommended for installing from scratch.
Note for container users: If you're running this in a minimal linux container, you may need to manually install system build tools beforehand. For example,
sudo apt install build-essential
.
The RiboGraphViz
package is required for some visualization tasks.
By default, it is not directly available on PyPI or conda. To install, you can either:
Option 1 (recommended by original authors):
Clone and install manually, following the instructions at
https://github.com/DasLab/RiboGraphViz
git clone https://github.com/DasLab/RiboGraphViz.git
cd RiboGraphViz
pip install .
Option 2 (alternative): Install directly via pip with dependencies:
pip install networkx matplotlib seaborn git+https://github.com/DasLab/RiboGraphViz
Note: While the RiboGraphViz developers have expressed interest in adding the package to PyPI, this has not yet been done as of this writing.
NUPACK4 (v4.0.0.27) was manually installed from file as it requires a free licence for download (https://docs.nupack.org/). As of Oct. 2024, NUPACK is temporarily free for academic personal users but may need paid subscription in the future.
- Register a new account to get an academic licence. Verify your email.
- After logging in and acknowledging the licence, you will find download links at https://www.nupack.org/download/software. Click
NUPACK 4.0.0.28
to download the zip file. - Unzip the zip file. Go to the directory
nupack-4.0.0.28/package
. - Choose one of the four
cp38
wheel files depending on your operating system. For example,nupack-4.0.0.28-cp38-cp38-macosx_10_13_x86_64.whl
on macos. - Run
conda activate nnn
to activate thennn
environment. - Run
pip install {path/to/your/nupack/whl/file}
to install the NUPACK python module.
The parameter estimation process could be replicated by following these steps:
-
Prepare the environment. Install the conda environment specified in
envs/nn_train.yml
withconda env create -f envs/nn_train.yml
. This yaml file specifies the required packages and their versions.a. Alternatively, use a singularity container. The build file of this singularity container is
envs/nn_train.def
. -
Activate the environment.
conda activate nn_train
-
Run the script. Enter
python run_nn_train.py
in the command line. You may also submit it as a job, usingrun_nn_train.sh
as a template (you will need to modify the slurm settings). Inrun_nn_train.py
, edit theconfig
dictionary to run models with different settings; edit themyrange
list to change the percentage of training data used for the plots.a. Alternatively, run the notebook interactively. Launch
jupyter lab
and run the notebook03.2_TrainNN.ipynb
in thennn_paper
repository.b. Note that in either script or notebook settings, you will be prompted to login to
wandb
to log the model training runs. This helps to keep track of models trained with different settings.
Python scripts in scripts/
generates the sequences in the variant library and are helpful to understand library design logics.
Run gnn_run.py
in torch
envoronment, pointing to the path of the saved model state dict file.
For any questions, contact Yuxi Ke ([email protected]).
Jan. 2024, updated Oct. 2024