Skip to content

zjlucam/ordinal-solvation-framework

Repository files navigation

Ordinal Solvation Framework (OSF)

Manuscript Python Code License Reproducibility

OSF overview

This repository implements the Ordinal Solvation Framework (OSF) for predicting polymer–solvent solvation behaviour. OSF models solvation behaviour at four resolutions, binary p(2), three-state p(3), four-state p(4), and six-state p(6), and integrates them through ordinal aggregation into a continuous interaction coordinate s, which can also be projected back onto the six-state solvation axis.

The repository is structured for:

  • inference on new polymer–solvent systems
  • evaluation using the included reproducibility bundle
  • reproduction of key model outputs

Installation

Clone the repository and create the environment:

git clone https://github.com/zjlucam/ordinal-solvation-framework.git
cd ordinal-solvation-framework
conda env create -f environment.yml
conda activate osf
python -m pip install -e .

If RDKit is not installed correctly through the environment file on your platform, install it separately:

conda install -c conda-forge rdkit

Quick start

Inference

Run OSF on the example input:

python scripts/predict_osf.py --checkpoint checkpoints/osf_pretrained.pt --input examples/public_inference_example.csv --output predictions.csv --device auto

This generates predictions.csv, including:

  • p2_pred
  • p3_pred
  • p4_pred
  • p6_pred
  • s_osf
  • osf_pred_6

Evaluation

Evaluate the pretrained checkpoint on the included processed bundle:

python scripts/evaluate_osf.py --checkpoint checkpoints/osf_pretrained.pt --bundle data/processed/osf_train_bundle.pt --metrics-out metrics.json --test-predictions-out test_predictions.csv

This generates:

  • metrics.json
  • test_predictions.csv

Evaluation reports compact paper-facing metrics for:

  • full
  • homopolymer
  • copolymer
  • binary_solvent

Each subset includes:

  • n
  • accuracy_p2
  • qwk_p2
  • adjacent_acc_p6
  • qwk_p6
  • adjacent_acc_osf
  • qwk_osf

Data and reproducibility

The repository includes a public reproducibility bundle containing the dataset used in the study:

data/processed/osf_train_bundle.pt

This bundle contains:

  • SMILES strings required to regenerate Morgan fingerprints
  • precomputed non-fingerprint features
  • labels
  • split indices
  • metadata and descriptors required for evaluation

The bundle reproduces the public inference and evaluation workflows included in this repository.

A small example workbook illustrating the structure of the original data format is also included at:

examples/data_schema_example.xlsx

This file is provided for orientation only and is not required for inference or evaluation.


Licensing

  • Code in this repository is released under the MIT License.
  • Non-code research artifacts, including the processed reproducibility bundle and pretrained checkpoint, are provided under the terms in LICENSE_DATA.md.

Included files

The repository includes:

  • source code under src/osf/
  • command-line scripts under scripts/
  • configuration files under configs/
  • example inputs under examples/
  • public reproducibility bundle at data/processed/osf_train_bundle.pt
  • pretrained checkpoint at checkpoints/osf_pretrained.pt

Example inputs

The examples/ directory contains small test files for:

  • single-solvent inference
  • copolymer inference
  • binary-solvent inference
  • unseen external inference

Example:

python scripts/predict_osf.py --checkpoint checkpoints/osf_pretrained.pt --input examples/binary_solvent_example.csv --output binary_predictions.csv --device auto

Optional smoke test

Windows (Anaconda Prompt)

run_scripts\smoke_test_public.bat

PowerShell

.\run_scripts\smoke_test_public.ps1

Expected outputs:

  • predictions.csv
  • metrics.json
  • test_predictions.csv

Repository structure

📦 ordinal-solvation-framework
 ┣ 📂assets             README figures and visual assets
 ┣ 📂checkpoints        Pretrained checkpoint and checkpoint notes
 ┣ 📂configs            Training and inference configuration files
 ┣ 📂data               Public reproducibility bundle and data notes
 ┣ 📂examples           Example CSV and workbook inputs for inference and schema reference
 ┣ 📂notebooks          Exploratory, figure-generation, and SI notebooks
 ┣ 📂run_scripts        Smoke tests and convenience run scripts
 ┣ 📂scripts            CLI entry points for training, inference, evaluation, and data export
 ┣ 📂src/osf
 ┃ ┣ 📂analysis         Ablation, runtime, SHAP, and error analysis
 ┃ ┣ 📂data             Dataset loading, labels, splits, and processed-bundle handling
 ┃ ┣ 📂features         Fingerprints, descriptors, scaling, and feature builders
 ┃ ┣ 📂inference        Prediction and external inference utilities
 ┃ ┣ 📂model            OSF architecture, ordinal aggregation, and losses
 ┃ ┣ 📂plotting         Plotting utilities for figures, dataset visualisation, and SI
 ┃ ┣ 📂training         Training, checkpointing, evaluation, and metrics
 ┃ ┗ 📜core utilities   Config, constants, paths, and I/O helpers
 ┣ 📂tests              Unit and pipeline tests
 ┣ 📜.gitignore
 ┣ 📜LICENSE
 ┣ 📜LICENSE_DATA.md
 ┣ 📜README.md
 ┣ 📜environment.yml
 ┣ 📜pyproject.toml
 ┗ 📜requirements.txt

Citation

If you use this code or model, please cite the associated manuscript.

About

Ordinal Solvation Framework (OSF) for predicting polymer–solvent solvation behaviour using multi-resolution ordinal modelling and continuous interaction coordinates.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors