Harnessing homolytic bond energetics to steer inverse radical design

Abstract and workflow

Homolytic bond dissociation energy (BDE) governs radical formation and bond-breaking thermodynamics, yet in molecular design it is typically evaluated only after candidate structures are proposed. Here we treat BDE as a continuous generative design coordinate rather than a post hoc descriptor, enabling distributional control over bond strength at the regime level rather than exact set-point attainment. A BDE-conditioned transformer generates radical fragment pairs steered toward prescribed single-bond strength regimes (50-130 kcal·mol⁻¹), achieving monotonic, rank-resolved energetic control with 81-94% validity and 84-92% novelty across targets. At the screening level, an MPNN screener trained on the same data source provides high-throughput energetic ranking consistent with regime-level steering. Same-functional M06-2X/def2-TZVP recalculation on randomly sampled novel generations yields target-attainment MAE of 7.7-12.7 kcal·mol⁻¹ consistent with regime-level steering, isolating generative accuracy from inter-functional offsets. Separate cross-functional evaluation at the ωB97X-D3BJ/def2-TZVP level shows preserved energetic ordering (DFT-calculated means spanning ~64 to ~119 kcal·mol⁻¹), indicating that the learned BDE axis reflects transferable thermodynamic structure rather than functional-specific artifacts. Spin-delocalization analysis further reveals a statistically significant correlation between radical localization and bond strength across the DFT-validated set. More broadly, reaction-relevant thermodynamic quantities may serve as primary generative coordinates, with PFAS-relevant C-F bond strengths motivating extension into higher-energy regimes.

Training Data (BDE labels + SMILES)
        │
        ├──► Train Generator (GPT-2 + FiLM conditioning)
        └──► Train Screener  (MPNN Graph Neural Network)
                    │
          Generate novel radical pairs
          at target BDE values (55–125 kcal/mol)
                    │
          Screen & filter with GNN predictions
                    │
          Sample novel candidates for DFT
                    │
          Write ORCA input scripts
          (Geometry Relax → SCF → Freq)
                    │
          Calculate DFT BDEs & compare to model

Repository Structure

Code/
├── data/                              # Datasets
│   ├── bde_rdf_with_multi_halo_cfc_model_3.pkl.zip # Main training data (use Pandas ".read_pickle" to read it)
│   ├── test_set_exp_bdes.csv                       # Experimental BDE test set
│   └── test_set_pfas.csv                           # PFAS test set
│
├── datasets/                          # PyTorch dataset classes
│   ├── generator.py                   # RadicalPairDataset (tokenized SMILES for GPT)
│   └── screener.py                    # BDEFragmentDataset (molecular graphs for GNN)
│
├── models/                            # Neural network architectures
│   ├── generator.py                   # BDEConditionedGPT (GPT-2 + FiLM conditioning)
│   └── screener.py                    # BDEFragmentModel (MPNN encoder)
│
├── training/                          # Training entry points
│   ├── generator.py                   # Train the generator
│   └── screener.py                    # Train the screener
│
├── design_and_evaluation/             # Generation, filtering, and analysis
│   ├── generate.py                    # Sample novel pairs from trained generator
│   ├── combine_radicals.py            # Bond two radical fragments into a parent molecule
│   ├── screener_prediction.py         # Predict BDE for generated pairs using screener
│   ├── evaluate_generated_pairs.py    # Validity / uniqueness / novelty / Tanimoto metrics
│   ├── sample_examples_for_dft.py     # Select novel candidates for DFT validation
│   └── calculate_dft_bde.py           # Parse ORCA outputs and compute DFT BDEs
│
└── write_orca_scripts/                # Quantum chemistry input generation
    ├── generate_relax_scripts.py      # SMILES → 3D coords → ORCA geometry relaxation inputs
    └── generate_scf_freq_scripts.py   # Optimized geometry → ORCA SCF + frequency inputs
│
└── generated_structures/              # Radical pairs generated after model training
    ├── generated_candidates_full_list.csv   # Full list with only the screener-predicted BDEs
    └── dft_candidates_with_dft_predicted_bde.csv  # Randomly-sampled list from the full list for dft validation

Models

Generator — `BDEConditionedGPT`

A GPT-2 language model conditioned on a target BDE value using FiLM (Feature-wise Linear Modulation). Given a target BDE, it autoregressively generates a SMILES string encoding a fragment1 <SEP> fragment2 radical pair.

Tokenizer: ChemBERTa
Architecture: GPT-2 (n_embd=256, n_layer=8, n_head=8)
Training: 10 epochs, AdamW, batch size 128

Screener — `BDEFragmentModel`

A message-passing neural network (MPNN) that predicts BDE from molecular graphs. It encodes the parent molecule and both fragments separately, then computes:

BDE = E(fragment1) + E(fragment2) − E(parent)

Node features: atomic number, degree, formal charge, aromaticity, ring membership, H count
Edge features: bond type, conjugation, ring membership
Training: 50 epochs, Adam (lr=5e-4), L1 loss, ReduceLROnPlateau

Data Format

The main dataset (data/bde_rdf_with_multi_halo_cfc_model_3.csv) contains:

Column	Description
`molecule`	Parent molecule SMILES
`fragment1`	First radical fragment SMILES
`fragment2`	Second radical fragment SMILES
`bond_type`	Type of cleaved bond (e.g., C-F, C-Cl)
`bde`	Bond dissociation energy (kcal/mol)
`set`	`train`, `valid`, or `test` split

Usage

1. Train the models

# Train the GNN screener
python -m training.screener

# Train the GPT generator
python -m training.generator

2. Generate novel radical pairs

python -m design_and_evaluation.generate

Generates 1000 fragment pairs at each of 8 target BDE values: 55, 65, 75, 85, 95, 105, 115, 125 kcal/mol.

3. Screen and filter

python -m design_and_evaluation.screener_prediction

4. Evaluate generated pairs

python -m design_and_evaluation.evaluate_generated_pairs

Reports validity, uniqueness, novelty, and Tanimoto similarity to training data.

5. Sample candidates for DFT

python -m design_and_evaluation.sample_examples_for_dft

Selects up to 20 novel pairs per BDE target and reconstructs parent molecules.

6. Write ORCA input scripts

# Step 1: Geometry relaxation
python -m write_orca_scripts.generate_relax_scripts

# Step 2: SCF + frequency calculations (after relaxation completes)
python -m write_orca_scripts.generate_scf_freq_scripts

7. Calculate DFT BDEs

python -m design_and_evaluation.calculate_dft_bde

Parses ORCA output files and computes BDE from SCF energies and zero-point energies (ZPE), converting from Hartree to kcal/mol (×627.509).

Libraries and computational dependencies

PyTorch + PyTorch Geometric — model training and GNN
Transformers (HuggingFace) — GPT-2 and ChemBERTa tokenizer
RDKit — cheminformatics (SMILES parsing, 3D coordinate generation)
pandas, numpy — data handling
tqdm — progress bars
ORCA — external quantum chemistry package for DFT calculations

Citation

@article{SheshanarayanaRadGen2026,
  author    = {R. Sheshanarayana and Fengqi You},
  title     = {Harnessing homolytic bond energetics to steer inverse radical design},
  journal   = {insert after publication},
  year      = {insert after publication},
  volume    = {insert after publication},
  pages     = {insert after publication},
  doi       = {insert after publication}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harnessing homolytic bond energetics to steer inverse radical design

Abstract and workflow

Repository Structure

Models

Generator — `BDEConditionedGPT`

Screener — `BDEFragmentModel`

Data Format

Usage

1. Train the models

2. Generate novel radical pairs

3. Screen and filter

4. Evaluate generated pairs

5. Sample candidates for DFT

6. Write ORCA input scripts

7. Calculate DFT BDEs

Libraries and computational dependencies

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
datasets		datasets
design_and_evaluation		design_and_evaluation
generated_structures		generated_structures
models		models
training		training
write_orca_scripts		write_orca_scripts
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Harnessing homolytic bond energetics to steer inverse radical design

Abstract and workflow

Repository Structure

Models

Generator — BDEConditionedGPT

Screener — BDEFragmentModel

Data Format

Usage

1. Train the models

2. Generate novel radical pairs

3. Screen and filter

4. Evaluate generated pairs

5. Sample candidates for DFT

6. Write ORCA input scripts

7. Calculate DFT BDEs

Libraries and computational dependencies

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Generator — `BDEConditionedGPT`

Screener — `BDEFragmentModel`

Packages