HFDP is research software for predicting breast cancer pathologic complete response (pCR) from dynamic contrast‑enhanced MRI (DCE‑MRI) by modeling enhancement dynamics (time‑intensity curves) and learning a generative habitat factorization via diffusion‑inspired denoising reconstruction, optionally fused with clinical covariates.
HFDP is organized as a two-stage pipeline:
- Stage 1 (pretrain): learn a habitat decomposer that factorizes a DCE time series into K soft spatial habitats and K corresponding enhancement curves (curve shape + timing), trained via noise-conditioned denoising reconstruction (diffusion-inspired) + diagnostics.
- Stage 2 (downstream): freeze the decomposer, cache curve-dynamics features (and optional per-habitat tokens), then train a lightweight classifier and fusion head (cov-only / img-only / fused) to predict pCR.
Stage 3 (end-to-end finetuning) is planned but not implemented.
- Pre‑alpha: APIs/configs may change without warning.
- Not clinically validated: do not use for medical decision-making.
- No patient data included: you provide your own DCE volumes, masks, and clinical metadata.
pretrain.py: stage 1 habitat decomposer pretraining.train.py: stage 2 downstream pCR training (cached habitat features + covariate fusion).hfdp/: library code (data, models, training, utils).configs/: minimal example configs (seeconfigs/README.md).docs/: technical docs and diagrams.
Useful dataset/process notes:
- Duke whole-breast + HFDP preprocessing
- Duke whole-breast nnU-Net v2 runbook
- Whole-breast segmentation standardized preprocessing + training
x0:[T, Z, X, Y](DCE time series)times_sec:[T](acquisition times)key_padding_mask:[T](True = padded)breast_mask:[Z, X, Y]tumor_mask:[Z, X, Y]
git clone git@github.com:uchicago-dsi/hfdp.git
cd hfdp
git submodule update --init --recursiveIf you do not already have micromamba:
macOS (Homebrew)
brew install micromambaLinux (x86_64)
mkdir -p ~/.local/bin
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj -C ~/.local/bin --strip-components=1 bin/micromamba
export PATH="$HOME/.local/bin:$PATH"
# enable `micromamba activate` (restart your shell after this)
micromamba shell init -s bash -p ~/.micromambaFor other platforms, see the official micromamba installation docs.
Then create the recommended HFDP environment:
micromamba env create -f environment.yml
micromamba activate hfdp
python -m pip install --no-build-isolation -r requirements-dev.txt
python -m pip install -e .If ffmpeg -version fails inside the env with
libopenh264.so.5: cannot open shared object file, repair the env-local ABI
link once:
ln -sf "$CONDA_PREFIX/lib/libopenh264.so.2.1.1" "$CONDA_PREFIX/lib/libopenh264.so.5"environment.yml is the default Linux + NVIDIA recipe used for HFDP work in
this repo. It installs:
- Python 3.11
- PyTorch + CUDA runtime
ffmpegformask_debugoverlay movies- the PyRadiomics build prerequisites needed for a clean micromamba install
PYTHONNOUSERSITE=1inside the env so~/.localpackages do not leak in
The explicit pip install --no-build-isolation -r requirements-dev.txt step is
intentional: pyradiomics==3.0.1 needs versioneer and the in-env numpy
visible at build time, which standard isolated builds do not provide.
Verify the editable path points at this checkout:
python - <<'PY'
import hfdp
print(hfdp.__file__)
PYThis environment.yml path is the supported install flow for this repo; do not
expect a raw pip install -r requirements*.txt install to reproduce the same
environment.
For one-shot commands without activation, keep the same isolation explicitly:
PYTHONNOUSERSITE=1 micromamba run -n hfdp python train.py --config <yaml>If you need a CPU-only or non-NVIDIA setup, keep the same editable-install step
but swap the PyTorch lines in environment.yml for the appropriate packages for
your platform.
data:
mode: breast_volume
slice_cache:
intensity_normalization: per_exam_minmax
enforce_left_on_left: true
pretrain:
training:
max_epochs: 10
decomposer:
enabled: true
k_habitats: 8
target_grid_zyx: [96, 144, 144]
input_representation: delta_t0Before running, edit the example configs under configs/ to point at your data:
data.paths.dataset_root(required)data.paths.mask_dirsanddata.paths.breast_mask_dirs(required)
Stage 1 (habitat decomposer pretraining):
python pretrain.py --config configs/pretrain/habitat_decomposer_mvp.yaml --debugStage 2 (downstream fusion head):
python train.py --config configs/downstream/habitat_fusion_baseline.yaml --debugThe foreman owns queued experiment selection and normal Slurm throughput. It
materializes launches through run_with_submitit.py, which remains the
job-level submitit adapter for script/config/fold execution and CV aggregation.
Use run_with_submitit.py directly only for one-off debug or intentionally
manual launches.
For downstream-only radiomics runs, you can execute all CV folds inside one Slurm slot without requesting GPUs:
python run_with_submitit.py \
--cpu-only \
--single-job-cv \
--parallel-folds-per-job 5 \
--folds all \
--cpus_per_task 32 \
--mem_gb_per_gpu 192 \
--partition general \
--timeout 720 \
--constraint "a100|h100|h200" \
--name <run_name> \
train.py --config <config.yaml>This mode is designed for slot-limited sweeps. Tune data.num_workers per
config to avoid CPU oversubscription when multiple folds run concurrently.
For radiomics backends, also set train.cache.radiomics_exam_shards (usually
equal to --parallel-folds-per-job) so folds cooperatively build exam-cache
shards instead of waiting on a single cache lock.
Set the ground-truth column via data.paths.label_column (defaults to pcr):
data:
paths:
label_column: pcrThis repo ships a CITATION.cff file for GitHub’s citation UI.
MIT (see LICENSE).