HFDP — Habitat Factorized Dynamics-derived Phenotypes

HFDP is research software for predicting breast cancer pathologic complete response (pCR) from dynamic contrast‑enhanced MRI (DCE‑MRI) by modeling enhancement dynamics (time‑intensity curves) and learning a generative habitat factorization via diffusion‑inspired denoising reconstruction, optionally fused with clinical covariates.

HFDP is organized as a two-stage pipeline:

Stage 1 (pretrain): learn a habitat decomposer that factorizes a DCE time series into K soft spatial habitats and K corresponding enhancement curves (curve shape + timing), trained via noise-conditioned denoising reconstruction (diffusion-inspired) + diagnostics.
Stage 2 (downstream): freeze the decomposer, cache curve-dynamics features (and optional per-habitat tokens), then train a lightweight classifier and fusion head (cov-only / img-only / fused) to predict pCR.

Stage 3 (end-to-end finetuning) is planned but not implemented.

Project status

Pre‑alpha: APIs/configs may change without warning.
Not clinically validated: do not use for medical decision-making.
No patient data included: you provide your own DCE volumes, masks, and clinical metadata.

Repository layout

pretrain.py: stage 1 habitat decomposer pretraining.
train.py: stage 2 downstream pCR training (cached habitat features + covariate fusion).
hfdp/: library code (data, models, training, utils).
configs/: minimal example configs (see configs/README.md).
docs/: technical docs and diagrams.

Useful dataset/process notes:

Core tensors (stage 1)

x0: [T, Z, X, Y] (DCE time series)
times_sec: [T] (acquisition times)
key_padding_mask: [T] (True = padded)
breast_mask: [Z, X, Y]
tumor_mask: [Z, X, Y]

Installation

git clone git@github.com:uchicago-dsi/hfdp.git
cd hfdp
git submodule update --init --recursive

Install micromamba (optional)

If you do not already have micromamba:

macOS (Homebrew)

brew install micromamba

Linux (x86_64)

mkdir -p ~/.local/bin
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj -C ~/.local/bin --strip-components=1 bin/micromamba
export PATH="$HOME/.local/bin:$PATH"

# enable `micromamba activate` (restart your shell after this)
micromamba shell init -s bash -p ~/.micromamba

For other platforms, see the official micromamba installation docs.

Then create the recommended HFDP environment:

micromamba env create -f environment.yml
micromamba activate hfdp
python -m pip install --no-build-isolation -r requirements-dev.txt
python -m pip install -e .

If ffmpeg -version fails inside the env with libopenh264.so.5: cannot open shared object file, repair the env-local ABI link once:

ln -sf "$CONDA_PREFIX/lib/libopenh264.so.2.1.1" "$CONDA_PREFIX/lib/libopenh264.so.5"

environment.yml is the default Linux + NVIDIA recipe used for HFDP work in this repo. It installs:

Python 3.11
PyTorch + CUDA runtime
ffmpeg for mask_debug overlay movies
the PyRadiomics build prerequisites needed for a clean micromamba install
PYTHONNOUSERSITE=1 inside the env so ~/.local packages do not leak in

The explicit pip install --no-build-isolation -r requirements-dev.txt step is intentional: pyradiomics==3.0.1 needs versioneer and the in-env numpy visible at build time, which standard isolated builds do not provide.

Verify the editable path points at this checkout:

python - <<'PY'
import hfdp
print(hfdp.__file__)
PY

This environment.yml path is the supported install flow for this repo; do not expect a raw pip install -r requirements*.txt install to reproduce the same environment.

For one-shot commands without activation, keep the same isolation explicitly:

PYTHONNOUSERSITE=1 micromamba run -n hfdp python train.py --config <yaml>

If you need a CPU-only or non-NVIDIA setup, keep the same editable-install step but swap the PyTorch lines in environment.yml for the appropriate packages for your platform.

Config quickstart

data:
  mode: breast_volume
  slice_cache:
    intensity_normalization: per_exam_minmax
    enforce_left_on_left: true
pretrain:
  training:
    max_epochs: 10
  decomposer:
    enabled: true
    k_habitats: 8
    target_grid_zyx: [96, 144, 144]
    input_representation: delta_t0

Before running, edit the example configs under configs/ to point at your data:

data.paths.dataset_root (required)
data.paths.mask_dirs and data.paths.breast_mask_dirs (required)

Quickstart (debug)

Stage 1 (habitat decomposer pretraining):

python pretrain.py --config configs/pretrain/habitat_decomposer_mvp.yaml --debug

Stage 2 (downstream fusion head):

python train.py --config configs/downstream/habitat_fusion_baseline.yaml --debug

Submitit: single-slot CPU CV

The foreman owns queued experiment selection and normal Slurm throughput. It materializes launches through run_with_submitit.py, which remains the job-level submitit adapter for script/config/fold execution and CV aggregation. Use run_with_submitit.py directly only for one-off debug or intentionally manual launches.

For downstream-only radiomics runs, you can execute all CV folds inside one Slurm slot without requesting GPUs:

python run_with_submitit.py \
  --cpu-only \
  --single-job-cv \
  --parallel-folds-per-job 5 \
  --folds all \
  --cpus_per_task 32 \
  --mem_gb_per_gpu 192 \
  --partition general \
  --timeout 720 \
  --constraint "a100|h100|h200" \
  --name <run_name> \
  train.py --config <config.yaml>

This mode is designed for slot-limited sweeps. Tune data.num_workers per config to avoid CPU oversubscription when multiple folds run concurrently. For radiomics backends, also set train.cache.radiomics_exam_shards (usually equal to --parallel-folds-per-job) so folds cooperatively build exam-cache shards instead of waiting on a single cache lock.

Clinical labels

Set the ground-truth column via data.paths.label_column (defaults to pcr):

data:
  paths:
    label_column: pcr

Citation

This repo ships a CITATION.cff file for GitHub’s citation UI.

License

MIT (see LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 743 Commits
configs		configs
docs		docs
external		external
hfdp		hfdp
scripts		scripts
state/manual_submissions		state/manual_submissions
systemd/user		systemd/user
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
DATA.md		DATA.md
HANDOFF.md		HANDOFF.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
gpuinfo		gpuinfo
meeting.txt		meeting.txt
pretrain.py		pretrain.py
pretrain_dce_slice_ddpm.py		pretrain_dce_slice_ddpm.py
pretrain_dce_slice_kinetic_maps.py		pretrain_dce_slice_kinetic_maps.py
qs.txt		qs.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
review.md		review.md
run_with_submitit.py		run_with_submitit.py
sage_todo.txt		sage_todo.txt
todo.txt		todo.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HFDP — Habitat Factorized Dynamics-derived Phenotypes

Project status

Repository layout

Core tensors (stage 1)

Installation

Install micromamba (optional)

Config quickstart

Quickstart (debug)

Submitit: single-slot CPU CV

Clinical labels

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HFDP — Habitat Factorized Dynamics-derived Phenotypes

Project status

Repository layout

Core tensors (stage 1)

Installation

Install micromamba (optional)

Config quickstart

Quickstart (debug)

Submitit: single-slot CPU CV

Clinical labels

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages