This repository contains Jupyter notebooks for extracting deep learning features from medical images (specifically chest X-rays) using a pre-trained RADDINO model.
-
Global Feature Extraction (
Global_feature_extraction.ipynb) - Extracts global image features using the RADDINO model. -
Patch Feature Extraction (
Patch_feature_extraction.ipynb) - Extracts patch-level features from medical images using the RADDINO model.
Both notebooks implement an efficient pipeline leveraging:
- PyTorch and PyTorch Lightning for deep learning operations
- MONAI (Medical Open Network for AI) for medical image-specific processing
- Parallel execution and persistent caching for performance optimization
- GPU acceleration for faster processing
Dependencies are managed via uv, with a lock file included to ensure reproducible environments.
# Install uv if you haven't already
pip install uv
# Clone the repository
git clone https://github.com/f10409/RAD-DINO_Embedding_Extractor.git
cd RAD-DINO_Embedding_Extractor
# Create a virtual environment and install dependencies using uv
uv sync- Update the
BASE_PATHvariable in theget_data_dict_part()function - Configure parameters like cache settings, batch size, and GPU selection
- Provide a CSV file with image paths in the
ImagePathcolumn - Run the notebook to extract and save features to the specified output directory
- Data loading from CSV containing image paths
- Medical image-specific preprocessing
- Dataset preparation with persistent caching
- Model initialization
- Validation through visual spot-checking
- Feature extraction using PyTorch Lightning
The extracted features are saved to disk for downstream tasks like classification or clustering.