This project implements a simple pipeline for training and evaluating a deep learning model to classify chest X-rays (CXRs) as positive or negative for pneumothorax.
A pneumothorax, commonly known as a collapsed lung, occurs when air enters the space between the chest wall and the lung (pleural space). This air accumulation can compress the lung, causing partial or complete collapse. Treatment typically involves air removal using a needle or chest tube inserted between the ribs.
Reference: Cleveland Clinic
Reference: Radiopaedia
The project utilizes three subsets sampled from the Emory CXR dataset:
- Data1: 2,256 images for training
- Data2: 2,256 images for training
- Data3: 553 images for external validation
All images are frontal view chest X-rays, pre-processed by resizing and padding.
Keras is a high-level neural network API, designed for ease of use and rapid prototyping of deep learning models. We are going to use it with Tensorflow backend.
For more information: https://keras.io/getting_started/
- Go to terminal:
-
Navigate to home directory:
cd ~ -
Clone the repository:
git clone https://github.com/f10409/Datathon24_CXR_Pipeline.git -
Navigate to the project folder:
cd ~/Datathon24_CXR_Pipeline -
Copy necessary files from FSX
cp /fsx/embed/summer-school-24/Datathon24_SummerSchool_CXR/Data_Info/Data*.csv ./
The project folder is organized as follows:
-
Jupyter Notebooks: A series of notebooks prefixed with numbers (e.g.,
1_1_Model_training.ipynb,1_2_Model_Evaluation.ipynb). The numbers indicate the recommended execution order. -
Dataset Information: Three CSV files named
Data1.csv,Data2.csv, andData3.csv, containing metadata and labels for the respective datasets. -
Results: A directory named
resultswhere model evaluation outputs are stored. -
Supplementary Material: A folder named
supplementarycontaining additional code and resources for those interested in deeper exploration of the project.
Datathon24_CXR_Pipeline/
│
├── 1_1_Model_training.ipynb
├── 1_2_Model_Evaluation.ipynb
├── 2_0_Data_Exploration.ipynb
│ ...
├── Data1.csv
├── Data2.csv
├── Data3.csv
│
├── results/
│ ├── Model1_External.csv
│ ├── Model1_TestSet.csv
│ └── ...
│
└── supplementary/
├── 1_a_Sample_biased_data.ipynb
├── 1_c_Preprocessing.ipynb
└── ...
If the existing enviroment doesn't have all the packages required, you can create a virtual environment.
To set up the project environment, follow these steps:
-
Create a new conda environment:
conda create -y -n Datathon24 python=3.10 -
Configure Shell
conda init bash -
Close the terminal and open it again
-
Activate the environment:
conda activate Datathon24 -
Navigate to the project folder:
cd ~/Datathon24_CXR_Pipeline -
Install TensorFlow and CUDA dependencies:
pip install -r requirements-tensorflow-cuda.txt -
Install other project dependencies:
pip install -r requirement.txt -
Install a custom Jupyter kernel named "Datathon24".
python -m ipykernel install --user --name Datathon24 --display-name "Datathon24" -
Restart the server






