Early detection saves lives. This repo contains an AI system for early-stage lung cancer detection from CT scans.
We train a hybrid CNN with strong pre-processing and data augmentation to perform well even on limited datasets, aiming to support clinicians with reliable triage signals.
Lung cancer remains among the world’s most prevalent cancers, where early identification dramatically improves outcomes.
This project proposes a hybrid CNN pipeline that predicts lung disease from CT images using:
- Targeted augmentations (rotation, shift, zoom, flip) to combat data scarcity,
- Image pre-processing (normalization/resizing, artifact-safe transforms),
- Hybrid architecture (feature fusion across complementary CNN backbones).
The approach achieves competitive accuracy/recall on a small dataset and is designed to be reproducible on Colab or local GPUs.
- Lung_Cancer_Detection.pdf # Full research write-up
- models_training.ipynb # End-to-end training pipeline
- performance_comparison.ipynb # Baselines vs. hybrid model
- propose_hybrid_model.ipynb # Hybrid architecture details
- README.md
📄 For methodology, experiments, and metrics, see the PDF paper.
- Python 3.9+
- Google Colab or Jupyter Notebook
- TensorFlow (>= 2.11) / Keras
- OpenCV
- Matplotlib
- scikit-learn
- Pandas
- NumPy
- pickle-mixin (for storing metrics/artifacts)
Install in one go:
pip install "tensorflow>=2.11" keras opencv-python matplotlib scikit-learn pandas numpy pickle-mixin💡 Use the TensorFlow variant compatible with your hardware (e.g., tensorflow-metal on macOS M-series chips).
- Architecture: Hybrid CNN combining DenseNet169 and MobileNet backbones.
- Input: Chest CT scan images (resized to 224×224).
- Pre-processing: Image normalization, rotation, shift, and zoom augmentations.
- Training: 50 epochs, Adam optimizer, categorical cross-entropy loss.
- Metrics: Accuracy, Precision, Recall, F1-score, and Confusion Matrix.
The hybrid approach enhances both feature extraction and generalization, achieving balanced precision and recall — a vital factor in medical diagnostics.
- Open
models_training.ipynbin Google Colab. - Mount your Google Drive and load the dataset.
- Execute all cells sequentially to train and evaluate the model.
cd Lung-Cancer-Detection-Using-AI-Based-Hybrid-CNN-Models⚙️ Install Dependencies and Launch Jupyter Notebook
pip install -r requirements.txt
jupyter notebookThen open and run models_training.ipynb.
Organize your dataset as follows before training:
dataset/
├── train/
│ ├── Adenocarcinoma/
│ ├── Large_Cell/
│ ├── Squamous_Cell/
│ └── Normal/
├── val/
│ └── (same class folders)
└── test/
└── (same class folders)
🗂️ Update dataset paths in the notebooks if your directory structure differs (e.g., local vs. Colab).- Baselines tested: DenseNet, MobileNet, InceptionV3, Xception, VGG19, ResNet50, and EfficientNetB4
- Proposed model: Hybrid of DenseNet169 + MobileNet
| Metric | Score |
|---|---|
| Accuracy | 87.30% |
| Recall | 1.00 (perfect sensitivity) |
| Loss | 0.3445 (lowest among baselines) |
📈 Visualizations such as training curves and confusion matrices are included in the notebooks and detailed in the research paper.
- Set consistent random seeds (
tf.random.set_seed,np.random.seed) for reproducibility. - Maintain moderate augmentations to prevent label drift.
- Use class weights to manage data imbalance.
- Prioritize recall — missing a cancer case (false negative) can be critical in screening contexts.
The following outputs are automatically generated during training:
- ✅ Model weights (
.h5or.pkl) - 📈 Accuracy and loss plots
- 🧩 Confusion matrix
- 📊 Metrics dictionary (
.pkl, viapickle-mixin)
- Integrate 3D CT volume analysis
- Add calibrated probability estimation
- Implement Test-Time Augmentation (TTA)
- Deploy a lightweight TensorFlow Lite version for real-world medical use