Skip to content

PrachiJainxD/Lung-Cancer-Detection-Using-Hybrid-CNN-Models

Repository files navigation

Lung Cancer Detection Using Hybrid CNN Models

Early detection saves lives. This repo contains an AI system for early-stage lung cancer detection from CT scans.
We train a hybrid CNN with strong pre-processing and data augmentation to perform well even on limited datasets, aiming to support clinicians with reliable triage signals.


🔍 Project Abstract

Lung cancer remains among the world’s most prevalent cancers, where early identification dramatically improves outcomes.
This project proposes a hybrid CNN pipeline that predicts lung disease from CT images using:

  • Targeted augmentations (rotation, shift, zoom, flip) to combat data scarcity,
  • Image pre-processing (normalization/resizing, artifact-safe transforms),
  • Hybrid architecture (feature fusion across complementary CNN backbones).

The approach achieves competitive accuracy/recall on a small dataset and is designed to be reproducible on Colab or local GPUs.


📦 Repository Contents

  • Lung_Cancer_Detection.pdf # Full research write-up
  • models_training.ipynb # End-to-end training pipeline
  • performance_comparison.ipynb # Baselines vs. hybrid model
  • propose_hybrid_model.ipynb # Hybrid architecture details
  • README.md

📄 For methodology, experiments, and metrics, see the PDF paper.


⚙️ Prerequisites

  • Python 3.9+
  • Google Colab or Jupyter Notebook
  • TensorFlow (>= 2.11) / Keras
  • OpenCV
  • Matplotlib
  • scikit-learn
  • Pandas
  • NumPy
  • pickle-mixin (for storing metrics/artifacts)

Install in one go:

pip install "tensorflow>=2.11" keras opencv-python matplotlib scikit-learn pandas numpy pickle-mixin

💡 Use the TensorFlow variant compatible with your hardware (e.g., tensorflow-metal on macOS M-series chips).

🧠 Model Overview

  • Architecture: Hybrid CNN combining DenseNet169 and MobileNet backbones.
  • Input: Chest CT scan images (resized to 224×224).
  • Pre-processing: Image normalization, rotation, shift, and zoom augmentations.
  • Training: 50 epochs, Adam optimizer, categorical cross-entropy loss.
  • Metrics: Accuracy, Precision, Recall, F1-score, and Confusion Matrix.

The hybrid approach enhances both feature extraction and generalization, achieving balanced precision and recall — a vital factor in medical diagnostics.


🚀 Quick Start

Run on Google Colab (Recommended)

  1. Open models_training.ipynb in Google Colab.
  2. Mount your Google Drive and load the dataset.
  3. Execute all cells sequentially to train and evaluate the model.

📍 Navigate to the Project Folder

cd Lung-Cancer-Detection-Using-AI-Based-Hybrid-CNN-Models

⚙️ Install Dependencies and Launch Jupyter Notebook

pip install -r requirements.txt
jupyter notebook

Then open and run models_training.ipynb.

📁 Dataset Structure

Organize your dataset as follows before training:

dataset/
├── train/
│   ├── Adenocarcinoma/
│   ├── Large_Cell/
│   ├── Squamous_Cell/
│   └── Normal/
├── val/
│   └── (same class folders)
└── test/
    └── (same class folders)
🗂️ Update dataset paths in the notebooks if your directory structure differs (e.g., local vs. Colab).

📊 Experiments & Results

  • Baselines tested: DenseNet, MobileNet, InceptionV3, Xception, VGG19, ResNet50, and EfficientNetB4
  • Proposed model: Hybrid of DenseNet169 + MobileNet

🧾 Performance Highlights

Metric Score
Accuracy 87.30%
Recall 1.00 (perfect sensitivity)
Loss 0.3445 (lowest among baselines)

📈 Visualizations such as training curves and confusion matrices are included in the notebooks and detailed in the research paper.


🧪 Reproducibility Tips

  • Set consistent random seeds (tf.random.set_seed, np.random.seed) for reproducibility.
  • Maintain moderate augmentations to prevent label drift.
  • Use class weights to manage data imbalance.
  • Prioritize recall — missing a cancer case (false negative) can be critical in screening contexts.

🧱 Saved Artifacts

The following outputs are automatically generated during training:

  • ✅ Model weights (.h5 or .pkl)
  • 📈 Accuracy and loss plots
  • 🧩 Confusion matrix
  • 📊 Metrics dictionary (.pkl, via pickle-mixin)

📝 Future Enhancements

  • Integrate 3D CT volume analysis
  • Add calibrated probability estimation
  • Implement Test-Time Augmentation (TTA)
  • Deploy a lightweight TensorFlow Lite version for real-world medical use

About

Lung Cancer Detection Using Hybrid CNN Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published