Lung Cancer Detection Using Hybrid CNN Models

Early detection saves lives. This repo contains an AI system for early-stage lung cancer detection from CT scans.
We train a hybrid CNN with strong pre-processing and data augmentation to perform well even on limited datasets, aiming to support clinicians with reliable triage signals.

🔍 Project Abstract

Lung cancer remains among the world’s most prevalent cancers, where early identification dramatically improves outcomes.
This project proposes a hybrid CNN pipeline that predicts lung disease from CT images using:

Targeted augmentations (rotation, shift, zoom, flip) to combat data scarcity,
Image pre-processing (normalization/resizing, artifact-safe transforms),
Hybrid architecture (feature fusion across complementary CNN backbones).

The approach achieves competitive accuracy/recall on a small dataset and is designed to be reproducible on Colab or local GPUs.

📦 Repository Contents

Lung_Cancer_Detection.pdf # Full research write-up
models_training.ipynb # End-to-end training pipeline
performance_comparison.ipynb # Baselines vs. hybrid model
propose_hybrid_model.ipynb # Hybrid architecture details
README.md

📄 For methodology, experiments, and metrics, see the PDF paper.

⚙️ Prerequisites

Python 3.9+
Google Colab or Jupyter Notebook
TensorFlow (>= 2.11) / Keras
OpenCV
Matplotlib
scikit-learn
Pandas
NumPy
pickle-mixin (for storing metrics/artifacts)

Install in one go:

pip install "tensorflow>=2.11" keras opencv-python matplotlib scikit-learn pandas numpy pickle-mixin

💡 Use the TensorFlow variant compatible with your hardware (e.g., tensorflow-metal on macOS M-series chips).

🧠 Model Overview

Architecture: Hybrid CNN combining DenseNet169 and MobileNet backbones.
Input: Chest CT scan images (resized to 224×224).
Pre-processing: Image normalization, rotation, shift, and zoom augmentations.
Training: 50 epochs, Adam optimizer, categorical cross-entropy loss.
Metrics: Accuracy, Precision, Recall, F1-score, and Confusion Matrix.

The hybrid approach enhances both feature extraction and generalization, achieving balanced precision and recall — a vital factor in medical diagnostics.

🚀 Quick Start

Run on Google Colab (Recommended)

Open models_training.ipynb in Google Colab.
Mount your Google Drive and load the dataset.
Execute all cells sequentially to train and evaluate the model.

📍 Navigate to the Project Folder

cd Lung-Cancer-Detection-Using-AI-Based-Hybrid-CNN-Models

⚙️ Install Dependencies and Launch Jupyter Notebook

pip install -r requirements.txt
jupyter notebook

Then open and run models_training.ipynb.

📁 Dataset Structure

Organize your dataset as follows before training:

dataset/
├── train/
│   ├── Adenocarcinoma/
│   ├── Large_Cell/
│   ├── Squamous_Cell/
│   └── Normal/
├── val/
│   └── (same class folders)
└── test/
    └── (same class folders)
🗂️ Update dataset paths in the notebooks if your directory structure differs (e.g., local vs. Colab).

📊 Experiments & Results

Baselines tested: DenseNet, MobileNet, InceptionV3, Xception, VGG19, ResNet50, and EfficientNetB4
Proposed model: Hybrid of DenseNet169 + MobileNet

🧾 Performance Highlights

Metric	Score
Accuracy	87.30%
Recall	1.00 (perfect sensitivity)
Loss	0.3445 (lowest among baselines)

📈 Visualizations such as training curves and confusion matrices are included in the notebooks and detailed in the research paper.

🧪 Reproducibility Tips

Set consistent random seeds (tf.random.set_seed, np.random.seed) for reproducibility.
Maintain moderate augmentations to prevent label drift.
Use class weights to manage data imbalance.
Prioritize recall — missing a cancer case (false negative) can be critical in screening contexts.

🧱 Saved Artifacts

The following outputs are automatically generated during training:

✅ Model weights (.h5 or .pkl)
📈 Accuracy and loss plots
🧩 Confusion matrix
📊 Metrics dictionary (.pkl, via pickle-mixin)

📝 Future Enhancements

Integrate 3D CT volume analysis
Add calibrated probability estimation
Implement Test-Time Augmentation (TTA)
Deploy a lightweight TensorFlow Lite version for real-world medical use

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
Lung_Cancer_Detection.pdf		Lung_Cancer_Detection.pdf
Performance Comparison.ipynb - Colab.pdf		Performance Comparison.ipynb - Colab.pdf
Performance_Comparison.ipynb		Performance_Comparison.ipynb
Propose Hybrid Model.ipynb - Colab.pdf		Propose Hybrid Model.ipynb - Colab.pdf
Propose_Hybrid_Model.ipynb		Propose_Hybrid_Model.ipynb
README.md		README.md
models_training.ipynb		models_training.ipynb
models_training.ipynb - Colab.pdf		models_training.ipynb - Colab.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lung Cancer Detection Using Hybrid CNN Models

🔍 Project Abstract

📦 Repository Contents

⚙️ Prerequisites

🧠 Model Overview

🚀 Quick Start

Run on Google Colab (Recommended)

📍 Navigate to the Project Folder

📁 Dataset Structure

📊 Experiments & Results

🧾 Performance Highlights

🧪 Reproducibility Tips

🧱 Saved Artifacts

📝 Future Enhancements

About

Uh oh!

Releases

Packages

Languages

License

PrachiJainxD/Lung-Cancer-Detection-Using-Hybrid-CNN-Models

Folders and files

Latest commit

History

Repository files navigation

Lung Cancer Detection Using Hybrid CNN Models

🔍 Project Abstract

📦 Repository Contents

⚙️ Prerequisites

🧠 Model Overview

🚀 Quick Start

Run on Google Colab (Recommended)

📍 Navigate to the Project Folder

📁 Dataset Structure

📊 Experiments & Results

🧾 Performance Highlights

🧪 Reproducibility Tips

🧱 Saved Artifacts

📝 Future Enhancements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages