COVID-19 Detection Model

A Python-based machine learning project that predicts COVID-19 test results using patient symptoms and demographic data. The model analyzes various health indicators to classify coronavirus test outcomes with high accuracy using multiple ML algorithms.

Features

Analyzes COVID-19 patient data from 2020-2021
Data preprocessing with Label Encoding for categorical variables
Multiple machine learning algorithms comparison
Achieves 91.53% accuracy with Random Forest Classifier
Comprehensive data visualization and analysis
Training vs Testing performance comparison
Built using Jupyter Notebook for interactive experimentation.

How to Run

Clone the repository:

   git clone https://github.com/AdarshZolekar/COVID19-Detection-Model.git

Prepare the dataset:

Download or add the covid_data_2020-2021.csv.zip file to your Google Drive.
Update the file path in the script to match your location.

Run the script:

Open a terminal or command prompt.
Navigate to the project directory:

   cd COVID19-Detection-Model

Run the Jupyter Notebook:

   jupyter notebook Corona-Detection-Model.ipynb

Or run the Python script directly:

   python Corona-Detection-Model.py

Execute the notebook cells step-by-step to train models and visualize results.

How It Works

Data Loading: Reads the COVID-19 dataset containing patient symptoms and test results from 2020-2021.
Data Preprocessing:

Removes unnecessary columns (test_date)
Checks for duplicates and null values
Converts categorical variables to numerical using Label Encoding

Feature Engineering: Separates features (symptoms, age, gender) from target variable (corona_result).
Model Training & Comparison: Tests multiple algorithms to find the best performer:
- Logistic Regression (90.84% accuracy)
- Random Forest Classifier (91.53% accuracy) Best
- Naive Bayes Classifier
- XGBoost Classifier (91.39% accuracy)
Visualization: Generates plots showing predictions, error distribution and feature correlations.

Dependencies

pandas – Data handling and manipulation
numpy – Numerical computations
scikit-learn – Machine learning models and preprocessing
xgboost – Gradient boosting classifier
matplotlib – Data visualization
seaborn – Statistical data visualization
Jupyter Notebook – Interactive environment

Install them manually if needed:

pip install pandas numpy scikit-learn xgboost matplotlib seaborn jupyter

Dataset

File: covid_data_2020-2021.csv.zip
Description: Contains COVID-19 patient records with symptoms and test results
Features:
- cough – Presence of cough symptom (Yes/No)
- fever – Presence of fever (Yes/No)
- sore_throat – Sore throat symptom (Yes/No)
- shortness_of_breath – Breathing difficulty (Yes/No)
- head_ache – Headache symptom (Yes/No)
- age_60_and_above – Age category (Yes/No)
- gender – Patient gender (Male/Female)
- test_indication – Reason for testing
- corona_result – COVID-19 test result (Positive/Negative) [Target Variable]

Results

Model Performance Comparison:

Algorithm	Training Accuracy	Testing Accuracy
Logistic Regression	90.84%	90.84%
Random Forest Classifier	~100%	91.53%
Naive Bayes	Lower	Lower
XGBoost	~100%	91.39%

Best Model: Random Forest Classifier with 91.53% test accuracy

Visualizations Include:

Actual vs Predicted comparison plots
Error distribution charts
Feature histograms
Correlation scatter plots.

Model Insights

The Random Forest Classifier provides the best balance between training and testing accuracy, indicating:

Strong predictive capability for COVID-19 detection
Good generalization to unseen data
Effective use of symptom-based features
Minimal overfitting compared to other models.

Future Improvements

Implement deep learning models (Neural Networks) for better accuracy
Add cross-validation for more robust evaluation
Include feature importance analysis
Expand dataset with more recent COVID-19 variants
Build a web interface using Flask or Streamlit for real-time predictions
Add SHAP or LIME for model interpretability
Implement ensemble methods combining multiple models.

License

This project is open-source under the MIT License.

Contributions

Contributions are welcome!

Open an issue for bugs or feature requests
Submit a pull request for improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Corona-Detection-Model.ipynb		Corona-Detection-Model.ipynb
Corona-Detection-Model.py		Corona-Detection-Model.py
Covid Data 2020-2021.zip		Covid Data 2020-2021.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COVID-19 Detection Model

Features

How to Run

How It Works

Dependencies

Dataset

Results

Model Insights

Future Improvements

License

Contributions

About

Uh oh!

Releases

Packages

Languages

AdarshZolekar/COVID19-Detection-Model

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Detection Model

Features

How to Run

How It Works

Dependencies

Dataset

Results

Model Insights

Future Improvements

License

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages