This project performs Exploratory Data Analysis (EDA) on global and country-level COVID-19 data. Through various visualizations and statistical techniques, this project aims to:
- Analyze COVID-19 trends globally and locally (by country).
- Understand key features of the data such as infection rates, mortality rates, and vaccination progress.
- Identify patterns, outliers, and correlations in the dataset.
- Create actionable insights using various plots and reports.
The goal of this project is to provide clear insights into the spread and impact of COVID-19 worldwide. Key questions addressed include:
- What are the most affected regions by COVID-19?
- How have case numbers and deaths evolved over time?
- What are the trends in COVID-19 vaccination?
- Python: For data processing and analysis.
- Libraries:
pandas,numpy,matplotlib,seaborn,plotly
- Libraries:
- Jupyter Notebook: For running and documenting the analysis.
- Dash/Plotly: For creating interactive dashboards.
- Git: For version control and collaboration.
- GitHub: For repository hosting and sharing.
This project utilizes multiple datasets from various reliable sources to perform an exploratory data analysis of COVID-19 trends globally and at the country level. The following datasets are used in this project:
-
Johns Hopkins University COVID-19 Dataset
- Source: GitHub Repository - Johns Hopkins University COVID-19
- Description: This dataset contains daily reports of confirmed COVID-19 cases, deaths, and recoveries at a global level and by country.
- Data Included:
- Confirmed cases
- Deaths and recoveries
- Geographical information (e.g., country, province)
-
Our World in Data COVID-19 Dataset
- Source: GitHub Repository - Our World in Data COVID-19
- Description: This dataset provides detailed COVID-19 data on confirmed cases, deaths, and vaccination information by country.
- Data Included:
- Daily reported cases and deaths
- Vaccination data by country
- Demographic data and health indicators
-
COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University
- Source: GitHub Repository - CSSE COVID-19 Data
- Description: Another source from Johns Hopkins University that provides global case data, including confirmed cases, deaths, and recoveries.
- Data Included:
- Global case counts
- Country-level statistics for the spread of COVID-19
- Confirmed Cases: The number of COVID-19 confirmed cases reported globally and at the country level.
- Deaths and Recoveries: Information on the number of deaths and recoveries due to COVID-19.
- Vaccination Data: Global and country-wise vaccination statistics to understand the progress of vaccination campaigns.
- Time Period: The data spans from the beginning of the COVID-19 pandemic to the present, with daily updates.
The project structure is organized as follows:
COVID19_EDA_Project/
├── data/
│ ├── raw/
│ └── processed/
├── notebooks/
│ ├── data_cleaning.ipynb
│ ├── exploratory_analysis.ipynb
│ └── visualizations.ipynb
├── dashboard/
├── app.py
├──Procfile
└── runtime.txt
├── reports/
│ └── visuals/
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt
data/raw/: Contains the raw datasets that are directly sourced and need processing.data/processed/: Contains cleaned and pre-processed data, ready for analysis.notebooks/: Jupyter notebooks for carrying out the data analysis, cleaning, and visualizations.dashboard/: Contains Dash application files (app.py,procfile).reports/visuals/: This folder stores the saved visualizations (graphs, plots, charts) for reports..gitignore: Ensures that unnecessary or temporary files are not tracked by Git (such as compiled Python files, logs, or system files).LICENSE: A file that outlines the licensing terms for your project.README.md: The file you are currently reading, which contains project information and guidelines.requirements.txt: Lists the Python libraries that are required to run the project (you can generate this by runningpip freeze > requirements.txt).
The cleaning process involved the following steps for each dataset:
- Handling Missing Values: Replaced missing values with
0in numerical columns. - Standardizing Column Names: Renamed columns for consistency (e.g., "Country/Region" → "Country").
- Ensuring Consistent Date Formats: Ensured all date columns follow the
YYYY-MM-DDformat. - Removing Duplicates: Checked and removed duplicate rows, if any.
- Dropping Irrelevant Columns: Removed unnecessary columns that do not contribute to the analysis.
- Saving Cleaned Data: All cleaned datasets were saved in the
data/processed/folder for further analysis.
cleaned_confirmed_cases.csvcleaned_deaths.csvcleaned_recovered.csvcleaned_vaccinations.csv
The following insights were drawn using data visualization techniques.
- Confirmed Cases Over Time: Observed exponential growth globally with peaks during major pandemic waves.
- Mortality Rates: Fluctuated by region, with some areas significantly higher.
- Vaccination Trends: Developed countries showed rapid increases compared to developing nations.
- Top 10 Affected Countries: USA, India, and Brazil consistently led in confirmed cases and deaths.
- Case Fatality Rate (CFR): Regions like Italy and the UK showed a higher CFR during initial waves.
- Top Vaccinated Countries: Countries like Israel, UAE, and the USA showed high vaccination rates early.
- Effectiveness: Countries with higher vaccination rates showed a decline in mortality rates.
- Global Trends: COVID-19 cases and deaths followed a cyclical pattern, correlating with waves of infection.
- Regional Variations: Developed countries had better healthcare responses but also experienced higher initial cases due to robust testing.
- Vaccination Impact: Strong evidence supports that vaccination campaigns reduced mortality rates in most regions.
- Case Fatality Rate: Globally, the CFR ranged between 1%–3%, with significant variance across regions.
An interactive dashboard has been created to visualize the COVID-19 data and provide real-time insights. The dashboard is built using Dash/Plotly and includes the following features:
- Global Overview: Displays global statistics including total confirmed cases, deaths, and recoveries.
- Country Comparison: Allows users to compare COVID-19 statistics between different countries.
- Trend Analysis: Shows trends over time for cases, deaths, and vaccinations.
- Vaccination Progress: Visualizes the progress of vaccination campaigns globally and by country.
- Interactive Maps: Provides geographical representations of COVID-19 data, highlighting affected regions.
- Ensure you have all the required libraries installed. You can install them using the
requirements.txtfile:
pip install -r requirements.txt- Navigate to the
dashboard/directory:
cd dashboard- Run the Dash application:
python app.pyOpen your web browser and go to https://covid19edaproject-production.up.railway.app/ to view the dashboard.
app.py: The main file that runs the Dash application.callbacks.py: Contains callback functions for interactivity in the dashboard.layout.py: Defines the layout and structure of the dashboard.assets/: Contains CSS and other assets for styling the dashboard.
The dashboard provides an intuitive and interactive way to explore the COVID-19 data, making it easier to derive insights and understand trends.
We welcome contributions to this project! If you'd like to contribute, please follow these guidelines:
- Fork the repository.
- Create a new branch for your feature (
git checkout -b feature-branch). - Make your changes and commit them (
git commit -m 'Add feature'). - Push your changes to your forked repository (
git push origin feature-branch). - Open a pull request to the
mainbranch of the original repository.
Please ensure that your contributions follow the project’s coding style, and add tests/documentation if applicable.
This project is licensed under the MIT License – see the LICENSE file for details.
-
Special thanks to the data sources used for this project:
-
Thanks to the open-source community for the Python libraries used in this project:
pandas,numpy,matplotlib,seaborn,plotly,dash
-
All contributors and collaborators who made this project possible.