Skip to content

This project aims to analyze the COVID-19 pandemic using publicly available data. The project includes a Jupyter notebook with Python code to extract, clean, and visualize COVID-19 data from various sources. Additionally, the project provides a dashboard to interactively explore the data.

Notifications You must be signed in to change notification settings

Vishwanath-J-25/Covid-19-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Covid-19 India Analysis

image

Introduction

The COVID-19 pandemic has had a profound impact on public health, and understanding the data is crucial for decision-making and resource allocation. This project, assigned by our training institution "Masai" with a tight 7-day deadline, aims to provide valuable insights and create a data-driven dashboard to aid in understanding the pandemic's progression and its effects on different regions of India.

Problem Statement

  • Extract COVID-19 data for India using Python and the requests module.
  • Parse and clean the data, ensuring consistency and handling missing values.
  • Perform SQL aggregations to create an Excel dashboard with the following insights:
    • Weekly evolution of confirmed cases, recoveries, deaths, and tests.
    • Categorize districts based on testing ratios and analyze deaths across categories.
    • Compare delta7 confirmed cases with vaccination data.
    • Create KPIs to assess case severity in different states.
    • Categorize total confirmed cases by month to identify the worst month for India.

Process

Data Retrieval

Comprehensive Data Extraction Extracted a comprehensive dataset of COVID-19 information from an API using Python, ensuring data integrity.


    #Making a Get Request to the Server:
    response=requests.get("https://data.covid19india.org/v4/min/timeseries.min.json")

Data Transformation

Transformed the API's JSON-format data into a structured dataframe.


    #Converting the response to Python DataStructure and Storing it to a Variable:
    data=response.json()

    #Making Empty DataFrame:
    df = pd.DataFrame()
    #Iterating through Each State:
    for state_code, state_data in data.items():
        #Getting All the Dates and Storing in a Variable:
        timeseries_data = state_data.get('dates', {})
        #Iterating through Each District of the State:
        for date, date_data in timeseries_data.items():
            #Flattening the Dictionary:
            date_df = pd.json_normalize(date_data)
            #Storing Dates in One Varaibel:
            date_df['Date'] = date
            #Storing Satate_Code in One Varaibel:
            date_df['State_Code'] = state_code
            #Concatinating Each Dates's Data
            df = pd.concat([df, date_df], ignore_index=True)

Data Cleaning & Processing

Processed the dataset by handling null values, eliminating unwanted entries, and conducting outlier analysis. The cleaned data was then exported as a CSV file.


    #Filling the Null Values with 0:
    for i in df:
        df[i]=df[i].fillna(0)
        #Changing The DataTypes:
        try:
            if i=="Date":
                df[i]=pd.to_datetime(df[i],format="%Y-%m-%d")
            else:    
                df[i]=df[i].astype(int)
        except:
            df[I]

    #Removing Outliers:
    for i in df.select_dtypes(include=int):
        q1=df[i].quantile(0.25)
        q3=df[i].quantile(0.75)
        iqr=q3-q1
        ul=df[i]+iqr*1.5
        ll=df[i]-iqr*1.5
        df=df[(df[i]>=ll)&(df[i]<=ul)]

Database Management with SQL

Imported the CSV file into a database and employed SQL queries for data organization and transformation. This included creating structured tables for efficient data handling and categorizing data by testing ratios for further analysis.


    #Creating Categories based on Testing Ratio:
    WITH cte as (SELECT *, IFNULL(Total_Tested/Population,0) AS Testing_Ratio FROM District_Data)
    SELECT *,
    	CASE
    		WHEN  Testing_Ratio BETWEEN 0.05 AND 0.1 THEN 'Category A'
    		WHEN  Testing_Ratio > 0.1 AND Testing_Ratio <= 0.3 THEN 'Category B'
    		WHEN  Testing_Ratio > 0.3 AND Testing_Ratio <= 0.5 THEN 'Category C'
    		WHEN  Testing_Ratio > 0.5 AND Testing_Ratio <= 0.75 THEN 'Category D'
    		WHEN  Testing_Ratio > 0.75 AND Testing_Ratio <= 1 THEN 'Category E'
    		ELSE '' 
    	END AS Category 
    FROM cte
    ORDER BY State, District;

Data Analysis

Employed pivot tables to summarize and aggregate the data, facilitating the creation of necessary charts and visualizations for the dashboard.

Dashboard Creation

Developed a dynamic Excel dashboard for effective data presentation. The dashboard featured various interactive elements, including slicers and bookmark features, allowing seamless switching between district and state-level analysis views.

Dashboard

  • The dashboard offers two views (state and district), allowing users to switch between them using bookmarks.
  • It includes various filters for exploring different scenarios.

Screenshot 2023-09-09 155926

Screenshot 2023-09-09 183418

Key Insights

Monthly Trend Analysis

A consistent monthly trend emerged in each year, with a significant surge in confirmed cases observed in May. Prior to May, cases remained relatively low, gradually increasing. However, May saw an exponential rise in confirmed cases.

Testing Ratio Impact

Districts exhibiting the lowest testing ratios tended to experience higher numbers of cases and deaths. This suggests a potential correlation between testing efforts and case outcomes.

Population Density Effect

States with higher population density, such as Maharashtra, tended to report more cases and deaths. In contrast, regions with lower population density, like Lakshadweep, recorded the lowest numbers of confirmed cases.

Seven-Day Moving Average

An analysis of the seven-day moving average of confirmed cases revealed Kerala as a standout, demonstrating a substantial difference compared to other states. This indicates unique trends in the disease's progression in Kerala.

Challenges Faced

Data Extraction

We initially faced difficulties in getting data from the server, but we persevered and figured it out.

Data Cleanup

Dealing with messy data, including missing values and confusing entries, was a challenge. We brainstormed and tried different methods to clean it up and make it useful.

Data Visualization

Visualizing the data in a way that answered the project's questions was tough. We had to think creatively and work as a team to present the data effectively.

Time Management

Meeting project deadlines required good time management skills, and we learned to prioritize tasks efficiently.

Learnings

Persistence

We learned to keep trying when faced with data extraction problems.

Data Cleaning

We improved our skills in making messy data useful through experimentation.

Creative Problem-Solving

We found creative ways to visualize complex data through teamwork.

Teamwork and Communication

Working with diverse viewpoints improved our communication and collaboration skills.

Time Management

Efficiently managing our time became crucial to meet project deadlines.

Files Information

Result

  • Dashboard: Excel file with sheets for state data, district data, pivot tables, and the dashboard.

Data

  • TimeSeries_Data: Cleaned time series data exported from the web server after EDA.
  • District_Data: Cleaned district data exported from the web server after EDA.

Code

  • Python Codes: Contains codes used to extract, clean, and prepare data for analysis.
  • SQL Queries: Contains SQL queries used for aggregation and creating tables for analysis and dashboard.

Tech Stack

  • Python Data extraction and EDA
  • MySQL Aggregation for table creation
  • Excel Dashboard creation and data presentation
  • Power Query Minor data adjustments
  • API Used API As a Data Source
  • Jupyter Notebook Used Jupyter as IDE

Data Sources

Data Info

Project Presentation

Team Members

License

This project is licensed under the MIT License, allowing you to use, modify, and distribute the code and visuals while maintaining the original license terms.


For questions or feedback, please contact: [email protected]

Enjoy exploring the project!

About

This project aims to analyze the COVID-19 pandemic using publicly available data. The project includes a Jupyter notebook with Python code to extract, clean, and visualize COVID-19 data from various sources. Additionally, the project provides a dashboard to interactively explore the data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published