Skip to content

To classify countries according to socio-economic and health indicators that determine the overall development of a country.

Notifications You must be signed in to change notification settings

azizzoaib786/countries-dataset-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Clustering - Countries by Socio-economic and Health Indicators (Unsupervised Learning)

Overview

This project applies unsupervised clustering techniques to categorize countries based on socio-economic and health indicators, facilitating analysis of their overall development levels.

Project Goals

  • Classify countries into meaningful clusters using unsupervised learning.
  • Visualize and analyze patterns across socio-economic and health-related data.
  • Utilize dimensionality reduction and clustering techniques for accurate grouping.

Steps Included

1. Exploratory Data Analysis (EDA)

  • Initial data exploration
  • Data cleaning and preprocessing

2. Data Visualization

  • Geographical Maps
  • Heatmaps
  • Histograms for understanding feature distributions

3. Data Scaling

  • Normalization to ensure equitable influence of all features

4. Feature Engineering

  • Principal Component Analysis (PCA) for dimensionality reduction and feature extraction

5. Clustering Analysis

  • Elbow Method for optimal cluster determination
  • Silhouette Score Analysis for validating cluster quality

6. K-Means Clustering

  • Application of the K-Means algorithm (unsupervised learning) to segment countries

How to Run

To run this project locally:

  1. Clone the repository:

    git clone https://github.com/azizzoaib786/countries-dataset-clustering.git
  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Run the notebook:

    • Open countries-dataset-clustering.ipynb in Jupyter Notebook or JupyterLab.
    • Execute all cells (Run All).

Requirements

  • Python (3.7 or higher recommended)
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn
  • geopandas (for map visualizations)
  • Jupyter Notebook/JupyterLab

Contributions

Contributions are encouraged! Fork the repository, make improvements, and submit a pull request.

Contact

License

This project is licensed under the MIT License.

About

To classify countries according to socio-economic and health indicators that determine the overall development of a country.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published