This project applies unsupervised clustering techniques to categorize countries based on socio-economic and health indicators, facilitating analysis of their overall development levels.
- Classify countries into meaningful clusters using unsupervised learning.
- Visualize and analyze patterns across socio-economic and health-related data.
- Utilize dimensionality reduction and clustering techniques for accurate grouping.
- Initial data exploration
- Data cleaning and preprocessing
- Geographical Maps
- Heatmaps
- Histograms for understanding feature distributions
- Normalization to ensure equitable influence of all features
- Principal Component Analysis (PCA) for dimensionality reduction and feature extraction
- Elbow Method for optimal cluster determination
- Silhouette Score Analysis for validating cluster quality
- Application of the K-Means algorithm (unsupervised learning) to segment countries
To run this project locally:
-
Clone the repository:
git clone https://github.com/azizzoaib786/countries-dataset-clustering.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the notebook:
- Open
countries-dataset-clustering.ipynbin Jupyter Notebook or JupyterLab. - Execute all cells (
Run All).
- Open
- Python (3.7 or higher recommended)
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- geopandas (for map visualizations)
- Jupyter Notebook/JupyterLab
Contributions are encouraged! Fork the repository, make improvements, and submit a pull request.
- Email: [email protected]
This project is licensed under the MIT License.