Telecom Churn Predictor

Overview

This project focuses on predicting customer churn using machine learning models. By analyzing a dataset of customer behaviors and service usage, the project identifies key patterns and provides actionable insights to help businesses retain at-risk customers. The primary objective is to minimize churn by proactively targeting customers likely to leave.

Objectives

Churn Prediction: Predict customers likely to churn using advanced machine learning models.
Customer Insights: Identify key characteristics and behaviors that correlate with churn.
Business Impact: Enable proactive retention strategies by prioritizing high-risk customers.

Dataset

The dataset consists of 5,000 customer records, including:

Categorical Features: state, international_plan, voice_mail_plan.
Numerical Features: Usage metrics like total_day_minutes, total_night_calls, and more.
Target Variable: Churn (binary classification: 1 for churn, 0 for no churn).

The dataset is clean, with no missing values and features already preprocessed.

Methodology

Data Preprocessing

Dropped redundant features (e.g., phone_number).
Encoded categorical variables using LabelEncoder.
Balanced the target variable using class weights and stratified sampling.

Exploratory Data Analysis (EDA)

Visualized customer distribution across features.
Analyzed correlations to identify relationships between variables and churn.

Machine Learning Models

The following models were trained and evaluated:

Logistic Regression
- Scaled features using StandardScaler.
- Class weights handled imbalance in Churn.
Random Forest
- Leveraged for feature importance and robust performance.
- No scaling required due to tree-based nature.
XGBoost
- Tuned scale_pos_weight to address class imbalance.
- Achieved the highest recall and best overall performance.

Key Results

Best Model: XGBoost achieved a recall of 0.84, outperforming other models.
Feature Importance:
- Key drivers of churn include international_plan, number_customer_service_calls, and voice_mail_plan.
Churn Strategy: Ranked top 500 customers with the highest churn probability for proactive engagement.

Tools and Technologies

Programming: Python (Pandas, NumPy, Scikit-learn, XGBoost).
Visualization: Matplotlib, Seaborn.
Model Evaluation: Confusion matrix, ROC curve, classification report.
Cross-Validation: Recall as the primary metric for imbalanced classification.

Visualizations

Correlation Matrix: Explored relationships between features.
Feature Importance: Highlighted key predictors of churn.
Customer Segmentation: Visualized churn probabilities by feature subsets.

Future Work

Hyperparameter tuning for optimal model performance.
Deploy the model for real-time churn prediction.
Integrate external factors (e.g., customer demographics) for enhanced insights.

Installation and Usage

Clone the repository:

git clone https://github.com/your-username/churn-prediction.git
cd churn-prediction

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
churn_all.csv		churn_all.csv
telecom_churn_clients_predictor.ipynb		telecom_churn_clients_predictor.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Telecom Churn Predictor

Overview

Objectives

Dataset

Methodology

Data Preprocessing

Exploratory Data Analysis (EDA)

Machine Learning Models

Key Results

Tools and Technologies

Visualizations

Future Work

Installation and Usage

About

Uh oh!

Releases

Packages

Languages

ivanseldas/Telecom-Churn-Predictor

Folders and files

Latest commit

History

Repository files navigation

Telecom Churn Predictor

Overview

Objectives

Dataset

Methodology

Data Preprocessing

Exploratory Data Analysis (EDA)

Machine Learning Models

Key Results

Tools and Technologies

Visualizations

Future Work

Installation and Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages