Employee Attrition Prediction using Machine Learning

Goal: Predicting employee turnover and understanding the key drivers behind it using SHAP explainability

Business Impact

Helps HR teams identify at-risk employees early
Supports retention planning:
- Reduce overtime for specific roles
- Invest in improving satisfaction and engagement
- Focus retention efforts on younger employees
Potential to simulate attrition costs and optimize workforce strategy

About the Dataset

This project uses a fictional dataset created by IBM data scientists to simulate real-world HR data. The purpose is to uncover what factors contribute to employee attrition (i.e., leaving the company), and to help businesses proactively reduce turnover.

Source: IBM HR Analytics Attrition Dataset (Kaggle) Rows: 1470 Target variable: Attrition (Yes/No)

Key Features:

Demographics: Age, Gender, MaritalStatus, Education
Job Info: Department, JobRole, YearsAtCompany, JobSatisfaction
Performance: PerformanceRating, OverTime, MonthlyIncome
Label: Attrition — whether the employee left or stayed

Dashboard to visualize some features

dynamic dashboard can be found here for more interactivity and info in the tooltip

Exploratory Data Analysis

Checked class imbalance: ~16% of employees left (Attrition=Yes)
Explored patterns by marital status, satisfaction levels, income, and overtime
Removed irrelevant or duplicate columns (e.g., EmployeeCount, Over18)
Identified clear correlations between attrition and:
- Overtime work
- Younger age
- Low satisfaction
- High distance from home

Preprocessing & Feature Engineering

Label Mapping: Attrition, Gender, OverTime, and Over18 were mapped to binary values
Ordinal Encoding: BusinessTravel
One-Hot Encoding: Applied to Department, EducationField, JobRole, MaritalStatus
Scaling: StandardScaler applied to continuous numerical features
Balancing: Used RandomOverSampler to handle class imbalance in training data

Modeling

Trained and evaluated six classification models:

Model	Accuracy	Key Observations
Logistic Regression	76%	Good recall on attrition class (66%)
K-Nearest Neighbors	67%	Weak precision/recall for attrition
Decision Tree	74%	Low performance on class 1
Random Forest	85%	High accuracy, weak recall for attrition
Support Vector Machine	60%	High recall but very low precision
XGBoost	86%	Best overall performance, selected for SHAP

Model Explainability with SHAP

Used SHAP (SHapley Additive exPlanations) to understand why the model predicts attrition.

Key Drivers of Attrition:

Feature	Insight
OverTime	Strongest predictor of attrition — frequent overtime increases risk
Age	Younger employees more likely to leave
EnvironmentSatisfaction	Dissatisfied employees more likely to leave
DistanceFromHome	Longer commutes correlate with higher attrition
JobSatisfaction	Low job satisfaction strongly tied to attrition

SHAP Summary Plot:

Shows direction and strength of impact for each feature
Confirms that the model aligns with real-world HR logic

🔝 Top Positive Drivers (Increase Attrition Risk)	🔽 Top Negative Drivers (Reduce Attrition Risk)
OverTime – Working overtime increases risk.	Older Age – Older employees are more stable.
Low Job Satisfaction – Unhappy in role.	Higher Monthly Income – More financial comfort.
Low Environment Satisfaction – Poor work environment.	Higher Stock Option Level – Incentivized to stay.
More Companies Worked For – Suggests instability.	Higher Job Involvement – More engaged employees.
Low Relationship Satisfaction – Poor manager/peer relationships.	Longer Years With Current Manager – Manager stability helps.

SHAP Dependence Plot:

Interaction between DistanceFromHome and OverTime:
- Employees living close by are often assigned overtime
- Those far away + overtime = higher attrition

Web App: Employee Attrition Predictor (for HR decision making)

To make the solution actionable for HR teams, I developed a Streamlit web app that predicts attrition risk for individual employees based on input features like age, job satisfaction, overtime, income, and more.

Features of the App:

Dynamic form for entering employee details
Real-time prediction of attrition risk (Yes/No)
Probability score showing model confidence
Built-in explanations of the key risk and retention factors (based on SHAP findings)

ML Backend:

Model: XGBoost (best performance, 86% accuracy)
Preprocessing: One-hot encoding, scaling with StandardScaler
Class balancing handled with RandomOverSampler

Why It Matters:

Empowers HR teams to proactively assess and mitigate employee churn
Encourages data-driven retention strategies
Bridges the gap between machine learning output and business decision-making

Screenshot: you can find the interactive tool here

▶ To Run Locally:

pip install streamlit pandas numpy scikit-learn joblib xgboost
streamlit run Employee_Attrition_App.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
Assets		Assets
Dataset		Dataset
Scripts		Scripts
ReadMe.md		ReadMe.md
XGBoost_model.pkl		XGBoost_model.pkl
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Employee Attrition Prediction using Machine Learning

Business Impact

About the Dataset

Key Features:

Dashboard to visualize some features

Exploratory Data Analysis

Preprocessing & Feature Engineering

Modeling

Model Explainability with SHAP

Key Drivers of Attrition:

Web App: Employee Attrition Predictor (for HR decision making)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Employee Attrition Prediction using Machine Learning

Business Impact

About the Dataset

Key Features:

Dashboard to visualize some features

Exploratory Data Analysis

Preprocessing & Feature Engineering

Modeling

Model Explainability with SHAP

Key Drivers of Attrition:

Web App: Employee Attrition Predictor (for HR decision making)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages