Predicting whether a patient has diabetes based on medical measurements.
- Source: Pima Indians Diabetes Dataset
- Features: Glucose, BMI, Age, Blood Pressure, Insulin, etc.
- Target: Outcome (0 = No Diabetes, 1 = Diabetes)
- Random Forest
- K-Nearest Neighbors
- Logistic Regression
- XGBoost
- SVC
- LinearSVC
| Model | Accuracy |
|---|---|
| Random Forest | 0.753 |
| KNN | 0.701 |
| Logistic Regression | 0.759 |
| XGBoost | 0.740 |
| SVC | 0.772 |
| LinearSVC | 0.740 |
Best Model: SVC (Accuracy = 0.772)
- Comparing 6 classification models in one pipeline
- Tracking best model automatically
- Hyperparameter tuning with RandomizedSearchCV then GridSearchCV
- Visualizing confusion matrix for best model
- Using PolynomialFeatures with classification problems
pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost