Healthcare Risk Prediction with Random Forest

This project is a healthcare AI model built using Python and scikit-learn to predict patient health risk levels (Low, Moderate, High) based on demographic, socioeconomic, and medical history data.

💡 Project Goal

The goal of this project is to explore machine learning in the healthcare space by building a classification model that predicts a patient's health risk score. The model can help identify high-risk individuals and support early intervention strategies.

📊 Dataset

The dataset includes synthetic healthcare data with the following features:

Age
Gender
Ethnicity
Income Level
Employment Status
Medical History
Health Risk Score (Target)

Additional sample records were created to balance the dataset and improve classification performance.

🛠 Tools & Technologies

Python
Pandas for data processing
scikit-learn for model training and evaluation
RandomForestClassifier for classification
LabelEncoder for categorical feature encoding
Google Colab for development and execution

📈 Key Steps

Data cleaning and encoding of categorical features
Class filtering to remove underrepresented health risk classes
Splitting data into training and testing sets (70/30)
Training a Random Forest model
Evaluating model performance with:
- Confusion matrix
- Classification report (accuracy, precision, recall)
Exporting the trained model using joblib

⚙️ Model Performance

The initial model achieved ~62% accuracy on the test data. Performance can be further improved by:

Hyperparameter tuning
Adding more diverse training data
Exploring alternative classification algorithms

🚀 Future Improvements

Implement cross-validation
Tune hyperparameters using GridSearchCV
Add more advanced visualizations (e.g., SHAP for feature importance)
Improve dataset size and balance for better generalization

📂 Output

Trained model saved as random_forest_model.joblib
Notebook includes full pipeline from preprocessing to evaluation

🔗 Try It Out

This notebook was developed in Google Colab and can be adapted for use with any structured healthcare dataset.

Project by Patricia L Johnson

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Risk Prediction with Random Forest

💡 Project Goal

📊 Dataset

🛠 Tools & Technologies

📈 Key Steps

⚙️ Model Performance

🚀 Future Improvements

📂 Output

🔗 Try It Out

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Healthcare Risk Prediction with Random Forest

💡 Project Goal

📊 Dataset

🛠 Tools & Technologies

📈 Key Steps

⚙️ Model Performance

🚀 Future Improvements

📂 Output

🔗 Try It Out

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages