Data Prediction Project

Introduction

This project explores different regression models to predict a target variable (Y) based on a given dataset. The models used include:

Support Vector Regression (SVR)
Polynomial Regression
Random Forest Regression
XGBoost Regression

Each model was trained and evaluated using different preprocessing techniques and hyperparameter tuning to optimize performance.

Data Preprocessing

Handling Missing Values:
- X9 (categorical) was filled with the mode.
- X2 (numerical) was filled with the mean.
Feature Selection:
- X1 (product ID) was dropped due to lack of relevance.
- X2 and X8 were dropped due to weak correlation with Y.
One-Hot Encoding:
- X5, X7, and X11 were one-hot encoded.
- Only X7_0, X7_4, X11_0, X11_1, and X11_3 were kept based on correlation.
Label Encoding:
- X3, X9, and X10 were label-encoded but later dropped due to weak correlation.
Standardization and Normalization:
- X6 was standardized for better model performance.

Models & Performance

1️⃣ Support Vector Regression (SVR) - Best Performing Model

Score: 0.370

Used Grid Search with 5-fold Cross-Validation for hyperparameter tuning.
Key parameters: C = 50, epsilon = 0.005, gamma = 0.1, kernel = rbf.
Standardization applied to X6.

2️⃣ Polynomial Regression

Score: 0.400

Polynomial features with degree 3 were selected.
Normalization applied to X6.

3️⃣ Random Forest Regression

Score: 0.385

Grid Search Optimization resulted in:
- max_depth = 10
- n_estimators = 200
- min_samples_split = 10
Standard preprocessing applied.

4️⃣ XGBoost Regression

Score: 0.481

Used the same preprocessing steps as Polynomial Regression.
Key parameters: n_estimators = 200, learning_rate = 0.3, max_depth = 6.

Conclusion

Model	Score
Support Vector Regression (SVR)	0.370 (Best Performance)
Polynomial Regression	0.400
Random Forest Regression	0.385
XGBoost Regression	0.481

SVR provided the best overall results, balancing accuracy and model reliability.
Polynomial Regression also showed strong results.
Feature selection and hyperparameter tuning played a crucial role in performance.

How to Run

Clone the repository:

git clone https://github.com/basemw0/data-prediction-project.git

Install dependencies:
```
pip install -r requirements.txt
```
Run the training script:
```
python train.py
```

📌 Future Improvements

Hyperparameter Optimization: Fine-tuning with Bayesian Optimization.
More Models: Exploring Neural Networks and LightGBM.

✨ Author

Basem Walid

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Models		Models
Ai-Report.pdf		Ai-Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Prediction Project

Introduction

Data Preprocessing

Models & Performance

1️⃣ Support Vector Regression (SVR) - Best Performing Model

2️⃣ Polynomial Regression

3️⃣ Random Forest Regression

4️⃣ XGBoost Regression

Conclusion

How to Run

📌 Future Improvements

✨ Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Prediction Project

Introduction

Data Preprocessing

Models & Performance

1️⃣ Support Vector Regression (SVR) - Best Performing Model

2️⃣ Polynomial Regression

3️⃣ Random Forest Regression

4️⃣ XGBoost Regression

Conclusion

How to Run

📌 Future Improvements

✨ Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages