This project explores different regression models to predict a target variable (Y) based on a given dataset. The models used include:
- Support Vector Regression (SVR)
- Polynomial Regression
- Random Forest Regression
- XGBoost Regression
Each model was trained and evaluated using different preprocessing techniques and hyperparameter tuning to optimize performance.
- Handling Missing Values:
X9(categorical) was filled with the mode.X2(numerical) was filled with the mean.
- Feature Selection:
X1(product ID) was dropped due to lack of relevance.X2andX8were dropped due to weak correlation withY.
- One-Hot Encoding:
X5,X7, andX11were one-hot encoded.- Only
X7_0,X7_4,X11_0,X11_1, andX11_3were kept based on correlation.
- Label Encoding:
X3,X9, andX10were label-encoded but later dropped due to weak correlation.
- Standardization and Normalization:
X6was standardized for better model performance.
Score: 0.370
- Used Grid Search with 5-fold Cross-Validation for hyperparameter tuning.
- Key parameters:
C = 50,epsilon = 0.005,gamma = 0.1,kernel = rbf. - Standardization applied to
X6.
Score: 0.400
- Polynomial features with degree 3 were selected.
- Normalization applied to
X6.
Score: 0.385
- Grid Search Optimization resulted in:
max_depth = 10n_estimators = 200min_samples_split = 10
- Standard preprocessing applied.
Score: 0.481
- Used the same preprocessing steps as Polynomial Regression.
- Key parameters:
n_estimators = 200,learning_rate = 0.3,max_depth = 6.
| Model | Score |
|---|---|
| Support Vector Regression (SVR) | 0.370 (Best Performance) |
| Polynomial Regression | 0.400 |
| Random Forest Regression | 0.385 |
| XGBoost Regression | 0.481 |
- SVR provided the best overall results, balancing accuracy and model reliability.
- Polynomial Regression also showed strong results.
- Feature selection and hyperparameter tuning played a crucial role in performance.
- Clone the repository:
git clone https://github.com/basemw0/data-prediction-project.git
- Install dependencies:
pip install -r requirements.txt
- Run the training script:
python train.py
- Hyperparameter Optimization: Fine-tuning with Bayesian Optimization.
- More Models: Exploring Neural Networks and LightGBM.