Model-Wise Analysis on Marketing Data

Group project analyzing customer demographics and behavior to predict spending scores and loyalty program participation using multiple machine learning models.
This project has been built by Anam Ahamed, Mihika Grover, Preksha, Siddhant Grover and Siddhi.

🔍 Project Overview

This project explores how different models perform in understanding and predicting customer behavior in a marketing dataset. We applied clustering, regression, and classification methods to gain insights into customer spending patterns and loyalty program participation.

Key components:

Clustering (K-Means): Segmented customers into 5 clusters using the elbow method; standardized variables ensured uniform analysis.
ANOVA & Normality Testing: Validated assumptions for regression; confirmed normal distribution through ANOVA and Q-Q plots.
Multiple Linear Regression: Modeled spending score with predictors Age, Income, and Online Shopping Frequency; explained 94.8% of variance.
KNN: Determined optimal k=11; analyzed accuracy, precision, recall, and F1 scores at cluster level.
Naive Bayes: Compared cluster-wise accuracy and recall; highlighted challenges with false negatives.
CART (Decision Trees): Derived interpretable decision rules; Income and Spending Score emerged as strongest loyalty predictors.
Logistic Regression: Modeled probability of loyalty program participation; moderate predictive accuracy (45–55%).

🚀 Quickstart

Run in Colab (Recommended)

Run Locally

git clone https://github.com/<your-username>/<repo-name>
cd <Marketing_Data_Analysis>
pip install -r requirements.txt
Marketing_Analysis.ipynb

🧰 Tech Stack Python, Google Colab → core environment Libraries: NumPy, Pandas, Matplotlib, seaborn, scikit-learn Methods Applied: K-Means clustering Multiple Linear Regression K-Nearest Neighbors (KNN) Naive Bayes CART (Decision Trees) Logistic Regression

📊 Key Findings

Spending Score Predictors:

Online Shopping Frequency (+0.914 SD) and Income (+0.3429 SD) had the strongest positive effect on spending score.
Age was statistically insignificant.

Model Performance:

Regression: R² = 0.948 → strong explanatory power.
KNN: Optimal k=11, but accuracy ~50%.
Naive Bayes: Accuracy peaked at ~54%, but recall varied across clusters.
CART: Cluster 0 performed best (53% accuracy), with clear decision rules based on Income & Spending Score.
Logistic Regression: Accuracies between 45–55%, F1 scores between 0.37–0.55.

For further key visualizations:

Recommendations:

Retain high-income, frequent shoppers with exclusive loyalty offers.
Promote cost-saving campaigns for low-income/infrequent shoppers.
Use CART decision rules for real-time, personalized marketing strategies.

📂 Repo Structure:

*Marketing_Analysis.ipynb → main notebook with full workflow *Marketing_DA.pdf → project presentation slides *AIDA_Dataset(2).xlsx → dataset *requirements.txt → Python dependencies

ACKNOWLEDGEMENT Project completed as part of the Ariificial Intelligence in Data Analytics(AIDA) course under the guidance of Professor Tushar Jaruhar

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
AIDA_Dataset (2).xlsx		AIDA_Dataset (2).xlsx
Marketing_Analysis.ipynb		Marketing_Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model-Wise Analysis on Marketing Data

🔍 Project Overview

🚀 Quickstart

Run in Colab (Recommended)

Run Locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model-Wise Analysis on Marketing Data

🔍 Project Overview

🚀 Quickstart

Run in Colab (Recommended)

Run Locally

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages