Skip to content

anam04/Marketing_Data_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Model-Wise Analysis on Marketing Data

license Open In Colab

Group project analyzing customer demographics and behavior to predict spending scores and loyalty program participation using multiple machine learning models.
This project has been built by Anam Ahamed, Mihika Grover, Preksha, Siddhant Grover and Siddhi.


🔍 Project Overview

This project explores how different models perform in understanding and predicting customer behavior in a marketing dataset. We applied clustering, regression, and classification methods to gain insights into customer spending patterns and loyalty program participation.

Key components:

  • Clustering (K-Means): Segmented customers into 5 clusters using the elbow method; standardized variables ensured uniform analysis.
  • ANOVA & Normality Testing: Validated assumptions for regression; confirmed normal distribution through ANOVA and Q-Q plots.
  • Multiple Linear Regression: Modeled spending score with predictors Age, Income, and Online Shopping Frequency; explained 94.8% of variance.
  • KNN: Determined optimal k=11; analyzed accuracy, precision, recall, and F1 scores at cluster level.
  • Naive Bayes: Compared cluster-wise accuracy and recall; highlighted challenges with false negatives.
  • CART (Decision Trees): Derived interpretable decision rules; Income and Spending Score emerged as strongest loyalty predictors.
  • Logistic Regression: Modeled probability of loyalty program participation; moderate predictive accuracy (45–55%).
image image image

🚀 Quickstart

Run in Colab (Recommended)

Open In Colab

Run Locally

git clone https://github.com/<your-username>/<repo-name>
cd <Marketing_Data_Analysis>
pip install -r requirements.txt
Marketing_Analysis.ipynb

🧰 Tech Stack Python, Google Colab → core environment Libraries: NumPy, Pandas, Matplotlib, seaborn, scikit-learn Methods Applied: K-Means clustering Multiple Linear Regression K-Nearest Neighbors (KNN) Naive Bayes CART (Decision Trees) Logistic Regression

📊 Key Findings

Spending Score Predictors:

  • Online Shopping Frequency (+0.914 SD) and Income (+0.3429 SD) had the strongest positive effect on spending score.
  • Age was statistically insignificant.

Model Performance:

  • Regression: R² = 0.948 → strong explanatory power.
  • KNN: Optimal k=11, but accuracy ~50%.
  • Naive Bayes: Accuracy peaked at ~54%, but recall varied across clusters.
  • CART: Cluster 0 performed best (53% accuracy), with clear decision rules based on Income & Spending Score.
  • Logistic Regression: Accuracies between 45–55%, F1 scores between 0.37–0.55.
image image

For further key visualizations:

Open In Colab

Recommendations:

  • Retain high-income, frequent shoppers with exclusive loyalty offers.
  • Promote cost-saving campaigns for low-income/infrequent shoppers.
  • Use CART decision rules for real-time, personalized marketing strategies.

📂 Repo Structure:

*Marketing_Analysis.ipynb → main notebook with full workflow *Marketing_DA.pdf → project presentation slides *AIDA_Dataset(2).xlsx → dataset *requirements.txt → Python dependencies

ACKNOWLEDGEMENT Project completed as part of the Ariificial Intelligence in Data Analytics(AIDA) course under the guidance of Professor Tushar Jaruhar

About

Marketing analytics project applying K-Means, regression, and ML classifiers to segment customers and predict loyalty program participation with actionable insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors