Skip to content

Amazon ML Challenge 2025 – Multi-Modal Price Prediction Built a multi-modal price prediction system using text and image data. Engineered features with advanced text parsing, KNN similarity, and embeddings from five SentenceTransformer models and EfficientNet. Ensemble of LightGBM and CatBoost achieved strong performance.

Notifications You must be signed in to change notification settings

Prahlad-07/Amazon-ML-Team-int_64t

Repository files navigation

🏆 Amazon ML Challenge 2025: Smart Product Pricing

Python Deep Learning Status

Multi-Modal Deep Learning Pipeline for Intelligent Price Prediction


🎯 Problem Statement

Predict product prices using multi-modal data:

  • 📝 Product descriptions (text)
  • 🖼️ Product images
  • 💰 Historical pricing

🏗️ Solution Architecture

High-Level Pipeline Overview

    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┃              INPUT LAYER (Raw Data)                   ┃
    ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
    ┃  📝 Product Text  │  🖼️ Product Images  │  💰 Prices   ┃
    ┗━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┛
                 ║                         ║
                 ▼                         ▼
    ┏━━━━━━━━━━━━━━━━━━━━━━┓  ┏━━━━━━━━━━━━━━━━━━━━━━┓
    ┃   STAGE 1: Extract   ┃  ┃  STAGE 2: Generate   ┃
    ┃   Base Features      ┃  ┃  Multi-Modal Data    ┃
    ┃                      ┃  ┃                      ┃
    ┃  • Text Parsing      ┃  ┃  • Text Embeddings   ┃
    ┃  • Unit Convert      ┃  ┃  • Image Embeddings  ┃
    ┃  • Brand Encoding    ┃  ┃  • KNN Features      ┃
    ┃  • TF-IDF Features   ┃  ┃                      ┃
    ┃                      ┃  ┃  Output: 3,212 feat. ┃
    ┃  Output: 520 feat.   ┃  ┃                      ┃
    ┗━━━━━━━━┬━━━━━━━━━━━━┛  ┗━━━━━━━━┬━━━━━━━━━━━━┛
             │                        │
             └────────────┬───────────┘
                          ▼
    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┃         STAGE 3: ENSEMBLE MODELING                    ┃
    ┃                                                       ┃
    ┃   📊 Consolidated Features: ~3,732                   ┃
    ┃                                                       ┃
    ┃   ┌──────────────────────┐  ┌──────────────────────┐ ┃
    ┃   │   LightGBM Model     │  │  CatBoost Model      │ ┃
    ┃   │   • 1000 trees       │  │  • 1000 trees        │ ┃
    ┃   │   • Depth: 7         │  │  • Depth: 7          │ ┃
    ┃   │   • LR: 0.05         │  │  • LR: 0.05          │ ┃
    ┃   │   • SMAPE: 54.78%    │  │  • SMAPE: 54.52%     │ ┃
    ┃   └──────────┬───────────┘  └──────────┬───────────┘ ┃
    ┃              │                         │              ┃
    ┃              └─────────────┬───────────┘              ┃
    ┃                            ▼                          ┃
    ┃               Final = (LightGBM + CatBoost) / 2      ┃
    ┃                                                       ┃
    ┗━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
                              ▼
    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┃         🎯 OUTPUT: Price Predictions                 ┃
    ┃         ✅ SMAPE: 53.99%                              ┃
    ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

STAGE 1: Feature Extraction (520 features)

INPUT: Product Text Descriptions
    ▼
┌─────────────────────────────────────────────────────────┐
│              TEXT PARSING & CLEANING                    │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │  Pack Qty    │  │   Weight     │  │   Volume     │  │
│  │  Extraction  │  │  Extraction  │  │  Extraction  │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└────────────────┬────────────────────────────────────────┘
                 ▼
┌─────────────────────────────────────────────────────────┐
│           UNIT STANDARDIZATION                          │
│  Weight → grams  │  Volume → milliliters  │ Values OK  │
└────────────────┬────────────────────────────────────────┘
                 ▼
┌─────────────────────────────────────────────────────────┐
│              FEATURE ENGINEERING                        │
│  ┌──────────────────────────────────────────────────┐  │
│  │  • Brand Target Encoding (smoothed)              │  │
│  │  • TF-IDF Features from description (500 feat.)  │  │
│  │  • Interaction Features                          │  │
│  │    - pack_weight_ratio                           │  │
│  │    - brand_avg_price                             │  │
│  │  • Price Normalization & Smoothing               │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────┬────────────────────────────────────────┘
                 ▼
         ✅ OUTPUT: 520 numerical features

STAGE 2: Multi-Modal Embeddings (3,212 features)

Part A: Parallel Embedding Generation

┌────────────────────────────────────┐    ┌────────────────────────────────────┐
│    TEXT EMBEDDINGS                 │    │    IMAGE EMBEDDINGS                │
│    (5 Transformer Models)          │    │    (EfficientNet-B0 CNN)           │
├────────────────────────────────────┤    ├────────────────────────────────────┤
│                                    │    │                                    │
│  1️⃣  MiniLM-L6-v2                  │    │  🖼️  EfficientNet-B0               │
│      └─ 384-dim vectors            │    │      └─ Pre-trained on ImageNet    │
│                                    │    │      └─ Global Average Pooling    │
│  2️⃣  Multilingual MiniLM           │    │      └─ 1280-dimensional output   │
│      └─ 384-dim vectors            │    │                                    │
│                                    │    │                                    │
│  3️⃣  all-MiniLM-L12                │    │                                    │
│      └─ 384-dim vectors            │    │                                    │
│                                    │    │                                    │
│  4️⃣  distiluse-base                │    │                                    │
│      └─ 384-dim vectors            │    │                                    │
│                                    │    │                                    │
│  5️⃣  all-MiniLM-L6                 │    │                                    │
│      └─ 384-dim vectors            │    │                                    │
│                                    │    │                                    │
│  Total: 5 × 384 = 1920 features   │    │  Total: 1280 features             │
└────────────────┬───────────────────┘    └────────────────┬───────────────────┘
                 │                                        │
                 └────────────────┬─────────────────────────┘
                                  ▼
                ┌──────────────────────────────────────┐
                │ CONCATENATED EMBEDDING SPACE         │
                │ 1920 + 1280 = 3200 dimensions       │
                └──────────────┬───────────────────────┘

Part B: KNN Similarity Features

Using 3,200-dimensional Concatenated Embeddings
    ▼
┌──────────────────────────────────────────────────────────────┐
│              K-NEAREST NEIGHBORS ANALYSIS (K=10)             │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  📊 TEXT SPACE KNN                                           │
│  ├─ Mean Price of 10 nearest neighbors                       │
│  ├─ Std Dev of neighbor prices                              │
│  ├─ Min price among neighbors                               │
│  └─ Max price among neighbors                               │
│                                                              │
│  🖼️ IMAGE SPACE KNN                                          │
│  ├─ Mean Price of 10 nearest neighbors                       │
│  ├─ Std Dev of neighbor prices                              │
│  ├─ Min price among neighbors                               │
│  └─ Max price among neighbors                               │
│                                                              │
│  🎯 COMBINED SPACE KNN                                       │
│  ├─ Mean Price of 10 nearest neighbors                       │
│  ├─ Std Dev of neighbor prices                              │
│  ├─ Min price among neighbors                               │
│  └─ Max price among neighbors                               │
│                                                              │
│  Total KNN Features: 4 metrics × 3 spaces = 12 features    │
└──────────────────┬───────────────────────────────────────────┘

Part C: Stage 2 Output Summary

┌─────────────────────────────────────────────────┐
│   STAGE 2 CONSOLIDATED OUTPUT                   │
├─────────────────────────────────────────────────┤
│  • Text Embeddings (5 models):    1920 feat.   │
│  • Image Embeddings (1 model):    1280 feat.   │
│  • KNN Features (3 spaces):          12 feat.   │
│  ─────────────────────────────────────────────  │
│  TOTAL:                            3212 feat.  │
└─────────────────────────────────────────────────┘

STAGE 3: Ensemble Modeling (~3,732 features)

┌─────────────────────────────────────────────────────────────┐
│        CONSOLIDATED FEATURE MATRIX                          │
├─────────────────────────────────────────────────────────────┤
│  From STAGE 1:  Numerical Features        ~520 feat.       │
│  From STAGE 2:  Text Embeddings           1920 feat.       │
│  From STAGE 2:  Image Embeddings          1280 feat.       │
│  From STAGE 2:  KNN Features                 12 feat.       │
│  ─────────────────────────────────────────────────────────  │
│  TOTAL:                                   ~3,732 feat.     │
└────────────────┬────────────────────────────────────────────┘
                 ▼
     ┌──────────────────────────────────────┐
     │   SPLIT: Train & Validation Data     │
     └──────────────────┬───────────────────┘
                        ▼
        ┌────────────────────────────────────┐
        │     GRADIENT BOOSTING MODELS       │
        ├────────────────────────────────────┤
        │                                    │
        │  🟦 LightGBM                       │
        │  ├─ Trees: 1000                   │
        │  ├─ Max Depth: 7                  │
        │  ├─ Learning Rate: 0.05           │
        │  ├─ Num Leaves: 31                │
        │  └─ SMAPE: 54.78%                 │
        │                                    │
        │  🟪 CatBoost                       │
        │  ├─ Trees: 1000                   │
        │  ├─ Max Depth: 7                  │
        │  ├─ Learning Rate: 0.05           │
        │  ├─ Handle Cat Features: Yes      │
        │  └─ SMAPE: 54.52%                 │
        │                                    │
        └────────────────┬───────────────────┘
                         ▼
        ┌────────────────────────────────────┐
        │   ENSEMBLE AVERAGING               │
        │   Final = (LightGBM + CatBoost)/2  │
        └────────────────┬───────────────────┘
                         ▼
        ✅ FINAL PREDICTIONS
           SMAPE: 53.99% ⭐

📊 Complete Data Flow Diagram

                    ┌──────────────────┐
                    │   INPUT DATA     │
                    └────────┬─────────┘
                             │
                    ┌────────┴─────────┐
                    ▼                  ▼
            ┌───────────────┐  ┌──────────────┐
            │  TEXT DATA    │  │  IMAGE DATA  │
            └───────┬───────┘  └──────┬───────┘
                    │                 │
                    └────────┬────────┘
                             ▼
            ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
            ┃  STAGE 1: EXTRACT FEATURES  ┃
            ┃  Output: 520 features       ┃
            ┗━━━━━━━━━━┬━━━━━━━━━━━━━━━━━┛
                       ▼
            ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
            ┃ STAGE 2: EMBEDDINGS + KNN   ┃
            ┃ Output: 3,212 features      ┃
            ┗━━━━━━━━━━┬━━━━━━━━━━━━━━━━━┛
                       ▼
            ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
            ┃ STAGE 3: ENSEMBLE TRAINING  ┃
            ┃ 3,732 consolidated features │
            ┃ LightGBM + CatBoost average │
            ┗━━━━━━━━━━┬━━━━━━━━━━━━━━━━━┛
                       ▼
            ┌──────────────────────────┐
            │  🎯 PRICE PREDICTIONS    │
            │  ✅ SMAPE: 53.99%        │
            └──────────────────────────┘

📈 Feature Importance Distribution

┌─────────────────────────────────────────────────┐
│        TOP 10 FEATURE CONTRIBUTIONS             │
├─────────────────────────────────────────────────┤
│                                                 │
│  1. knn_mean_price_combined  ████████████ 18.2%│
│  2. brand_target_encoded     ████████ 12.4%    │
│  3. knn_mean_price_text      ██████ 9.8%       │
│  4. weight_grams             █████ 7.3%        │
│  5. knn_std_price_combined   ████ 6.5%         │
│  6. image_embedding_0        ███ 5.1%          │
│  7. pack_quantity            ███ 4.9%          │
│  8. knn_mean_price_image     ██ 4.2%           │
│  9. volume_ml                ██ 3.8%           │
│  10. text_embedding_1_0      ██ 3.1%           │
│                                                 │
└─────────────────────────────────────────────────┘

📊 Model Performance

╔═══════════════════════════════════════════════════════════╗
║              PERFORMANCE METRICS                          ║
╠═══════════════════════════════════════════════════════════╣
║                                                           ║
║  📊 LightGBM (Single Model)        →  54.78% SMAPE       ║
║  📊 CatBoost (Single Model)        →  54.52% SMAPE       ║
║                                                           ║
║  ✅ ENSEMBLE (Average)             →  53.99% SMAPE ⭐    ║
║                                                           ║
║  🚀 Improvement from Ensemble:  ↓ 0.53% SMAPE           ║
║                                                           ║
╚═══════════════════════════════════════════════════════════╝

🚀 Quick Start

# Stage 1: Extract features
python enhanced_feature_extraction.py

# Stage 2: Generate embeddings (GPU recommended)
python generate_all_embeddings.py

# Stage 3: Train and predict
python validate_and_submit_ensemble.py

📦 Tech Stack

┌─────────────────────────────────────────────┐
│  DEEP LEARNING                              │
│  • PyTorch                                  │
│  • Sentence Transformers                    │
│  • EfficientNet (timm)                      │
├─────────────────────────────────────────────┤
│  GRADIENT BOOSTING                          │
│  • LightGBM                                 │
│  • CatBoost                                 │
├─────────────────────────────────────────────┤
│  SIMILARITY SEARCH                          │
│  • FAISS (Facebook AI)                      │
├─────────────────────────────────────────────┤
│  DATA PROCESSING                            │
│  • Pandas, NumPy, Scikit-learn              │
└─────────────────────────────────────────────┘

💡 Key Innovation

Our solution's strength lies in three-level similarity features:

  1. 🔤 Text-based neighbors → Semantically similar products
  2. 🖼️ Image-based neighbors → Visually similar products
  3. 🎯 Combined neighbors → Holistically similar products

Each level captures different pricing patterns, creating a robust feature set.


🎓 Results Summary

Validation SMAPE: 53.99%
3,732 engineered features
5 text embedding models
Multi-space KNN features
Robust ensemble approach


⭐ Star this repo if you found it helpful! ⭐

Made with ❤️ for Amazon ML Challenge 2025

About

Amazon ML Challenge 2025 – Multi-Modal Price Prediction Built a multi-modal price prediction system using text and image data. Engineered features with advanced text parsing, KNN similarity, and embeddings from five SentenceTransformer models and EfficientNet. Ensemble of LightGBM and CatBoost achieved strong performance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published