Predict product prices using multi-modal data:
- 📝 Product descriptions (text)
- 🖼️ Product images
- 💰 Historical pricing
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ INPUT LAYER (Raw Data) ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ 📝 Product Text │ 🖼️ Product Images │ 💰 Prices ┃
┗━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┛
║ ║
▼ ▼
┏━━━━━━━━━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━━━━━┓
┃ STAGE 1: Extract ┃ ┃ STAGE 2: Generate ┃
┃ Base Features ┃ ┃ Multi-Modal Data ┃
┃ ┃ ┃ ┃
┃ • Text Parsing ┃ ┃ • Text Embeddings ┃
┃ • Unit Convert ┃ ┃ • Image Embeddings ┃
┃ • Brand Encoding ┃ ┃ • KNN Features ┃
┃ • TF-IDF Features ┃ ┃ ┃
┃ ┃ ┃ Output: 3,212 feat. ┃
┃ Output: 520 feat. ┃ ┃ ┃
┗━━━━━━━━┬━━━━━━━━━━━━┛ ┗━━━━━━━━┬━━━━━━━━━━━━┛
│ │
└────────────┬───────────┘
▼
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STAGE 3: ENSEMBLE MODELING ┃
┃ ┃
┃ 📊 Consolidated Features: ~3,732 ┃
┃ ┃
┃ ┌──────────────────────┐ ┌──────────────────────┐ ┃
┃ │ LightGBM Model │ │ CatBoost Model │ ┃
┃ │ • 1000 trees │ │ • 1000 trees │ ┃
┃ │ • Depth: 7 │ │ • Depth: 7 │ ┃
┃ │ • LR: 0.05 │ │ • LR: 0.05 │ ┃
┃ │ • SMAPE: 54.78% │ │ • SMAPE: 54.52% │ ┃
┃ └──────────┬───────────┘ └──────────┬───────────┘ ┃
┃ │ │ ┃
┃ └─────────────┬───────────┘ ┃
┃ ▼ ┃
┃ Final = (LightGBM + CatBoost) / 2 ┃
┃ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
▼
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ 🎯 OUTPUT: Price Predictions ┃
┃ ✅ SMAPE: 53.99% ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
INPUT: Product Text Descriptions
▼
┌─────────────────────────────────────────────────────────┐
│ TEXT PARSING & CLEANING │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Pack Qty │ │ Weight │ │ Volume │ │
│ │ Extraction │ │ Extraction │ │ Extraction │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────┬────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ UNIT STANDARDIZATION │
│ Weight → grams │ Volume → milliliters │ Values OK │
└────────────────┬────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ FEATURE ENGINEERING │
│ ┌──────────────────────────────────────────────────┐ │
│ │ • Brand Target Encoding (smoothed) │ │
│ │ • TF-IDF Features from description (500 feat.) │ │
│ │ • Interaction Features │ │
│ │ - pack_weight_ratio │ │
│ │ - brand_avg_price │ │
│ │ • Price Normalization & Smoothing │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────┬────────────────────────────────────────┘
▼
✅ OUTPUT: 520 numerical features
┌────────────────────────────────────┐ ┌────────────────────────────────────┐
│ TEXT EMBEDDINGS │ │ IMAGE EMBEDDINGS │
│ (5 Transformer Models) │ │ (EfficientNet-B0 CNN) │
├────────────────────────────────────┤ ├────────────────────────────────────┤
│ │ │ │
│ 1️⃣ MiniLM-L6-v2 │ │ 🖼️ EfficientNet-B0 │
│ └─ 384-dim vectors │ │ └─ Pre-trained on ImageNet │
│ │ │ └─ Global Average Pooling │
│ 2️⃣ Multilingual MiniLM │ │ └─ 1280-dimensional output │
│ └─ 384-dim vectors │ │ │
│ │ │ │
│ 3️⃣ all-MiniLM-L12 │ │ │
│ └─ 384-dim vectors │ │ │
│ │ │ │
│ 4️⃣ distiluse-base │ │ │
│ └─ 384-dim vectors │ │ │
│ │ │ │
│ 5️⃣ all-MiniLM-L6 │ │ │
│ └─ 384-dim vectors │ │ │
│ │ │ │
│ Total: 5 × 384 = 1920 features │ │ Total: 1280 features │
└────────────────┬───────────────────┘ └────────────────┬───────────────────┘
│ │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────┐
│ CONCATENATED EMBEDDING SPACE │
│ 1920 + 1280 = 3200 dimensions │
└──────────────┬───────────────────────┘
Using 3,200-dimensional Concatenated Embeddings
▼
┌──────────────────────────────────────────────────────────────┐
│ K-NEAREST NEIGHBORS ANALYSIS (K=10) │
├──────────────────────────────────────────────────────────────┤
│ │
│ 📊 TEXT SPACE KNN │
│ ├─ Mean Price of 10 nearest neighbors │
│ ├─ Std Dev of neighbor prices │
│ ├─ Min price among neighbors │
│ └─ Max price among neighbors │
│ │
│ 🖼️ IMAGE SPACE KNN │
│ ├─ Mean Price of 10 nearest neighbors │
│ ├─ Std Dev of neighbor prices │
│ ├─ Min price among neighbors │
│ └─ Max price among neighbors │
│ │
│ 🎯 COMBINED SPACE KNN │
│ ├─ Mean Price of 10 nearest neighbors │
│ ├─ Std Dev of neighbor prices │
│ ├─ Min price among neighbors │
│ └─ Max price among neighbors │
│ │
│ Total KNN Features: 4 metrics × 3 spaces = 12 features │
└──────────────────┬───────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ STAGE 2 CONSOLIDATED OUTPUT │
├─────────────────────────────────────────────────┤
│ • Text Embeddings (5 models): 1920 feat. │
│ • Image Embeddings (1 model): 1280 feat. │
│ • KNN Features (3 spaces): 12 feat. │
│ ───────────────────────────────────────────── │
│ TOTAL: 3212 feat. │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CONSOLIDATED FEATURE MATRIX │
├─────────────────────────────────────────────────────────────┤
│ From STAGE 1: Numerical Features ~520 feat. │
│ From STAGE 2: Text Embeddings 1920 feat. │
│ From STAGE 2: Image Embeddings 1280 feat. │
│ From STAGE 2: KNN Features 12 feat. │
│ ───────────────────────────────────────────────────────── │
│ TOTAL: ~3,732 feat. │
└────────────────┬────────────────────────────────────────────┘
▼
┌──────────────────────────────────────┐
│ SPLIT: Train & Validation Data │
└──────────────────┬───────────────────┘
▼
┌────────────────────────────────────┐
│ GRADIENT BOOSTING MODELS │
├────────────────────────────────────┤
│ │
│ 🟦 LightGBM │
│ ├─ Trees: 1000 │
│ ├─ Max Depth: 7 │
│ ├─ Learning Rate: 0.05 │
│ ├─ Num Leaves: 31 │
│ └─ SMAPE: 54.78% │
│ │
│ 🟪 CatBoost │
│ ├─ Trees: 1000 │
│ ├─ Max Depth: 7 │
│ ├─ Learning Rate: 0.05 │
│ ├─ Handle Cat Features: Yes │
│ └─ SMAPE: 54.52% │
│ │
└────────────────┬───────────────────┘
▼
┌────────────────────────────────────┐
│ ENSEMBLE AVERAGING │
│ Final = (LightGBM + CatBoost)/2 │
└────────────────┬───────────────────┘
▼
✅ FINAL PREDICTIONS
SMAPE: 53.99% ⭐
┌──────────────────┐
│ INPUT DATA │
└────────┬─────────┘
│
┌────────┴─────────┐
▼ ▼
┌───────────────┐ ┌──────────────┐
│ TEXT DATA │ │ IMAGE DATA │
└───────┬───────┘ └──────┬───────┘
│ │
└────────┬────────┘
▼
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STAGE 1: EXTRACT FEATURES ┃
┃ Output: 520 features ┃
┗━━━━━━━━━━┬━━━━━━━━━━━━━━━━━┛
▼
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STAGE 2: EMBEDDINGS + KNN ┃
┃ Output: 3,212 features ┃
┗━━━━━━━━━━┬━━━━━━━━━━━━━━━━━┛
▼
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STAGE 3: ENSEMBLE TRAINING ┃
┃ 3,732 consolidated features │
┃ LightGBM + CatBoost average │
┗━━━━━━━━━━┬━━━━━━━━━━━━━━━━━┛
▼
┌──────────────────────────┐
│ 🎯 PRICE PREDICTIONS │
│ ✅ SMAPE: 53.99% │
└──────────────────────────┘
┌─────────────────────────────────────────────────┐
│ TOP 10 FEATURE CONTRIBUTIONS │
├─────────────────────────────────────────────────┤
│ │
│ 1. knn_mean_price_combined ████████████ 18.2%│
│ 2. brand_target_encoded ████████ 12.4% │
│ 3. knn_mean_price_text ██████ 9.8% │
│ 4. weight_grams █████ 7.3% │
│ 5. knn_std_price_combined ████ 6.5% │
│ 6. image_embedding_0 ███ 5.1% │
│ 7. pack_quantity ███ 4.9% │
│ 8. knn_mean_price_image ██ 4.2% │
│ 9. volume_ml ██ 3.8% │
│ 10. text_embedding_1_0 ██ 3.1% │
│ │
└─────────────────────────────────────────────────┘
╔═══════════════════════════════════════════════════════════╗
║ PERFORMANCE METRICS ║
╠═══════════════════════════════════════════════════════════╣
║ ║
║ 📊 LightGBM (Single Model) → 54.78% SMAPE ║
║ 📊 CatBoost (Single Model) → 54.52% SMAPE ║
║ ║
║ ✅ ENSEMBLE (Average) → 53.99% SMAPE ⭐ ║
║ ║
║ 🚀 Improvement from Ensemble: ↓ 0.53% SMAPE ║
║ ║
╚═══════════════════════════════════════════════════════════╝
# Stage 1: Extract features
python enhanced_feature_extraction.py
# Stage 2: Generate embeddings (GPU recommended)
python generate_all_embeddings.py
# Stage 3: Train and predict
python validate_and_submit_ensemble.py┌─────────────────────────────────────────────┐
│ DEEP LEARNING │
│ • PyTorch │
│ • Sentence Transformers │
│ • EfficientNet (timm) │
├─────────────────────────────────────────────┤
│ GRADIENT BOOSTING │
│ • LightGBM │
│ • CatBoost │
├─────────────────────────────────────────────┤
│ SIMILARITY SEARCH │
│ • FAISS (Facebook AI) │
├─────────────────────────────────────────────┤
│ DATA PROCESSING │
│ • Pandas, NumPy, Scikit-learn │
└─────────────────────────────────────────────┘
Our solution's strength lies in three-level similarity features:
- 🔤 Text-based neighbors → Semantically similar products
- 🖼️ Image-based neighbors → Visually similar products
- 🎯 Combined neighbors → Holistically similar products
Each level captures different pricing patterns, creating a robust feature set.
✅ Validation SMAPE: 53.99%
✅ 3,732 engineered features
✅ 5 text embedding models
✅ Multi-space KNN features
✅ Robust ensemble approach
⭐ Star this repo if you found it helpful! ⭐
Made with ❤️ for Amazon ML Challenge 2025