Skip to content

Model-agnostic cultural bias agent with VLM evaluation, automated correction, and self-improving data collection pipeline

License

Notifications You must be signed in to change notification settings

cmubig/ccub2-agent

Repository files navigation

CCUB2 Agent

Model-Agnostic Cultural Bias Mitigation System

Automatically detect and correct cultural biases in generative image models using VLM-based evaluation, RAG-enhanced cultural knowledge, and automatic model-specific prompt optimization.

🎯 Key Innovation

One Universal Instruction β†’ 6+ Model-Optimized Prompts

Our system automatically adapts editing instructions to each model's optimal format:

  • FLUX Kontext: Context-preserving instructions
  • Qwen Image Edit: Detailed, specific requirements
  • Stable Diffusion 3.5: Structured with quality tags
  • HiDream, NextStep, Custom models...

Result: Best performance from every model, zero manual tuning.

πŸš€ Quick Start

Setup

# 1. Clone repository
git clone https://github.com/cmubig/ccub2-agent.git
cd ccub2-agent

# 2. Install dependencies
pip install -r requirements.txt

# 3. Firebase Setup (Required)
# Contact [email protected] for credentials:
#   - firebase-service-account.json
#   - .firebase_config.json
# Save both files to project root

# 4. Test Firebase connection
python scripts/05_utils/test_firebase_connection.py

# 5. Initialize dataset (first-time: ~2-5 hours)
python scripts/01_setup/init_dataset.py --country korea

# 6. Run interactive workflow
python scripts/04_testing/test_model_agnostic_editing.py

The interactive CLI guides you through:

  • T2I model selection (SDXL, FLUX)
  • I2I model selection (Qwen, SDXL, FLUX, or test all)
  • Country & category selection
  • Prompt input
  • Automatic cultural evaluation & refinement

Command-Line Mode

# Direct execution with parameters
python scripts/04_testing/test_model_agnostic_editing.py \
  --prompt "A Korean woman in traditional hanbok" \
  --model qwen \
  --t2i-model sdxl \
  --country korea \
  --category traditional_clothing

πŸ“‚ Project Structure

ccub2-agent/                    # Code repository
β”œβ”€β”€ scripts/                    # Organized workflow scripts
β”‚   β”œβ”€β”€ 01_setup/               # Initial setup
β”‚   β”œβ”€β”€ 02_data_processing/     # Data enhancement
β”‚   β”œβ”€β”€ 03_indexing/            # Build indices
β”‚   β”œβ”€β”€ 04_testing/             # Main testing interface
β”‚   └── 05_utils/               # Utilities
β”œβ”€β”€ ccub2_agent/                # Core Python library
β”‚   β”œβ”€β”€ modules/                # VLM, CLIP, RAG, prompt adapter
β”‚   β”œβ”€β”€ models/                 # Universal I2I interface
β”‚   β”œβ”€β”€ pipelines/              # Iterative editing
β”‚   └── adapters/               # Image editing adapters
β”œβ”€β”€ metric/                     # Cultural metric evaluation
β”œβ”€β”€ docs/                       # Documentation
└── data/                       # Contributions CSV

~/ccub2-agent-data/             # Generated data (not in repo)
β”œβ”€β”€ country_packs/korea/
β”‚   β”œβ”€β”€ approved_dataset_enhanced.json    # VLM-enhanced
β”‚   └── images/                           # 338 images
β”œβ”€β”€ cultural_knowledge/         # Extracted knowledge
β”œβ”€β”€ cultural_index/korea/       # RAG text index
└── clip_index/korea/           # CLIP image index

🎯 Current Status

  • βœ… Firebase Direct Integration - Real-time data access from Firestore
  • βœ… GPT-OSS-20B - Upgraded question model for better cultural evaluation (20B params)
  • βœ… Qwen3-VL-8B - Vision-Language Model for image analysis
  • βœ… Self-Improving System - Automatic gap detection β†’ job creation β†’ retraining
  • βœ… Model-Agnostic I2I - Universal interface for 6+ image editing models
  • βœ… 575+ Cultural Images - VLM-enhanced captions with cultural knowledge

πŸ“ Key Scripts

Script Purpose Usage
test_firebase_connection.py Test Firebase connectivity and data access No arguments needed
init_dataset.py Initialize dataset from Firebase (auto-detects new data) --country korea
test_model_agnostic_editing.py Interactive T2I→I2I pipeline with cultural evaluation --prompt "text" --model qwen
extract_cultural_knowledge.py Extract structured knowledge from verified images --max-images 5 --load-in-4bit
test_vlm_detector.py Test VLM cultural bias detection --image-path <path>
build_clip_image_index.py Build CLIP FAISS index for reference images --data-dir <path>

πŸ’‘ How It Works

The Problem

Generative AI models often produce culturally inaccurate images due to:

  • Limited cultural knowledge in training data
  • Bias towards Western/dominant culture representations
  • Lack of visual details about authentic cultural elements

Our Solution: Self-Improving Cultural Agent

1. Firebase-Powered Knowledge Base

  • Direct integration with Firestore (575+ verified cultural images)
  • Real-time data updates from crowd-sourced contributions
  • Automatic detection of data gaps

2. Dual-Model Evaluation

  • GPT-OSS-20B (20B params): Generates detailed cultural verification questions
  • Qwen3-VL-8B: Analyzes images and answers questions about cultural accuracy

3. RAG-Enhanced Context

  • CLIP-based reference image retrieval
  • VLM-extracted cultural knowledge from verified images
  • Text + visual guidance for precise evaluation

4. Model-Agnostic Image Editing

  • Universal prompt adapter for 6+ I2I models
  • Automatic optimization for each model's format
  • Iterative refinement based on VLM feedback

5. Self-Improving Loop

User generates β†’ VLM detects gap ("Not enough jeogori collar data")
β†’ System creates Firebase job β†’ Users upload authentic images
β†’ RAG auto-updates (89% faster!) β†’ Accuracy improves (15% β†’ 95%)

Impact

  • Cultural Accuracy: 30-40% β†’ 70-90%+ with visual knowledge
  • Model Coverage: Works with FLUX, SD3.5, Qwen, HiDream, etc.
  • Update Speed: 89% faster with incremental FAISS updates
  • Continuous Learning: Gets smarter with each use

πŸ”§ Requirements

Hardware

  • GPU: 8GB+ VRAM (4-bit quantization) or 24GB+ for full precision
  • Storage: ~50GB for models + data
  • RAM: 16GB+ recommended

Software

  • Python 3.10+
  • PyTorch 2.0+
  • CUDA 11.8+ (for GPU acceleration)

Models Used

  • GPT-OSS-20B - Question generation (16GB VRAM)
  • Qwen3-VL-8B-Instruct - Image evaluation (8GB VRAM)
  • CLIP - Image similarity search
  • FLUX/SD3.5/Qwen - Image generation & editing

Installation

pip install -r requirements.txt

Note: Firebase credentials required for data access (contact: [email protected])

πŸ“Š Data Pipeline

1. Firebase Firestore (575+ contributions)
   ↓
2. init_dataset.py - Auto-detects new data (incremental update)
   ↓
3. VLM Caption Enhancement (Qwen3-VL)
   ↓
4. Cultural Knowledge Extraction (GPT-OSS-20B + Qwen3-VL)
   ↓
5. FAISS Index Building (Text RAG + CLIP Image Index)
   ↓
6. Ready for Cultural Evaluation!
   ↓
7. VLM Evaluation β†’ Gap Detection β†’ Job Creation β†’ Loop back to step 1

Key Feature: Incremental updates only process new data (89% time savings!)

πŸ“š Documentation

πŸ’Ύ Data Paths

All data paths can be configured via command-line arguments. Default structure:

Data Default Path
Images data/country_packs/korea/images/
Enhanced captions data/country_packs/korea/approved_dataset_enhanced.json
Output knowledge data/cultural_knowledge/korea_knowledge.json
RAG index data/cultural_index/korea/

Note: Large data files are not included in the repository. Download separately or use your own dataset.

πŸ› Troubleshooting

Firebase Issues

Firebase connection failed?

# Test Firebase connectivity
python scripts/05_utils/test_firebase_connection.py

# System automatically falls back to CSV if Firebase unavailable

Need Firebase credentials?

  • Contact: [email protected]
  • You'll receive: firebase-service-account.json and .firebase_config.json
  • Place both files in project root directory

GPU/Memory Issues

Out of GPU memory?

python scripts/extract_cultural_knowledge.py --load-in-4bit

Resume from checkpoint?

python scripts/extract_cultural_knowledge.py --resume

Test before full run?

python scripts/extract_cultural_knowledge.py --max-images 5

πŸ“„ License

MIT License - See LICENSE file for details.

πŸ“ Citation

If you use CCUB2-Agent in your research, please cite our paper:

@misc{seo2025exposingblindspotsculturalbias,
      title={Exposing Blindspots: Cultural Bias Evaluation in Generative Image Models},
      author={Huichan Seo and Sieun Choi and Minki Hong and Yi Zhou and Junseo Kim and Lukman Ismaila and Naome Etori and Mehul Agarwal and Zhixuan Liu and Jihie Kim and Jean Oh},
      year={2025},
      eprint={2510.20042},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.20042},
}

Paper: Exposing Blindspots: Cultural Bias Evaluation in Generative Image Models

πŸ“§ Contact

For Firebase credentials or questions about the project:

πŸ”— Related Projects

  • WorldCCUB App - Crowdsourcing platform for cultural data collection

About

Model-agnostic cultural bias agent with VLM evaluation, automated correction, and self-improving data collection pipeline

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published