Skip to content

gilbertofp16/rea_ai_enrichment

Repository files navigation

🧠 REA AI Photo Enrichment

Overview

Project: rea-ai-photo-enrichment

Goal: To build a backend that classifies property photos by room type and visible attributes, then enriches them with semantic insights (e.g., style, notable features). The output enhances REA’s search, filtering, and listing relevance.


Architecture Summary

flowchart TD
  U["REA Platform / API Client"] -->|POST /api/enrich| A[FastAPI Backend]
  U -->|POST /api/batch_enrich| A
  A --> P["Pre-selection (Dedup + Quality Filter)"]
  P --> M["Vision Model (OpenCLIP + LoRA / ONNX Triton)"]
  M --> L["Light LLM Enricher (Gemini Flash / Claude Haiku)"]
  L --> C[(LRU Cache)]
  L -->|Structured JSON| A
  subgraph Observability
    ML["MLflow (training)"]
    LF["Langfuse (LLM traces)"]
  end
  A -->|Response| U
Loading

Core Features

Layer Description
Prefilter Removes duplicate / low-quality images before inference.
Weak Labelling A captioning model (BLIP) generates open-vocabulary descriptions for images.
LLM Refinement A lightweight LLM (Gemini/Claude) filters captions and extracts structured JSON.
Vision Model OpenCLIP fine-tuned with LoRA adapters on the generated captions.
Enrichment (LLM) Adds style, design keywords, and notable features.
Caching In-memory cache stores results by a hash of the inputs (TTL 24 h).
Observability MLflow logs model metrics; Langfuse logs LLM traces.
Deployment Dockerized (FastAPI + Triton) for Cloud Run / GKE.

API Endpoints

POST /api/enrich

Analyzes and enriches a single photo.

Input:

{
  "image_url": "https://example.com/photo.jpg",
  "listing_id": "REA123",
  "include_semantics": true
}

Output:

{
  "listing_id": "REA123",
  "room_type": {"label": "kitchen", "confidence": 0.91},
  "attributes": [{"label": "oven", "confidence": 0.89}],
  "semantics": {
    "style": "modern",
    "notable_features": ["open layout", "stainless appliances"],
    "confidence": 0.92
  },
  "model_version": "openclip_v1_lora_v1",
  "prompt_version": "enrich_v1_0_0"
}

POST /api/batch_enrich

Asynchronous endpoint for multiple images.

POST /api/files/analyze

Uploads one or more files directly.

POST /api/feedback (Optional)

Collects human corrections for continuous improvement.


Quick Start

1. Clone & Setup

git clone https://github.com/rea-group/rea-ai-photo-enrichment.git
cd rea-ai-photo-enrichment
poetry install

2. Run Locally (Docker)

docker compose up --build
  • FastAPI: http://localhost:8080
  • Triton Server: http://localhost:8000

3. Deploy

gcloud run deploy rea-photo-enrichment-api \
  --region australia-southeast1 \
  --allow-unauthenticated \
  --set-env-vars SECRET_KEY=changeme,MODEL_PROVIDER=openclip_lora

Business Impact & Observability

KPI Definition Target Gain
Search Conversion Rate % of searches leading to enquiry or click +3 pp
Avg Search Time to Match Median seconds to find relevant property −20 %
Filter Usage Uplift Users applying new visual filters +30 %
Manual Tagging Reduction Human labeling workload −80 %
Latency Budget End-to-end p95 latency ≤ 700 ms

Monitored Metrics:

  • Technical: Top-1/Top-3 accuracy, Attribute F1, Latency p95, Cache hit ratio, Error rate.
  • LLM (via Langfuse): LLM usage, Cost.

Development Process

This project followed a structured plan outlined in DEVELOPMENT_PLAN.md. The plan was executed by the project owner, with assistance from Cline and Gemini Pro, and breaks down the work into clear, sequential phases, from initial setup to final deployment and observability.

The development process adhered to a set of defined programming rules, with a strong emphasis on the principles from Robert C. Martin's "Clean Code". This phased and disciplined approach ensured that each component was built and tested systematically, with defined goals and deliverables for each stage. By using a detailed development plan, we were able to manage complexity, track progress against clear success criteria, and ensure all requirements were met in a structured manner.


Development Environment

  • Python: 3.11
  • ML: PyTorch + PEFT (LoRA), OpenCLIP
  • LLM: Langchain (supporting Gemini Flash, Claude Haiku, and others)
  • Inference: ONNXRuntime / Triton GPU
  • Authentication: Bearer token (SECRET_KEY)
  • Cache: LRU Cache
  • Monitoring: MLflow + Langfuse
  • Testing: pytest + pytest-asyncio (≥ 90 %)
  • Code Quality: Ruff + Black + Mypy (120 chars)
  • Pre-commit: Hooks for formatting, linting, and type-checking on every commit.

Directory Layout

.
├── ai_enrichment/
│   ├── app/
│   ├── config/
│   ├── enrichment/
│   ├── prefilter/
│   ├── prompts/
│   ├── routers/
│   ├── schemas/
│   ├── utils/
│   └── vision/
├── data/
├── mlruns/
├── models/
├── property_photo_1k_sample/
├── reports/
└── tests/

Data and Labelling Workflow

This project uses an open-vocabulary labelling pipeline to generate rich, semantic labels for training.

  1. Weak Label Generation: The ai_enrichment/vision/run_open_labelling.py script uses a vision-language model to generate captions for images.
  2. LLM Refinement: A lightweight LLM filters these captions and outputs a structured JSON object containing room_type, attributes, and style.
  3. Cleaning and Balancing: The ai_enrichment/vision/cleaning_labels.py script normalizes synonymous labels, removes rare or invalid ones, and oversamples minority classes to create a balanced dataset for training.

Testing

Run all tests:

poetry run pytest

Coverage Target: ≥ 90%


Future Work

  • Cloud Data Storage: Migrate datasets to a cloud storage solution (S3/GCS).
  • Improved Training Data: Source higher-quality, human-verified labels.
  • Hyperparameter Tuning: Conduct a more extensive search for optimal LoRA and training settings.
  • Cache Upgrade to Redis: Replace the in-memory cache with a distributed Redis cache.
  • Observability Enhancements: Integrate Prometheus/Grafana and automate evaluation reports.
  • Human Feedback Loop: Implement a /api/feedback endpoint to collect user corrections for continuous improvement.

Trade-offs and Limitations

This prototype uses weak zero-shot pseudo-labels with a LoRA fine-tuning adapter. Given the limited and imbalanced data, high precision was prioritized over recall.

Current limitations:

  • Noisy pseudo-labels from zero-shot CLIP
  • Class imbalance and a limited dataset size (< 5k images)
  • No human validation set

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published