🚗 Real-Time LLM-based Visual Grounding in 3D Driving Scenes

This project integrates object detection, 3D reconstruction, and GPT-4-based scene reasoning to allow natural language interaction with 3D driving environments. Built as an expert-level AI Engineering portfolio project.

🧠 Features

3D Scene Reconstruction from RGB-D (TUM Dataset)
Object Detection using Detectron2
Scene Graph Creation for spatial reasoning
Natural Language Question Answering via GPT-4
Real-Time 3D Visualization with Panda3D
Highlight Objects based on user queries like:
- "Which cars are behind the truck?"
- "Where is the pedestrian near the stop sign?"

🖼️ Pipeline Diagram

📂 Project Structure

VisualGroundingAutonomy/
├── data/                 # RGB-D images and depth maps
├── reconstruction/       # Backproject RGB-D → 3D point cloud
├── scene_graph/         # Scene graph builder
├── grounding/           # LLM interface (GPT or CLIP)
├── utils/               # Mapping objects to points
├── visualizer/          # Panda3D real-time viewer
├── main.py              # Full pipeline
├── requirements.txt
└── README.md

🧪 Example Query

User: "Which objects are behind car_0?"
🧠 GPT-4: "Pedestrian_2 and car_3 are behind car_0 based on spatial relationships."
✅ Viewer: Highlights those objects in red

📦 Output Samples

🔴 scene.ply — Reconstructed scene
📄 scene_graph.json — Full graph with relations
🎥 demo.gif — Panda3D video output (recorded)

💻 Technologies

Python · PyTorch · Detectron2 · Panda3D · GPT-4 · NumPy · Trimesh · LangChain

🧾 License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚗 Real-Time LLM-based Visual Grounding in 3D Driving Scenes

🧠 Features

🖼️ Pipeline Diagram

📂 Project Structure

🧪 Example Query

📦 Output Samples

💻 Technologies

🧾 License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/rgbd_dataset_freiburg1_desk		data/rgbd_dataset_freiburg1_desk
grounding		grounding
reconstruction		reconstruction
scene_graph		scene_graph
utils		utils
visualizer		visualizer
README.md		README.md
main.py		main.py
pipeline.png		pipeline.png
requirements.txt		requirements.txt

youcefgheffari3/VisualGroundingAutonomy

Folders and files

Latest commit

History

Repository files navigation

🚗 Real-Time LLM-based Visual Grounding in 3D Driving Scenes

🧠 Features

🖼️ Pipeline Diagram

📂 Project Structure

🧪 Example Query

📦 Output Samples

💻 Technologies

🧾 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages