Skip to content

Commit 089a141

Browse files
Enable visual prompts for YOLOE in the custom frontend demo (#761)
* feat: add YOLOE with visual prompts * feat: add precision option for YOLOE models * fix: default model and precision args * style: formatting * docs: improve model and precision description * small typo fix * unrelated chore: corrected ordering * feat: add encoding and high-res visualization * refactor: model arg renaming * docs: fix model arg name in readme * docs: add argument changing instructions * chore: upgrade depthai and depthai-nodes versions * feat: add bbox drawing functionality * fix: remove device argument * feat: preserve background in overlay node * feat: add yolo-world fp16 version * feat: add notification toasts * chore: update depthai-nodes version * feat: improve toast notifications * feat: make YOLOE fp16 default * feat: add unified YOLOE model with both text and image prompts * style: formatting --------- Co-authored-by: klemen1999 <[email protected]>
1 parent 0c7f284 commit 089a141

File tree

22 files changed

+924
-153
lines changed

22 files changed

+924
-153
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,6 @@ repos:
1616
rev: 0.7.10
1717
hooks:
1818
- id: mdformat
19+
args: [--number]
1920
additional_dependencies:
2021
- mdformat-gfm==0.3.6
Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Dynamic YOLO World/YOLOE
22

3-
This example demonstrates an advanced use of a custom frontend. On the DepthAI backend, it runs either the **YOLO-World** (default) or **YOLOE** model on-device, with configurable class labels and confidence threshold — both controllable via the frontend.
3+
This example demonstrates an advanced use of a custom frontend. On the DepthAI backend, it runs either **YOLOE** (default) or **YOLO-World** on-device, with configurable class labels and confidence threshold — both controllable via the frontend.
44
The frontend, built using the `@luxonis/depthai-viewer-common` package, displays a real-time video stream with detections. It is combined with the [default oakapp docker image](https://hub.docker.com/r/luxonis/oakapp-base), which enables remote access via WebRTC.
55

66
> **Note:** This example works only on RVC4 in standalone mode.
@@ -16,30 +16,34 @@ Running this example requires a **Luxonis device** connected to your computer. R
1616
Here is a list of all available parameters:
1717

1818
```
19-
-d DEVICE, --device DEVICE
20-
Optional name, DeviceID or IP of the camera to connect to. (default: None)
21-
-fps FPS_LIMIT, --fps-limit FPS_LIMIT
22-
FPS limit. (default: None)
23-
-ip IP, --ip IP IP address to serve the frontend on. (default: None)
24-
-p PORT, --port PORT Port to serve the frontend on. (default: None)
25-
-n MODEL_NAME, --model-name MODEL_NAME
26-
Name of the model to use: yolo-world or yoloe (default: yolo-world)
19+
-fps FPS_LIMIT, --fps_limit FPS_LIMIT
20+
FPS limit. (default: None)
21+
-ip IP, --ip IP IP address to serve the frontend on. (default: None)
22+
-p PORT, --port PORT Port to serve the frontend on. (default: None)
23+
-m MODEL, --model MODEL
24+
Name of the model to use: yolo-world or yoloe (default: yoloe)
25+
--precision PRECISION
26+
Model precision for YOLOE models: int8 (faster) or fp16 (more accurate) (default: fp16)
2727
```
2828

2929
### Model Options
3030

31-
This example supports two different YOLO models:
31+
This example supports two YOLO models:
3232

33-
- **YOLO-World** (default): An open-vocabulary object detection model that supports both text-based class definitions and image-based prompting (upload an image to detect similar objects)
34-
- **YOLOE**: A fast and efficient object detection model with enhanced visualization features including instance segmentation
33+
- **YOLOE** (default): Supports both text prompts and image prompts (visual prompts). The model outputs 160 classes in total: indices 0–79 correspond to text prompts, and indices 80–159 correspond to image prompts. When only one prompt type is provided, dummy inputs are sent for the other and ignored by the model.
34+
- **YOLO-World**: Open-vocabulary detection with text prompts and optional image prompting (CLIP visual encoder).
35+
36+
Notes:
37+
38+
- Backend function `extract_image_prompt_embeddings(image, max_num_classes=80, model_name, mask_prompt=None)` accepts an optional `mask_prompt` of shape `(80,80)` or `(1,1,80,80)` for `yoloe`. When `None`, a default central mask is used.
3539

3640
### Prerequisites
3741

3842
Before running the example you’ll need to first build the frontend. Follow these steps:
3943

4044
1. Install FE dependencies: `cd frontend/ && npm i`
41-
1. Build the FE: `npm run build`
42-
1. Move back to origin directory: `cd ..`
45+
2. Build the FE: `npm run build`
46+
3. Move back to origin directory: `cd ..`
4347

4448
## Standalone Mode (RVC4 only)
4549

@@ -55,9 +59,13 @@ oakctl app run .
5559

5660
Once the app is built and running you can access the DepthAI Viewer locally by opening `https://<OAK4_IP>:9000/` in your browser (the exact URL will be shown in the terminal output).
5761

58-
This will run the example with default argument values (YOLO-World model). If you want to change these values you need to edit the `oakapp.toml` file (refer [here](https://docs.luxonis.com/software-v3/oak-apps/configuration/) for more information about this configuration file).
62+
This will run the example with default argument values (YOLOE model). If you want to change these values you need to edit the `backend-run.sh` file to pass the arguments to the backend. Example:
63+
64+
```bash
65+
python3.12 /app/backend/src/main.py --model yoloe --precision fp16 --fps_limit 10
66+
```
5967

6068
### Remote access
6169

6270
1. You can upload oakapp to Luxonis Hub via oakctl
63-
1. And then you can just remotly open App UI via App detail
71+
2. And then you can just remotely open App UI via App detail
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
model: yolo-world-l:640x640-host-decoding-fp16
2+
platform: RVC4

custom-frontend/dynamic-yolo-world/backend/src/depthai_models/yoloe_v8_l.RVC4.yaml

Lines changed: 0 additions & 2 deletions
This file was deleted.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
model: yoloe-v8-l:640x640-text-visual-prompt:e23da0f
2+
platform: RVC4

0 commit comments

Comments
 (0)