Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100
Draft
volkermauel wants to merge 4 commits intoopen-webui:mainfrom
Draft
Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100volkermauel wants to merge 4 commits intoopen-webui:mainfrom
volkermauel wants to merge 4 commits intoopen-webui:mainfrom
Conversation
…use (open-webui#44) Introduce an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind OPEN_TERMINAL_ENABLE_DESKTOP (default false) so existing deployments are unaffected. When enabled, exposes: GET /desktop – status (running, screen size, ports) POST /desktop/start – start Xvfb, VNC, noVNC, window manager POST /desktop/stop – tear down all desktop processes POST /desktop/screenshot – capture PNG (base64 JSON or raw binary) POST /desktop/click – mouse click at (x, y) POST /desktop/mouse_move – move cursor POST /desktop/drag – mouse drag operation POST /desktop/type – type text into focused window POST /desktop/key – press key / key combo (e.g. ctrl+c) POST /desktop/scroll – scroll at position All endpoints are behind the existing API key auth. Desktop info is added to the system prompt and /api/config when enabled so agents can discover the capability. Includes integration test script (test_desktop.sh, 27 test cases) that verifies the full lifecycle against a running container.
volkermauel
added a commit
to volkermauel/open-webui
that referenced
this pull request
Mar 31, 2026
- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param - Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls - API helpers for desktop status, start, stop - Unit tests for backend router and frontend API helpers Ref: open-webui/open-terminal#100
volkermauel
added a commit
to volkermauel/open-webui
that referenced
this pull request
Mar 31, 2026
- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param - Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls - API helpers for desktop status, start, stop - Unit tests for backend router and frontend API helpers Ref: open-webui/open-terminal#100
12 tasks
|
In your testing did you notice how much memory footprint changed with these enabled? Just curious. |
Author
|
i did not to be honest, sorry |
Author
|
the grounding for clicking seems to be off, currently investigating with grid overlays and instructing the model as part of the result of desktop_screenshot. converting to draft as of now, until i feel this is more mature. |
…mputer use desktop - Add desktop_windows and desktop_window_focus endpoints for listing and focusing windows on the virtual desktop - Add persistent cursor overlay (Xlib) visible in VNC after xdotool moves, hides during clicks to avoid blocking input - Auto-start desktop on first tool call, hide non-essential tools from harness model when grounding model is configured - Remove screenshots from action tool results (status-only returns) - Add dmz-cursor-theme and x11-xserver-utils to Dockerfile - Remove hardcoded screen dimensions from tool descriptions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements #44 — Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.
Adds an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind
OPEN_TERMINAL_ENABLE_DESKTOP(defaultfalse) so existing deployments are unaffected.Changes
open_terminal/utils/desktop.py(new) —DesktopManagerclass managing Xvfb, x11vnc, noVNC, and openbox lifecycle, with screenshot (scrot), mouse (xdotool), and keyboard input methodsopen_terminal/env.py— Added 5 config vars:ENABLE_DESKTOP,DESKTOP_DISPLAY,DESKTOP_SCREEN_SIZE,DESKTOP_VNC_PORT,DESKTOP_NOVNC_PORTopen_terminal/main.py— 10 new API endpoints, desktop info in system prompt and/api/configDockerfile— Added xvfb, x11vnc, novnc, openbox, xdotool, scrot, chromium, fonts; exposed port 6080entrypoint.sh— DISPLAY setup and stale X11 lock file cleanup when desktop is enabledtest_desktop.sh(new) — 27 integration tests covering full desktop lifecycleAPI Endpoints
/desktop/desktop/start/desktop/stop/desktop/screenshot/desktop/click/desktop/mouse_move/desktop/drag/desktop/type/desktop/keyctrl+c)/desktop/scrollUsage
http://localhost:8000http://localhost:6080/vnc.htmlTest Results
All 27 integration tests pass against a locally built container.