Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities. by volkermauel · Pull Request #100 · open-webui/open-terminal

volkermauel · 2026-03-31T14:29:53Z

Summary

Implements #44 — Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.

Adds an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind OPEN_TERMINAL_ENABLE_DESKTOP (default false) so existing deployments are unaffected.

Changes

open_terminal/utils/desktop.py (new) — DesktopManager class managing Xvfb, x11vnc, noVNC, and openbox lifecycle, with screenshot (scrot), mouse (xdotool), and keyboard input methods
open_terminal/env.py — Added 5 config vars: ENABLE_DESKTOP, DESKTOP_DISPLAY, DESKTOP_SCREEN_SIZE, DESKTOP_VNC_PORT, DESKTOP_NOVNC_PORT
open_terminal/main.py — 10 new API endpoints, desktop info in system prompt and /api/config
Dockerfile — Added xvfb, x11vnc, novnc, openbox, xdotool, scrot, chromium, fonts; exposed port 6080
entrypoint.sh — DISPLAY setup and stale X11 lock file cleanup when desktop is enabled
test_desktop.sh (new) — 27 integration tests covering full desktop lifecycle

API Endpoints

Method	Path	Description
GET	`/desktop`	Desktop status (running, screen size, ports)
POST	`/desktop/start`	Start virtual desktop
POST	`/desktop/stop`	Stop virtual desktop
POST	`/desktop/screenshot`	Capture PNG screenshot (base64 JSON or raw binary)
POST	`/desktop/click`	Mouse click at (x, y)
POST	`/desktop/mouse_move`	Move cursor to (x, y)
POST	`/desktop/drag`	Mouse drag operation
POST	`/desktop/type`	Type text (with human-like randomized delay by default)
POST	`/desktop/key`	Press key / key combo (e.g. `ctrl+c`)
POST	`/desktop/scroll`	Scroll at position

Usage

docker run -p 8000:8000 -p 6080:6080 \
  -e OPEN_TERMINAL_API_KEY=your-key \
  -e OPEN_TERMINAL_ENABLE_DESKTOP=true \
  open-terminal

API: http://localhost:8000
noVNC web client: http://localhost:6080/vnc.html

Test Results

All 27 integration tests pass against a locally built container.

…use (open-webui#44) Introduce an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind OPEN_TERMINAL_ENABLE_DESKTOP (default false) so existing deployments are unaffected. When enabled, exposes: GET /desktop – status (running, screen size, ports) POST /desktop/start – start Xvfb, VNC, noVNC, window manager POST /desktop/stop – tear down all desktop processes POST /desktop/screenshot – capture PNG (base64 JSON or raw binary) POST /desktop/click – mouse click at (x, y) POST /desktop/mouse_move – move cursor POST /desktop/drag – mouse drag operation POST /desktop/type – type text into focused window POST /desktop/key – press key / key combo (e.g. ctrl+c) POST /desktop/scroll – scroll at position All endpoints are behind the existing API key auth. Desktop info is added to the system prompt and /api/config when enabled so agents can discover the capability. Includes integration test script (test_desktop.sh, 27 test cases) that verifies the full lifecycle against a running container.

…ype endpoint

- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param - Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls - API helpers for desktop status, start, stop - Unit tests for backend router and frontend API helpers Ref: open-webui/open-terminal#100

Arokha · 2026-03-31T22:00:36Z

In your testing did you notice how much memory footprint changed with these enabled? Just curious.

volkermauel · 2026-03-31T22:05:50Z

i did not to be honest, sorry

volkermauel · 2026-04-01T07:13:17Z

the grounding for clicking seems to be off, currently investigating with grid overlays and instructing the model as part of the result of desktop_screenshot.

converting to draft as of now, until i feel this is more mature.

…mputer use desktop - Add desktop_windows and desktop_window_focus endpoints for listing and focusing windows on the virtual desktop - Add persistent cursor overlay (Xlib) visible in VNC after xdotool moves, hides during clicks to avoid blocking input - Auto-start desktop on first tool call, hide non-essential tools from harness model when grounding model is configured - Remove screenshots from action tool results (status-only returns) - Add dmz-cursor-theme and x11-xserver-utils to Dockerfile - Remove hardcoded screen dimensions from tool descriptions

volkermauel added 2 commits March 31, 2026 16:18

feat: add human-like randomized delay between keystrokes in desktop t…

eee0abd

…ype endpoint

fixed chrome not launching

b8d74ca

volkermauel mentioned this pull request Mar 31, 2026

feat: adding open-terminal virtual desktop open-webui/open-webui#23274

Draft

12 tasks

volkermauel marked this pull request as draft April 1, 2026 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100

Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100
volkermauel wants to merge 4 commits intoopen-webui:mainfrom
volkermauel:desktop

volkermauel commented Mar 31, 2026

Uh oh!

Arokha commented Mar 31, 2026

Uh oh!

volkermauel commented Mar 31, 2026

Uh oh!

volkermauel commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

volkermauel commented Mar 31, 2026

Summary

Changes

API Endpoints

Usage

Test Results

Uh oh!

Arokha commented Mar 31, 2026

Uh oh!

volkermauel commented Mar 31, 2026

Uh oh!

volkermauel commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants