Skip to content

Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100

Draft
volkermauel wants to merge 4 commits intoopen-webui:mainfrom
volkermauel:desktop
Draft

Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100
volkermauel wants to merge 4 commits intoopen-webui:mainfrom
volkermauel:desktop

Conversation

@volkermauel
Copy link
Copy Markdown

Summary

Implements #44 — Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.

Adds an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind OPEN_TERMINAL_ENABLE_DESKTOP (default false) so existing deployments are unaffected.

Changes

  • open_terminal/utils/desktop.py (new) — DesktopManager class managing Xvfb, x11vnc, noVNC, and openbox lifecycle, with screenshot (scrot), mouse (xdotool), and keyboard input methods
  • open_terminal/env.py — Added 5 config vars: ENABLE_DESKTOP, DESKTOP_DISPLAY, DESKTOP_SCREEN_SIZE, DESKTOP_VNC_PORT, DESKTOP_NOVNC_PORT
  • open_terminal/main.py — 10 new API endpoints, desktop info in system prompt and /api/config
  • Dockerfile — Added xvfb, x11vnc, novnc, openbox, xdotool, scrot, chromium, fonts; exposed port 6080
  • entrypoint.sh — DISPLAY setup and stale X11 lock file cleanup when desktop is enabled
  • test_desktop.sh (new) — 27 integration tests covering full desktop lifecycle

API Endpoints

Method Path Description
GET /desktop Desktop status (running, screen size, ports)
POST /desktop/start Start virtual desktop
POST /desktop/stop Stop virtual desktop
POST /desktop/screenshot Capture PNG screenshot (base64 JSON or raw binary)
POST /desktop/click Mouse click at (x, y)
POST /desktop/mouse_move Move cursor to (x, y)
POST /desktop/drag Mouse drag operation
POST /desktop/type Type text (with human-like randomized delay by default)
POST /desktop/key Press key / key combo (e.g. ctrl+c)
POST /desktop/scroll Scroll at position

Usage

docker run -p 8000:8000 -p 6080:6080 \
  -e OPEN_TERMINAL_API_KEY=your-key \
  -e OPEN_TERMINAL_ENABLE_DESKTOP=true \
  open-terminal
  • API: http://localhost:8000
  • noVNC web client: http://localhost:6080/vnc.html

Test Results

All 27 integration tests pass against a locally built container.

…use (open-webui#44)

Introduce an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox)
that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs.

Feature-gated behind OPEN_TERMINAL_ENABLE_DESKTOP (default false) so existing
deployments are unaffected. When enabled, exposes:

  GET  /desktop          – status (running, screen size, ports)
  POST /desktop/start    – start Xvfb, VNC, noVNC, window manager
  POST /desktop/stop     – tear down all desktop processes
  POST /desktop/screenshot – capture PNG (base64 JSON or raw binary)
  POST /desktop/click    – mouse click at (x, y)
  POST /desktop/mouse_move – move cursor
  POST /desktop/drag     – mouse drag operation
  POST /desktop/type     – type text into focused window
  POST /desktop/key      – press key / key combo (e.g. ctrl+c)
  POST /desktop/scroll   – scroll at position

All endpoints are behind the existing API key auth. Desktop info is added to the
system prompt and /api/config when enabled so agents can discover the capability.

Includes integration test script (test_desktop.sh, 27 test cases) that verifies
the full lifecycle against a running container.
volkermauel added a commit to volkermauel/open-webui that referenced this pull request Mar 31, 2026
- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param
- Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls
- API helpers for desktop status, start, stop
- Unit tests for backend router and frontend API helpers

Ref: open-webui/open-terminal#100
volkermauel added a commit to volkermauel/open-webui that referenced this pull request Mar 31, 2026
- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param
- Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls
- API helpers for desktop status, start, stop
- Unit tests for backend router and frontend API helpers

Ref: open-webui/open-terminal#100
@Arokha
Copy link
Copy Markdown

Arokha commented Mar 31, 2026

In your testing did you notice how much memory footprint changed with these enabled? Just curious.

@volkermauel
Copy link
Copy Markdown
Author

i did not to be honest, sorry

@volkermauel
Copy link
Copy Markdown
Author

the grounding for clicking seems to be off, currently investigating with grid overlays and instructing the model as part of the result of desktop_screenshot.

converting to draft as of now, until i feel this is more mature.

@volkermauel volkermauel marked this pull request as draft April 1, 2026 07:13
…mputer use desktop

- Add desktop_windows and desktop_window_focus endpoints for listing and
  focusing windows on the virtual desktop
- Add persistent cursor overlay (Xlib) visible in VNC after xdotool moves,
  hides during clicks to avoid blocking input
- Auto-start desktop on first tool call, hide non-essential tools from
  harness model when grounding model is configured
- Remove screenshots from action tool results (status-only returns)
- Add dmz-cursor-theme and x11-xserver-utils to Dockerfile
- Remove hardcoded screen dimensions from tool descriptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants