Skip to content

Feature Request: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" #44

@flozi00

Description

@flozi00

I propose extending the Open Terminal Docker image and API to support a virtual desktop environment:
X-Server & Desktop Environment: Include a lightweight DE (like XFCE) and Xvfb within the Docker container
noVNC Integration: Integrate a web-based VNC client (noVNC) so that users can view the agent's desktop interaction directly in the browser.
Enhanced API for Screenshots/Input: Extend the REST API to allow agents to capture screenshots of the virtual display and send input events (mouse clicks, keystrokes).
Persistent Browser Sessions: Pre-install tools like Playwright or Selenium in "headful" mode to allow agents to manage complex web sessions (e.g., Microsoft 365) where session persistence is required to bypass frequent MFA prompts.

How would this improve the project?
Enables "Computer Use" Capabilities: Aligns Open Terminal with modern agentic trends (like Anthropic's Computer Use) where the agent interacts with a full OS UI.
Visual Debugging for Users: Users can see exactly what the agent is doing in real-time via the noVNC interface.
Bypassing API Restrictions: Allows agents to work with services where no API token can be generated, such as corporate Outlook accounts with disabled App Passwords.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions