I propose extending the Open Terminal Docker image and API to support a virtual desktop environment:
X-Server & Desktop Environment: Include a lightweight DE (like XFCE) and Xvfb within the Docker container
noVNC Integration: Integrate a web-based VNC client (noVNC) so that users can view the agent's desktop interaction directly in the browser.
Enhanced API for Screenshots/Input: Extend the REST API to allow agents to capture screenshots of the virtual display and send input events (mouse clicks, keystrokes).
Persistent Browser Sessions: Pre-install tools like Playwright or Selenium in "headful" mode to allow agents to manage complex web sessions (e.g., Microsoft 365) where session persistence is required to bypass frequent MFA prompts.
How would this improve the project?
Enables "Computer Use" Capabilities: Aligns Open Terminal with modern agentic trends (like Anthropic's Computer Use) where the agent interacts with a full OS UI.
Visual Debugging for Users: Users can see exactly what the agent is doing in real-time via the noVNC interface.
Bypassing API Restrictions: Allows agents to work with services where no API token can be generated, such as corporate Outlook accounts with disabled App Passwords.
I propose extending the Open Terminal Docker image and API to support a virtual desktop environment:
X-Server & Desktop Environment: Include a lightweight DE (like XFCE) and Xvfb within the Docker container
noVNC Integration: Integrate a web-based VNC client (noVNC) so that users can view the agent's desktop interaction directly in the browser.
Enhanced API for Screenshots/Input: Extend the REST API to allow agents to capture screenshots of the virtual display and send input events (mouse clicks, keystrokes).
Persistent Browser Sessions: Pre-install tools like Playwright or Selenium in "headful" mode to allow agents to manage complex web sessions (e.g., Microsoft 365) where session persistence is required to bypass frequent MFA prompts.
How would this improve the project?
Enables "Computer Use" Capabilities: Aligns Open Terminal with modern agentic trends (like Anthropic's Computer Use) where the agent interacts with a full OS UI.
Visual Debugging for Users: Users can see exactly what the agent is doing in real-time via the noVNC interface.
Bypassing API Restrictions: Allows agents to work with services where no API token can be generated, such as corporate Outlook accounts with disabled App Passwords.