-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Support a hybrid workflow where an agent running in a remote environment can interact with and observe GUI surfaces on the user's local machine.
Problem
In remote agent setups, the agent can execute tasks on a server, but it usually cannot directly access or control GUI windows that exist on the user's local device.
This breaks an important class of workflows:
- GUI-based testing
- local browser or desktop app interaction
- visual verification of results
- step-by-step debugging that depends on what the user can see locally
In practice, the useful compute environment may be remote, while the useful GUI surface is local. Today, those two environments are disconnected.
Request
It would be valuable for saneagent to support a local companion or bridge that allows a remotely running agent to interact with the user's local GUI environment.
At a high level, the desired workflow is:
- The main agent runs remotely.
- A local companion process exposes controlled access to local GUI surfaces.
- The remote agent can launch, attach to, inspect, and interact with local applications or windows.
- The user can directly observe the same local GUI while the agent continues operating remotely.
Expected Capabilities
A future implementation could make it possible for a remote agent to:
- launch or attach to local GUI applications
- inspect available local windows or active sessions
- perform GUI-driven interactions on the local machine
- stream enough state back to the remote agent for reliable automation
- let the user observe and validate actions in real time
Why This Fits saneagent
This feels aligned with the direction described in this repository:
- desktop GUI support
- cross-platform operation
- compatibility across agent harnesses
- acting as an interface for driving other clients
A remote-to-local GUI bridge would make saneagent substantially more useful for real development and testing workflows, especially when execution belongs on a server but interaction and verification belong on the user's machine.
Notes
The key need here is not simply "remote GUI on the server." The more important capability is enabling a remote agent to work with GUI surfaces that are local to the user.
That distinction matters because, in many practical workflows, the user wants to see and validate the interaction on their own device rather than through a server-hosted desktop session.
Tagging @minpeter for context and discussion.