Problem
When an agent is asked to interact with an app during computer use, it may jump straight to manual desktop/browser control even when a dedicated skill, MCP server, plugin, or connector could handle the app more reliably.
That can make the interaction slower, more brittle, and less auditable than using a purpose-built integration. It also means users may miss the chance to install an available capability that would improve future app interactions.
Proposed solution
Before manually controlling an app, the agent/runtime should check whether an installable or already-available capability exists for that app or workflow.
Suggested behavior:
- When the agent is asked to interact with an app, infer the target app/service and intended task.
- Search available skills, MCP servers, plugins, and connectors for a matching capability before falling back to manual computer control.
- If an appropriate capability is already installed/enabled, prefer it over direct manual control when it can satisfy the request.
- If an appropriate capability exists but is not installed, ask the user whether to install/enable it before proceeding.
- If the user declines, no matching capability exists, installation fails, or the requested task still requires visual/manual interaction, continue with normal computer control.
- Make the check lightweight and explainable so it does not create noisy prompts for every small action.
Alternatives considered
- Always use manual computer control first and let the agent discover better tools after it gets stuck. This preserves current behavior but misses reliable integrations until after wasted effort.
- Always auto-install matching capabilities without asking. This is faster but too surprising for user trust, permissions, and workspace hygiene.
- Require users to explicitly request a skill/MCP lookup. This avoids extra prompts but puts the burden on users to know integrations exist.
Success criteria
- When a user asks the agent to interact with an app, the runtime can detect likely app/service targets and perform a bounded capability lookup before manual control.
- Matching installed capabilities are preferred over manual computer control when they can complete the task.
- Matching uninstalled skills/MCP servers/plugins/connectors trigger a clear user confirmation flow before installation/enabling.
- The agent falls back cleanly to manual control when no good capability exists or the user declines installation.
- Prompts are rate-limited or scoped so users are not repeatedly asked about the same capability during a single task.
- Tests cover installed-match, uninstalled-match, no-match, declined-install, failed-install, and manual-fallback paths.
- User-facing copy clearly distinguishes skills, MCP servers, plugins/connectors, and manual computer control without exposing internal implementation details unnecessarily.
Area
Agent runtime
Problem
When an agent is asked to interact with an app during computer use, it may jump straight to manual desktop/browser control even when a dedicated skill, MCP server, plugin, or connector could handle the app more reliably.
That can make the interaction slower, more brittle, and less auditable than using a purpose-built integration. It also means users may miss the chance to install an available capability that would improve future app interactions.
Proposed solution
Before manually controlling an app, the agent/runtime should check whether an installable or already-available capability exists for that app or workflow.
Suggested behavior:
Alternatives considered
Success criteria
Area
Agent runtime