Skip to content

feat: discover app skills before manual computer control #16

@snowdamiz

Description

@snowdamiz

Problem

When an agent is asked to interact with an app during computer use, it may jump straight to manual desktop/browser control even when a dedicated skill, MCP server, plugin, or connector could handle the app more reliably.

That can make the interaction slower, more brittle, and less auditable than using a purpose-built integration. It also means users may miss the chance to install an available capability that would improve future app interactions.

Proposed solution

Before manually controlling an app, the agent/runtime should check whether an installable or already-available capability exists for that app or workflow.

Suggested behavior:

  • When the agent is asked to interact with an app, infer the target app/service and intended task.
  • Search available skills, MCP servers, plugins, and connectors for a matching capability before falling back to manual computer control.
  • If an appropriate capability is already installed/enabled, prefer it over direct manual control when it can satisfy the request.
  • If an appropriate capability exists but is not installed, ask the user whether to install/enable it before proceeding.
  • If the user declines, no matching capability exists, installation fails, or the requested task still requires visual/manual interaction, continue with normal computer control.
  • Make the check lightweight and explainable so it does not create noisy prompts for every small action.

Alternatives considered

  • Always use manual computer control first and let the agent discover better tools after it gets stuck. This preserves current behavior but misses reliable integrations until after wasted effort.
  • Always auto-install matching capabilities without asking. This is faster but too surprising for user trust, permissions, and workspace hygiene.
  • Require users to explicitly request a skill/MCP lookup. This avoids extra prompts but puts the burden on users to know integrations exist.

Success criteria

  • When a user asks the agent to interact with an app, the runtime can detect likely app/service targets and perform a bounded capability lookup before manual control.
  • Matching installed capabilities are preferred over manual computer control when they can complete the task.
  • Matching uninstalled skills/MCP servers/plugins/connectors trigger a clear user confirmation flow before installation/enabling.
  • The agent falls back cleanly to manual control when no good capability exists or the user declines installation.
  • Prompts are rate-limited or scoped so users are not repeatedly asked about the same capability during a single task.
  • Tests cover installed-match, uninstalled-match, no-match, declined-install, failed-install, and manual-fallback paths.
  • User-facing copy clearly distinguishes skills, MCP servers, plugins/connectors, and manual computer control without exposing internal implementation details unnecessarily.

Area

Agent runtime

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions