Skip to content

feat(mcp-tool-proxy): private MCP integration via per-tool SLXs#43

Open
theyashl wants to merge 36 commits into
mainfrom
feat/private-mcp-integration
Open

feat(mcp-tool-proxy): private MCP integration via per-tool SLXs#43
theyashl wants to merge 36 commits into
mainfrom
feat/private-mcp-integration

Conversation

@theyashl

@theyashl theyashl commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • New codebundles/mcp-tool-proxy/ that proxies a single MCP tool call: Python script does initialize + tools/call (handles both JSON and SSE responses), Robot wrapper dynamically imports per-tool parameters from MCP_INPUT_SCHEMA.
  • Generation rule + SLX/Runbook templates render one SLX + Runbook per discovered mcp_tool resource. Generated SLXs carry platform: mcp, resource_name: {server}, resource_type: mcp_server, mcp_server: {server}, mcp_tool: {tool} tags, an additionalContext.hierarchy: [platform, mcp_server, mcp_tool] for grouping in the platform UI, and an access tag rendered from spec.access (per-tool, classified by the indexer; defaults to read-write when absent).
  • SLX alias is rendered as <server> - <tool> (dash separator); the runbook task name is <server>_<tool>. Both surface the server scope alongside the tool name so the UI doesn't show 37 anonymous "list_*" tasks.
  • SLX template sets additionalContext.resourcePath: mcp/<server> (2 keys) — distinct from the 3-key hierarchy. UI grouping uses hierarchy; resource addressing uses resourcePath. Requires the paired compute_resource_path_from_hierarchy change in runwhen-local#798 so the explicit value isn't overwritten.
  • Runtime var defaults from the MCP input schema are always rendered as YAML strings — numbers/bools/lists/dicts are JSON-encoded first (so 42 becomes "42", true becomes "true", [1,2,3] becomes '"[1, 2, 3]"'). Robot Framework treats runtime vars as strings; un-coerced defaults caused type-mismatch failures in the runner.
  • Required parameters from the schema's top-level required array get a (required) suffix appended to their rendered description ("Required parameter." as a fallback when the schema description is empty), so downstream UI/agent surfaces know which inputs are mandatory.
  • Tool args from agentfarm arrive as strings (Robot's Import User Variable always returns string); the proxy coerces them to the JSON-Schema types in MCP_INPUT_SCHEMA before tools/call so boolean/integer/number/array/object parameters don't trip the MCP server's input validator.
  • Error policy split: transport / initialize failures exit 1 (task fails); tools/call errors and result.isError=true are surfaced as task output (rc=0) so agentfarm can read and react to them.
  • Templates use | tojson for any value sourced from upstream MCP descriptions/defaults (which may contain colons, control characters, or newlines), and the standard labels: \n {% include "common-labels.yaml" %} pattern so the SLX YAML always parses cleanly.
  • Pairs with runwhen-local#798 (the mcp_tools indexer that drives this codebundle from Helm-provided mcpConfig values).

Design spec: docs/superpowers/specs/2026-05-20-private-mcp-integration-design.md.

Configuring MCP servers

MCP servers are declared on the workspace-builder side, not in this codecollection — see codebundles/mcp-tool-proxy/README.md for the full configuration example. Short version:

runwhenLocal:
  workspaceBuilder:
    workspaceInfo:
      configMap:
        data:
          mcpConfig:
            servers:
              - display_name: jira
                url: https://jira-mcp.internal:443/mcp
                secret_ref: jira-mcp-token

secret_ref points to a k8s Secret with data.token: <bearer>. The same secret must be reachable from runner pods at execution time (the generated Runbook references it via secretsProvided.workspaceKey).

Optional per-server verify_tls: false skips TLS verification for environments where the pod's CA bundle doesn't yet trust the MCP server's issuer.

Test Plan

  • Python unit tests: cd codebundles/mcp-tool-proxy && PYTHONPATH=. .venv/bin/pytest tests/ -v — 22 pass + 1 skip (JSON-RPC envelope parsing, SSE handling, tool output rendering, error envelopes, transport failures, JSON-Schema arg coercion)
  • Robot file parses: from robot.api import get_model; get_model('runbook.robot') clean
  • Generation rule validates against runwhen-local's generation-rule-schema.json
  • SLX + Runbook templates render against a synthetic mcp_tool resource with realistic tool descriptions (colons, newlines, embedded quotes) — asserts on tags, additionalContext, runtimeVarsProvided validation blocks, string-coerced defaults, (required) suffix on mandatory params, configProvided
  • Local dry-run (./.test/dry-run.sh) — stub MCP server + script round-trip; bypasses Robot since RW.Core ships only in the runner image (documented in .test/README.md)
  • End-to-end inside a real workspace-builder pod against https://mcp.test.runwhen.com/mcp: 37 MCP tools discovered, 37 SLX + Runbook pairs rendered and uploaded to PAPI

Out of scope (deferred follow-ups)

  • Production CA-bundle distribution to remove the temporary verify_tls: false escape hatch (tracked in RW-1146)

🤖 Generated with Claude Code

theyashl and others added 27 commits May 28, 2026 18:22
…P integration

Spec covers four approaches (A multi-task SLX, B in-VPC gateway, C SLX-per-server,
D SLX-per-tool) and recommends D. Plan scopes to the codecollection mcp-tool-proxy
codebundle + the mcp_tools indexer in runwhen-local; papi DB/API/UI work is a
separate plan. Defaults locked for §10 open decisions in the plan header.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…p path, split error policy

- Discovery: D1 → D2 (no papi work needed for v1; reads MCP_CONFIG setting from
  Helm-provided mcpConfig values, mirroring CLOUD_CONFIG_SETTING pattern).
- SLX template: additionalContext gets path/hierarchy = "mcp/{server}"; access
  tag flipped to read-only as safe default until we can classify tools.
- Error policy split: tools/call errors and result.isError surface as task
  output (rc=0) so agentfarm can read and react; transport + initialize errors
  fail the task (rc=1). Reflected in invoke_tool (returns string vs raises) and
  main() exit codes.
- Tests rewritten accordingly; Phase 4 papi HTTP fetch replaced with
  Helm-config parsing + validation; Phase 5 E2E drops papi mock.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…list

SLX YAMLs generated by workspace-builder don't carry configProvided
(that lives on the Runbook for the runner to read at exec time).
additionalContext.hierarchy is a list of tag keys the UI walks to build
the tree view, not a slash-path string. Use [source, mcp_server] so MCP
SLXs group by source → server → tool, and surface tool_name as its own
tag so the rendered alias isn't the only place it shows up.
RuntimeVarEntry in corestate-operator api/v1/common_types.go only
declares name/default/description/validation. The Runbook CRD validation
will reject envelopes with extra fields, so 'required' and 'type'
(carried over from MCP's JSON Schema) had to come out. The Robot wrapper
still receives the full input_schema via MCP_INPUT_SCHEMA so per-tool
required-arg enforcement happens at MCP call time, not at Runbook level.
Maps MCP JSON Schema property metadata onto RuntimeVarValidation:
  - properties[x].enum    -> validation.type=enum,  values=[...]
  - properties[x].pattern -> validation.type=regex, pattern=...
  - neither               -> validation.type=regex, pattern='.*'

CRD constrains validation.type to {enum, regex} (corestate-operator
common_types.go:53-63), so the catch-all fallback is a permissive regex
rather than 'optional / nothing'. Every emitted runtime var now carries
a validation block, which matches the CRD's expectation in practice.
…hars

Live MCP servers return tool/property descriptions that may contain
colons, control characters (U+0080 seen on the RunWhen platform MCP
server), embedded quotes, and newlines — all of which break YAML when
emitted as `field: "<raw>"`. Our previous escape-only-quotes approach
caught quotes but missed everything else, leading to render errors:

  Unexpected error rendering mcp-tool-proxy-slx.yaml:
    mapping values are not allowed here
    unacceptable character #x0080: special characters are not allowed

Switching to the `| tojson` filter produces JSON-escaped strings which
are also valid YAML scalars — handles quotes, backslashes, newlines,
and control characters in one go.

Affected fields:
  - SLX template:    spec.alias, spec.statement
  - Runbook template: runtimeVarsProvided[].description / default /
                      validation.values / validation.pattern

Verified end-to-end against https://mcp.test.runwhen.com/mcp (37 tools
discovered; previous render-stage errors gone).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Other codebundles' SLX templates use:

    labels:
      {% include "common-labels.yaml" %}

Ours had it on the same line:

    labels: {% include "common-labels.yaml" %}

which, when the include's first line starts with content, expands to
`labels:     slx: <name>` — YAML reads that as `labels` with a scalar
value that contains a colon, so the parser bails with "mapping values
are not allowed here" on every mcp-tool SLX.

Verified by capturing the raw rendered output in-pod: this line was
the actual breakage, not the alias/statement scalars I fixed in
54198cd. tojson is still the right thing for those — keeping that.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Explains where mcpConfig.servers lives in Helm values and the
workspaceinfo ConfigMap, what fields each server entry takes,
the expected k8s Secret shape for bearer tokens, and links to
the canonical workspaceInfo docs in runwhen-local.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PAPI groups SLXs by resourcePath (the qualified_name minus the leaf
component). With hierarchy: [source, mcp_server] every tool on a given
server collapses to the same resourcePath (e.g. mcp/linear-mcp), which
hits a per-resource cap of 10 SLXs on the platform side — meaning
servers with >10 tools silently lose the surplus on upload.

Adding mcp_tool as a third hierarchy level keeps the qualified_name the
same (mcp/{server}/{tool}) but makes each tool its own resourcePath, so
the per-resource cap no longer applies.

Verified end-to-end against a linear MCP server with 41 tools: with
2-level hierarchy 10/41 stored, with 3-level all 41 stored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ate defaults

PAPI's taskiq worker clones rw-generic-codecollection at whichever ref
the uploaded Runbook YAML specifies in `codeBundle.ref` to attach a
runbook (and its tasks) to each SLX. With a hardcoded `ref: main`, that
clone fails for any environment where the codebundle has not yet landed
on main:

  PathNotFoundError: path codebundles/mcp-tool-proxy not found in local
  clone /tmp/rw_upload_*/...rw-generic-codecollection_main

The error fires inside the runbook post-sync hook *after* the SLX row is
committed, so SLXs end up persisted with `runbook: null` and the entire
batch task aborts before remaining SLXs are processed.

Templates now read `match_resource.spec.codecollection_ref` (threaded
from the mcpConfig server entry by the indexer; defaults to "main") so a
workspace can point at a branch / tag while a change is in review.

Also switches Python-style `or ""` fallbacks to Jinja's `| default("")`
filter for `pschema.description`, `pschema.default`, and SLX `statement`
to tolerate MCP tools whose property schemas don't carry those keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The workspace-builder generation-rules engine populates `{{ ref }}`
with the codecollection ref the template was loaded from
(generation_rules.py:643). Using it for codeBundle.ref keeps the runner-
side clone pinned to whatever codecollection ref the workspace builder
was already pointed at via codeCollections — no extra knob needed on
mcpConfig.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t resourcePath

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ k8s-prefixed secret

- SLX qualified_name now stops at server level (mcp/<server>); drop tool
  from hierarchy so PAPI's resourcePath shows the parent path, not the leaf.
- Drop the "MCP: " alias prefix; alias is just "<server> / <tool>".
- Runbook task name uses ${MCP_TOOL_NAME} so reports show the actual tool.
- Runbook secret read uses the standard k8s:file@secret/<name>:token prefix
  (matches kubernetes-auth.yaml / azure-auth.yaml convention). Workspace-vault
  support tracked separately in RW-1150.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
theyashl and others added 6 commits May 28, 2026 18:22
…rchy uses resource_name

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…erver → mcp_tool)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the workspace-builder indexer escape hatch. New MCP_VERIFY_TLS
configProvided var (defaults to true) flows from the indexer's verify_tls
field → Runbook configProvided → Robot env → mcp_tool_proxy.py. Sets
session.verify and passes verify= per-request (REQUESTS_CA_BUNDLE
otherwise overrides it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re tools/call

Robot's `Import User Variable` always returns a string, but MCP servers
schema-check the JSON-RPC payload — sending `"true"` (string) where the
schema says boolean fails with `invalid_type` (e.g. linear's list_teams
rejecting `includeArchived: "true"`).

The proxy now parses MCP_INPUT_SCHEMA (already passed as configProvided)
and casts each arg to the declared JSON-Schema type: boolean / integer
/ number / array / object / string. Unknown types and coercion failures
pass through so the MCP server can surface its own validation error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…fault read-write

Renders the SLX `access` tag from `match_resource.spec.access` so the
workspace-builder indexer can classify each MCP tool independently (via
`readOnlyHint` + tool-name verb heuristic). Defaults to `read-write`
when the spec field is absent — safer to over-mark write capability than
to silently flag a write tool as read-only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lx, task title prefixed with server

- SLX alias: server/tool joined with " - " instead of " / " so the platform
  UI shows e.g. "linear-mcp - list_teams" without path-like separators
  inside the alias text.
- SLX additionalContext.resourcePath: "mcp/<server>" — explicit field
  alongside qualified_name for downstream consumers that key off
  resourcePath rather than qualified_name.
- Robot task title now "<server>_<tool>" instead of just "<tool>", so
  tasks from different MCP servers don't collide on identically named
  tools (e.g. multiple servers exposing "list_projects"). Plumbed
  MCP_SERVER_DISPLAY_NAME through configProvided + Suite Initialization
  so the variable resolves before the task name binds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@theyashl theyashl force-pushed the feat/private-mcp-integration branch from a622144 to 8ac5413 Compare May 28, 2026 12:56
theyashl and others added 3 commits May 29, 2026 18:50
Robot Framework runtime vars are always strings. If the MCP tool's input
schema has a numeric/bool/list/dict default, the previous template let
YAML parse it as that native type, which the runner rejects on type
mismatch. Coerce non-string defaults through tojson first so YAML sees
a string.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n description

JSON Schema lists required fields at the schema's top level, not as a
per-property flag. Read that array and append "(required)" to the
description of any parameter listed there so downstream UI/agent surfaces
know which inputs are mandatory. If the parameter has no description,
fall back to "Required parameter." instead of an awkward "(required)" alone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Robot's `'''${schema_json}'''` interpolates the value as Python source
text, so a `\"` inside any MCP tool description (e.g. Linear's
list_issues description containing 'or "me"') gets re-interpreted by
Python's string-literal parser and the JSON is corrupted before
json.loads ever runs. Reproduced via runner log:

  JSONDecodeError: Expecting ',' delimiter: line 1 column 206 (char 205)

Switch to `$schema_json` / `$tool_args` (no curly braces), which binds
the value to the expression's namespace as a Python object — no source
substitution, no escape re-parsing. Same fix applied to the FOR-loop
conditional (Run Keyword If → IF) so it doesn't break on values that
happen to contain triple quotes or backslashes either.

Adds a regression test file that pins both halves: a failing fixture
reproducing the original crash with the Linear schema, a passing
fixture using the object-pass form, and a static guard that fails the
build if any executable line ever reintroduces `'''${var}'''`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant