Skip to content

fix(tools): stop workspace guard misreading scheme-less URLs#2965

Open
maxmilian wants to merge 2 commits into
sipeed:mainfrom
maxmilian:fix/exec-guard-bare-domain-falsepos
Open

fix(tools): stop workspace guard misreading scheme-less URLs#2965
maxmilian wants to merge 2 commits into
sipeed:mainfrom
maxmilian:fix/exec-guard-bare-domain-falsepos

Conversation

@maxmilian
Copy link
Copy Markdown

Description

When restrict_to_workspace is enabled, the exec tool's guardCommand
extracted every /...-led run from the command and treated each as an absolute
path. A scheme-less URL — which curl accepts without an explicit http://
prefix, e.g. curl -s "wttr.in/Beijing?T" — had its /Beijing?T component
matched, resolved with filepath.Abs, and filepath.Rel(cwd, …) turned it into
../../Beijing?T, tripping the .. "path outside working dir" check. Reporters
note this makes most curl-based skills unusable.

Fix

Skip a /path match only when the run of characters immediately before the
slash parses as a hostname: host-legal bytes [A-Za-z0-9.-], not starting with
-, containing a ., not starting/ending with ..

Because a skipped match is glued to its host with no separator, the shell
treats the whole word as a single URL/relative token — never a standalone
absolute path. The skip therefore cannot whitelist a real escape:

Command Left token Skipped? Result
curl wttr.in/Beijing wttr.in ✅ host allowed
curl 192.168.1.1/status 192.168.1.1 ✅ host allowed
tar -xC/etc/cron.d … -xC (starts -) blocked
cat $x/etc/passwd x (no dot) blocked
java -cp app.jar:/etc/passwd empty (walk stops at :) blocked
scp host:/etc/passwd empty (stops at :) blocked
cat /etc/passwd empty (stops at space) blocked

.. traversal is still caught by the existing pre-loop check; the existing
web-scheme (//host after http:/https:/…) skip is unchanged.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

AI Code Generation

  • 🤖 Fully AI-generated (AI-written, human-reviewed + validated)

The fix approach was human-selected and the security boundary was hardened
across three independent adversarial review passes — two of which found and
closed sandbox-escape classes (flag-glued -C paths, and : path-lists) before
this PR was opened.

Related Issue

Fixes #1042

Technical Context

  • Root cause: pkg/tools/shell.go absolutePathPattern (/[^\s"']+) matches a
    URL's host-relative path component; the prior exemption only covered // after
    a web scheme, not scheme-less hosts.
  • New helper hostLikeBeforeSlash gates the skip on a hostname check.
  • : is deliberately excluded from the host-character walk so classpath /
    remote-spec path-lists (java -cp a.jar:/abs, scp host:/abs) are not mistaken
    for host:port and stay blocked.

Test Environment

  • OS: macOS (Darwin) · Go 1.25
  • go test ./pkg/tools/: new TestShellTool_SchemelessURLGuard passes (5
    scheme-less URLs allowed + 5 escape regressions blocked); existing
    TestShellTool_URLsNotBlocked / URLBypassPrevented / PathTraversalVariants
    / RestrictToWorkspace / SafePaths all pass.
  • TestShellTool_DevNullAllowed and TestShellTool_FileURISandboxing fail
    locally only due to the macOS /tmp/private/tmp symlink (pre-existing on
    main, unrelated to this change); they pass on Linux CI.

Checklist

  • I have read and tested the AI-generated code
  • gofmt-clean and go vet passes
  • Added test coverage (both directions, incl. escape regressions)
  • No new shell-injection / path-traversal surface

…1042)

When restrict_to_workspace is enabled, guardCommand extracted any
"/..."-led run from the command and treated it as an absolute path. A
scheme-less URL such as `curl -s "wttr.in/Beijing?T"` (curl accepts URLs
without an explicit http:// prefix) had its "/Beijing?T" component
matched, resolved to an absolute path, and blocked as a workspace escape
-- making most curl-based skills unusable.

Skip a "/path" match only when the run of characters immediately before
the slash parses as a hostname (host-legal bytes [A-Za-z0-9.-], not
starting with "-", containing a "."). Because a skipped match is glued
to its host with no separator, the shell treats it as a single
URL/relative token, never a standalone absolute path. Flag-glued paths
("tar -xC/etc/cron.d"), variable-prefixed paths ("cat $x/etc/passwd")
and colon path-lists ("java -cp app.jar:/etc/passwd", "scp
host:/etc/passwd") are not host-like and stay blocked; ".." traversal is
caught earlier.

Add TestShellTool_SchemelessURLGuard covering both directions, including
the flag/var/colon escape regressions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maxmilian maxmilian marked this pull request as ready for review May 29, 2026 01:02
@afjcjsbx
Copy link
Copy Markdown
Collaborator

afjcjsbx commented May 29, 2026

Hi @maxmilian , thanks for the PR,

pkg/tools/shell.go and pkg/tools/shell.go introduce a new sandbox bypass for dotted relative paths. hostLikeBeforeSlash() treats any foo.bar/... token as URL-like and skips the existing filepath.Abs / EvalSymlinks / Rel checks entirely.

In practice, in a workspace that contains a versioned symlink like foo.bar -> /, running cat foo.bar/etc/hosts now bypasses the guard and reads outside the workspace, whereas it was blocked prior to this PR.

We need to either properly identify a true URL token-instead of just matching any prefix with a dot-or re-run the validation on the full token before triggering the continue statement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@afjcjsbx
Copy link
Copy Markdown
Collaborator

hostLikeBeforeSlash() now treats any foo.bar/... token as a scheme-less URL and skips the path check before EvalSymlinks(). That means a relative path inside the workspace can evade the guard if it contains a dot and resolves through a symlink outside the workspace.

Repro:

  1. Create safe.example -> /tmp/secret inside the workspace
  2. Run cat safe.example/secret.txt

On main this is blocked by the workspace guard; on this PR it succeeds and reads the file, because safe.example/secret.txt is classified as a URL-like token.

Could we tighten the heuristic and add a regression test for a dotted symlink/path case? Right now this looks like a security regression, not just an edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]exec工具的guardCommand方法问题

2 participants