Skip to content

Host-side API server experiment#28

Open
maruiz93 wants to merge 4 commits into
mainfrom
experiment/host-side-api-server
Open

Host-side API server experiment#28
maruiz93 wants to merge 4 commits into
mainfrom
experiment/host-side-api-server

Conversation

@maruiz93
Copy link
Copy Markdown
Contributor

Summary

Closes #25

  • Add design spec and implementation plan for testing host-side API servers callable from inside OpenShell sandboxes through the L7 proxy
  • Implement builder (Go) and provisioner (Python) API servers with uniform process contract (--port, --token, /healthz, /tools.json, SIGTERM), openshell sandbox download/upload for file transfer, and fullsend run integration with 6 harness configs (3 discovery methods × 2 network policies)
  • Run all 6 harnesses and document findings: canonical host.openshell.internal hostname, provider-based bearer token auth, L7 policy structure, orchestrator lifecycle, bugs found, token usage analysis, and recommendation for /tools.json as the standard API discovery format

Test plan

  • All 6 harnesses executed (baked-instructions-full/restricted, openapi-discovery-full/restricted, tooluse-discovery-full/restricted)
  • All full-access runs pass (clone + build + image upload)
  • Restricted runs correctly show 403s for blocked endpoints
  • JSONL transcripts and replay HTML generated for each run
  • Findings document covers architecture, bugs, and recommendations

🤖 Generated with Claude Code

@maruiz93 maruiz93 requested a review from a team as a code owner May 20, 2026 03:31
@maruiz93 maruiz93 force-pushed the experiment/host-side-api-server branch from 9197ac7 to 038e446 Compare May 20, 2026 03:36
@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 20, 2026

Review

Findings

Critical

  • [command-injection] host-side-api-server/servers/builder/main.go:155,164,193,202,234 — User-controlled request fields (Tag, Dockerfile, ContextDir, Dest, Sandbox) are passed unsanitized as arguments to exec.Command calls for podman and openshell CLIs. While exec.Command avoids shell metacharacter injection, path traversal is possible: req.Dockerfile containing ../../etc/shadow would resolve outside the temp context directory when concatenated at line 164 (localContext + "/" + req.Dockerfile), and req.Tag passed directly to podman push (line 234) as a positional argument could be interpreted as a flag (e.g., --tls-verify=false). The agent inside the sandbox controls these inputs and could be compromised via prompt injection from malicious repository content.
    Remediation: Validate all request fields — Tag against a strict image-tag regex rejecting leading -; Dockerfile via filepath.Base() rejecting path separators and ..; ContextDir and Dest via filepath.Clean() rejecting .. components; Sandbox against an alphanumeric pattern.

High

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:162,165 — The repo parameter is validated only with not repo or "/" not in repo, allowing values with path traversal sequences or extra path segments. The value is interpolated into https://x-access-token:{github_token}@github.com/{repo}.git, so a crafted repo could redirect the clone to an unexpected GitHub repository, leaking the embedded github_token.
    Remediation: Validate repo against ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$.

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:170 — The ref parameter is passed directly to git clone --branch {ref} with no validation. While --branch consumes the next argument as its value, the lack of any validation is a defense-in-depth failure.
    Remediation: Validate ref against ^[a-zA-Z0-9_./-]+$ and reject values starting with -. Alternatively, add -- before the clone URL to terminate option parsing.

  • [command-injection] host-side-api-server/servers/repo-provisioner/server.py:178-181 — The sandbox and dest parameters are passed directly to subprocess.run(["openshell", "sandbox", "upload", sb, local_path, dest]) with no validation. A sandbox value starting with -- could be interpreted as a flag; dest could contain path traversal sequences.
    Remediation: Validate sandbox against an alphanumeric-plus-hyphens pattern. Validate dest as a relative path with no .. components. Add -- before positional arguments.

Medium

  • [auth-weakness] host-side-api-server/servers/builder/main.go:69 — Bearer token comparison uses != (non-constant-time). The code comments acknowledge this should use crypto/subtle.ConstantTimeCompare.
    Remediation: Apply the fix noted in the comment.

  • [auth-weakness] host-side-api-server/servers/repo-provisioner/server.py:293 — Bearer token comparison uses != instead of hmac.compare_digest(). Same issue as the Go server.
    Remediation: Use hmac.compare_digest(parts[1], bearer_token).

  • [credential-exposure] host-side-api-server/servers/repo-provisioner/server.py:255 — When clone_repo raises CalledProcessError, the log() call logs str(exc) directly, which includes the full command with the embedded GitHub token in the clone URL. The _scrub_credentials() function is only applied to the result dict, not the log output.
    Remediation: Apply _scrub_credentials(str(exc)) in the log call.

  • [json-injection] host-side-api-server/servers/builder/main.go:107,270 — Error responses use fmt.Sprintf to interpolate error messages into JSON string templates. If error messages contain quotes or backslashes, the resulting JSON is malformed.
    Remediation: Use json.Marshal or json.NewEncoder for all JSON responses including errors.

  • [logic-error] host-side-api-server/servers/builder/main.go:138-205buildHandler returns HTTP 200 on all failure paths (download failure, build error, save failure, upload failure). Clients must inspect the JSON body to detect failures.
    Remediation: Return appropriate HTTP error status codes (500 for build failures, 400 for validation errors).

  • [credential-exposure] host-side-api-server/harness/*.yaml:19 — All six harness files copy the GCP service account key file into the sandbox via host_files, undermining the credential isolation goal stated in the experiment design.
    Remediation: Document this as a known limitation in findings.md, or use an OpenShell provider for GCP credentials.

Low

  • [auth-gap] host-side-api-server/servers/builder/main.go:511-512/openapi.json and /tools.json endpoints are unauthenticated. Likely intentional for tool discovery; add a code comment documenting the decision.

  • [error-handling] host-side-api-server/servers/repo-provisioner/server.py:310Content-Length parsing does not handle non-numeric values, resulting in an unhandled ValueError.

Previous run

Review

Outcome: comment · 4 high · 5 medium · 3 low · 5 info

This PR adds a well-structured experiment testing host-side API servers callable from inside OpenShell sandboxes. The experiment design is solid, the shell orchestration is clean (set -euo pipefail, proper traps), and the findings document captures useful architectural insights. The code below surfaces security patterns worth addressing — especially relevant since this experiment specifically tests trust boundaries between sandboxed agents and host services.


High

H1 · Argument injection via unvalidated request fields — main.go

req.Tag, req.Dockerfile, req.ContextDir, and req.Dest are passed directly to exec.Command arguments for podman build, podman push, and openshell sandbox download/upload. While Go's exec.Command avoids shell interpretation, crafted values can still inject flags (e.g., Tag = "--build-arg=SECRET=..." or a tag pointing to an attacker-controlled registry). Dockerfile = "../../etc/passwd" resolves outside the build context.

Remediation: Validate Tag against a strict image-reference regex, reject Dockerfile/ContextDir/Dest containing .. or leading /, and validate Sandbox against ^[a-zA-Z0-9_-]+$.

H2 · Argument injection and weak validation in Python provisioner — server.py

The repo field is validated only with "/" in repo, allowing values like ../../etc/passwd. The ref parameter has no validation — a value like --upload-pack=/bin/malicious is passed as a git argument. The dest and sandbox fields are similarly unvalidated.

Remediation: Validate repo with ^[a-zA-Z0-9._-]+/[a-zA-Z0-9._-]+$, validate ref with ^[a-zA-Z0-9._/-]+$, and sanitize dest/sandbox as in H1.

H3 · GitHub token leaked in error messages — server.py

When clone_repo raises CalledProcessError, exc.stderr or str(exc) includes the full clone URL with the embedded x-access-token:{github_token}@github.com credential. This is returned in the HTTP response body, making the token visible to any authenticated caller (including the sandboxed agent).

Remediation: Scrub the clone URL from error messages before returning them, or return a generic error and log the details server-side only.

H4 · GCP service account key copied into sandbox — harness/*.yaml

All six harness configs copy ${GOOGLE_APPLICATION_CREDENTIALS} into the sandbox at /tmp/workspace/.gcp-credentials.json. This contradicts the design principle of credential isolation (servers hold credentials, agents never see them). A compromised agent can read the GCP key and exfiltrate it.

Remediation: Consider a host-side proxy for Vertex AI calls (similar to how the API servers proxy other capabilities), or document this as an accepted trade-off for the experiment.


Medium

M1 · Bearer token comparison is not timing-safe — main.go, server.py

Both servers use != for token comparison. Go should use crypto/subtle.ConstantTimeCompare(); Python should use hmac.compare_digest(). In this experiment context (short-lived tokens, non-internet-facing), exploitability is low, but it's a bad pattern to carry forward.

M2 · Servers bind to all interfaces — main.go, server.py

Both servers listen on 0.0.0.0. On a shared host, other users could probe these APIs. Should bind to 127.0.0.1 since only local sandbox traffic (via L7 proxy) needs access.

M3 · Bearer token printed to stdout — run.sh

echo "Token: $API_TOKEN" logs the per-run bearer token. If captured in CI logs, this persists the credential beyond the run lifetime.

M4 · No request body size limits — main.go, server.py

Neither server caps request body size. http.MaxBytesReader (Go) or a Content-Length check (Python) would prevent memory exhaustion from oversized payloads.

M5 · Error responses leak internal paths — main.go

Raw stderr from podman/openshell commands is included in JSON error responses, revealing host filesystem structure and tool versions.


Low

L1 · operations dict grows without bound — server.py

The in-memory operations store is never pruned. Acceptable for a short-lived experiment but would leak memory in longer runs.

L2 · healthz endpoints are shallow — main.go, server.py

Both /healthz handlers return {"status": "ok"} without checking that dependencies (podman, openshell) are available.

L3 · Design doc references stale hostname — 2026-05-14-host-side-api-server-design.md

The design doc mentions host.docker.internal in two places; the implementation and findings use host.openshell.internal. Should be updated for consistency.


Info

  • No hardcoded secrets: No real tokens, keys, or credentials are committed. Variables use proper ${} expansion. Result files contain no sensitive data.
  • SKILL.md files are clean: No prompt injection patterns found in agent instructions. The injection patterns appearing in result files are scan findings from the target repo, not this PR's own content.
  • Shell scripting quality is good: set -euo pipefail, proper quoting, cleanup traps, and structured logging throughout run.sh and setup.sh.
  • Documentation is accurate: README, HOW_TO, and findings align with the implemented code (except the hostname noted in L3).
  • PR body claims match the diff: 6 harness configs, 3 discovery methods × 2 network policies, all result artifacts present. The claim of "no behavior changes" is not made — the PR accurately describes new functionality.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label May 20, 2026
maruiz93 added a commit that referenced this pull request May 29, 2026
- Scrub credentials from error messages in provisioner (H3)
- Add timing-safe comparison comments in both servers (M1)
- Bind servers to 127.0.0.1 instead of 0.0.0.0 (M2)
- Remove bearer token from stdout in run.sh (M3)
- Add request body size limits in both servers (M4)
- Add comment about unbounded operations dict (L1)
- Fix stale host.docker.internal references in design docs (L3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
@maruiz93
Copy link
Copy Markdown
Contributor Author

Addressed in 4fbee55:

Finding Action
H1, H2 (input validation) Skipped — experiment PoC, not production code
H3 (token in error messages) Fixed — added _scrub_credentials() to strip tokens from error output
H4 (GCP key in sandbox) Skipped — accepted trade-off, documented as Tier 4 in ADR-0025
M1 (timing-safe comparison) Added comments noting production should use constant_time_compare/compare_digest
M2 (bind address) Fixed — both servers now bind to 127.0.0.1
M3 (token in stdout) Fixed — removed echo "Token: $API_TOKEN" from run.sh
M4 (body size limits) Fixed — added MaxBytesReader (Go) and Content-Length check (Python), 1 MB cap
M5 (path leakage) Skipped — acceptable for experiment
L1 (unbounded operations dict) Added comment acknowledging, acceptable for experiment scope
L2 (shallow healthz) Skipped — experiment scope
L3 (stale hostname) Fixedhost.docker.internalhost.openshell.internal in design + plan docs

maruiz93 and others added 4 commits May 29, 2026 13:57
Add design spec exploring how sandboxed agents can call host-side API
servers through the L7 network proxy, and implementation plan for
integrating the experiment with fullsend run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
Add builder (Go) and provisioner (Python) API servers with uniform
process contract: --port, --token, /healthz, /tools.json, SIGTERM
handling. Both use openshell sandbox download/upload for host-sandbox
file transfer with per-request sandbox identification.

Add fullsend run integration: 6 harness configs (3 discovery methods
x 2 network policies), skills for baked-instructions/openapi/tooluse
discovery, OpenShell provider for bearer token auth via credential
placeholders, L7 policies with host.openshell.internal hostname
matching, and run.sh wrapper managing server lifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
Add JSONL transcripts, replay HTML, and agent output for all 6 harness
runs. Add findings.md covering architecture validated (canonical
hostname, server contract, provider auth, file transfer, network
policies, orchestrator lifecycle), bugs found, token usage analysis,
and recommendations for the host-side API server pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
- Scrub credentials from error messages in provisioner (H3)
- Add timing-safe comparison comments in both servers (M1)
- Bind servers to 127.0.0.1 instead of 0.0.0.0 (M2)
- Remove bearer token from stdout in run.sh (M3)
- Add request body size limits in both servers (M4)
- Add comment about unbounded operations dict (L1)
- Fix stale host.docker.internal references in design docs (L3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Marta Anon <manon@redhat.com>
@maruiz93 maruiz93 force-pushed the experiment/host-side-api-server branch from 4fbee55 to 0df51be Compare May 29, 2026 11:57
Copy link
Copy Markdown

@fullsend-ai-review fullsend-ai-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.


localContext := tmpDir + "/context"
log.Printf("Downloading build context from sandbox %s:%s", sb, req.ContextDir)
dlCmd := exec.Command("openshell", "sandbox", "download", sb, req.ContextDir, localContext)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[critical] command-injection

User-controlled request fields (Tag, Dockerfile, ContextDir, Dest, Sandbox) are passed unsanitized as arguments to exec.Command calls for podman and openshell CLIs. Path traversal via req.Dockerfile (line 164) can escape the temp context directory; req.Tag passed to podman push (line 234) as a positional argument could be interpreted as a CLI flag.

Suggested fix: Validate all request fields: Tag against a strict image-tag regex rejecting leading dashes; Dockerfile via filepath.Base() rejecting path separators and ..; ContextDir and Dest via filepath.Clean() rejecting .. components; Sandbox against an alphanumeric pattern.



def clone_repo(repo: str, ref: str, dest: str) -> None:
"""Clone a GitHub repo into dest, checking out ref."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] command-injection

The repo parameter is validated only with 'not repo or / not in repo', allowing path traversal and extra path segments. The value is interpolated into the clone URL with an embedded github_token, so a crafted repo could leak the token to an unexpected repository.

Suggested fix: Validate repo against ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$.

clone_url = f"https://github.com/{repo}.git"

subprocess.run(
["git", "clone", "--depth", "1", "--branch", ref, clone_url, dest],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] command-injection

The ref parameter is passed directly to git clone --branch with no validation. While --branch consumes the next argument, the lack of validation is a defense-in-depth failure.

Suggested fix: Validate ref against ^[a-zA-Z0-9_./-]+$ and reject values starting with -. Add -- before the clone URL.



def upload_to_sandbox_named(sb: str, local_path: str, dest: str) -> None:
"""Upload files into the sandbox via openshell sandbox upload."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] command-injection

The sandbox and dest parameters are passed directly to subprocess.run for openshell sandbox upload with no validation. sandbox starting with -- could be interpreted as a flag; dest could contain path traversal sequences.

Suggested fix: Validate sandbox against alphanumeric-plus-hyphens. Validate dest as a relative path with no .. components. Add -- before positional arguments.

}

// NOTE: production code should use crypto/subtle.ConstantTimeCompare
if parts[1] != bearerToken {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] auth-weakness

Bearer token comparison uses != (non-constant-time). Timing side-channel could leak token information.

Suggested fix: Use crypto/subtle.ConstantTimeCompare as noted in the existing code comment.

with operations_lock:
operations[op_id] = result
log(f"Failed to provision {repo}: {exc}")
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] credential-exposure

CalledProcessError log message includes the full command with embedded GitHub token in the clone URL. _scrub_credentials() is only applied to the result dict, not the log output.

Suggested fix: Apply _scrub_credentials(str(exc)) in the log call.

r.Body = http.MaxBytesReader(w, r.Body, maxRequestBodySize)
var req BuildRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, fmt.Sprintf(`{"error":"invalid JSON: %s"}`, err), http.StatusBadRequest)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] json-injection

Error responses use fmt.Sprintf to interpolate into JSON string templates. Error messages containing quotes or backslashes produce malformed JSON.

Suggested fix: Use json.Marshal or json.NewEncoder for all JSON responses including errors.


// Download build context from sandbox
if sb == "" {
resp.Status = "failed"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] logic-error

buildHandler returns HTTP 200 on all failure paths. Clients must inspect the JSON body to detect failures, violating REST conventions.

Suggested fix: Return appropriate HTTP error status codes (500 for build failures, 400 for validation errors).

dest: /tmp/workspace/.env.d/gcp-vertex.env
expand: true
- src: ${GOOGLE_APPLICATION_CREDENTIALS}
dest: /tmp/workspace/.gcp-credentials.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] credential-exposure

All six harness files copy the GCP service account key file into the sandbox via host_files, undermining the credential isolation goal stated in the experiment design.

Suggested fix: Document this as a known limitation or use an OpenShell provider for GCP credentials.

@fullsend-ai-review fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label May 29, 2026
Copy link
Copy Markdown

@ben-alkov ben-alkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Experiment: host-side API server for sandboxed agents

2 participants