Harden generated output handling#581
Open
glmgbj233 wants to merge 3 commits into
Open
Conversation
Keep generated output behind the intended trust boundary while preserving the normal safe workflow. Add regression coverage for the unsafe flow and the expected safe behavior.
Contributor
Performance
✓ No regressions detected |
Contributor
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
Contributor
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
❌ Patch gate failed
How to fix
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Vulnerability
This fixes a prompt-injection data flow where attacker-controlled or otherwise untrusted input can influence LLM output, and that model output reaches a privileged sink.
(*Client).ExecuteToolCallLoopResult - sdk/go/ai/tool_calling.go:169 - sink kind: llm_api_response(*Agent).Call - sdk/go/agent/agent.go:1596 - sink kind: http_requestAttack scenario
An attacker can influence the LLM output at
(*Client).ExecuteToolCallLoopResult - sdk/go/ai/tool_calling.go:169 - sink kind: llm_api_responseso the model selects a URL or request target. If that output reaches(*Agent).Call - sdk/go/agent/agent.go:1596 - sink kind: http_request, generated text can drive an unintended network request.Attack path
Fix
The fix adds a trust-boundary check before LLM-derived data can reach the sink, while preserving the intended benign workflow. The dangerous effect is blocked after the LLM step instead of only reformatting or re-marshalling model output.
Changed files:
sdk/go/agent/agent.gosdk/go/agent/agent_test.goWhy this mitigates the issue
Prompt injection is mitigated by preventing model-controlled text from directly selecting, poisoning, replaying, executing, or persisting a privileged action across the identified boundary. Benign model output can still follow the constrained safe path.
Functionality preservation
The repair is intended to keep normal safe behavior available and restrict only unsafe LLM-derived behavior at the sink boundary. Normal-case and exploit-regression coverage should be reviewed in the changed tests.
Tests / verification
Local Go verification: passed, return code 0
Release notes
Security fix only; no user-facing release note is required.
Repository release-note guidance checked:
.github/PULL_REQUEST_TEMPLATE.mdCHANGELOG.md