Skip to content

feat(ci): include human PR discussion in review prompts#582

Open
mariusvniekerk wants to merge 14 commits intomainfrom
ci-poller-include-pr-comments
Open

feat(ci): include human PR discussion in review prompts#582
mariusvniekerk wants to merge 14 commits intomainfrom
ci-poller-include-pr-comments

Conversation

@mariusvniekerk
Copy link
Copy Markdown
Collaborator

Summary

  • include human PR discussion in CI poller review prompts as non-authoritative context
  • order discussion newest-first and exclude bot/self roborev comments
  • persist the precomputed CI review prompt so worker execution preserves that discussion context

Verification

  • go vet ./...
  • go test ./...

Add human PR discussion to CI poller review prompts, ordered newest-first and filtered to exclude bots and roborev's own comments. Persist the precomputed prompt for CI review jobs so worker execution preserves that context.

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (79aa2b8)

Verdict: Changes introduce a high-severity prompt-injection risk in CI review handling and a medium-severity retry inconsistency in stored prompts.

High

  • Untrusted PR discussion is injected into the agent prompt
    • Locations: internal/daemon/ci_poller.go:387, internal/daemon/ci_poller.go:1597, internal/github/pr_discussion.go:53, internal/prompt/prompt.go:753
    • Issue: PR discussion content from GitHub is fetched and inserted into the review prompt, with recent comments given extra weight. That creates a new trust-boundary problem: untrusted PR authors or reviewers can place instructions in comments that may steer, suppress, or otherwise manipulate the automated review. For tool-enabled agents, this is effectively a remote prompt-injection path.
    • Recommendation: Do not pass raw external PR discussion directly into the review prompt. Limit ingestion to trusted actors, or summarize discussion in a separate constrained step and pass only the summary forward. If any discussion must be included, treat it strictly as untrusted data with strong delimiting/escaping and explicit instructions that it must never override code-based findings.

Medium

  • Stored prompt marker is lost after decode, making retries non-deterministic
    • Location: internal/daemon/worker.go:383
    • Issue: The worker decodes the stored prompt marker and then continues with the raw prompt body. If the same CI job is retried later, the marker is no longer present, so DecodeStoredReviewPrompt will not match and the worker may rebuild the prompt instead of reusing the original precomputed one. That can silently drop PR discussion context and change retry behavior.
    • Recommendation: Preserve the encoded marker when persisting the job prompt, or store an explicit flag/field indicating that the prompt is a precomputed full review prompt so retries always reuse the exact same prompt.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

mariusvniekerk and others added 2 commits March 27, 2026 10:15
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (47d1a7a)

Verdict: One medium-severity regression risk was found; otherwise the changes look clean.

Medium

  • Stored prompt override is lost on retry/reclaim
    • Location: internal/daemon/worker.go:383, internal/prompt/prompt.go:261
    • Issue: The worker consumes the stored-review marker and persists only the decoded prompt body. If the same CI job is retried or later reclaimed, DecodeStoredReviewPrompt will no longer match, so the worker rebuilds a fresh prompt instead of reusing the original stored prompt. That can silently drop the trusted PR discussion context injected at enqueue time.
    • Suggested fix: Preserve the marker in job.Prompt through processing, or persist an explicit flag/type indicating the prompt is a precomputed stored review prompt so retries continue using the original override.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (190df5b)

Verdict: Changes introduce one high-severity prompt-injection risk and several medium-severity correctness/safety issues.

High

  • Prompt injection via trusted PR discussion
    Location: internal/daemon/ci_poller.go#L603, internal/daemon/ci_poller.go#L1641
    PR discussion fetched from GitHub is injected verbatim into the main review prompt and framed as relevant context. Even with admin/maintain filtering, this creates a remote prompt-injection channel: a compromised maintainer account or malicious insider could steer the reviewer, suppress findings, or influence any tool-using agent behavior.
    Suggested fix: Do not place free-form PR discussion directly into the execution prompt. If it must be included, treat it as untrusted quoted data, sanitize markdown/control characters, and add an explicit rule that discussion cannot override code or diff evidence.

Medium

  • Stored prompt can diverge from the prompt actually executed on retry
    Location: internal/daemon/worker.go:382-413
    promptToPersist is initialized from job.Prompt for every job, not only the stored-review override case. On a retried normal review job, the worker rebuilds a fresh prompt for execution but persists the previous attempt’s prompt back to the database, leaving stored state out of sync with what actually ran.
    Suggested fix: Only preserve job.Prompt when it decodes as an explicit stored-review override; otherwise persist the newly built prompt or reuse the stored prompt as the retry input.

  • Stored prompt wrapper leaks into prompt-inspection/read paths
    Location: internal/daemon/ci_poller.go:606-630, internal/daemon/worker.go:382-413, internal/prompt/prompt.go:265-284
    CI prompts are now stored in jobs.prompt with <roborev-stored-review-prompt> wrapper tags, but decoding was only added on the worker execution path. Any other consumer that reads jobs.prompt for display/inspection will now see the sentinel markup rather than the real prompt text.
    Suggested fix: Keep the override marker separate from user-visible prompt text, or decode it everywhere jobs.prompt is read.

  • Unescaped comment.Path allows prompt injection through filenames
    Location: internal/daemon/ci_poller.go#L1670
    comment.Path is inserted into the prompt without escaping. Since Git paths can contain backticks and control characters, a crafted filename can break out of the intended formatting and inject arbitrary prompt text via an inline review comment, bypassing the collaborator-author trust check.
    Suggested fix: Safely encode paths before embedding them in prompt text, for example with strconv.Quote or JSON string encoding, or strip control characters/backticks.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Co-authored-by: OpenAI Codex <codex@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 27, 2026

roborev: Combined Review (0e0c566)

Summary verdict: Changes are mostly sound, but there is one medium-severity correctness issue to address before merge.

Medium

  • Stored review prompts are sent back to the agent with persistence markers instead of the decoded prompt body
    Location: internal/daemon/worker.go:382
    Problem: When DecodeStoredReviewPrompt succeeds, the worker re-encodes the prompt and passes the marker-wrapped value to agent.Review instead of the decoded prompt body. This causes <roborev-stored-review-prompt> markers to become part of the model input for CI jobs using a stored prompt, and it can also mutate legacy stored prompts on retry.
    Fix: Pass decodedStoredPrompt as reviewPrompt to the agent, and reserve the encoded form only for persistence in job.Prompt / SaveJobPrompt.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

mariusvniekerk and others added 2 commits March 27, 2026 13:24
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 28, 2026

roborev: Combined Review (607d777)

Verdict: Changes are mostly sound, but there is one medium-severity regression in stored prompt handling that should be fixed before merge.

Medium

  • Stored review prompts are sent to the agent with wrapper markup still attached
    Location: internal/daemon/worker.go:384
    Precomputed review prompts are persisted with <roborev-stored-review-prompt> wrapper tags, and the worker now forwards that wrapped value directly to the agent instead of decoding it first. As a result, CI reviews that reuse stored PR-discussion context receive sentinel markup around the actual prompt text, which can alter agent behavior and break the intended prompt format.
    Suggested fix: Decode stored review prompts before invoking the agent, while continuing to store the encoded form so retry logic can still identify exact persisted payloads.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 28, 2026

roborev: Combined Review (ca85404)

Verdict: Changes look mostly sound, but there is one medium-severity regression risk that should be addressed before merge.

Medium

  • internal/daemon/ci_poller.go:607: The new PR-discussion path builds the full review prompt eagerly during enqueue, and a BuildWithAdditionalContext failure now causes processPR to abort before creating any jobs. Because PR discussion is optional context, a transient prompt-build/git/db failure can suppress the entire CI review batch instead of falling back to the existing worker-side prompt generation and retry path.
    • Suggested fix: if stored-prompt construction fails, log the error and enqueue the job without Prompt so the worker can build and retry it as before.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 28, 2026

roborev: Combined Review (ef45ad7)

Verdict: Changes are not ready to merge due to 1 High and 1 Medium issue.

High

  • internal/github/pr_discussion.go:171
    jq's @base64 formatter only accepts strings. Encoding objects directly with .[] | @base64 will fail with object cannot be base64-encoded, which means PR discussion loading will break entirely.
    Fix: encode JSON strings instead, e.g. .[] | tojson | @base64.

Medium

  • internal/daemon/worker.go:384
    The worker determines whether job.Prompt is a prebuilt review prompt by checking for the literal substring <untrusted-pr-discussion>. That content-based heuristic can misclassify unrelated jobs whose prompt or diff happens to contain that text, causing normal prompt rebuilding and/or prompt-native preamble handling to be skipped based on reviewed content instead of job metadata.
    Fix: record prompt mode explicitly in job metadata or storage and branch on that flag rather than scanning prompt contents. Add a regression test covering a normal prompt that contains the marker literally.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 29, 2026

roborev: Combined Review (37f7218)

Verdict: 2 medium-severity issues found; no high or critical findings.

Medium

  • Stale prompt reuse can be triggered by prompt content

    • Location: worker.go
    • Issue: The worker infers whether a saved prompt is an exact prebuilt review prompt by checking for the substring "<untrusted-pr-discussion>" in the prompt body. Because this marker can also appear in diff content, guidelines, or prior review text, unrelated retries can be misclassified and incorrectly reuse stale prompt content.
    • Recommended fix: Persist explicit job metadata indicating whether the prompt is an exact prebuilt prompt instead of inferring it from prompt text.
  • PR discussion context is lost on prompt prebuild failure

    • Location: ci_poller.go
    • Issue: If prompt prebuild fails, the job is queued with an empty Prompt and the PR discussion context is dropped. The worker then rebuilds a standard review prompt without the discussion, so transient prebuild failures silently remove maintainer feedback from the review context.
    • Recommended fix: Store the PR discussion context, or the structured prompt-build inputs, separately so the worker can reconstruct the intended prompt later.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Co-authored-by: OpenAI Codex
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (410d379)

PR is not ready to merge: 2 high-severity issues and 1 medium-severity regression need to be addressed.

High

  • Credential leakage and stale auth in clone URLs
    Locations: internal/github/repo_ops.go:102, internal/daemon/ci_poller.go:917
    CloneURL embeds the GitHub token directly in the HTTPS remote, and ghClone uses that URL for git clone. This exposes the token in process arguments, can persist it in .git/config as remote.origin.url, and can leak it through clone/fetch error output. It also creates a correctness problem for GitHub App installation tokens: once the stored remote URL contains an expired token, later fetches against an existing clone will start failing. Use a transient auth mechanism instead of credential-bearing URLs, or at minimum ensure the remote is immediately rewritten to a token-free URL and all error/log output is sanitized.

  • Daemon can panic on valid GitHub comment content
    Location: internal/daemon/ci_poller.go (writeEscapedPromptXML)
    xml.EscapeText can return an error for Unicode code points that are invalid in XML 1.0. Because that error path currently panics, a PR author can include such characters in discussion text and crash the daemon. Filter invalid XML characters during sanitization or handle the escape error without panicking.

Medium

  • Regression: gh auth login fallback no longer works
    Locations: cmd/roborev/ci.go:395, internal/daemon/ci_poller.go:983
    The new GitHub client path only supports GitHub App tokens or GH_TOKEN/GITHUB_TOKEN. The prior gh-based flow also worked for users authenticated via gh auth login without token env vars, so existing setups may lose the ability to list PRs, clone private repos, or post comments after upgrade. Restore a gh credential fallback or make the new token requirement explicit and enforced as a migration.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The new GitHub client path started requiring GH_TOKEN or GITHUB_TOKEN directly, which regressed setups that relied on gh auth login for local authentication. That broke the prior operator workflow for CI comment posting and daemon poller access even though gh was still authenticated.

This change centralizes token resolution so the new client code still prefers explicit and environment-provided tokens, but falls back to gh auth token when no direct token source is available. The accompanying test updates keep the pre-commit lint rules green while locking in the restored fallback behavior for both the CLI CI path and the daemon poller path.

Validation: go fmt ./...; go vet ./...; go test ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (281292f)

Verdict: Changes introduce two high-severity regressions and one medium-severity reliability issue that should be addressed before merge.

High

  1. GitHub token is exposed via authenticated clone URLs

    • Locations: internal/github/repo_ops.go:101, internal/daemon/ci_poller.go:917
    • CloneURL now embeds the token directly in the HTTPS remote, and ghClone passes that URL to git clone. This can leak credentials through process arguments, persist them in the cloned repo's .git/config, and surface them in clone error output.
    • Recommended fix: Clone the plain repository URL and pass credentials out-of-band, such as http.extraHeader, GIT_ASKPASS, or a temporary credential helper. Redact clone errors so tokens cannot be echoed.
  2. GitHub Enterprise/custom host support was dropped

    • Locations: internal/github/client.go, internal/github/repo_ops.go, internal/daemon/ci_poller.go
    • The new client/clone path hardcodes the default GitHub API and web hosts instead of propagating the configured base URL/host. Any GitHub Enterprise or other non-default-host setup will send API requests and clone operations to the wrong host.
    • Recommended fix: Thread the configured GitHub base URL/host into NewClient via WithBaseURL, and derive clone URLs from repo/provider configuration instead of hardcoding github.com.

Medium

  1. Commit status updates can hang indefinitely
    • Location: internal/daemon/ci_poller.go
    • setCommitStatus uses context.Background() with a client path that no longer enforces the prior HTTP timeout, so a stalled GitHub request can block a daemon worker indefinitely.
    • Recommended fix: Use context.WithTimeout(...) for status updates and/or construct the GitHub client with an http.Client that has a timeout.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

mariusvniekerk and others added 3 commits March 30, 2026 14:29
The CI poller was still embedding GitHub tokens directly into clone remotes and leaving those URLs behind in managed clones. That leaked credentials into process arguments and git config, and it made cached clones brittle once installation tokens expired.

This change switches clone and fetch operations to transient git auth via command-scoped extra headers, rewrites managed clone remotes back to token-free HTTPS URLs, and adds regression coverage for both the transport hardening and the remote rewrite path. It also strips XML-invalid runes from PR discussion content before escaping so malformed-but-valid GitHub text cannot panic the daemon.

Validation: go fmt ./...; go vet ./...; go test ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 30, 2026

roborev: Combined Review (0393a51)

Verdict: 3 medium-severity issues remain; no high or critical findings were identified.

Medium

  • internal/daemon/ci_poller.go (setCommitStatus), internal/github/client.go (NewClient)
    Commit status updates no longer have a request timeout. The previous implementation used an http.Client with a 30s timeout, but the new path calls client.SetCommitStatus(context.Background(), ...) via a client built on http.DefaultClient, which has no timeout. A stalled GitHub request can now block CI status reporting indefinitely.
    Suggested fix: Use a bounded context such as context.WithTimeout around SetCommitStatus, or construct the GitHub client with an HTTP client that enforces a timeout.

  • internal/github/repo_ops.go (listUserRepos)
    Errors from Repositories.ListByAuthenticatedUser are silently ignored with break, so wildcard expansion for user-owned repos can return only public repos from ListByUser and quietly omit private repos. This is a behavioral regression from the previous gh repo list flow, and the partial result may be cached for an hour as if it were complete.
    Suggested fix: Propagate the authenticated-user listing error, or explicitly mark the expansion as degraded instead of returning a silent partial result.

  • internal/github/pr_discussion.go (lines 59, 82, 113)
    GetCreatedAt() and GetSubmittedAt() return *github.Timestamp. If GitHub omits those fields, dereferencing .Time directly can panic on a nil pointer.
    Suggested fix: Guard the timestamp pointer before accessing .Time, e.g. if t := item.GetCreatedAt(); t != nil { ... }.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant