Implement reproducer agent#610
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new "reproducer" agent skill to Ymir, designed to automatically create, verify, and publish minimal reproducers for RHEL bugs and CVEs. It adds the reproducer workflow, prompts, BeakerLib templates, Testing Farm integration tools (for machine reservation, remote command execution, and file copying), and cleanup middleware. The feedback focuses on critical improvements: wrapping synchronous blocking calls (such as API requests and directory removals) in asyncio.to_thread to prevent event loop starvation, fixing a scoping bug in the BeakerLib test template where variables are lost across subshells, robustly parsing LLM JSON responses to handle markdown code blocks, and using Path.is_relative_to for safer path validation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
I am having trouble creating individual review comments. Click here to see my feedback.
ymir/tools/privileged/testing_farm.py (346)
The _testing_farm_api_post call is synchronous and blocking. Since this tool is executed within an asynchronous context (async def _run), calling it directly blocks the event loop and can starve other concurrent tasks. Wrap the call in asyncio.to_thread to run it in a separate thread.
response = await asyncio.to_thread(_testing_farm_api_post, "requests", json=body)
ymir/tools/privileged/testing_farm.py (397-409)
There are two issues here:
- The
_testing_farm_api_getcall is synchronous and blocking. It should be run in a separate thread usingasyncio.to_threadto avoid blocking the event loop during polling. - Catching only
requests.HTTPErrormeans other requests-related exceptions (likeConnectionErrororTimeout) will propagate to the genericexcept Exceptionblock and immediately abort the polling loop. Catchingrequests.RequestExceptionand treating connection/timeout errors as transient makes the polling loop much more resilient to temporary network issues.
response = await asyncio.to_thread(_testing_farm_api_get, f"requests/{tool_input.request_id}")
except requests.RequestException as e:
is_transient = False
if isinstance(e, requests.HTTPError) and e.response is not None:
if e.response.status_code in _TRANSIENT_HTTP_CODES:
is_transient = True
elif isinstance(e, (requests.ConnectionError, requests.Timeout)):
is_transient = True
if is_transient:
logger.warning(
"Transient error %s polling TF %s (attempt %d/%d)",
e, tool_input.request_id, attempt, max_attempts,
)
if attempt < max_attempts:
await asyncio.sleep(poll_interval)
continue
raise ToolError(
f"Failed to get Testing Farm reservation details {tool_input.request_id}: {e}"
) from e
ymir/tools/privileged/testing_farm.py (424)
The requests.get call to fetch pipeline.log is synchronous and blocking. Wrap it in asyncio.to_thread to prevent blocking the event loop.
log_resp = await asyncio.to_thread(requests.get, f"{artifacts_url}/pipeline.log", timeout=30)
ymir/tools/privileged/testing_farm.py (487)
The _testing_farm_api_delete call is synchronous and blocking. Wrap it in asyncio.to_thread to avoid blocking the event loop.
await asyncio.to_thread(_testing_farm_api_delete, f"requests/{request_id}")
ymir/tools/privileged/gitlab.py (505)
The shutil.rmtree call is synchronous and blocking. Removing a cloned git repository can take several seconds, which will block the event loop. Wrap it in asyncio.to_thread to run it asynchronously.
await asyncio.to_thread(shutil.rmtree, clone_path)
ymir/agents/prompts/reproducer/prompt.j2 (300-302)
In BeakerLib, each rlRun invocation executes in a separate subshell. Assigning TmpDir=\$(mktemp -d) inside rlRun means the variable is lost immediately after that command exits. Subsequent commands like rlRun "pushd \$TmpDir" will fail because TmpDir is empty. Assign TmpDir in the parent shell context instead, and let it expand in the parent shell when passing to rlRun.
TmpDir=\$(mktemp -d)
ORIG_DIR=\$(pwd)
rlRun "pushd \$TmpDir"
ymir/agents/prompts/reproducer/prompt.j2 (331)
Since TmpDir is now defined in the parent shell context, do not escape the $ in the rlRun call so that it expands in the parent shell before being executed.
rlRun "rm -rf \$TmpDir" 0 "Removing tmp directory"
agents_as_skills/reproducer/SKILL.md (300-302)
In BeakerLib, each rlRun invocation executes in a separate subshell. Assigning TmpDir=\$(mktemp -d) inside rlRun means the variable is lost immediately after that command exits. Subsequent commands like rlRun "pushd \$TmpDir" will fail because TmpDir is empty. Assign TmpDir in the parent shell context instead, and let it expand in the parent shell when passing to rlRun.
TmpDir=\$(mktemp -d)
ORIG_DIR=\$(pwd)
rlRun "pushd \$TmpDir"
agents_as_skills/reproducer/SKILL.md (331)
Since TmpDir is now defined in the parent shell context, do not escape the $ in the rlRun call so that it expands in the parent shell before being executed.
rlRun "rm -rf \$TmpDir" 0 "Removing tmp directory"
ymir/agents/reproducer_agent.py (238)
Directly calling OutputSchema.model_validate_json on the raw LLM response text is fragile. If the LLM wraps the JSON in markdown code blocks (e.g., ```json ... ```), validation will fail. Clean the response text to robustly extract the JSON payload before validation.
raw_text = response.last_message.text.strip()
if raw_text.startswith("```"):
lines = raw_text.splitlines()
if lines[0].startswith("```"):
lines = lines[1:]
if lines and lines[-1].startswith("```"):
lines = lines[:-1]
raw_text = "\n".join(lines).strip()
state.result = OutputSchema.model_validate_json(raw_text)
ymir/tools/privileged/testing_farm.py (571)
Define _ALLOWED_COPY_BASES as Path objects to enable more robust and idiomatic path validation using is_relative_to instead of string prefix matching.
_ALLOWED_COPY_BASES = (Path("/git-repos"), Path("/tmp"))
ymir/tools/privileged/testing_farm.py (580-587)
Use is_relative_to for path validation to be consistent with the rest of the codebase (e.g., gitlab.py) and to avoid fragile string-based prefix matching.
@field_validator("local_paths")
@classmethod
def validate_local_paths(cls, v: list[str]) -> list[str]:
for p in v:
resolved = Path(p).resolve()
if not any(resolved.is_relative_to(base) for base in _ALLOWED_COPY_BASES):
raise ValueError(f"Path {p} is not under an allowed directory")
return v
1ef1ca7 to
a3589b5
Compare
Add ReproducerInputSchema and ReproducerOutputSchema for structured agent input/output with validation. Includes comprehensive unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add REPRODUCER_QUEUE name and lifecycle labels (REPRODUCER_TRIAGED, REPRODUCER_COMPLETED, REPRODUCER_FAILED, REPRODUCER_ERRORED) for tracking reproducer agent pipeline state in Jira. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make TF API URL configurable via TESTING_FARM_API_URL env var (defaults to testing-farm.io). Add _redact_secrets for safe logging of API payloads, _testing_farm_api_delete helper, _ensure_gateway_ssh_key for gateway-managed SSH key generation, and _parse_tf_request helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three MCP tools for the TF machine reservation lifecycle: - ReserveTestingFarmMachineTool: provisions machines with SSH access, security group ingress rules, and configurable compose/arch/duration - GetTestingFarmReservationDetailsTool: polls SSH availability from pipeline.log with built-in retry (20 attempts, 30s intervals) and transient HTTP 502/503/504 retry - CancelTestingFarmRequestTool: releases reserved machines via API Includes input validation with regex patterns for request_id and remote_dir parameters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two MCP tools for executing commands on reserved TF machines: - RunRemoteCommandTool: runs commands via SSH with configurable timeout, working directory, and gateway SSH key - CopyFilesToRemoteTool: transfers files via SCP with path validation against an allowlist of safe base directories Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive test suite covering all TF tools: reservation, polling with retry/transient-error handling, cancellation, remote command execution, file copying, input validation patterns, SSH key management, and dry-run mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Before: push failures returned only "Failed to push to the specified repository" with no detail. Worse, git stderr could contain credential helper output (Authorization headers, Basic auth tokens) which would be logged or returned to the LLM agent verbatim. Fix: capture stderr via subprocess.PIPE + communicate(), filter it through _sanitize_git_stderr (strips lines matching auth-related patterns), and include the sanitized output in the error message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Before: clone_repository unconditionally called mkdir(exist_ok=True) then git clone, which fails with "destination path already exists" if a previous clone attempt left a partial directory behind (e.g. a network failure mid-clone). Fix: for branchless clones (the common path), remove the target directory if it already exists before cloning. The rmtree is guarded by an allowlist of safe parent directories (/git-repos/, /tmp/) to prevent accidental deletion of arbitrary paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `except Exception` handlers in 5 tools (AddMergeRequestLabels, AddMergeRequestComment, GetAuthorizedCommentsFromMergeRequest, GetMergeRequestDetails, SearchGitlabProjectMrs) catch ToolError alongside genuine exceptions, wrapping it in a second ToolError and losing the specific error message. Add `except ToolError: raise` before each `except Exception` to preserve the original error. Also remove the stale local `from beeai_framework.tools import ToolError` in SearchGitlabProjectMrsTool in favor of the existing top-level import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Event-based middleware that tracks active Testing Farm reservations during agent runs. If the agent crashes or is interrupted, any reservations left in the tracker are cancelled on cleanup, preventing orphaned machines from burning Testing Farm quota. Listens for tool success/error events and maintains a set of active request IDs, removing them when the agent cancels them normally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
717-line structured prompt that guides the LLM through the full reproducer workflow: analyzing the Jira issue, designing a BeakerLib test, provisioning a Testing Farm machine, iterating on the test until it reliably demonstrates the bug, and opening a merge request. Extracted as a Jinja2 template to keep the Python agent code focused on orchestration rather than prompt content. Uses template variables for issue metadata, tool names, and configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main agent module that wires together the reproducer workflow: - Dedup guard: checks Jira labels before starting to prevent duplicate runs across concurrent deployments - Tool registration: assembles the MCP gateway tools (GitLab, Testing Farm, SSH) with Jira and file-access tools - TF cleanup middleware: attaches crash-recovery reservation tracking - Structured output: parses LLM response into ReproducerOutputSchema - Retry logic: up to 3 attempts on transient LLM/tool failures - Jira lifecycle labels: marks issues through queued → in-progress → done/failed/error states Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- compose.yaml: Add reproducer-agent service definition with MCP gateway dependency, Jira/TF environment variables, and volume mounts for git repos and prompt templates - Makefile: Add reproducer-related targets (run, test, lint) - .gitignore: Exclude secrets directory and agent runtime artifacts - templates/beeai-agent.env: Add REPRODUCER_QUEUE variable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SKILL.md that registers the reproducer as a Claude Code slash-command skill. Defines the input schema (Jira issue key), required environment variables, and invocation instructions for running the reproducer through the Claude Code CLI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8728f06 to
f74052a
Compare
for more information, see https://pre-commit.ci
|
ad gemini findings:
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'reproducer' agent to automate the creation, verification, and publication of minimal reproducers for RHEL bugs and CVEs. It adds the core reproducer agent workflow, prompt templates, Testing Farm reservation cleanup middleware, and several new Testing Farm tools. The review feedback highlights several key areas for robustness and performance improvements, including wrapping blocking synchronous calls in asyncio.to_thread, resolving allowed parent paths to absolute paths to prevent validation failures, defensively parsing LLM JSON responses that may contain markdown wrapping, and safely accessing event attributes in the cleanup middleware.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ) | ||
|
|
||
| try: | ||
| _ensure_gateway_ssh_key() |
There was a problem hiding this comment.
Calling _ensure_gateway_ssh_key() synchronously inside an async def _run blocks the event loop because it performs synchronous file I/O and runs a subprocess. Wrap this call in asyncio.to_thread to run it asynchronously.
| _ensure_gateway_ssh_key() | |
| await asyncio.to_thread(_ensure_gateway_ssh_key) |
| } | ||
| ) | ||
|
|
||
| _ensure_gateway_ssh_key() |
There was a problem hiding this comment.
Calling _ensure_gateway_ssh_key() synchronously inside an async def _run blocks the event loop because it performs synchronous file I/O and runs a subprocess. Wrap this call in asyncio.to_thread to run it asynchronously.
| _ensure_gateway_ssh_key() | |
| await asyncio.to_thread(_ensure_gateway_ssh_key) |
| Path(os.environ.get("GIT_REPO_BASEPATH", "/git-repos")), | ||
| Path("/tmp"), # noqa: S108 | ||
| } | ||
| if not any(clone_path.resolve().is_relative_to(p) for p in allowed_parents): |
There was a problem hiding this comment.
If GIT_REPO_BASEPATH is configured as a relative path or contains symlinks, clone_path.resolve().is_relative_to(p) can raise a ValueError or return False due to mixing absolute and relative paths. To prevent false-positive directory traversal blocks, resolve all paths in allowed_parents to absolute paths using .resolve().
| Path(os.environ.get("GIT_REPO_BASEPATH", "/git-repos")), | |
| Path("/tmp"), # noqa: S108 | |
| } | |
| if not any(clone_path.resolve().is_relative_to(p) for p in allowed_parents): | |
| Path(os.environ.get("GIT_REPO_BASEPATH", "/git-repos")).resolve(), | |
| Path("/tmp").resolve(), # noqa: S108 | |
| } | |
| if not any(clone_path.resolve().is_relative_to(p) for p in allowed_parents): |
| ), | ||
| **get_agent_execution_config(), | ||
| ) | ||
| state.result = OutputSchema.model_validate_json(response.last_message.text) |
There was a problem hiding this comment.
LLMs frequently wrap JSON responses in markdown code blocks (e.g., ```json ... ```) even when instructed not to. Directly calling model_validate_json on such wrapped text will raise a JSONDecodeError. Extract the raw JSON string from markdown code blocks before parsing to make the validation more robust.
| state.result = OutputSchema.model_validate_json(response.last_message.text) | |
| raw_text = response.last_message.text.strip() | |
| if raw_text.startswith("```"): | |
| lines = raw_text.splitlines() | |
| if lines[0].startswith("```"): | |
| lines = lines[1:] | |
| if lines and lines[-1].startswith("```"): | |
| lines = lines[:-1] | |
| raw_text = "\n".join(lines).strip() | |
| state.result = OutputSchema.model_validate_json(raw_text) |
| async def _on_reserve(self, event: ToolSuccessEvent, meta: EventMeta) -> None: | ||
| request_id = event.output.result.get("id") | ||
| if request_id: | ||
| self._reserved.add(request_id) | ||
| logger.debug("Tracked TF reservation %s", request_id) | ||
|
|
||
| async def _on_cancel(self, event: ToolSuccessEvent, meta: EventMeta) -> None: | ||
| request_id = event.input.request_id if hasattr(event.input, "request_id") else None | ||
| if request_id: | ||
| self._cancelled.add(request_id) | ||
| logger.debug("Tracked TF cancellation %s", request_id) |
There was a problem hiding this comment.
To prevent potential runtime crashes in the event listeners (which could disrupt the agent's execution), use defensive programming when accessing event.input and event.output. event.input and event.output can be dictionaries or Pydantic models depending on the framework adapter, and event.output.result can occasionally be None.
async def _on_reserve(self, event: ToolSuccessEvent, meta: EventMeta) -> None:
output = getattr(event, "output", None)
result = getattr(output, "result", None) if output else None
request_id = result.get("id") if isinstance(result, dict) else None
if request_id:
self._reserved.add(request_id)
logger.debug("Tracked TF reservation %s", request_id)
async def _on_cancel(self, event: ToolSuccessEvent, meta: EventMeta) -> None:
request_id = None
tool_input = getattr(event, "input", None)
if isinstance(tool_input, dict):
request_id = tool_input.get("request_id")
elif tool_input:
request_id = getattr(tool_input, "request_id", None)
if request_id:
self._cancelled.add(request_id)
logger.debug("Tracked TF cancellation %s", request_id)
TODO:
test_push_to_remote_repositoryfor newcommunicate()APIagents_as_skills/reproducer/SKILL.md)Background
I've been using and improving a custom reproducer skill over the last few months. After a meeting with @TomasKorbar, I took my reproducer skill and @mkyral's cve-test-generator (kudos to him) as references, and let Claude Opus 4.6 read the existing ai-workflows agents and codebase to prepare an implementation plan: basic functionality + tests, monofunctional commits. Once it was working, I started an iterative implement - review - fix - test run loop with Claude. This PR is the result after 16 runs.
What
Adds a reproducer agent that takes a Jira issue, provisions a Testing Farm machine, designs and iterates on a BeakerLib reproducer test against real RHEL, and opens a merge request with the working test.
New components
testing_farm.py):ReserveTestingFarmMachine,GetTestingFarmReservationDetails,CancelTestingFarmRequest,RunRemoteCommand,CopyFilesToRemote- full reservation lifecycle and remote execution via the MCP gatewaytf_cleanup_middleware.py): event-based tracker that cancels leaked Testing Farm reservations on agent crash, preventing orphaned machines from burning quotareproducer_agent.py): orchestration with dedup guard (prevents duplicate runs across concurrent deployments), structured output, retry logic, and Jira lifecycle labelsprompts/reproducer/prompt.j2): 717-line Jinja2 template guiding the LLM through issue analysis, test design, TF provisioning, iterative verification, and MR submissionagents_as_skills/reproducer/SKILL.md): registers the agent as a Claude Code slash-command skillBug fixes in existing GitLab tools
push_to_remote_repository: was silently failing with a generic error and leaking credentials via git stderr. Now captures stderr viasubprocess.PIPE + communicate(), filters auth-related lines through_sanitize_git_stderr, and includes sanitized output in the error message.clone_repository: failed on retry when a previous clone left a partial directory behind. Now removes the target directory before cloning (guarded by an allowlist of safe parent paths).except Exceptionhandlers that caughtToolErrorand re-wrapped it, losing the specific error message. Addedexcept ToolError: raisebefore eachexcept Exception.Test coverage
test_push_to_remote_repositoryfor the newcommunicate()APITest runs and results
The agent was validated over 8 end-to-end runs against real Jira issues.
The cost difference is driven by bug complexity: ksh required 22 remote commands (valgrind, pmap, multiple reproducer variants, RSS threshold tuning) vs file's 6 (create a 32-byte binary, run
file, check output). Both runs demonstrate the agent generalizes across different bug types.Merge requests created by the agent
Related to
#553