perf: bypass HTTP self-call from /run to /v1/responses by ananthsub · Pull Request #1439 · NVIDIA-NeMo/Gym

ananthsub · 2026-05-28T07:19:45Z

Each agent's /run handler currently makes an HTTP self-call to its own /v1/responses endpoint. This adds extra aiohttp round-trip + FastAPI middleware + pydantic re-validation on every rollout.

This PR converts all production agents that still do this to call the implementation in-process.

To keep backwards compatibility, the refactor extracts the existing responses() body into a private helper. The public responses(request, response, body) method keeps its FastAPI signature.

Microbenchmarks:

Model inference latency still dominates any of this framework overhead
With a real OpenAI endpoint for 4.1o, the absolute per-rollout saved ~7% rps on simple agent
the speedup is more pronounced for agents like proof_refinement_agent which have more HTTP hops as part of the implementation

agent	http requests/second	inprocess requests/second	speedup
`simple_agent`	2.95	3.17	+7.5%
`proof_refinement_agent`	1.18	1.34	+13.3%

copy-pr-bot · 2026-05-28T07:19:49Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…sponses Each agent's `/run` handler previously made an HTTP self-call to its own `/v1/responses` endpoint, paying for aiohttp round-trip, FastAPI middleware, and pydantic re-validation on every rollout. This converts all production agents that still did this to call the implementation in-process. The refactor extracts the existing `responses()` body into a private `_responses(body, cookies) -> (response, set_cookies)` helper. The public `responses(request, response, body)` method keeps its FastAPI signature and is now a 4-line adapter that delegates to `_responses`, so external HTTP callers see no change. `/run` replaces the 7-line self-call block with two lines that call `_responses` directly. Agents converted: simple_agent, proof_refinement_agent, non_executing_simple_agent, speed_bench_agent, cvdp_agent, finance_agent, browsecomp_agent, hermes_agent, claude_code_agent, and the langgraph_agent base + 4 subclasses (rewoo, orchestrator, parallel_thinking, reflection). swe_agents, aviary_agent, verifiers_agent, and stirrup_agent were already on this pattern. Microbenchmark shows ~0.7-1.2 ms framework overhead per self-call at concurrency=1, scaling to a ~70-150x rps multiplier in the LLM-overhead-isolated case. End-to-end with a mock model: simple_agent +25% rps / 20% wall-time reduction; proof_refinement_agent (3 self-calls per rollout) up to +2x rps / 50% wall-time reduction at concurrency=256. With a real OpenAI model in the loop the absolute per-rollout savings carry over (~7% rps on simple, ~13% on pref) and the HTTP path also hit FD exhaustion at concurrency=1024 where the in-process path did not. Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

cmunley1 · 2026-05-29T21:51:35Z

Did you run training or benchmarks to confirm no accuracy or convergence degradation? Seems fine to me otherwise, although the logic around cookies seems to change a bit

ananthsub requested review from adil-a, bxyu-nvidia and cmunley1 May 28, 2026 07:19

ananthsub force-pushed the inprocess-self-call-perf branch from 9dc0d1c to ed66904 Compare May 28, 2026 07:22

ananthsub force-pushed the inprocess-self-call-perf branch from ed66904 to 75473cb Compare May 28, 2026 07:36

ananthsub marked this pull request as ready for review May 28, 2026 12:54

copy-pr-bot Bot temporarily deployed to public May 28, 2026 12:55 Inactive

copy-pr-bot Bot temporarily deployed to public May 28, 2026 12:56 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bypass HTTP self-call from /run to /v1/responses#1439

perf: bypass HTTP self-call from /run to /v1/responses#1439
ananthsub wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ananthsub:inprocess-self-call-perf

ananthsub commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

cmunley1 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ananthsub commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

cmunley1 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants