Skip to content

openenvrolloutprocessor#336

Merged
shreymodi1 merged 11 commits intomainfrom
shrey/OpenEnvRolloutProcessor
Nov 20, 2025
Merged

openenvrolloutprocessor#336
shreymodi1 merged 11 commits intomainfrom
shrey/OpenEnvRolloutProcessor

Conversation

@shreymodi1
Copy link
Contributor

@shreymodi1 shreymodi1 commented Nov 17, 2025


name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''


Description

Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)
Implements # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Refactoring/Code cleanup
  • Build/CI/CD related changes
  • Other (please describe):

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

  • Test A
  • Test B

Test Configuration:

  • Firmware version:
  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

  • My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.


Note

Adds a generic OpenEnv rollout processor with vLLM/TRL integration, a VLLMPolicy adapter, an execution metadata bag, and new integration tests plus optional OpenEnv deps.

  • OpenEnv Integration:
    • OpenEnvRolloutProcessor (eval_protocol/pytest/openenv_rollout_processor.py): generic processor for OpenEnv HTTPEnvClient; runs rollout loops, builds prompts, calls policy (default LiteLLMPolicy or injected), tracks token usage, collects per-step rewards, and stores prompt_ids/completion_ids in execution_metadata.extra.
    • create_openenv_vllm_rollout_func (eval_protocol/pytest/integrations/openenv_trl_vllm.py): bridges TRL with OpenEnv using VLLMPolicy; supports task rotation; returns GRPO-style prompt_ids/completion_ids/eval_score.
  • LLM Policy:
    • VLLMPolicy (eval_protocol/mcp/execution/vllm_policy.py): converts chat messages to a prompt, calls TRL vLLM (server or colocated), decodes output, and returns OpenAI-compatible responses including raw token IDs.
  • Models:
    • Extend ExecutionMetadata with extra: Dict[str, Any] for integration-specific data (e.g., step rewards, token IDs).
  • Tests:
    • Add integration tests for BrowserGym (basic and eval), Echo (Hub), and TextArena (Docker) under tests/pytest/* (skipped on CI).
  • Config:
    • Add openenv optional dependency group in pyproject.toml for OpenEnv packages.

Written by Cursor Bugbot for commit 707f7cd. This will update automatically on new commits. Configure here.

@jspisak
Copy link

jspisak commented Nov 18, 2025

Love seeing this PR!

"""Process a single row with OpenEnv rollout."""
start_time = time.perf_counter()

print(f"\n[OpenEnvRolloutProcessor] Starting rollout for row...", flush=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its best practice to use logger, not print statements. Log level is an important part of debugging that you should callout yourself.

Copy link
Collaborator

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't there be dependency changes for openenv in pyproject.toml?

pytestmark = pytest.mark.skipif(os.getenv("CI") == "true", reason="Skip OpenEnv integration tests on CI")


@pytest.mark.integration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why integration? How long do these tests take to run? If they take a while then we should run this as part of e2e-smoke-test.yml

Comment on lines +37 to +55
def _extract_goal_url_title(observation: Any) -> tuple[str, str, str]:
goal = getattr(observation, "goal", "") or ""
url = getattr(observation, "url", "") or ""
title = ""
metadata = getattr(observation, "metadata", {}) or {}
obs_dict = metadata.get("browsergym_obs", {}) or {}
if not goal:
goal = obs_dict.get("goal") or ""
if not url:
url = obs_dict.get("url") or ""
titles = obs_dict.get("open_pages_titles") or ()
active_idx = _as_scalar(obs_dict.get("active_page_index"))
try:
active_idx = int(active_idx)
except Exception:
active_idx = 0
if isinstance(titles, (list, tuple)) and 0 <= active_idx < len(titles):
title = titles[active_idx] or ""
return goal, url, title
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you write all this glue code? If you copy-pasted it, can we import directly from openenv instead?

Comment on lines +26 to +30
def prompt_builder(observation: Any, step: int, history: List[str]) -> str:
"""
Echo env is very simple; we just send a short instruction.
"""
return "Please repeat back the next message exactly."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, its not a ton of code. But if you copy-pasted it—can we import the implementation directly?

"""Process evaluation rows and return async tasks."""

semaphore = config.semaphore
max_steps = config.steps or 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

steps already has a default (30) why did you add 8 here?

"completion_ids": episode_completion_ids, # List[List[int]] - tokens per episode
"logprobs": episode_logprobs, # List[List[float]] - logprobs per episode
"eval_score": eval_scores,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing rewards in rollout function return value

The rollout_func computes total_rewards at line 439 (summing step rewards per episode for GRPO training) but doesn't include it in the return dictionary. The function returns prompt_ids, completion_ids, logprobs, and eval_score, but GRPO training requires rewards to update the policy. The computed total_rewards variable is never used, causing the training loop to lack the reward signal needed for reinforcement learning.

Fix in Cursor Fix in Web

top_p=kwargs.get("top_p"),
top_k=kwargs.get("top_k"),
**kwargs,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate keyword arguments in VLLMPolicy instantiation

The vllm_policy_factory function passes top_p and top_k both explicitly (extracted from kwargs at lines 233-234) and as part of **kwargs at line 235. If kwargs contains top_p or top_k keys, Python will raise a TypeError for receiving multiple values for the same keyword argument when instantiating VLLMPolicy. The explicit parameters should be removed from kwargs before unpacking, or the explicit extraction should be removed.

Fix in Cursor Fix in Web

@shreymodi1 shreymodi1 merged commit 425b882 into main Nov 20, 2025
9 checks passed
@shreymodi1 shreymodi1 deleted the shrey/OpenEnvRolloutProcessor branch November 20, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants