Skip to content

[RFC 006] add external environment import rfc#731

Open
burtenshaw wants to merge 1 commit into
huggingface:mainfrom
burtenshaw:codex/rfc-external-environment-imports
Open

[RFC 006] add external environment import rfc#731
burtenshaw wants to merge 1 commit into
huggingface:mainfrom
burtenshaw:codex/rfc-external-environment-imports

Conversation

@burtenshaw

Copy link
Copy Markdown
Collaborator

This PR adds RFC 006 for deterministic external environment imports.

It documents the design behind PR #726, including the openenv import CLI, importer registry, AST-based ORS/OpenReward and Verifiers detection, generated wrappers, task/split discovery, MCP-style tool calls, and the boundary rules needed to keep task discovery infrastructure-facing.

It also updates the RFC index and changes the old roadmap reference to avoid reusing RFC 006 for a different future proposal.

Validation: git diff --cached --check before commit.

@burtenshaw burtenshaw changed the title [codex] add external environment import rfc [RFC] add external environment import rfc Jun 1, 2026
@burtenshaw burtenshaw marked this pull request as ready for review June 1, 2026 11:52

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Tier 1: Bugs & Issues

  • RFC number collision: Commit 2ef60ddf already added rfcs/006-hf-rl-environment-datasets.md claiming RFC 006. This PR also uses number 006. One must be renumbered.

  • RFC vs implementation mismatch: The RFC describes TaskProvider as a separate optional Protocol, but the actual implementation in PR #726 places list_splits, list_tasks, num_tasks, get_task, get_task_range directly on the Environment base class with raise NotImplementedError fallbacks. The RFC must be corrected to match the implementation, or vice versa.

  • Task API routes not gated: The RFC states HTTP task routes are "infrastructure-only", but register_routes() in http_server.py registers them unconditionally in both SIMULATION and PRODUCTION modes. Only /reset, /step, /state are behind the simulation-mode gate. The RFC's access-control claims don't match the implementation.

  • Roadmap item 5 mismatch: rfcs/000-project-phases.md changes item 5 from "RFC 006" to "upcoming RFC", but item 5 is about "production performance simulation" — unrelated to either RFC 006 candidate. This introduces confusion.

Tier 2: Alignment

ALIGNMENT FLAG: Task-discovery methods on Environment base class

  • Invariant at risk: Gymnasium API signatures must not change (INVARIANTS.md §1). Adding 5 new methods to the base class modifies the contract for every environment builder. The RFC claims a Protocol approach but the implementation chose base-class mixin. This trade-off needs explicit justification or the implementation should be revised.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: Task API HTTP routes not gated behind infrastructure-only access

  • Invariant at risk: Dual API boundary (INVARIANTS.md §1). In production mode, an agent-facing deployment could potentially reach /{env_name}/splits or /{env_name}/task since these routes have no mode guard.
  • Suggested reviewer: @Darktex

Summary

RFC 006 has a numbering collision with an existing RFC 006, factual discrepancies between the described design (separate TaskProvider Protocol) and the implementation (base class mixin), and the HTTP task routes aren't gated infrastructure-only as claimed. These should be resolved before merge.


Automated review by Claude Code | Learn more

@burtenshaw burtenshaw changed the title [RFC] add external environment import rfc [RFC 006] add external environment import rfc Jun 1, 2026

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: N/A — RFC-only change (markdown files). No Python files modified.
  • Debug code: CLEAN.

Tier 1: Fixes Required

  • RFC numbering conflictrfcs/006-external-environment-imports.md claims RFC ID 006, but PR #624 (rfcs/006-multi-component-rewards.md by @adithya-s-k) already claims the same number. One must be renumbered before merge.

  • rfcs/000-project-phases.md — Line 56 was changed from "RFC 006" to "upcoming RFC" for "production performance simulation," but the new RFC 006 is about external imports, not simulation. The roadmap item numbering should be reconciled.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: TaskProvider protocol is well-scoped and aligned

  • The RFC correctly keeps task discovery as an optional protocol separate from the base Environment ABC, preserving the invariant that agents cannot access simulation controls.
  • Generated wrappers correctly use ListToolsAction/CallToolAction for the agent-facing surface, aligning with RFC 003 (MCP as universal standard).
  • The boundary rule "Agents interact through MCP-style tools only" is explicitly stated and enforced.

ALIGNMENT FLAG: HTTP compatibility routes are appropriately marked as transitional

  • The RFC correctly identifies the ORS-compatible HTTP task routes as infrastructure-only metadata routes that should not expose agent actions.
  • The plan to mirror these through WebSocket before 1.0 aligns with the ongoing HTTP deprecation (PR #252).

ALIGNMENT FLAG: Vendoring security exclusions are necessary but not sufficient

  • The RFC excludes .env, credentials, and common secrets from vendoring — good. But it correctly notes "This does not remove the need for review before publishing." The RFC should consider whether a --dry-run flag for openenv import would help reviewers inspect what will be vendored before committing.

Overall, this RFC is well-structured and respects OpenEnv's invariants. The main blocker is the RFC numbering conflict.


Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment Review — RFC 006: External Environment Imports

Verdict: Request Changes

Overall: The RFC is well-structured and demonstrates clear thinking about the agent/orchestration boundary and static-detection safety. Two mechanical fixes are required before merge (stale cross-references), and five alignment questions need human sign-off — the HTTP-routes question is the most time-sensitive given the active HTTP deprecation effort.


Tier 1: Required Fixes

1. Stale RFC 006 cross-references in existing files (blockers)

This PR correctly changes rfcs/000-project-phases.md to replace (RFC 006) with (upcoming RFC) for the production-performance simulation item. However, two other files still hard-reference RFC 006 to mean that different (not-yet-written) production-simulation RFC:

  • rfcs/001-abstractions.md: "**See RFC 006** for how we simulate production performance characteristics..."
  • rfcs/003-mcp-support.md: "injecting performance metadata (see RFC 006)"

After this PR merges, 006-external-environment-imports.md exists with a completely different topic, making those cross-references incorrect and misleading. Both lines should be updated to (see upcoming RFC) or a named placeholder in this same PR.


Tier 2: Alignment Flags (human decision required)

ALIGNMENT FLAG 1: HTTP compatibility routes introduced while HTTP is being deprecated

  • Principle at stake: INVARIANTS.md §Communication patterns — "WebSocket for all environment communication." Note: "We are in the process of deprecating HTTP (see PR #252) in favor of WebSocket-only."
  • The concern: The "Compatibility Task Routes" section introduces six new HTTP endpoints for orchestration-layer task discovery. Decision 5 and Open Question 1 acknowledge the tension but leave the question of whether HTTP can ship before a WebSocket equivalent unresolved. Landing new HTTP infrastructure endpoints while PR #252 is actively deprecating HTTP could entrench ORS-style clients. This trade-off needs explicit sign-off.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 2: No server-enforced access control between agent MCP endpoint and TaskProvider HTTP routes

  • Principle at stake: INVARIANTS.md §Agent isolation — "Agents cannot access reset/simulation controls."
  • The concern: The TaskProvider protocol is correctly scoped to orchestration, but the RFC does not describe how the server prevents agents from calling the infrastructure HTTP routes (e.g., /{env_name}/splits). If both the MCP endpoint and the task routes are served on the same port without access controls, an agent that can reach the MCP path could also reach the task routes.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 3: task_spec wire type left unresolved — violates Pydantic invariant

  • Principle at stake: INVARIANTS.md §Pydantic serialization — "All wire types must be Pydantic models."
  • The concern: Open Question 2 asks whether a canonical TaskSpec Pydantic model should be defined or whether task specs remain source-library-specific Any-typed dicts. If task_spec travels over the wire as an unstructured dict, it violates the Pydantic invariant. This must be resolved with a concrete decision — not left as a post-merge open question.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 4: Vendored import isolation explicitly deferred with no deadline

  • Principle at stake: PRINCIPLES.md — "Container isolation for reproducibility and security."
  • The concern: The Non-Goals section defers "complete Python dependency isolation between multiple imported wrappers in the same process" to future work. The specific risk is sys.path insertion causing import name shadowing. The RFC should state a concrete milestone and name the specific shadow-import risk.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 5: Verifiers external reward model may violate rewards-inside-environment invariant

  • Principle at stake: INVARIANTS.md §Rewards in environment — "Reward computation must stay inside environment boundary."
  • The concern: The Verifiers mapping states the submit tool scores through "the environment's harness or rubric when available." Some Verifiers harnesses make external RPC calls to reward models. The RFC does not state whether out-of-process reward calls are permitted or blocked. If a harness phones home to an external scoring service, reward computation leaves the environment boundary.
  • Suggested reviewer: @Darktex

What's Well Done

  • The static AST detection approach is exactly right and aligns with "no arbitrary code execution at import time."
  • TaskProvider as a Protocol (not added to the base Environment ABC) is the correct choice.
  • The explicit "wrappers must not auto-create an episode on tool call before reset" rule preserves the agents-cannot-reset invariant cleanly.
  • The credential-exclusion list for vendoring addresses the most common accidental leak vectors.

Note: This is an automated review by Claude Code, not a human review.


Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment Review — PR #731

Tier 1: Observations

  • No Python code changes (Markdown-only RFC document) — lint/debug checks N/A.
  • Minor header format inconsistencies with RFC template (non-blocking).
  • The RFC explicitly documents a post-hoc rationale for already-implemented PR #726 — this is unusual and should be prominently noted.

Tier 2: Alignment Flags (for human review)

  1. HTTP compatibility routes lack agent/infrastructure enforcement. Six new HTTP routes (GET /list_environments, GET /{env_name}/splits, POST /{env_name}/tasks, etc.) are described as "infrastructure-facing" but no concrete guard (auth, session token, network separation) prevents agents from calling them. Adding new HTTP surface during an active HTTP deprecation effort (PR #252) is in tension with that trajectory.
  2. Post-hoc RFC process violation. The RFC documents PR #726 which was already written. An RFC written after code lands cannot perform its primary function of allowing the team to reject or change the approach before implementation cost is sunk. Should be acknowledged as a one-time exception with justification.
  3. TaskProvider protocol boundary erosion. Generated wrappers implement both Environment and TaskProvider on the same class. If orchestration code assumes any Environment may implement TaskProvider, this erodes the "one canonical API" guarantee.
  4. Non-deterministic reset() fallback. When no task selector is provided, reset() calls list_splits() and list_tasks() from the external library — potentially non-reproducible. Conflicts with seed/episode_id reproducibility invariants.

Verdict: request_changes — The HTTP route enforcement gap needs a concrete answer before merge. The post-hoc nature of the RFC should be explicitly acknowledged.

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review: RFC 006 - External Environment Imports

The motivation is clear and the AST-based detection approach (no arbitrary code execution during import) is a strong safety property. The generated package pattern with vendored source is the right instinct. I have one hard blocker and several alignment flags that need human discussion.

Tier 1: Fixes Required

[ ] RFC number collision must be resolved before merge.

Branch origin/pr/624 contains commit abcebe0c titled [RFC] RFC 006: Multi-Component Reward Signals. PR #727 also claims RFC 006. At most one RFC can hold this number. One of the following must happen:

  • Confirm the competing RFC 006 is rejected/superseded and document why.
  • Renumber this RFC to the next available number.

[ ] RFC process order: implementation preceded RFC.

The PR description states this RFC "documents the design behind PR #726." rfcs/README.md is explicit: RFCs are submitted for review before implementation, not after. If PR #726 has already merged, the RFC status should be Implemented, not In Review.

Tier 2: Alignment Discussion

ALIGNMENT FLAG 1: ORS-compatible HTTP task routes at the agent/infrastructure boundary

  • Principle at stake: Dual API boundary (INVARIANTS.md)
  • The concern: The RFC proposes ORS-compatible HTTP task routes (GET /tasks, GET /splits) on the environment server. If these live on the same server as /reset and /step, there is no architectural barrier preventing an agent from calling GET /tasks to discover the task distribution, enabling reward hacking. The RFC must describe access control separating these routes from the agent boundary.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 2: MCP task/split discovery tools accessible to agents

  • Principle at stake: "Agents cannot reset" (INVARIANTS.md)
  • The concern: If task selection (which split, which task ID) is surfaced as an MCP tool, the agent gains indirect simulation control. Task/split selection must be infrastructure-only, never callable by the agent.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 3: Generated reward wrappers and the "rewards inside environment" boundary

  • Principle at stake: Rewards inside environment (INVARIANTS.md, RFC 002, RFC 004)
  • The concern: If the generated wrapper calls back into the external library's reward function outside the OpenEnv container boundary, reward partially escapes. The RFC should explicitly state vendored source is fully self-contained and the reward call chain does not escape the container.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 4: Importer registry as a second canonical environment creation path

  • Principle at stake: "Flexibility for simplicity" (PRINCIPLES.md)
  • The concern: PRINCIPLES.md trades flexibility for simplicity: "one canonical way to build environments." The importer registry introduces a second path alongside openenv init. The trade-offs (maintenance surface, security, divergence) should be addressed explicitly.
  • Suggested reviewer: @Darktex

Summary

One hard blocker (RFC number collision) and four alignment flags requiring human review. The core mechanism (AST detection, vendored source, generated packages) is sound — the alignment concerns are about boundary rules that need to be made explicit in the document, not objections to the feature itself.


Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS (Markdown-only PR)
  • Debug code: CLEAN

Tier 1: Fixes Required

None (documentation-only RFC PR).

Tier 2: Alignment Discussion

ALIGNMENT FLAG 1: New HTTP task-discovery routes (/list_environments, /{env_name}/splits, etc.) widen the HTTP surface during an active HTTP→WebSocket deprecation. No committed version deadline for WebSocket equivalents.

  • Principle: INVARIANTS.md §4 — WebSocket for all environment communication
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 2: TaskProvider protocol placed in interfaces.py alongside Environment base class — no mechanical guard against accidental MCP exposure.

  • Principle: INVARIANTS.md §1 — Dual API boundary
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 3: reset() accepting task_spec, split, index kwargs extends the Gymnasium API signature. Confirm base Environment.reset already accepts **kwargs; if not, this is a breaking change per INVARIANTS.md.

  • Principle: INVARIANTS.md §API — Gymnasium API signatures must not change without major version bump
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 4: Client-server separation for vendored source in generated server/ packages is documented but not enforced (no lint rule or import guard).

  • Principle: INVARIANTS.md §2 — Clients must never import from server/
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 5: RFC 001 forward-reference to "RFC 006 = production simulation" will become stale once this RFC lands as RFC 006 (external imports). RFC 001 needs a corresponding edit.

Summary

  • 0 mechanical issues
  • 5 alignment points for human review

Automated review by Claude Code

@ab-naidu

Copy link
Copy Markdown

Architectural Security Review: Address Task Discovery API Boundary Leak (Alignment Flags 1 & 2)

Hi @burtenshaw,

Thanks for proposing RFC 006. Standardizing external environment imports is a great direction! However, this RFC introduces a critical security boundary violation (Alignment Flags 1 & 2) that we should address to ensure agents remain fully sandboxed.

The Threat Model (Reward Hacking)
The RFC proposes introducing new infrastructure-facing HTTP task-discovery routes on the environment server (/list_environments, /{env_name}/splits, /{env_name}/tasks, /{env_name}/num_tasks, /{env_name}/task, and /{env_name}/task_range).

Currently, these routes are registered unconditionally on the same FastAPI server instance and port as the agent-facing /mcp endpoints, with no access control. In a containerized training setup, the agent runs in a sandbox container. If the agent can simply query http://localhost:8000/{env_name}/splits or request tasks via local HTTP calls, it can easily:

  1. Discover the task distribution and dataset labels before resetting.
  2. Exploit ground-truth task specifications to perform reward hacking (cheating by reading the answers directly rather than solving the task).

Proposed Solutions

I propose two complementary alignment fixes to secure the dual API boundary:

  1. Network Segregation (Port Separation) - Recommended

    • Expose the environment server on two distinct ports or bind them differently:
      • Agent Port (e.g., 8001): Serves only the /mcp POST and /mcp WebSocket routes. This is the only port exposed and mapped into the agent's sandboxed container.
      • Orchestration Port (e.g., 8000): Serves /reset, /step, /state, and all the new TaskProvider task-discovery routes. This port is bound exclusively to 127.0.0.1 on the host machine and is never exposed or mapped to the agent container.
    • Benefits: Provides a mathematically guaranteed physical boundary. The agent's container network namespace cannot reach the host's loopback port, preventing any unauthorized calls.
  2. Accidental MCP Exposure Guard (Addressing Alignment Flag 2)

    • OpenEnv already uses a RESERVED_TOOL_NAMES list to prevent core methods from being exposed as tools. We should simply append the new TaskProvider protocol methods (list_splits, list_tasks, get_task, etc.) to this existing blocklist.
    • If an imported environment attempts to register any of these reserved names as tools, the server will raise a ValueError at startup. This prevents developer error from accidentally leaking task controls to the agent.

We'd love to hear your thoughts on implementing Port Separation or a token-based access control approach to keep the agent/control-plane boundaries mathematically secure.

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS — no Python source files changed; markdown-only diff.
  • Debug code: CLEAN — no debug artifacts introduced by this PR.

Tier 1: Fixes Required

  • rfcs/006-external-environment-imports.mdWrong org in PR link. The Summary / Implementation Notes link to https://github.com/huggingface/OpenEnv/pull/726, but the canonical repo is meta-pytorch/OpenEnv. Correct URL: https://github.com/meta-pytorch/OpenEnv/pull/726. Appears in more than one place — fix all occurrences.

  • rfcs/006-external-environment-imports.md (~line 259) — step() signature diverges from the INVARIANTS.md Gymnasium contract. The generated wrapper contract shows def step(self, action, timeout_s=None, **kwargs). INVARIANTS.md §API Invariants mandates step(action) -> Observation as the immutable signature. Existing RFCs (001–004) use only step(action). Either remove timeout_s from the documented contract or add an explicit note that this is an orchestration-side extension point wrapping the base step(action) and does not change the public API signature.

  • rfcs/006-external-environment-imports.md (~lines 294-298 vs 369-376) — Type inconsistency for list_splits() between two sections. The ORS/OpenReward Mapping section documents list_splits() -> list[dict], but the TaskProvider Protocol section defines def list_splits(self) -> list[Any]. Pick one and make both sections agree.


Tier 2: Alignment Discussion

ALIGNMENT FLAG: Compatibility HTTP task routes introduce a new infrastructure-facing HTTP surface during an active HTTP→WebSocket deprecation

  • Principle at stake: INVARIANTS.md §Communication patterns — "WebSocket for all environment communication … HTTP being deprecated (see PR #252)".
  • Concern: The "Compatibility Task Routes" section adds new HTTP GET/POST routes as a bridge for ORS clients. The RFC acknowledges this tension and Open Question 1 surfaces it, but there is no concrete gate/timeline for a WebSocket-native equivalent before 1.0. Adding HTTP routes during the migration risks entrenching HTTP further.

ALIGNMENT FLAG: Task discovery protocol placement — infrastructure vs dual-use

  • Principle at stake: INVARIANTS.md §Dual API boundary — agents must not access reset/simulation controls or task selection.
  • Concern: TaskProvider methods are implemented on the same ImportedEnvironment object that handles agent step() calls. The boundary is enforced by documentation/convention rather than structural separation. Worth discussing whether the TaskProvider surface should live on a separate object not reachable via MCP.

ALIGNMENT FLAG: reset() accepting task_spec, split, index as optional kwargs

  • Principle at stake: INVARIANTS.md §API Invariants — reset(seed?, episode_id?) -> Observation.
  • Concern: The wrapper's reset() extends the documented signature. It is infrastructure-facing and not exposed to agents (so it doesn't violate "agents cannot reset"), but the divergence from the documented Gymnasium signature should be an explicit, recorded decision rather than implicit.

Summary

  • 3 Tier 1 doc issues to fix (wrong repo org in link(s), step() signature divergence from INVARIANTS, list_splits type inconsistency).
  • 3 Tier 2 alignment points for human review (HTTP routes during deprecation, structural vs convention-based task-discovery boundary, reset() signature extension).

These are RFC-text consistency items, not code bugs. Flagging for author follow-up.


Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS — this PR only modifies markdown files; no Python code to lint.
  • Debug code: CLEAN — no debug artifacts introduced.

Tier 1: Fixes Required

1. RFC number collision leaves stale cross-references in RFC 001 and RFC 003 (blocking)

This PR claims RFC slot 006 for "External Environment Imports" and relabels the old RFC 006 ("production performance simulation") as "upcoming" in rfcs/000-project-phases.md. But two existing documents still hardcode RFC 006 as the production-simulation RFC:

  • rfcs/001-abstractions.md:170 — "See RFC 006 for how we simulate production performance characteristics (latency, reliability) during training to minimize the training-production delta."
  • rfcs/003-mcp-support.md:17 — "injecting performance metadata (see RFC 006), and A/B testing tool implementations."

After this merges, both references will point readers to the wrong document. Fix by either (a) updating RFC 001 and RFC 003 to remove/renumber the production-simulation reference, or (b) renumbering this RFC to a free slot and keeping 006 for production-simulation. The choice should be consistent with the roadmap change already made in RFC 000.

2. rfcs/006-external-environment-imports.md:20 — heading format inconsistency

Heading is # RFC 006: External Environment Imports. RFC 000–003 use # RFC: [Title]. Consistent with 004/005 but differs from the template in rfcs/README.md. Cosmetic; recommend aligning with the template.

3. rfcs/006-external-environment-imports.md:259step() signature diverges from INVARIANTS.md

The ImportedEnvironment contract shows def step(self, action, timeout_s=None, **kwargs) -> Observation:. INVARIANTS.md defines the canonical signature as step(action) -> Observation. If the timeout_s/**kwargs extension is intentional for imported wrappers, the RFC should explicitly call out that it's an extension and justify why it doesn't break the frozen Gymnasium signature.


Tier 2: Alignment Discussion

ALIGNMENT FLAG 1 — RFC number squatting displaces the committed production-simulation RFC slot. RFC 001 and RFC 003 were reviewed and accepted with RFC 006 meaning production-performance simulation, which is tied directly to the "minimize lifecycle deltas" principle. The core team should explicitly decide whether that RFC is being deferred/dropped or just renumbered. (Suggested reviewer: @Darktex)

ALIGNMENT FLAG 2 — Compatibility HTTP routes expand the HTTP surface against INVARIANTS.md §4 and the active HTTP deprecation work (PR #252). The RFC proposes six new HTTP REST routes. It acknowledges the tension (Decision 5, Open Question 1) but the mitigation is non-binding. Recommend committing to a specific milestone/version gate at which the HTTP compatibility bridge must be replaced by a WebSocket-native metadata channel. (Suggested reviewer: @Darktex)

ALIGNMENT FLAG 3 — TaskProvider boundary enforcement is underspecified. The RFC states TaskProvider methods "must not be exposed as MCP tools" but doesn't specify whether this is enforced structurally (separate router, never registered on the MCP handler) or only by convention. If only conventional, a future contributor could inadvertently expose task selection to agents — touching the "agents cannot reset" / dual-API-boundary invariants. Specify the enforcement mechanism. (Suggested reviewer: @Darktex)

ALIGNMENT FLAG 4 — SUPPORTS_CONCURRENT_SESSIONS = False interacts with the "one env = one trajectory" batching model in an under-documented way. The RFC doesn't address whether an EnvPool can safely stack multiple imported-wrapper instances, or whether vendored libraries with global state would interfere. Recommend flagging this as a known limitation and noting what users must verify before stacking imported environments. (Suggested reviewer: @Darktex)


Summary

  • 3 mechanical issues (1 blocking: broken cross-references in RFC 001/003; 1 signature deviation; 1 minor format)
  • 4 alignment points for human decision

The RFC is well-structured, clearly motivated, and the boundary rules are generally sound. The blocking issue is the RFC-number collision leaving stale cross-references. The alignment flags involve trade-offs the core team should explicitly sign off on before implementation is accepted.


Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS (lint hook runs against Python source; this PR adds only Markdown files — no lint violations introduced)
  • Debug code: CLEAN (no Python code added by this PR)

Open RFCs Context

Existing RFCs on disk: 000 through 005. There is no existing 006-*.md file, so the numeric slot is unoccupied. However, rfcs/000-project-phases.md (pre-patch) explicitly reserved RFC 006 for "Enabling production performance simulation to minimize training-production delta." This PR claims the same slot for a different topic (External Environment Imports) and neutralizes the conflict by rewriting that line to say "upcoming RFC" — which is the mechanical fix, but the roadmap intent question remains open.

Tier 1: Fixes Required

  • rfcs/000-project-phases.md — After stripping the "RFC 006" label from item 5, item 6 still says "RFC 007" for MCP observability while the repurposed slot does not carry a number. The numbering is now asymmetric. The roadmap should either (a) cross-reference the new RFC 006 at item 5 with a note that the topic changed, or (b) renumber forward consistently. As written, readers see an unexplained gap.

  • rfcs/006-external-environment-imports.md (line ~213) — openrewardstandard appears as a single identifier in the list of detected import paths (ors, openreward, openreward.environments, openrewardstandard). If the canonical package name contains a separator (e.g. openreward_standard), this should match the actual module name exactly. Verify the string matches the real import path.

  • rfcs/README.md — The new "External Environment Imports" section is added after "Agentic Harnesses" with its own ### External Environment Imports heading but no owning thematic category (existing sections group under "Core Abstractions & Design," "MCP Integration," etc.). Minor: consider whether it belongs under an existing category.

Tier 2: Alignment Discussion

ALIGNMENT FLAG: RFC claims a pre-reserved roadmap slot for an unrelated topic

  • Principle at stake: "Key Decisions Made... should not be changed without a new RFC" (PRINCIPLES.md); RFC 000 roadmap coherence
  • The concern: RFC 000 explicitly named slot 006 for "production performance simulation." This PR repurposes 006 for External Environment Imports and silently retires the simulation RFC by dropping it to "upcoming RFC." There is no discussion of what happens to the production simulation design slot or whether a new number will be assigned.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: Implementation-before-RFC process inversion

  • Principle at stake: RFC process purpose — propose/discuss/document significant changes before implementation (rfcs/README.md)
  • The concern: The RFC states it "documents the design implemented in PR #726 ... even though the code already exists." The RFC can only ratify the design after the fact, not influence it — for a feature introducing a new CLI command, a TaskProvider protocol, new HTTP routes, and vendoring behavior.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: New HTTP task routes vs. active WebSocket-first migration

  • Principle at stake: "WebSocket for all environment communication" (INVARIANTS.md §4); HTTP deprecation in progress (PR #252)
  • The concern: The RFC proposes six new HTTP routes as an "ORS-compatible compatibility bridge." INVARIANTS.md notes HTTP is being deprecated in favor of WebSocket-only. Adding new HTTP surface during an active deprecation widens the migration scope. The RFC acknowledges this (Decision 5, Open Question 1) but defers it to "before 1.0" rather than resolving it.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: TaskProvider protocol extends the orchestration boundary in a new direction without doc updates

  • Principle at stake: Dual API boundary (INVARIANTS.md §1); Gymnasium-style API (PRINCIPLES.md)
  • The concern: TaskProvider is correctly kept off the agent/MCP surface and the base Environment class is not modified (good). But it is a new formal orchestration-side extension point not described in INVARIANTS.md or PRINCIPLES.md. If accepted, those docs should be updated to describe it as a sanctioned orchestration-side extension pattern.
  • Suggested reviewer: @Darktex

Summary

  • 3 mechanical issues to fix (roadmap gap explanation, module name verification, README categorization)
  • 4 alignment points for human review (roadmap slot conflict, implementation-before-RFC process, new HTTP routes during active deprecation, TaskProvider boundary documentation)

Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


RFC 006: External Environment Imports — Review

Automated Checks

  • Lint: N/A — this PR adds only Markdown documents; source code changes are in the referenced PR #726, not this PR.
  • Debug code: N/A — no code in this PR.

Tier 1: Document Quality Issues

1. Title format is inconsistent with the RFC template. Existing RFCs use # RFC: [Title] (with a colon) as their H1. RFC 006 uses # RFC 006: External Environment Imports. The **RFC ID**: 006 header block is fine. Fix the H1 to match convention: # RFC: External Environment Imports.

2. RFC 000 change is a revert, not a redirect — but the old text was misleading. The diff changes line 9 of rfcs/000-project-phases.md from (RFC 006) to (upcoming RFC). The old text pointed to RFC 006 as if it covered "production performance simulation," but this RFC 006 is about external environment imports. Either name the actual topic ((see RFC 006: External Environment Imports)) or, if production-performance simulation is still unwritten, keep (upcoming RFC) — but the current change leaves a misleading gap.

3. Implementation-first disclosure needs clearer framing. The Summary notes the RFC "documents the design implemented in PR #726 … even though the code already exists." This is a legitimate post-hoc RFC pattern, but rfcs/README.md does not mention it and the RFC has no Status annotation for this case. Add something like "Implementation status: Code exists in PR #726; this RFC is the retroactive design review before merge."

4. Missing "Alternatives Considered" entry for the TaskProvider protocol placement. The RFC adds TaskProvider to src/openenv/core/env_server/interfaces.py (core). An alternative — placing task discovery in an optional mixin or the importers package only — is not discussed. Core changes carry more stability obligation than importer-specific code.

5. The "Dependency Updates" section recommends a TOML parser but does not name one. "Use a TOML parser/writer rather than string replacement" is a good rule, but name the preferred library (tomllib/tomli_w, tomlkit, …) or reference existing OpenEnv tooling so implementers do not diverge.

6. Confirm rendering of the ORS base-class import path list. The prose lists ors, openreward, openreward.environments, and openrewardstandard; verify the spacing/newline renders correctly (appears as a possible run-on in the raw diff).

7. Open Question 1 is a prerequisite, not an open question. "Should task discovery become a first-class WebSocket metadata message before the HTTP compatibility routes are merged?" directly affects whether Decision 5 is acceptable as-is. Surface it as a pending blocker, not an easily-missed open question.


Tier 2: Alignment Discussion

ALIGNMENT FLAG: HTTP compatibility routes added without a WebSocket equivalent plan

  • Principle at stake: "WebSocket for all environment communication" (INVARIANTS.md §Architectural); PRINCIPLES.md "WebSocket for step loop"; RFC 003 / PR #252 deprecating HTTP in favor of WebSocket-only.
  • The concern: Decision 5 acknowledges the tension but defers the WebSocket-native equivalent to "before 1.0." The PR adds seven new HTTP routes (/list_environments, /{env_name}/splits, and five more) to http_server.py. Adding new HTTP surface during an active HTTP deprecation widens the very surface being shrunk, and Open Question 1 leaves it unresolved. The RFC should specify a concrete milestone for the WebSocket equivalent so these routes don't become permanent.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: generated wrapper reset(split, index) and the agent/orchestration boundary

  • Principle at stake: "Agents cannot reset" (INVARIANTS.md §Security).
  • The concern: The RFC correctly states task discovery is orchestration-only. But the generated wrapper's reset(split, index) means anything that can reach reset() can select the task. The RFC should explicitly describe how the server enforces that only the orchestration WebSocket path (not MCP) can invoke reset() for generated wrappers, since the generated code is new and reviewable.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: SUPPORTS_CONCURRENT_SESSIONS = False hardcoded in generated wrappers

  • Principle at stake: "One env = one trajectory" (PRINCIPLES.md).
  • The concern: Consistent with the principle, but set as a class constant with no discussion of EnvPool implications or whether the generated Dockerfile/openenv.yaml and the server runtime actually enforce single-session semantics. Low severity, worth confirming.
  • Suggested reviewer: @Darktex

Summary

  • 7 document-quality issues to address (all in the RFC Markdown itself)
  • 3 alignment points for human review; the HTTP route expansion is the highest-priority concern given the active HTTP deprecation

The RFC is well-structured, clearly written, and correctly identifies the major design decisions. The biggest item before approval is Open Question 1: decide whether the HTTP compatibility routes can merge ahead of WebSocket parity or must wait — that decision also determines whether Decision 5's rationale is complete.


Automated review by Claude Code | Learn more

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


This RFC proposes a deterministic CLI-based import path for third-party RL environment libraries (ORS/OpenReward and Prime Intellect Verifiers). The design is substantial and clearly written: static AST detection, a generated-package strategy, an opt-in TaskProvider protocol, and carefully scoped compatibility HTTP routes. The boundary rules section explicitly preserves all four OpenEnv invariants (agents cannot reset, dual API boundary, rewards inside environment, client-server separation). Overall this is a well-structured addition. However, there is one blocking correctness issue and two non-blocking alignment points.

Tier 1: Correctness & Consistency

  • rfcs/001-abstractions.md:170 and rfcs/003-mcp-support.md:17 — dangling "RFC 006" cross-references point to the wrong topic. The PR correctly un-assigns the old "production performance simulation" meaning from RFC 006 in rfcs/000-project-phases.md, but leaves two live references in other RFCs that still read "RFC 006" expecting that old topic. rfcs/001-abstractions.md says "See RFC 006 for how we simulate production performance characteristics (latency, reliability) during training", and rfcs/003-mcp-support.md says "injecting performance metadata (see RFC 006)". After this PR lands, "RFC 006" is the external-environment-imports RFC and those two sentences become factually wrong. Both references must be updated before this merges.

  • RFC header format inconsistency. Every prior RFC uses # RFC: [Title] (e.g., # RFC: OpenEnv Basic Abstractions). This document uses # RFC 006: External Environment Imports — omitting the colon and the RFC: convention. Minor, but consistency matters for tooling and grep patterns. Please align to the established format.

Tier 2: Alignment

ALIGNMENT FLAG: HTTP compatibility task routes added without a WebSocket-first migration commitment.

  • Principle at stake: INVARIANTS.md §4 "Communication patterns — WebSocket for all environment communication".
  • The concern: The RFC adds six new HTTP routes (GET /list_environments, GET /{env_name}/splits, POST /{env_name}/tasks, etc.) as a "transitional compatibility bridge". The RFC says these should be available over WebSocket before 1.0, but there is no concrete pre-condition — the implementation could land and these routes remain HTTP-only indefinitely. Calling them "transitional" while shipping them without a paired WebSocket implementation or a committed follow-up issue means the HTTP surface widens without a clear deprecation trigger.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: SUPPORTS_CONCURRENT_SESSIONS = False is mentioned in the generated wrapper contract but not explained or enforced.

  • Principle at stake: PRINCIPLES.md — "One env = one trajectory".
  • The concern: The generated wrapper stub shows SUPPORTS_CONCURRENT_SESSIONS = False (correct for the invariant), but nowhere explains what happens if an implementer flips it to True. If this is a safety guard, the RFC should state explicitly that generated wrappers must not flip this flag without an explicit RFC — otherwise a contributor editing a generated wrapper could unknowingly violate the one-env-one-trajectory invariant.
  • Suggested reviewer: @Darktex

Verdict

The RFC is well-aligned with OpenEnv's core principles and explicitly preserves all key invariants. However, the two dangling cross-references in RFC 001 and RFC 003 are factual errors introduced by this PR and must be fixed before merging — they will confuse any reader who follows the links after this RFC lands. The header format inconsistency is smaller but still worth fixing given how consistently the convention is applied elsewhere. The two alignment flags are informational. Verdict: request_changes (fix the cross-references and header).


Automated review by Claude Code | Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants