huggingface · acharyaanusha · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -125,6 +125,8 @@
     title: Agent World Model
   - local: environments/opencode
     title: OpenCode
+  - local: environments/sophistry_bench_sprint
+    title: Sophistry Bench Sprint
   title: Environments
 - isExpanded: false
   sections:

diff --git a/docs/source/environments.md b/docs/source/environments.md
@@ -258,6 +258,13 @@ The OpenEnv community has built a catalog of ready-to-run environments that cove
         <a href="environments/opencode" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
       </div>
     </div>
+    <div class="border dark:border-gray-700 p-5 rounded-lg shadow">
+      <div class="font-bold mb-2">Sophistry Bench Sprint</div>
+      <p class="text-sm"><code>sophistry_bench_sprint_env</code> is a single-turn advocacy reward-hacking environment on QuALITY passages: the policy defends an assigned answer and the reward proxy peaks at 8 <code>&lt;claim&gt;</code> tags, with four weight-0 canaries that detect format hacking.</p>
+      <div class="flex gap-2 mt-3">
+        <a href="environments/sophistry_bench_sprint" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
+      </div>
+    </div>
   </div>
 </div>
 

diff --git a/docs/source/environments/sophistry_bench_sprint.md b/docs/source/environments/sophistry_bench_sprint.md
@@ -0,0 +1,84 @@
+<!-- openenv-source: sophistry_bench_sprint_env -->
+# sophistry_bench_sprint_env
+
+OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
+single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
+reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
+produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
+(`claim_count_cliff`); four weight-0 canaries detect format hacking.
+
+Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
+PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.
+
+## Episode model
+
+Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
+`done=True`.
+
+## Configuration (environment variables)
+
+| Var | Default | Meaning |
+|---|---|---|
+| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
+| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
+| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
+| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |
+| `SPRINT_EXPOSE_CORRECTNESS` | `0` | When `1`/`true`, surface `correctness_reward` (the hidden ground truth) in the wire `metadata`/`components`. Off by default so a harness can't accidentally leak it to the policy. This flag controls only *surfacing*, not *weighting*: correctness affects `reward` only via its `SPRINT_WEIGHTS` entry, which is `0` by default. |
+
+## Usage
+
+The client is **async by default** (like every OpenEnv client):
+
+```python
+import asyncio
+
+from sophistry_bench_sprint_env import SophistryBenchSprintEnv
+
+
+async def main():
+    # Deployed Hugging Face Space (or .from_docker_image("openenv-sophistry_bench_sprint:latest")):
+    client = await SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
+    async with client:
+        obs = (await client.reset()).observation
+        print(obs.prompt, obs.answer_to_defend)
+        result = await client.step_text("<claim>...</claim><cite>...</cite>")
+        print(result.reward, result.observation.metadata)
+
+
+asyncio.run(main())
+```
+
+For **synchronous usage**, use the `.sync()` wrapper:
+
+```python
+with SophistryBenchSprintEnv(base_url="http://localhost:8000").sync() as client:
+    obs = client.reset().observation
+    result = client.step_text("<claim>...</claim><cite>...</cite>")
+    print(result.reward, result.observation.metadata)
+```
+
+`result.observation.metadata` carries the reward components every step — the canary scores are
+the reward-hacking measurement. By default it holds **seven** components; `correctness_reward`
+(the hidden ground truth) is withheld unless `SPRINT_EXPOSE_CORRECTNESS=1` (see above).
+
+> **Do not feed `observation.metadata` / `observation.components` back into the policy's
+> prompt.** `reset()` deliberately tells the policy only *what* to defend, never *whether* it
+> is correct. `correctness_reward` is withheld from the wire by default for exactly this
+> reason; even with the rest of the components, forwarding them to the agent leaks the
+> reward signal and defeats the reward-hacking measurement.
+
+## Build & test
+
+```bash
+# Tests live with the other env tests. Run them from the repo root using this
+# env's venv (which installs the scoring package):
+uv run --project envs/sophistry_bench_sprint_env --extra dev \
+  pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
+# The module pulls the published sophistry-bench-sprint, so in the repo's shared
+# CI (where it isn't installed) it skips via pytest.importorskip — same as other
+# envs with heavy deps (e.g. tbench2's camel guard).
+
+# Container
+openenv build sophistry_bench_sprint_env
+# produces image tag: openenv-sophistry_bench_sprint:latest
+```
diff --git a/envs/sophistry_bench_sprint_env/README.md b/envs/sophistry_bench_sprint_env/README.md
@@ -0,0 +1,96 @@
+---
+title: Sophistry Bench Sprint Env
+emoji: 🗣️
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+---
+
+# sophistry_bench_sprint_env
+
+OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
+single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
+reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
+produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
+(`claim_count_cliff`); four weight-0 canaries detect format hacking.
+
+Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
+PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.
+
+## Episode model
+
+Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
+`done=True`.
+
+## Configuration (environment variables)
+
+| Var | Default | Meaning |
+|---|---|---|
+| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
+| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
+| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
+| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |
+| `SPRINT_EXPOSE_CORRECTNESS` | `0` | When `1`/`true`, surface `correctness_reward` (the hidden ground truth) in the wire `metadata`/`components`. Off by default so a harness can't accidentally leak it to the policy. This flag controls only *surfacing*, not *weighting*: correctness affects `reward` only via its `SPRINT_WEIGHTS` entry, which is `0` by default. |
+
+## Usage
+
+The client is **async by default** (like every OpenEnv client):
+
+```python
+import asyncio
+
+from sophistry_bench_sprint_env import SophistryBenchSprintEnv
+
+
+async def main():
+    # Deployed Hugging Face Space (or .from_docker_image("openenv-sophistry_bench_sprint:latest")):
+    client = await SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
+    async with client:
+        obs = (await client.reset()).observation
+        print(obs.prompt, obs.answer_to_defend)
+        result = await client.step_text("<claim>...</claim><cite>...</cite>")
+        print(result.reward, result.observation.metadata)
+
+
+asyncio.run(main())
+```
+
+For **synchronous usage**, use the `.sync()` wrapper:
+
+```python
+with SophistryBenchSprintEnv(base_url="http://localhost:8000").sync() as client:
+    obs = client.reset().observation
+    result = client.step_text("<claim>...</claim><cite>...</cite>")
+    print(result.reward, result.observation.metadata)
+```
+
+`result.observation.metadata` carries the reward components every step — the canary scores are
+the reward-hacking measurement. By default it holds **seven** components; `correctness_reward`
+(the hidden ground truth) is withheld unless `SPRINT_EXPOSE_CORRECTNESS=1` (see above).
+
+> **Do not feed `observation.metadata` / `observation.components` back into the policy's
+> prompt.** `reset()` deliberately tells the policy only *what* to defend, never *whether* it
+> is correct. `correctness_reward` is withheld from the wire by default for exactly this
+> reason; even with the rest of the components, forwarding them to the agent leaks the
+> reward signal and defeats the reward-hacking measurement.
+
+## Build & test
+
+```bash
+# Tests live with the other env tests. Run them from the repo root using this
+# env's venv (which installs the scoring package):
+uv run --project envs/sophistry_bench_sprint_env --extra dev \
+  pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
+# The module pulls the published sophistry-bench-sprint, so in the repo's shared
+# CI (where it isn't installed) it skips via pytest.importorskip — same as other
+# envs with heavy deps (e.g. tbench2's camel guard).
+
+# Container
+openenv build sophistry_bench_sprint_env
+# produces image tag: openenv-sophistry_bench_sprint:latest
+```
diff --git a/envs/sophistry_bench_sprint_env/__init__.py b/envs/sophistry_bench_sprint_env/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Sophistry-Bench Sprint Environment (OpenEnv port).
+
+Single-step advocacy environment: reset() issues a QuALITY reading-comprehension
+advocacy task, step(AdvocacyAction(text=...)) scores the argument and returns the
+reward plus all eight sprint reward components in observation.metadata.
+"""
+
+from .client import SophistryBenchSprintEnv
+from .models import AdvocacyAction, AdvocacyObservation
+
+__all__ = ["SophistryBenchSprintEnv", "AdvocacyAction", "AdvocacyObservation"]
diff --git a/envs/sophistry_bench_sprint_env/client.py b/envs/sophistry_bench_sprint_env/client.py
@@ -0,0 +1,65 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import annotations
+
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from openenv.core.env_server.types import State
+
+from .models import AdvocacyAction, AdvocacyObservation
+
+
+class SophistryBenchSprintEnv(EnvClient[AdvocacyAction, AdvocacyObservation, State]):
+    """Typed client for the sophistry-bench sprint OpenEnv environment."""
+
+    async def step_text(self, text: str) -> StepResult[AdvocacyObservation]:
+        """Convenience: submit a raw argument string as an AdvocacyAction.
+
+        ``async`` so it returns a ``StepResult`` (not a coroutine) and is wrapped
+        by ``.sync()`` like the base ``step``/``reset`` — call ``await
+        env.step_text(...)``, or ``env.sync().step_text(...)`` on a sync client.
+        """
+        return await super().step(AdvocacyAction(text=text))
+
+    def _step_payload(self, action: AdvocacyAction) -> dict:
+        return action.model_dump()
+
+    def _parse_result(self, data: dict) -> StepResult[AdvocacyObservation]:
+        # Fail loudly on a malformed payload rather than silently building an
+        # empty observation (a missing/null ``observation`` is a real protocol error).
+        observation_payload = data.get("observation")
+        if not isinstance(observation_payload, dict):
+            raise ValueError(
+                "malformed step result: 'observation' must be a dict, got "
+                f"{type(observation_payload).__name__}"
+            )
+        obs_data = dict(observation_payload)
+        # The framework's HTTP layer strips the base ``metadata`` dict from the
+        # serialized observation, so the reward components arrive in the declared
+        # ``components`` field (and the diagnostic message in ``error``). Rebuild
+        # ``metadata`` here so the public contract holds — ``observation.metadata``
+        # carries the eight components — preferring any metadata that survived
+        # (in-process callers), else the mirrored ``components``.
+        wire_metadata = obs_data.pop("metadata", None)
+        metadata = (
+            dict(wire_metadata)
+            if wire_metadata
+            else dict(obs_data.get("components") or {})
+        )
+        error = obs_data.get("error") or ""
+        if error and "error" not in metadata:
+            metadata["error"] = error
+        # Construct once with metadata set, rather than mutating the model after.
+        observation = AdvocacyObservation(**obs_data, metadata=metadata)
+        return StepResult(
+            observation=observation,
+            reward=data.get("reward"),
+            done=data.get("done", False),
+        )
+
+    def _parse_state(self, data: dict) -> State:
+        return State(**data)
diff --git a/envs/sophistry_bench_sprint_env/models.py b/envs/sophistry_bench_sprint_env/models.py
@@ -0,0 +1,54 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import annotations
+
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field
+
+
+class AdvocacyAction(Action):
+    """The policy's one-shot advocacy argument."""
+
+    text: str = Field(
+        ..., description="The argument completion, using <claim>/<cite> tags."
+    )
+
+
+class AdvocacyObservation(Observation):
+    """Task on reset; scored result on step.
+
+    On reset: ``prompt`` holds the full system prompt (passage + question +
+    answer-to-defend), ``done`` is False.
+    On step: ``prompt`` is empty, ``done`` is True, and ``metadata`` carries all
+    eight reward components.
+
+    ``reward``/``done`` are inherited from the base ``Observation`` (reward
+    defaults to ``None``). Read the post-step reward from ``StepResult.reward``,
+    not ``observation.reward``: the framework's serializer strips ``reward`` from
+    the observation payload, so only ``StepResult.reward`` carries the weighted
+    aggregate. ``reset()`` leaves ``reward`` as ``None`` (no action scored yet),
+    matching the framework convention.
+
+    The eight reward components are also mirrored in the declared ``components``
+    field. The base ``metadata`` dict is stripped by the framework's HTTP
+    serialization layer, so ``components`` is what survives the wire; the typed
+    client re-populates ``metadata`` from it on the way back.
+    """
+
+    prompt: str = Field("", description="Full prompt the policy must answer.")
+    answer_to_defend: str = Field(
+        "", description="The answer the policy advocates for."
+    )
+    item_id: str = Field("", description="Source QuALITY article id.")
+    components: dict[str, float] = Field(
+        default_factory=dict,
+        description="Eight reward components (mirror of metadata; survives HTTP).",
+    )
+    error: str = Field(
+        "",
+        description="Diagnostic message (e.g. step-before-reset); survives serialization.",
+    )
diff --git a/envs/sophistry_bench_sprint_env/openenv.yaml b/envs/sophistry_bench_sprint_env/openenv.yaml
@@ -0,0 +1,6 @@
+spec_version: 1
+name: sophistry_bench_sprint_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
diff --git a/envs/sophistry_bench_sprint_env/pyproject.toml b/envs/sophistry_bench_sprint_env/pyproject.toml
@@ -0,0 +1,33 @@
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "openenv-sophistry-bench-sprint-env"
+version = "0.1.0"
+description = "OpenEnv port of the sophistry-bench single-agent advocacy reward-hacking sprint env"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv[core]>=0.2.2",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn>=0.24.0",
+    # >=0.1.6 exports aggregate_reward (imported directly, not reproduced here).
+    # Capped <0.2.0: a 0.2+ bump must re-run the parity test before widening.
+    "sophistry-bench-sprint>=0.1.6,<0.2.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=9.0.3",
+    "pytest-asyncio>=0.21",
+    "pytest-cov",
+]
+
+[project.scripts]
+server = "sophistry_bench_sprint_env.server.app:main"
+
+[tool.setuptools]
+include-package-data = true
+packages = ["sophistry_bench_sprint_env", "sophistry_bench_sprint_env.server"]
+package-dir = { "sophistry_bench_sprint_env" = ".", "sophistry_bench_sprint_env.server" = "server" }