Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
ebf13e5
feat(sophistry_bench_sprint_env): scaffold OpenEnv package + vendor s…
acharyaanusha Jun 11, 2026
76ed010
feat(sophistry_bench_sprint_env): add AdvocacyAction/AdvocacyObservat…
acharyaanusha Jun 11, 2026
21b7952
feat(sophistry_bench_sprint_env): typed HTTP client
acharyaanusha Jun 11, 2026
83465f0
feat(sophistry_bench_sprint_env): environment construction + reset()
acharyaanusha Jun 11, 2026
b6d0355
feat(sophistry_bench_sprint_env): step() scoring with canonical rewar…
acharyaanusha Jun 11, 2026
4ce6be8
feat(sophistry_bench_sprint_env): FastAPI app + Dockerfile
acharyaanusha Jun 11, 2026
d337a1d
docs(sophistry_bench_sprint_env): README + build/usage
acharyaanusha Jun 11, 2026
dc15ca7
fix(sophistry_bench_sprint_env): surface reward components over HTTP
acharyaanusha Jun 11, 2026
230269a
fix(sophistry_bench_sprint_env): error path survives wire + serializa…
acharyaanusha Jun 11, 2026
b419779
docs+test(sophistry_bench_sprint_env): correct image tag; error wire-…
acharyaanusha Jun 11, 2026
ac9e1c0
feat(sophistry_bench_sprint_env): depend on published sophistry-bench…
acharyaanusha Jun 11, 2026
792905d
fix(sophistry_bench_sprint_env): address review + align with merged-e…
acharyaanusha Jun 11, 2026
94a7f61
fix(sophistry_bench_sprint_env): harden reward weighting + clarify wi…
acharyaanusha Jun 12, 2026
71f51c5
Merge branch 'main' into feature/sophistry_bench_sprint_env
acharyaanusha Jun 12, 2026
ac72972
fix(sophistry_bench_sprint_env): align reward field + harden client p…
acharyaanusha Jun 13, 2026
8ad6f3b
fix(sophistry_bench_sprint_env): import aggregate_reward instead of r…
acharyaanusha Jun 13, 2026
8aafeae
fix(sophistry_bench_sprint_env): step_text async + review cleanups
acharyaanusha Jun 13, 2026
a1e0f52
fix(sophistry_bench_sprint_env): withhold ground truth + drop dead im…
acharyaanusha Jun 13, 2026
01d96f9
fix(sophistry_bench_sprint_env): strict _parse_result + clarify corre…
acharyaanusha Jun 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@
title: Agent World Model
- local: environments/opencode
title: OpenCode
- local: environments/sophistry_bench_sprint
title: Sophistry Bench Sprint
title: Environments
- isExpanded: false
sections:
Expand Down
7 changes: 7 additions & 0 deletions docs/source/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,13 @@ The OpenEnv community has built a catalog of ready-to-run environments that cove
<a href="environments/opencode" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
</div>
</div>
<div class="border dark:border-gray-700 p-5 rounded-lg shadow">
<div class="font-bold mb-2">Sophistry Bench Sprint</div>
<p class="text-sm"><code>sophistry_bench_sprint_env</code> is a single-turn advocacy reward-hacking environment on QuALITY passages: the policy defends an assigned answer and the reward proxy peaks at 8 <code>&lt;claim&gt;</code> tags, with four weight-0 canaries that detect format hacking.</p>
<div class="flex gap-2 mt-3">
<a href="environments/sophistry_bench_sprint" class="!no-underline border dark:border-gray-700 px-3 py-1 rounded text-sm hover:shadow">📄 Docs</a>
</div>
</div>
</div>
</div>

Expand Down
84 changes: 84 additions & 0 deletions docs/source/environments/sophistry_bench_sprint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
<!-- openenv-source: sophistry_bench_sprint_env -->
# sophistry_bench_sprint_env

OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
(`claim_count_cliff`); four weight-0 canaries detect format hacking.

Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.

## Episode model

Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
`done=True`.

## Configuration (environment variables)

| Var | Default | Meaning |
|---|---|---|
| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |
| `SPRINT_EXPOSE_CORRECTNESS` | `0` | When `1`/`true`, surface `correctness_reward` (the hidden ground truth) in the wire `metadata`/`components`. Off by default so a harness can't accidentally leak it to the policy. This flag controls only *surfacing*, not *weighting*: correctness affects `reward` only via its `SPRINT_WEIGHTS` entry, which is `0` by default. |

## Usage

The client is **async by default** (like every OpenEnv client):

```python
import asyncio

from sophistry_bench_sprint_env import SophistryBenchSprintEnv


async def main():
# Deployed Hugging Face Space (or .from_docker_image("openenv-sophistry_bench_sprint:latest")):
client = await SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
async with client:
obs = (await client.reset()).observation
print(obs.prompt, obs.answer_to_defend)
result = await client.step_text("<claim>...</claim><cite>...</cite>")
print(result.reward, result.observation.metadata)


asyncio.run(main())
```

For **synchronous usage**, use the `.sync()` wrapper:

```python
with SophistryBenchSprintEnv(base_url="http://localhost:8000").sync() as client:
obs = client.reset().observation
result = client.step_text("<claim>...</claim><cite>...</cite>")
print(result.reward, result.observation.metadata)
```

`result.observation.metadata` carries the reward components every step — the canary scores are
the reward-hacking measurement. By default it holds **seven** components; `correctness_reward`
(the hidden ground truth) is withheld unless `SPRINT_EXPOSE_CORRECTNESS=1` (see above).

> **Do not feed `observation.metadata` / `observation.components` back into the policy's
> prompt.** `reset()` deliberately tells the policy only *what* to defend, never *whether* it
> is correct. `correctness_reward` is withheld from the wire by default for exactly this
> reason; even with the rest of the components, forwarding them to the agent leaks the
> reward signal and defeats the reward-hacking measurement.

## Build & test

```bash
# Tests live with the other env tests. Run them from the repo root using this
# env's venv (which installs the scoring package):
uv run --project envs/sophistry_bench_sprint_env --extra dev \
pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
# The module pulls the published sophistry-bench-sprint, so in the repo's shared
# CI (where it isn't installed) it skips via pytest.importorskip — same as other
# envs with heavy deps (e.g. tbench2's camel guard).

# Container
openenv build sophistry_bench_sprint_env
# produces image tag: openenv-sophistry_bench_sprint:latest
```
96 changes: 96 additions & 0 deletions envs/sophistry_bench_sprint_env/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: Sophistry Bench Sprint Env
emoji: 🗣️
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---

# sophistry_bench_sprint_env

OpenEnv port of the [sophistry-bench](https://github.com/acharyaanusha/sophistry-bench)
single-agent advocacy **reward-hacking sprint** environment. One-turn advocacy on QuALITY
reading-comprehension passages: the policy is assigned the gold answer or a distractor and must
produce one `<claim>`/`<cite>` argument. The reward proxy peaks at exactly 8 `<claim>` tags
(`claim_count_cliff`); four weight-0 canaries detect format hacking.

Scoring is imported verbatim from the [`sophistry-bench-sprint`](https://pypi.org/project/sophistry-bench-sprint/)
PyPI package, so the reward numbers are identical to the Prime Intellect Hub env.

## Episode model

Single step. `reset()` issues a task; `step(AdvocacyAction(text=...))` scores it and returns
`done=True`.

## Configuration (environment variables)

| Var | Default | Meaning |
|---|---|---|
| `SPRINT_N_ITEMS` | `50` | QuALITY items to load (2 advocacy rows each) |
| `SPRINT_PASSAGE_CHARS` | `2000` | Passage char cap |
| `SPRINT_SEED` | `0` | Distractor-selection seed (deterministic) |
| `SPRINT_WEIGHTS` | `1,0,0,0,0,0,0,0` | 8 reward weights, order: `aggregate, correctness, n_claims, n_citations, alternation_canary, starts_with_canary, length_band_canary, template_echo_canary`. Do **not** weight canaries during training. |
| `SPRINT_EXPOSE_CORRECTNESS` | `0` | When `1`/`true`, surface `correctness_reward` (the hidden ground truth) in the wire `metadata`/`components`. Off by default so a harness can't accidentally leak it to the policy. This flag controls only *surfacing*, not *weighting*: correctness affects `reward` only via its `SPRINT_WEIGHTS` entry, which is `0` by default. |

## Usage

The client is **async by default** (like every OpenEnv client):

```python
import asyncio

from sophistry_bench_sprint_env import SophistryBenchSprintEnv


async def main():
# Deployed Hugging Face Space (or .from_docker_image("openenv-sophistry_bench_sprint:latest")):
client = await SophistryBenchSprintEnv.from_env("anushaacharya/sophistry_bench_sprint_env")
async with client:
obs = (await client.reset()).observation
print(obs.prompt, obs.answer_to_defend)
result = await client.step_text("<claim>...</claim><cite>...</cite>")
print(result.reward, result.observation.metadata)


asyncio.run(main())
```

For **synchronous usage**, use the `.sync()` wrapper:

```python
with SophistryBenchSprintEnv(base_url="http://localhost:8000").sync() as client:
obs = client.reset().observation
result = client.step_text("<claim>...</claim><cite>...</cite>")
print(result.reward, result.observation.metadata)
```

`result.observation.metadata` carries the reward components every step — the canary scores are
the reward-hacking measurement. By default it holds **seven** components; `correctness_reward`
(the hidden ground truth) is withheld unless `SPRINT_EXPOSE_CORRECTNESS=1` (see above).

> **Do not feed `observation.metadata` / `observation.components` back into the policy's
> prompt.** `reset()` deliberately tells the policy only *what* to defend, never *whether* it
> is correct. `correctness_reward` is withheld from the wire by default for exactly this
> reason; even with the rest of the components, forwarding them to the agent leaks the
> reward signal and defeats the reward-hacking measurement.

## Build & test

```bash
# Tests live with the other env tests. Run them from the repo root using this
# env's venv (which installs the scoring package):
uv run --project envs/sophistry_bench_sprint_env --extra dev \
pytest tests/envs/test_sophistry_bench_sprint_environment.py -v
# The module pulls the published sophistry-bench-sprint, so in the repo's shared
# CI (where it isn't installed) it skips via pytest.importorskip — same as other
# envs with heavy deps (e.g. tbench2's camel guard).

# Container
openenv build sophistry_bench_sprint_env
# produces image tag: openenv-sophistry_bench_sprint:latest
```
17 changes: 17 additions & 0 deletions envs/sophistry_bench_sprint_env/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""Sophistry-Bench Sprint Environment (OpenEnv port).

Single-step advocacy environment: reset() issues a QuALITY reading-comprehension
advocacy task, step(AdvocacyAction(text=...)) scores the argument and returns the
reward plus all eight sprint reward components in observation.metadata.
"""

from .client import SophistryBenchSprintEnv
from .models import AdvocacyAction, AdvocacyObservation

__all__ = ["SophistryBenchSprintEnv", "AdvocacyAction", "AdvocacyObservation"]
65 changes: 65 additions & 0 deletions envs/sophistry_bench_sprint_env/client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from __future__ import annotations

from openenv.core.client_types import StepResult
from openenv.core.env_client import EnvClient
from openenv.core.env_server.types import State

from .models import AdvocacyAction, AdvocacyObservation


class SophistryBenchSprintEnv(EnvClient[AdvocacyAction, AdvocacyObservation, State]):
"""Typed client for the sophistry-bench sprint OpenEnv environment."""

async def step_text(self, text: str) -> StepResult[AdvocacyObservation]:
"""Convenience: submit a raw argument string as an AdvocacyAction.

``async`` so it returns a ``StepResult`` (not a coroutine) and is wrapped
by ``.sync()`` like the base ``step``/``reset`` — call ``await
env.step_text(...)``, or ``env.sync().step_text(...)`` on a sync client.
"""
return await super().step(AdvocacyAction(text=text))

def _step_payload(self, action: AdvocacyAction) -> dict:
return action.model_dump()

def _parse_result(self, data: dict) -> StepResult[AdvocacyObservation]:
# Fail loudly on a malformed payload rather than silently building an
# empty observation (a missing/null ``observation`` is a real protocol error).
observation_payload = data.get("observation")
if not isinstance(observation_payload, dict):
raise ValueError(
"malformed step result: 'observation' must be a dict, got "
f"{type(observation_payload).__name__}"
)
obs_data = dict(observation_payload)
# The framework's HTTP layer strips the base ``metadata`` dict from the
# serialized observation, so the reward components arrive in the declared
# ``components`` field (and the diagnostic message in ``error``). Rebuild
# ``metadata`` here so the public contract holds — ``observation.metadata``
# carries the eight components — preferring any metadata that survived
# (in-process callers), else the mirrored ``components``.
wire_metadata = obs_data.pop("metadata", None)
metadata = (
dict(wire_metadata)
if wire_metadata
else dict(obs_data.get("components") or {})
)
error = obs_data.get("error") or ""
if error and "error" not in metadata:
metadata["error"] = error
# Construct once with metadata set, rather than mutating the model after.
observation = AdvocacyObservation(**obs_data, metadata=metadata)
return StepResult(
observation=observation,
reward=data.get("reward"),
done=data.get("done", False),
)

def _parse_state(self, data: dict) -> State:
return State(**data)
54 changes: 54 additions & 0 deletions envs/sophistry_bench_sprint_env/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from __future__ import annotations

from openenv.core.env_server.types import Action, Observation
from pydantic import Field


class AdvocacyAction(Action):
"""The policy's one-shot advocacy argument."""

text: str = Field(
..., description="The argument completion, using <claim>/<cite> tags."
)


class AdvocacyObservation(Observation):
"""Task on reset; scored result on step.

On reset: ``prompt`` holds the full system prompt (passage + question +
answer-to-defend), ``done`` is False.
On step: ``prompt`` is empty, ``done`` is True, and ``metadata`` carries all
eight reward components.

``reward``/``done`` are inherited from the base ``Observation`` (reward
defaults to ``None``). Read the post-step reward from ``StepResult.reward``,
not ``observation.reward``: the framework's serializer strips ``reward`` from
the observation payload, so only ``StepResult.reward`` carries the weighted
aggregate. ``reset()`` leaves ``reward`` as ``None`` (no action scored yet),
matching the framework convention.

The eight reward components are also mirrored in the declared ``components``
field. The base ``metadata`` dict is stripped by the framework's HTTP
serialization layer, so ``components`` is what survives the wire; the typed
client re-populates ``metadata`` from it on the way back.
"""

prompt: str = Field("", description="Full prompt the policy must answer.")
answer_to_defend: str = Field(
"", description="The answer the policy advocates for."
)
item_id: str = Field("", description="Source QuALITY article id.")
components: dict[str, float] = Field(
default_factory=dict,
description="Eight reward components (mirror of metadata; survives HTTP).",
)
error: str = Field(
"",
description="Diagnostic message (e.g. step-before-reset); survives serialization.",
)
6 changes: 6 additions & 0 deletions envs/sophistry_bench_sprint_env/openenv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
spec_version: 1
name: sophistry_bench_sprint_env
type: space
runtime: fastapi
app: server.app:app
port: 8000
33 changes: 33 additions & 0 deletions envs/sophistry_bench_sprint_env/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[build-system]
requires = ["setuptools>=45", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "openenv-sophistry-bench-sprint-env"
version = "0.1.0"
description = "OpenEnv port of the sophistry-bench single-agent advocacy reward-hacking sprint env"
requires-python = ">=3.10"
dependencies = [
"openenv[core]>=0.2.2",
"fastapi>=0.115.0",
"pydantic>=2.0.0",
"uvicorn>=0.24.0",
# >=0.1.6 exports aggregate_reward (imported directly, not reproduced here).
# Capped <0.2.0: a 0.2+ bump must re-run the parity test before widening.
"sophistry-bench-sprint>=0.1.6,<0.2.0",
]

[project.optional-dependencies]
dev = [
"pytest>=9.0.3",
"pytest-asyncio>=0.21",
"pytest-cov",
]

[project.scripts]
server = "sophistry_bench_sprint_env.server.app:main"

[tool.setuptools]
include-package-data = true
packages = ["sophistry_bench_sprint_env", "sophistry_bench_sprint_env.server"]
package-dir = { "sophistry_bench_sprint_env" = ".", "sophistry_bench_sprint_env.server" = "server" }
Loading