Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions environments/reasoning_gym_claude_code/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Reasoning Gym — Claude Code Agent

Claude Code agent harness for reasoning gym tasks.

Source benchmark: https://github.com/open-thought/reasoning-gym

## Configuration

Set Anthropic credentials in `env.yaml`:

```yaml
anthropic_api_key: sk-ant-...
anthropic_model_name: claude-sonnet-4-6
anthropic_base_url: null
```

For a local vLLM or Ollama endpoint that serves the Anthropic Messages API:

```yaml
anthropic_api_key: EMPTY
anthropic_model_name: Qwen/Qwen3-4B-Instruct-2507
anthropic_base_url: http://localhost:8000
```

`anthropic_base_url` should not include `/v1`. Claude Code appends `/v1/messages` itself.

See [`responses_api_agents/claude_code_agent`](../../responses_api_agents/claude_code_agent/README.md) for the full set of agent options (`thinking`, `max_thinking_tokens`, `allowed_tools`, `disallowed_tools`, `max_turns`, `timeout`, etc.).

## Quick start

```bash
ng_run "+config_paths=[environments/reasoning_gym_claude_code/config.yaml]"
```

```bash
ng_collect_rollouts \
+agent_name=reasoning_gym_claude_code_agent \
+input_jsonl_fpath=environments/reasoning_gym_claude_code/data/example.jsonl \
+output_jsonl_fpath=results/reasoning_gym_claude_code_rollouts.jsonl
```

## Prepare training data

```bash
python environments/reasoning_gym_claude_code/prepare.py --task knights_knaves --size 1000 --output environments/reasoning_gym_claude_code/data/train_knights_knaves.jsonl
```

See `prepare.py` for all available tasks, categories, and config options.

Alternatively, a pre-built dataset is hosted on HuggingFace at [nvidia/Nemotron-RL-ReasoningGym-v1](https://huggingface.co/datasets/nvidia/Nemotron-RL-ReasoningGym-v1).
Empty file.
34 changes: 34 additions & 0 deletions environments/reasoning_gym_claude_code/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
reasoning_gym:
resources_servers:
reasoning_gym:
entrypoint: app.py
domain: knowledge
verified: true
description: Claude Code agent harness for reasoning gym tasks
value: Evaluate model capabilities in the Claude Code agent harness

reasoning_gym_claude_code_agent:
responses_api_agents:
claude_code_agent:
entrypoint: app.py
resources_server:
type: resources_servers
name: reasoning_gym
concurrency: 32
model: ${anthropic_model_name}
anthropic_api_key: ${anthropic_api_key}
anthropic_base_url: ${anthropic_base_url}
max_turns: 30
timeout: 300
thinking: disabled
system_prompt: |
You are a precise reasoning assistant. You have access to Bash to run Python for calculations.
For every problem: think step by step, use code to verify when helpful, and state your final answer clearly.
datasets:
- name: example
type: example
jsonl_fpath: environments/reasoning_gym_claude_code/data/example.jsonl
- name: train
type: train
jsonl_fpath: environments/reasoning_gym_claude_code/data/train_knights_knaves.jsonl
license: Apache 2.0
5 changes: 5 additions & 0 deletions environments/reasoning_gym_claude_code/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*train.jsonl
*validation.jsonl
*train_prepare.jsonl
*validation_prepare.jsonl
*example_prepare.jsonl
5 changes: 5 additions & 0 deletions environments/reasoning_gym_claude_code/data/example.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{"responses_create_params": {"input": [{"role": "user", "content": "A very special island is inhabited only by sages and fools. Sages always tell the truth, and fools always lie. You meet 2 inhabitants: Zoey, and Riley. Zoey commented, \"Riley is a fool\". In Riley's words: \"Zoey is a sage or Riley is a sage\". So who is a sage and who is a fool? (Format your answer like: \"Zoey is a sage/fool, and Riley is a sage/fool\")"}]}, "question": "A very special island is inhabited only by sages and fools. Sages always tell the truth, and fools always lie. You meet 2 inhabitants: Zoey, and Riley. Zoey commented, \"Riley is a fool\". In Riley's words: \"Zoey is a sage or Riley is a sage\". So who is a sage and who is a fool? (Format your answer like: \"Zoey is a sage/fool, and Riley is a sage/fool\")", "answer": "Zoey is a fool, and Riley is a sage.", "metadata": {"source_dataset": "knights_knaves", "source_index": 0, "statements": [["lying", 1], ["or", ["telling-truth", 0], ["telling-truth", 1]]], "solution": [false, true], "names": ["Zoey", "Riley"], "knight_knave_terms": {"knight": "sage", "knave": "fool", "a_knight": "a sage", "a_knave": "a fool", "Knight": "Sage", "Knave": "Fool"}, "difficulty": {"n_people": 2, "depth_constraint": 2, "width_constraint": 2}}}
{"responses_create_params": {"input": [{"role": "user", "content": "A very special island is inhabited only by pioneers and laggards. Pioneers always tell the truth, and laggards always lie. You meet 2 inhabitants: Riley, and Olivia. \"if Riley is a pioneer then Olivia is a laggard\" - Riley. Olivia stated, \"Olivia is a pioneer and Riley is a laggard\". So who is a pioneer and who is a laggard? (Format your answer like: \"Riley is a pioneer/laggard, and Olivia is a pioneer/laggard\")"}]}, "question": "A very special island is inhabited only by pioneers and laggards. Pioneers always tell the truth, and laggards always lie. You meet 2 inhabitants: Riley, and Olivia. \"if Riley is a pioneer then Olivia is a laggard\" - Riley. Olivia stated, \"Olivia is a pioneer and Riley is a laggard\". So who is a pioneer and who is a laggard? (Format your answer like: \"Riley is a pioneer/laggard, and Olivia is a pioneer/laggard\")", "answer": "Riley is a pioneer, and Olivia is a laggard.", "metadata": {"source_dataset": "knights_knaves", "source_index": 1, "statements": [["->", ["telling-truth", 0], ["lying", 1]], ["and", ["telling-truth", 1], ["lying", 0]]], "solution": [true, false], "names": ["Riley", "Olivia"], "knight_knave_terms": {"knight": "pioneer", "knave": "laggard", "a_knight": "a pioneer", "a_knave": "a laggard", "Knight": "Pioneer", "Knave": "Laggard"}, "difficulty": {"n_people": 2, "depth_constraint": 2, "width_constraint": 2}}}
{"responses_create_params": {"input": [{"role": "user", "content": "A very special island is inhabited only by saints and sinners. Saints always tell the truth, and sinners always lie. You meet 2 inhabitants: Samuel, and Jacob. Samuel expressed that if Samuel is a saint then Jacob is a sinner. Jacob was heard saying, \"if Samuel is a saint then Samuel is a sinner\". So who is a saint and who is a sinner? (Format your answer like: \"Samuel is a saint/sinner, and Jacob is a saint/sinner\")"}]}, "question": "A very special island is inhabited only by saints and sinners. Saints always tell the truth, and sinners always lie. You meet 2 inhabitants: Samuel, and Jacob. Samuel expressed that if Samuel is a saint then Jacob is a sinner. Jacob was heard saying, \"if Samuel is a saint then Samuel is a sinner\". So who is a saint and who is a sinner? (Format your answer like: \"Samuel is a saint/sinner, and Jacob is a saint/sinner\")", "answer": "Samuel is a saint, and Jacob is a sinner.", "metadata": {"source_dataset": "knights_knaves", "source_index": 2, "statements": [["->", ["telling-truth", 0], ["lying", 1]], ["->", ["telling-truth", 0], ["lying", 0]]], "solution": [true, false], "names": ["Samuel", "Jacob"], "knight_knave_terms": {"knight": "saint", "knave": "sinner", "a_knight": "a saint", "a_knave": "a sinner", "Knight": "Saint", "Knave": "Sinner"}, "difficulty": {"n_people": 2, "depth_constraint": 2, "width_constraint": 2}}}
{"responses_create_params": {"input": [{"role": "user", "content": "A very special island is inhabited only by pioneers and laggards. Pioneers always tell the truth, and laggards always lie. You meet 2 inhabitants: Olivia, and Lily. \"Lily is a laggard and Lily is a pioneer\" - Olivia. Lily noted, \"if Olivia is a pioneer then Lily is a pioneer\". So who is a pioneer and who is a laggard? (Format your answer like: \"Olivia is a pioneer/laggard, and Lily is a pioneer/laggard\")"}]}, "question": "A very special island is inhabited only by pioneers and laggards. Pioneers always tell the truth, and laggards always lie. You meet 2 inhabitants: Olivia, and Lily. \"Lily is a laggard and Lily is a pioneer\" - Olivia. Lily noted, \"if Olivia is a pioneer then Lily is a pioneer\". So who is a pioneer and who is a laggard? (Format your answer like: \"Olivia is a pioneer/laggard, and Lily is a pioneer/laggard\")", "answer": "Olivia is a laggard, and Lily is a pioneer.", "metadata": {"source_dataset": "knights_knaves", "source_index": 3, "statements": [["and", ["lying", 1], ["telling-truth", 1]], ["->", ["telling-truth", 0], ["telling-truth", 1]]], "solution": [false, true], "names": ["Olivia", "Lily"], "knight_knave_terms": {"knight": "pioneer", "knave": "laggard", "a_knight": "a pioneer", "a_knave": "a laggard", "Knight": "Pioneer", "Knave": "Laggard"}, "difficulty": {"n_people": 2, "depth_constraint": 2, "width_constraint": 2}}}
{"responses_create_params": {"input": [{"role": "user", "content": "A very special island is inhabited only by altruists and egoists. Altruists always tell the truth, and egoists always lie. You meet 2 inhabitants: Mason, and Jack. Mason expressed that if Jack is an egoist then Mason is an altruist. In Jack's words: \"Mason is an egoist\". So who is an altruist and who is an egoist? (Format your answer like: \"Mason is a altruist/egoist, and Jack is a altruist/egoist\")"}]}, "question": "A very special island is inhabited only by altruists and egoists. Altruists always tell the truth, and egoists always lie. You meet 2 inhabitants: Mason, and Jack. Mason expressed that if Jack is an egoist then Mason is an altruist. In Jack's words: \"Mason is an egoist\". So who is an altruist and who is an egoist? (Format your answer like: \"Mason is a altruist/egoist, and Jack is a altruist/egoist\")", "answer": "Mason is an altruist, and Jack is an egoist.", "metadata": {"source_dataset": "knights_knaves", "source_index": 4, "statements": [["->", ["lying", 1], ["telling-truth", 0]], ["not", ["telling-truth", 0]]], "solution": [true, false], "names": ["Mason", "Jack"], "knight_knave_terms": {"knight": "altruist", "knave": "egoist", "a_knight": "an altruist", "a_knave": "an egoist", "Knight": "Altruist", "Knave": "Egoist"}, "difficulty": {"n_people": 2, "depth_constraint": 2, "width_constraint": 2}}}
Loading
Loading