Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,9 +294,9 @@ The Dataset column links to publicly available datasets (e.g., on HuggingFace).
| Terminus Judge | agent | single-step terminal based task (simple judge prompt) | Improve on terminal-style tasks | ✓ | ✓ | Apache 2.0 | <a href='resources_servers/terminus_judge/configs/terminus_judge_simple.yaml'>terminus_judge_simple.yaml</a> | - |
| Terminus Judge | agent | single-step terminal based task (string similarity only) | Improve on terminal-style tasks | ✓ | ✓ | Apache 2.0 | <a href='resources_servers/terminus_judge/configs/terminus_judge_string_only.yaml'>terminus_judge_string_only.yaml</a> | - |
| Text To Sql | coding | Text-to-SQL generation with LLM-as-a-judge equivalence checking | Improve text-to-SQL capabilities across multiple dialects | - | - | - | <a href='resources_servers/text_to_sql/configs/text_to_sql.yaml'>text_to_sql.yaml</a> | - |
| Turing Vif | instruction_following | Turing VIF instruction following validators with rule-based and LLM judge support | Improve instruction following capabilities with comprehensive validation | - | - | - | <a href='resources_servers/turing_vif/configs/turing_vif.yaml'>turing_vif.yaml</a> | - |
| Ugphysics Judge | knowledge | Undergraduate physics QA verified by a TRUE/FALSE LLM judge with math-verify symbolic fallback | Score undergraduate-physics benchmarks (e.g. UGPhysics) where the judge is a TRUE/FALSE equivalence grader using a reference solution | - | - | - | <a href='resources_servers/ugphysics_judge/configs/ugphysics_judge.yaml'>ugphysics_judge.yaml</a> | - |
| Verifiers Agent | math | Prime intellect verifiers and environments hub integration, ace-reason math environment example. | Improve math reasoning capabilities. | ✓ | - | - | <a href='responses_api_agents/verifiers_agent/configs/acereason-math.yaml'>acereason-math.yaml</a> | - |
| Verifif | instruction_following | VerifIF instruction following validators with rule-based and LLM judge support | Improve instruction following capabilities with comprehensive validation | - | - | - | <a href='resources_servers/verifif/configs/verifif.yaml'>verifif.yaml</a> | - |
| Vlm Eval Kit | other | - | Measure VLM capabilities | - | ✓ | - | <a href='resources_servers/vlm_eval_kit/configs/MMBench_DEV_EN_V11.yaml'>MMBench_DEV_EN_V11.yaml</a> | - |
| Vlm Eval Kit | other | - | Measure VLM capabilities | - | ✓ | - | <a href='resources_servers/vlm_eval_kit/configs/OCRBench.yaml'>OCRBench.yaml</a> | - |
| Vlm Eval Kit | other | Run all supported VLMEvalKit benchmarks. | Measure VLM capabilities | - | ✓ | - | <a href='resources_servers/vlm_eval_kit/configs/vlm_eval_kit.yaml'>vlm_eval_kit.yaml</a> | - |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Turing VIF Resource Server
# VerifIF Resource Server

A NeMo Gym resource server that integrates **Turing VIF** (Verifiable Instruction Following) validators for comprehensive instruction-following evaluation in reinforcement learning training.
A NeMo Gym resource server that integrates **VerifIF** (Verifiable Instruction Following) validators for comprehensive instruction-following evaluation in reinforcement learning training.

## Overview

Expand Down Expand Up @@ -31,22 +31,22 @@ policy_model_name: gpt-5-2025-08-07 # or gpt-4.1-2025-04-14
```bash
cd /path/to/Gym
source .venv/bin/activate
ng_run "+config_paths=[resources_servers/turing_vif/configs/turing_vif.yaml,responses_api_models/openai_model/configs/openai_model.yaml]"
ng_run "+config_paths=[resources_servers/verifif/configs/verifif.yaml,responses_api_models/openai_model/configs/openai_model.yaml]"
```

### 3. Run a test

```bash
ng_collect_rollouts \
+agent_name=turing_vif_simple_agent \
+input_jsonl_fpath=resources_servers/turing_vif/data/example.jsonl \
+agent_name=verifif_simple_agent \
+input_jsonl_fpath=resources_servers/verifif/data/example.jsonl \
+output_jsonl_fpath=results.jsonl
```

## Architecture

```
turing_vif/
verifif/
├── app.py # Main resource server (TuringVIFResourcesServer)
├── vif_validators/ # Validation logic
│ ├── __init__.py
Expand All @@ -56,7 +56,7 @@ turing_vif/
│ ├── subinstruction_definition.csv
│ └── evaluation_modes.csv
├── configs/
│ └── turing_vif.yaml # Server configuration
│ └── verifif.yaml # Server configuration
├── data/
│ └── example.jsonl # Example dataset
├── tests/
Expand Down Expand Up @@ -90,12 +90,12 @@ turing_vif/

## Configuration

### Server Config (`configs/turing_vif.yaml`)
### Server Config (`configs/verifif.yaml`)

```yaml
turing_vif:
verifif:
resources_servers:
turing_vif:
verifif:
entrypoint: app.py
domain: instruction_following
# Reward aggregation mode
Expand Down Expand Up @@ -123,9 +123,9 @@ Override in your experiment YAML:
```yaml
env:
nemo_gym:
turing_vif:
verifif:
resources_servers:
turing_vif:
verifif:
aggregation_mode: mean
```

Expand Down Expand Up @@ -168,7 +168,7 @@ Each entry in your JSONL dataset should have:
```bash
cd /path/to/Gym
source .venv/bin/activate
pytest resources_servers/turing_vif/tests/ -v
pytest resources_servers/verifif/tests/ -v
```

## API Endpoints
Expand Down Expand Up @@ -237,7 +237,7 @@ For high-throughput training:
| `401 Unauthorized` | Check `policy_api_key` in `env.yaml` |
| `400 Bad Request` with GPT-5 | Ensure you're using the latest `app.py` with Responses API support |
| `ModuleNotFoundError` | Run `ray stop --force` and restart servers |
| Server won't start | Delete `.venv` in `resources_servers/turing_vif/` and restart |
| Server won't start | Delete `.venv` in `resources_servers/verifif/` and restart |

### Debugging

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
# limitations under the License.

"""
Turing VIF Resource Server for NeMo Gym.
VerifIF Resource Server for NeMo Gym.

This resource server integrates the Turing VIF (Verifiable Instruction Following)
This resource server integrates the VerifIF (Verifiable Instruction Following)
validators into NeMo Gym's reinforcement learning framework. It supports both
fast rule-based validators and async LLM-based judge validators.
"""
Expand Down Expand Up @@ -99,7 +99,7 @@ class AggregationMode(str, Enum):


class TuringVIFResourcesServerConfig(BaseResourcesServerConfig):
"""Configuration for the Turing VIF Resource Server."""
"""Configuration for the VerifIF Resource Server."""

judge_server_name: Optional[str] = Field(
default=None,
Expand Down Expand Up @@ -165,7 +165,7 @@ class LLMJudgeItem(BaseModel):


class TuringVIFRunRequest(BaseRunRequest):
"""Request model for the Turing VIF resource server."""
"""Request model for the VerifIF resource server."""

id: int = Field(default=0, description="Request identifier")
instructions: List[Dict[str, Any]] = Field(
Expand Down Expand Up @@ -285,7 +285,7 @@ def _extract_text_from_response(response, exclude_thinking: bool = True) -> str:

class TuringVIFResourcesServer(SimpleResourcesServer):
"""
Turing VIF Resource Server for NeMo Gym.
VerifIF Resource Server for NeMo Gym.

Validates LLM responses against instruction-following criteria using both
fast rule-based validators and async LLM-as-a-judge validators.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Turing VIF Resource Server Configuration
# VerifIF Resource Server Configuration
# This server validates LLM responses against instruction-following criteria
# using both fast rule-based validators and async LLM-as-a-judge validators.

turing_vif:
verifif:
resources_servers:
turing_vif:
verifif:
entrypoint: app.py
domain: instruction_following
verified: false
description: Turing VIF instruction following validators with rule-based and LLM judge support
description: VerifIF instruction following validators with rule-based and LLM judge support
value: Improve instruction following capabilities with comprehensive validation
# Reward aggregation: how individual check verdicts combine into the final reward
# Options: all (AND), any (OR), mean, min, max
Expand All @@ -22,17 +22,17 @@ turing_vif:
judge_top_p: 0.8
judge_max_tokens: 10000

turing_vif_simple_agent:
verifif_simple_agent:
responses_api_agents:
simple_agent:
entrypoint: app.py
resources_server:
type: resources_servers
name: turing_vif
name: verifif
model_server:
type: responses_api_models
name: policy_model
datasets:
- name: example
type: example
jsonl_fpath: resources_servers/turing_vif/data/example.jsonl
jsonl_fpath: resources_servers/verifif/data/example.jsonl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "example",
"type": "example",
"jsonl_fpath": "resources_servers/turing_vif/data/example.jsonl",
"jsonl_fpath": "resources_servers/verifif/data/example.jsonl",
"num_repeats": 1,
"gitlab_identifier": null,
"huggingface_identifier": null,
Expand Down
Loading
Loading