diff --git a/.claude/skills/nemo-gym-pivot-datasets b/.claude/skills/nemo-gym-pivot-datasets new file mode 120000 index 000000000..507a7bf28 --- /dev/null +++ b/.claude/skills/nemo-gym-pivot-datasets @@ -0,0 +1 @@ +../../skills/nemo-gym-pivot-datasets \ No newline at end of file diff --git a/.claude/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md similarity index 67% rename from .claude/skills/nemo-gym-pivot-datasets/SKILL.md rename to skills/nemo-gym-pivot-datasets/SKILL.md index 5636e796f..2c93b6ba6 100644 --- a/.claude/skills/nemo-gym-pivot-datasets/SKILL.md +++ b/skills/nemo-gym-pivot-datasets/SKILL.md @@ -1,14 +1,34 @@ --- name: nemo-gym-pivot-datasets +license: Apache-2.0 description: >- - Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout, - trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym - Responses-style row conversion, pivot selection, single-step tool-use configs, - agent_ref alignment, verifier knobs, expected-action row contracts, and train/eval usage. + Create and validate Nemo Gym single-step pivot datasets from trajectory or + rollout artifacts. Not for reward profiling or debugging runs. +metadata: + author: NVIDIA + tags: + - pivot-dataset + - dataset-conversion + - reinforcement-learning + - single-step + - trajectory --- # Nemo Gym Pivot Datasets +## Purpose + +Convert agent trajectories and rollout artifacts into single-step Nemo Gym pivot +datasets for local RL or evaluation, and validate that a pivot JSONL and its Gym +config can be used together. + +## Prerequisites + +- Source artifacts to convert: rollout, trajectory, chat-completion, Responses API, or tool-call data. +- Python to run `scripts/validate_pivot_dataset.py` and the reference converters. +- The target Gym config (agent and resource-server names) the pivot rows must align with. +- Optionally a Gym checkout (`--gym-repo`) to validate against resource-server Pydantic models. + ## Paper Reference This skill operationalizes [PivotRL](https://arxiv.org/html/2603.21383v1): create local @@ -21,11 +41,17 @@ Use this skill when the task is to turn existing agent trajectories or rollout a Nemo Gym pivot dataset, or to validate whether a pivot JSONL/config pair can be used for single-step local RL or evaluation. +Do not activate this skill for these adjacent tasks: + +- Running or profiling rewards on an existing dataset. Use `nemo-gym-reward-profiling`. +- Debugging a failed or crashed run (Ray/vLLM stack traces, empty output). Use `nemo-gym-debugging`. +- Adding or scaffolding a new benchmark or training environment. Use `add-benchmark`. + Before writing a converter, inspect representative source rows and the target resource server. Do not assume the source field names are the contract. Convert by reconstructing the semantic pieces needed by Gym's Responses-style row format. -## Core Workflow +## Instructions 1. Inspect the source data shape and count the candidate assistant decision points. 2. Identify the semantic fields needed for each pivot: @@ -117,3 +143,31 @@ resource-server request model. The validator accepts both supported expected-action types by default (`function_call` and `message`) and prints an end summary split between tool-call and message pivots. + +## Examples + +Converting chat-completion logs: inspect representative rows, identify each +assistant decision point, and reconstruct `responses_create_params`, +`expected_action` (a single `function_call` or `message`), and `agent_ref` for +each accepted pivot. Route turns with more than one tool call into a skipped-row +audit. Borrow from +`scripts/reference/chat_messages_to_pivot_dataset_reference.py` rather than +running it unchanged. + +Validating a finished dataset: run `scripts/validate_pivot_dataset.py` with the +expected `--agent-ref`, and add `--gym-repo` when the Gym checkout is available +to also validate against the resource-server Pydantic models. + +## Limitations + +- `expected_action` is singular; source turns with more than one tool call are filtered out, not split. +- Reference converters under `scripts/reference/` are dataset-specific examples, not commands to run unchanged. +- A valid JSONL file can still be unusable if the agent and resource-server names do not line up. + +## Troubleshooting + +| Symptom | Likely cause | Resolution | +|---|---|---| +| Validator rejects rows | `agent_ref.name` does not match the config's agent block | Align `agent_ref.name` with the agent used by the generated config | +| Tool-argument matches fail | String-argument threshold too strict | Tune `word_count_similarity_threshold` for the single-step tool-use verifier | +| Structured-decoding path taken unexpectedly | `tool_choice: "required"` routes some engines there | Use `tool_choice: "auto"` for these rows | diff --git a/skills/nemo-gym-pivot-datasets/evals/evals.json b/skills/nemo-gym-pivot-datasets/evals/evals.json new file mode 100644 index 000000000..869f6575d --- /dev/null +++ b/skills/nemo-gym-pivot-datasets/evals/evals.json @@ -0,0 +1,62 @@ +[ + { + "id": "nemo-gym-pivot-datasets-positive-001", + "question": "Convert these tool-call trajectories in a JSONL file into a NeMo Gym pivot dataset I can use for single-step training.", + "expected_skill": "nemo-gym-pivot-datasets", + "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, inspects representative source rows and the target resource server before writing a converter, identifies the semantic fields needed per pivot, converts each accepted decision point to one row with responses_create_params, expected_action, and agent_ref, and runs the bundled validator against the output.", + "expected_behavior": [ + "The agent read nemo-gym-pivot-datasets/SKILL.md before acting", + "The agent inspected representative source rows before writing a converter", + "The agent emitted one pivot row per accepted decision point with responses_create_params, expected_action, and agent_ref", + "The agent filtered out source turns with more than one tool call rather than emitting multi-action rows", + "The agent ran scripts/validate_pivot_dataset.py on the output" + ] + }, + { + "id": "nemo-gym-pivot-datasets-positive-002", + "question": "Validate this pivot.jsonl file against the resources-server request models and the agent_ref I expect — is it usable for training?", + "expected_skill": "nemo-gym-pivot-datasets", + "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill and runs the bundled validator with both --agent-ref and --gym-repo, checks that agent_ref matches the config's agent block, and confirms the row contract for single_step_tool_use_with_argument_comparison.", + "expected_behavior": [ + "The agent read nemo-gym-pivot-datasets/SKILL.md before acting", + "The agent ran scripts/validate_pivot_dataset.py with --agent-ref", + "The agent passed --gym-repo to validate against the resources-server Pydantic models when the Gym repo is available", + "The agent confirmed agent_ref.name matches the agent block used by the config" + ] + }, + { + "id": "nemo-gym-pivot-datasets-positive-003", + "question": "I have a batch of chat-completion rollouts from a different framework. Build me a NeMo Gym pivot dataset and the matching Gym YAML config.", + "expected_skill": "nemo-gym-pivot-datasets", + "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, normalizes the chat-completion rows into Gym Responses-style pivot rows (one expected_action per row), uses tool_choice: auto in the generated config, points the train dataset entry at the pivot JSONL, and aligns agent_ref with the agent block before validating.", + "expected_behavior": [ + "The agent read nemo-gym-pivot-datasets/SKILL.md before acting", + "The agent normalized the chat-completion rows into Responses-style pivot rows", + "The agent set tool_choice: auto in the generated config rather than required", + "The agent pointed the config's train dataset entry directly at the pivot JSONL", + "The agent ensured row-level agent_ref matches the config's agent block" + ] + }, + { + "id": "nemo-gym-pivot-datasets-negative-001", + "question": "Add the cuOpt vehicle-routing benchmark to NeMo-Gym, including data prep and the resources server.", + "expected_skill": null, + "should_trigger": false, + "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a new-benchmark integration task. It should use the add-benchmark skill instead.", + "expected_behavior": [ + "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md", + "The agent recognized this as a benchmark integration task" + ] + }, + { + "id": "nemo-gym-pivot-datasets-negative-002", + "question": "My ng_reward_profile job is producing empty profile rows for half the tasks. Help me figure out what's wrong.", + "expected_skill": null, + "should_trigger": false, + "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a debugging task on an existing reward profiling run. It should use the nemo-gym-debugging skill instead.", + "expected_behavior": [ + "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md", + "The agent recognized this as a debugging task" + ] + } +] diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md b/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md rename to skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md b/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md rename to skills/nemo-gym-pivot-datasets/references/conversion-patterns.md diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/row-contract.md b/skills/nemo-gym-pivot-datasets/references/row-contract.md similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/references/row-contract.md rename to skills/nemo-gym-pivot-datasets/references/row-contract.md diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py b/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py rename to skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py