NVIDIA-NeMo · ananthsub · May 28, 2026 · May 29, 2026 · May 29, 2026 · May 30, 2026
diff --git a/.claude/skills/nemo-gym-pivot-datasets b/.claude/skills/nemo-gym-pivot-datasets
@@ -0,0 +1 @@
+../../skills/nemo-gym-pivot-datasets
diff --git a/...e/skills/nemo-gym-pivot-datasets/SKILL.md → skills/nemo-gym-pivot-datasets/SKILL.md b/...e/skills/nemo-gym-pivot-datasets/SKILL.md → skills/nemo-gym-pivot-datasets/SKILL.md
@@ -1,14 +1,34 @@
 ---
 name: nemo-gym-pivot-datasets
+license: Apache-2.0
 description: >-
-  Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout,
-  trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym
-  Responses-style row conversion, pivot selection, single-step tool-use configs,
-  agent_ref alignment, verifier knobs, expected-action row contracts, and train/eval usage.
+  Create and validate Nemo Gym single-step pivot datasets from trajectory or
+  rollout artifacts. Not for reward profiling or debugging runs.
+metadata:
+  author: NVIDIA <nemo-gym@nvidia.com>
+  tags:
+    - pivot-dataset
+    - dataset-conversion
+    - reinforcement-learning
+    - single-step
+    - trajectory
 ---
 
 # Nemo Gym Pivot Datasets
 
+## Purpose
+
+Convert agent trajectories and rollout artifacts into single-step Nemo Gym pivot
+datasets for local RL or evaluation, and validate that a pivot JSONL and its Gym
+config can be used together.
+
+## Prerequisites
+
+- Source artifacts to convert: rollout, trajectory, chat-completion, Responses API, or tool-call data.
+- Python to run `scripts/validate_pivot_dataset.py` and the reference converters.
+- The target Gym config (agent and resource-server names) the pivot rows must align with.
+- Optionally a Gym checkout (`--gym-repo`) to validate against resource-server Pydantic models.
+
 ## Paper Reference
 
 This skill operationalizes [PivotRL](https://arxiv.org/html/2603.21383v1): create local
@@ -21,11 +41,17 @@ Use this skill when the task is to turn existing agent trajectories or rollout a
 Nemo Gym pivot dataset, or to validate whether a pivot JSONL/config pair can be used for
 single-step local RL or evaluation.
 
+Do not activate this skill for these adjacent tasks:
+
+- Running or profiling rewards on an existing dataset. Use `nemo-gym-reward-profiling`.
+- Debugging a failed or crashed run (Ray/vLLM stack traces, empty output). Use `nemo-gym-debugging`.
+- Adding or scaffolding a new benchmark or training environment. Use `add-benchmark`.
+
 Before writing a converter, inspect representative source rows and the target resource server.
 Do not assume the source field names are the contract. Convert by reconstructing the semantic
 pieces needed by Gym's Responses-style row format.
 
-## Core Workflow
+## Instructions
 
 1. Inspect the source data shape and count the candidate assistant decision points.
 2. Identify the semantic fields needed for each pivot:
@@ -117,3 +143,31 @@ resource-server request model.
 
 The validator accepts both supported expected-action types by default (`function_call` and `message`)
 and prints an end summary split between tool-call and message pivots.
+
+## Examples
+
+Converting chat-completion logs: inspect representative rows, identify each
+assistant decision point, and reconstruct `responses_create_params`,
+`expected_action` (a single `function_call` or `message`), and `agent_ref` for
+each accepted pivot. Route turns with more than one tool call into a skipped-row
+audit. Borrow from
+`scripts/reference/chat_messages_to_pivot_dataset_reference.py` rather than
+running it unchanged.
+
+Validating a finished dataset: run `scripts/validate_pivot_dataset.py` with the
+expected `--agent-ref`, and add `--gym-repo` when the Gym checkout is available
+to also validate against the resource-server Pydantic models.
+
+## Limitations
+
+- `expected_action` is singular; source turns with more than one tool call are filtered out, not split.
+- Reference converters under `scripts/reference/` are dataset-specific examples, not commands to run unchanged.
+- A valid JSONL file can still be unusable if the agent and resource-server names do not line up.
+
+## Troubleshooting
+
+| Symptom | Likely cause | Resolution |
+|---|---|---|
+| Validator rejects rows | `agent_ref.name` does not match the config's agent block | Align `agent_ref.name` with the agent used by the generated config |
+| Tool-argument matches fail | String-argument threshold too strict | Tune `word_count_similarity_threshold` for the single-step tool-use verifier |
+| Structured-decoding path taken unexpectedly | `tool_choice: "required"` routes some engines there | Use `tool_choice: "auto"` for these rows |
diff --git a/skills/nemo-gym-pivot-datasets/evals/evals.json b/skills/nemo-gym-pivot-datasets/evals/evals.json
@@ -0,0 +1,62 @@
+[
+  {
+    "id": "nemo-gym-pivot-datasets-positive-001",
+    "question": "Convert these tool-call trajectories in a JSONL file into a NeMo Gym pivot dataset I can use for single-step training.",
+    "expected_skill": "nemo-gym-pivot-datasets",
+    "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, inspects representative source rows and the target resource server before writing a converter, identifies the semantic fields needed per pivot, converts each accepted decision point to one row with responses_create_params, expected_action, and agent_ref, and runs the bundled validator against the output.",
+    "expected_behavior": [
+      "The agent read nemo-gym-pivot-datasets/SKILL.md before acting",
+      "The agent inspected representative source rows before writing a converter",
+      "The agent emitted one pivot row per accepted decision point with responses_create_params, expected_action, and agent_ref",
+      "The agent filtered out source turns with more than one tool call rather than emitting multi-action rows",
+      "The agent ran scripts/validate_pivot_dataset.py on the output"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-positive-002",
+    "question": "Validate this pivot.jsonl file against the resources-server request models and the agent_ref I expect — is it usable for training?",
+    "expected_skill": "nemo-gym-pivot-datasets",
+    "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill and runs the bundled validator with both --agent-ref and --gym-repo, checks that agent_ref matches the config's agent block, and confirms the row contract for single_step_tool_use_with_argument_comparison.",
+    "expected_behavior": [
+      "The agent read nemo-gym-pivot-datasets/SKILL.md before acting",
+      "The agent ran scripts/validate_pivot_dataset.py with --agent-ref",
+      "The agent passed --gym-repo to validate against the resources-server Pydantic models when the Gym repo is available",
+      "The agent confirmed agent_ref.name matches the agent block used by the config"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-positive-003",
+    "question": "I have a batch of chat-completion rollouts from a different framework. Build me a NeMo Gym pivot dataset and the matching Gym YAML config.",
+    "expected_skill": "nemo-gym-pivot-datasets",
+    "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, normalizes the chat-completion rows into Gym Responses-style pivot rows (one expected_action per row), uses tool_choice: auto in the generated config, points the train dataset entry at the pivot JSONL, and aligns agent_ref with the agent block before validating.",
+    "expected_behavior": [
+      "The agent read nemo-gym-pivot-datasets/SKILL.md before acting",
+      "The agent normalized the chat-completion rows into Responses-style pivot rows",
+      "The agent set tool_choice: auto in the generated config rather than required",
+      "The agent pointed the config's train dataset entry directly at the pivot JSONL",
+      "The agent ensured row-level agent_ref matches the config's agent block"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-negative-001",
+    "question": "Add the cuOpt vehicle-routing benchmark to NeMo-Gym, including data prep and the resources server.",
+    "expected_skill": null,
+    "should_trigger": false,
+    "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a new-benchmark integration task. It should use the add-benchmark skill instead.",
+    "expected_behavior": [
+      "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md",
+      "The agent recognized this as a benchmark integration task"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-negative-002",
+    "question": "My ng_reward_profile job is producing empty profile rows for half the tasks. Help me figure out what's wrong.",
+    "expected_skill": null,
+    "should_trigger": false,
+    "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a debugging task on an existing reward profiling run. It should use the nemo-gym-debugging skill instead.",
+    "expected_behavior": [
+      "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md",
+      "The agent recognized this as a debugging task"
+    ]
+  }
+]
diff --git a/...ferences/config-training-and-agent-ref.md → ...ferences/config-training-and-agent-ref.md b/...ferences/config-training-and-agent-ref.md → ...ferences/config-training-and-agent-ref.md
diff --git a/...atasets/references/conversion-patterns.md → ...atasets/references/conversion-patterns.md b/...atasets/references/conversion-patterns.md → ...atasets/references/conversion-patterns.md
diff --git a/...pivot-datasets/references/row-contract.md → ...pivot-datasets/references/row-contract.md b/...pivot-datasets/references/row-contract.md → ...pivot-datasets/references/row-contract.md
diff --git a/...at_messages_to_pivot_dataset_reference.py → ...at_messages_to_pivot_dataset_reference.py b/...at_messages_to_pivot_dataset_reference.py → ...at_messages_to_pivot_dataset_reference.py
diff --git a/...al_messages_to_pivot_dataset_reference.py → ...al_messages_to_pivot_dataset_reference.py b/...al_messages_to_pivot_dataset_reference.py → ...al_messages_to_pivot_dataset_reference.py
diff --git a/...erence/generic_pivot_dataset_reference.py → ...erence/generic_pivot_dataset_reference.py b/...erence/generic_pivot_dataset_reference.py → ...erence/generic_pivot_dataset_reference.py
diff --git a/...ol_messages_to_pivot_dataset_reference.py → ...ol_messages_to_pivot_dataset_reference.py b/...ol_messages_to_pivot_dataset_reference.py → ...ol_messages_to_pivot_dataset_reference.py
diff --git a/...atasets/scripts/validate_pivot_dataset.py → ...atasets/scripts/validate_pivot_dataset.py b/...atasets/scripts/validate_pivot_dataset.py → ...atasets/scripts/validate_pivot_dataset.py