From b78d327005c73889cda4832ae9224043c755db15 Mon Sep 17 00:00:00 2001 From: Ananth Subramaniam Date: Thu, 28 May 2026 16:58:26 -0700 Subject: [PATCH 1/4] ci(nemo-gym-pivot-datasets): migrate to skills/ for NVSkills CI ## Summary - Move `.claude/skills/nemo-gym-pivot-datasets/` to top-level `skills/nemo-gym-pivot-datasets/` so this PR touches files under the central `team-request.yml` trigger allowlist (`skills/`, `team-skills/`, `rules/team-rules/`, `plugins/`). - Replace `.claude/skills/nemo-gym-pivot-datasets/` with a symlink to `../../skills/nemo-gym-pivot-datasets` so Claude Code and Cursor continue to discover the skill via the conventional `.claude/skills//SKILL.md` path with no tool-side change. - Add `license: Apache-2.0` to `skills/nemo-gym-pivot-datasets/SKILL.md` frontmatter. - Add `skills/nemo-gym-pivot-datasets/evals/evals.json` with positive trigger cases (build pivot from rollouts, single-step tool-use configs, agent_ref alignment) and negative cases that delegate to sibling skills. ## Motivation Prepares the `nemo-gym-pivot-datasets` skill for NVSkills CI signing. Per-skill scope keeps the diff small and lets NVSkills CI evaluate one skill at a time. Other skills remain at `.claude/skills//` until each has its own migration PR. ## Test plan - [ ] Comment `/nvskills-ci` on this PR. Expect the request workflow to dispatch (not skip) and `svc-nvskills-signing` to attach `skill-card.md` and `skill.oms.sig` under `skills/nemo-gym-pivot-datasets/`. - [ ] Claude Code discovers `nemo-gym-pivot-datasets` via `.claude/skills/nemo-gym-pivot-datasets/SKILL.md` (follows symlink). Signed-off-by: Ananth Subramaniam --- .claude/skills/nemo-gym-pivot-datasets | 1 + .../nemo-gym-pivot-datasets/SKILL.md | 1 + .../nemo-gym-pivot-datasets/evals/evals.json | 62 +++++++++++++++++++ .../config-training-and-agent-ref.md | 0 .../references/conversion-patterns.md | 0 .../references/row-contract.md | 0 ...hat_messages_to_pivot_dataset_reference.py | 0 ...nal_messages_to_pivot_dataset_reference.py | 0 .../generic_pivot_dataset_reference.py | 0 ...ool_messages_to_pivot_dataset_reference.py | 0 .../scripts/validate_pivot_dataset.py | 0 11 files changed, 64 insertions(+) create mode 120000 .claude/skills/nemo-gym-pivot-datasets rename {.claude/skills => skills}/nemo-gym-pivot-datasets/SKILL.md (99%) create mode 100644 skills/nemo-gym-pivot-datasets/evals/evals.json rename {.claude/skills => skills}/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/references/conversion-patterns.md (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/references/row-contract.md (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py (100%) rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py (100%) diff --git a/.claude/skills/nemo-gym-pivot-datasets b/.claude/skills/nemo-gym-pivot-datasets new file mode 120000 index 000000000..507a7bf28 --- /dev/null +++ b/.claude/skills/nemo-gym-pivot-datasets @@ -0,0 +1 @@ +../../skills/nemo-gym-pivot-datasets \ No newline at end of file diff --git a/.claude/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md similarity index 99% rename from .claude/skills/nemo-gym-pivot-datasets/SKILL.md rename to skills/nemo-gym-pivot-datasets/SKILL.md index 5636e796f..59b085c12 100644 --- a/.claude/skills/nemo-gym-pivot-datasets/SKILL.md +++ b/skills/nemo-gym-pivot-datasets/SKILL.md @@ -1,5 +1,6 @@ --- name: nemo-gym-pivot-datasets +license: Apache-2.0 description: >- Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout, trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym diff --git a/skills/nemo-gym-pivot-datasets/evals/evals.json b/skills/nemo-gym-pivot-datasets/evals/evals.json new file mode 100644 index 000000000..869f6575d --- /dev/null +++ b/skills/nemo-gym-pivot-datasets/evals/evals.json @@ -0,0 +1,62 @@ +[ + { + "id": "nemo-gym-pivot-datasets-positive-001", + "question": "Convert these tool-call trajectories in a JSONL file into a NeMo Gym pivot dataset I can use for single-step training.", + "expected_skill": "nemo-gym-pivot-datasets", + "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, inspects representative source rows and the target resource server before writing a converter, identifies the semantic fields needed per pivot, converts each accepted decision point to one row with responses_create_params, expected_action, and agent_ref, and runs the bundled validator against the output.", + "expected_behavior": [ + "The agent read nemo-gym-pivot-datasets/SKILL.md before acting", + "The agent inspected representative source rows before writing a converter", + "The agent emitted one pivot row per accepted decision point with responses_create_params, expected_action, and agent_ref", + "The agent filtered out source turns with more than one tool call rather than emitting multi-action rows", + "The agent ran scripts/validate_pivot_dataset.py on the output" + ] + }, + { + "id": "nemo-gym-pivot-datasets-positive-002", + "question": "Validate this pivot.jsonl file against the resources-server request models and the agent_ref I expect — is it usable for training?", + "expected_skill": "nemo-gym-pivot-datasets", + "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill and runs the bundled validator with both --agent-ref and --gym-repo, checks that agent_ref matches the config's agent block, and confirms the row contract for single_step_tool_use_with_argument_comparison.", + "expected_behavior": [ + "The agent read nemo-gym-pivot-datasets/SKILL.md before acting", + "The agent ran scripts/validate_pivot_dataset.py with --agent-ref", + "The agent passed --gym-repo to validate against the resources-server Pydantic models when the Gym repo is available", + "The agent confirmed agent_ref.name matches the agent block used by the config" + ] + }, + { + "id": "nemo-gym-pivot-datasets-positive-003", + "question": "I have a batch of chat-completion rollouts from a different framework. Build me a NeMo Gym pivot dataset and the matching Gym YAML config.", + "expected_skill": "nemo-gym-pivot-datasets", + "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, normalizes the chat-completion rows into Gym Responses-style pivot rows (one expected_action per row), uses tool_choice: auto in the generated config, points the train dataset entry at the pivot JSONL, and aligns agent_ref with the agent block before validating.", + "expected_behavior": [ + "The agent read nemo-gym-pivot-datasets/SKILL.md before acting", + "The agent normalized the chat-completion rows into Responses-style pivot rows", + "The agent set tool_choice: auto in the generated config rather than required", + "The agent pointed the config's train dataset entry directly at the pivot JSONL", + "The agent ensured row-level agent_ref matches the config's agent block" + ] + }, + { + "id": "nemo-gym-pivot-datasets-negative-001", + "question": "Add the cuOpt vehicle-routing benchmark to NeMo-Gym, including data prep and the resources server.", + "expected_skill": null, + "should_trigger": false, + "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a new-benchmark integration task. It should use the add-benchmark skill instead.", + "expected_behavior": [ + "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md", + "The agent recognized this as a benchmark integration task" + ] + }, + { + "id": "nemo-gym-pivot-datasets-negative-002", + "question": "My ng_reward_profile job is producing empty profile rows for half the tasks. Help me figure out what's wrong.", + "expected_skill": null, + "should_trigger": false, + "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a debugging task on an existing reward profiling run. It should use the nemo-gym-debugging skill instead.", + "expected_behavior": [ + "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md", + "The agent recognized this as a debugging task" + ] + } +] diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md b/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md rename to skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md b/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md rename to skills/nemo-gym-pivot-datasets/references/conversion-patterns.md diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/row-contract.md b/skills/nemo-gym-pivot-datasets/references/row-contract.md similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/references/row-contract.md rename to skills/nemo-gym-pivot-datasets/references/row-contract.md diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py rename to skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py b/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py similarity index 100% rename from .claude/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py rename to skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py From 270fbd1b59f8680cef028da0239a591d84885eaf Mon Sep 17 00:00:00 2001 From: Ananth Subramaniam Date: Fri, 29 May 2026 09:03:28 -0700 Subject: [PATCH 2/4] ci(nemo-gym-pivot-datasets): address NVSkills CI content feedback Apply the same content fixes validated on add-benchmark: - Add metadata.author and metadata.tags. - Tighten the description and add negative triggers (not for general reward profiling or debugging runs). - Add a Purpose section and an Examples section, and rename Core Workflow to Instructions. Signed-off-by: Ananth Subramaniam --- skills/nemo-gym-pivot-datasets/SKILL.md | 40 +++++++++++++++++++++---- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md index 59b085c12..4fc9e8957 100644 --- a/skills/nemo-gym-pivot-datasets/SKILL.md +++ b/skills/nemo-gym-pivot-datasets/SKILL.md @@ -2,14 +2,30 @@ name: nemo-gym-pivot-datasets license: Apache-2.0 description: >- - Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout, - trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym - Responses-style row conversion, pivot selection, single-step tool-use configs, - agent_ref alignment, verifier knobs, expected-action row contracts, and train/eval usage. + Create, validate, or document Nemo Gym pivot datasets from rollout, trajectory, + chat-completion, Responses API, or tool-call artifacts: Gym Responses-style row + conversion, pivot selection, single-step tool-use configs, agent_ref alignment, + verifier knobs, expected-action row contracts, and train/eval usage. Not for + general reward profiling (use nemo-gym-reward-profiling) or debugging runs (use + nemo-gym-debugging). +metadata: + author: NVIDIA + tags: + - pivot-dataset + - dataset-conversion + - reinforcement-learning + - single-step + - trajectory --- # Nemo Gym Pivot Datasets +## Purpose + +Convert agent trajectories and rollout artifacts into single-step Nemo Gym pivot +datasets for local RL or evaluation, and validate that a pivot JSONL and its Gym +config can be used together. + ## Paper Reference This skill operationalizes [PivotRL](https://arxiv.org/html/2603.21383v1): create local @@ -26,7 +42,7 @@ Before writing a converter, inspect representative source rows and the target re Do not assume the source field names are the contract. Convert by reconstructing the semantic pieces needed by Gym's Responses-style row format. -## Core Workflow +## Instructions 1. Inspect the source data shape and count the candidate assistant decision points. 2. Identify the semantic fields needed for each pivot: @@ -118,3 +134,17 @@ resource-server request model. The validator accepts both supported expected-action types by default (`function_call` and `message`) and prints an end summary split between tool-call and message pivots. + +## Examples + +Converting chat-completion logs: inspect representative rows, identify each +assistant decision point, and reconstruct `responses_create_params`, +`expected_action` (a single `function_call` or `message`), and `agent_ref` for +each accepted pivot. Route turns with more than one tool call into a skipped-row +audit. Borrow from +`scripts/reference/chat_messages_to_pivot_dataset_reference.py` rather than +running it unchanged. + +Validating a finished dataset: run `scripts/validate_pivot_dataset.py` with the +expected `--agent-ref`, and add `--gym-repo` when the Gym checkout is available +to also validate against the resource-server Pydantic models. From e8b55090b621194f840ec1c59bbf3f8d257cce39 Mon Sep 17 00:00:00 2001 From: Ananth Subramaniam Date: Fri, 29 May 2026 15:10:51 -0700 Subject: [PATCH 3/4] ci(nemo-gym-pivot-datasets): clear low-severity quality advisories - Shorten the description to under 150 characters. - Add Prerequisites, Limitations, and Troubleshooting sections. Signed-off-by: Ananth Subramaniam --- skills/nemo-gym-pivot-datasets/SKILL.md | 29 ++++++++++++++++++++----- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md index 4fc9e8957..2a19ce9eb 100644 --- a/skills/nemo-gym-pivot-datasets/SKILL.md +++ b/skills/nemo-gym-pivot-datasets/SKILL.md @@ -2,12 +2,8 @@ name: nemo-gym-pivot-datasets license: Apache-2.0 description: >- - Create, validate, or document Nemo Gym pivot datasets from rollout, trajectory, - chat-completion, Responses API, or tool-call artifacts: Gym Responses-style row - conversion, pivot selection, single-step tool-use configs, agent_ref alignment, - verifier knobs, expected-action row contracts, and train/eval usage. Not for - general reward profiling (use nemo-gym-reward-profiling) or debugging runs (use - nemo-gym-debugging). + Create and validate Nemo Gym single-step pivot datasets from trajectory or + rollout artifacts. Not for reward profiling or debugging runs. metadata: author: NVIDIA tags: @@ -26,6 +22,13 @@ Convert agent trajectories and rollout artifacts into single-step Nemo Gym pivot datasets for local RL or evaluation, and validate that a pivot JSONL and its Gym config can be used together. +## Prerequisites + +- Source artifacts to convert: rollout, trajectory, chat-completion, Responses API, or tool-call data. +- Python to run `scripts/validate_pivot_dataset.py` and the reference converters. +- The target Gym config (agent and resource-server names) the pivot rows must align with. +- Optionally a Gym checkout (`--gym-repo`) to validate against resource-server Pydantic models. + ## Paper Reference This skill operationalizes [PivotRL](https://arxiv.org/html/2603.21383v1): create local @@ -148,3 +151,17 @@ running it unchanged. Validating a finished dataset: run `scripts/validate_pivot_dataset.py` with the expected `--agent-ref`, and add `--gym-repo` when the Gym checkout is available to also validate against the resource-server Pydantic models. + +## Limitations + +- `expected_action` is singular; source turns with more than one tool call are filtered out, not split. +- Reference converters under `scripts/reference/` are dataset-specific examples, not commands to run unchanged. +- A valid JSONL file can still be unusable if the agent and resource-server names do not line up. + +## Troubleshooting + +| Symptom | Likely cause | Resolution | +|---|---|---| +| Validator rejects rows | `agent_ref.name` does not match the config's agent block | Align `agent_ref.name` with the agent used by the generated config | +| Tool-argument matches fail | String-argument threshold too strict | Tune `word_count_similarity_threshold` for the single-step tool-use verifier | +| Structured-decoding path taken unexpectedly | `tool_choice: "required"` routes some engines there | Use `tool_choice: "auto"` for these rows | From 293203f014690a9a60369fcc3fc73a84df9fba51 Mon Sep 17 00:00:00 2001 From: Ananth Subramaniam Date: Fri, 29 May 2026 21:54:21 -0700 Subject: [PATCH 4/4] ci(nemo-gym-pivot-datasets): add do-not-activate routing to siblings Add explicit routing in the invocation check so the skill defers to nemo-gym-reward-profiling, nemo-gym-debugging, and add-benchmark for adjacent tasks, reducing misrouted activations in agent evaluation. Signed-off-by: Ananth Subramaniam --- skills/nemo-gym-pivot-datasets/SKILL.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md index 2a19ce9eb..2c93b6ba6 100644 --- a/skills/nemo-gym-pivot-datasets/SKILL.md +++ b/skills/nemo-gym-pivot-datasets/SKILL.md @@ -41,6 +41,12 @@ Use this skill when the task is to turn existing agent trajectories or rollout a Nemo Gym pivot dataset, or to validate whether a pivot JSONL/config pair can be used for single-step local RL or evaluation. +Do not activate this skill for these adjacent tasks: + +- Running or profiling rewards on an existing dataset. Use `nemo-gym-reward-profiling`. +- Debugging a failed or crashed run (Ray/vLLM stack traces, empty output). Use `nemo-gym-debugging`. +- Adding or scaffolding a new benchmark or training environment. Use `add-benchmark`. + Before writing a converter, inspect representative source rows and the target resource server. Do not assume the source field names are the contract. Convert by reconstructing the semantic pieces needed by Gym's Responses-style row format.