From b78d327005c73889cda4832ae9224043c755db15 Mon Sep 17 00:00:00 2001
From: Ananth Subramaniam <ansubramania@nvidia.com>
Date: Thu, 28 May 2026 16:58:26 -0700
Subject: [PATCH 1/4] ci(nemo-gym-pivot-datasets): migrate to skills/ for
 NVSkills CI

## Summary

- Move `.claude/skills/nemo-gym-pivot-datasets/` to top-level
  `skills/nemo-gym-pivot-datasets/` so this PR touches files under
  the central `team-request.yml` trigger allowlist (`skills/`,
  `team-skills/`, `rules/team-rules/`, `plugins/`).
- Replace `.claude/skills/nemo-gym-pivot-datasets/` with a symlink to
  `../../skills/nemo-gym-pivot-datasets` so Claude Code and Cursor
  continue to discover the skill via the conventional
  `.claude/skills/<name>/SKILL.md` path with no tool-side change.
- Add `license: Apache-2.0` to
  `skills/nemo-gym-pivot-datasets/SKILL.md` frontmatter.
- Add `skills/nemo-gym-pivot-datasets/evals/evals.json` with positive
  trigger cases (build pivot from rollouts, single-step tool-use
  configs, agent_ref alignment) and negative cases that delegate to
  sibling skills.

## Motivation

Prepares the `nemo-gym-pivot-datasets` skill for NVSkills CI signing.
Per-skill scope keeps the diff small and lets NVSkills CI evaluate
one skill at a time. Other skills remain at `.claude/skills/<name>/`
until each has its own migration PR.

## Test plan

- [ ] Comment `/nvskills-ci` on this PR. Expect the request workflow
      to dispatch (not skip) and `svc-nvskills-signing` to attach
      `skill-card.md` and `skill.oms.sig` under
      `skills/nemo-gym-pivot-datasets/`.
- [ ] Claude Code discovers `nemo-gym-pivot-datasets` via
      `.claude/skills/nemo-gym-pivot-datasets/SKILL.md` (follows symlink).

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
---
 .claude/skills/nemo-gym-pivot-datasets        |  1 +
 .../nemo-gym-pivot-datasets/SKILL.md          |  1 +
 .../nemo-gym-pivot-datasets/evals/evals.json  | 62 +++++++++++++++++++
 .../config-training-and-agent-ref.md          |  0
 .../references/conversion-patterns.md         |  0
 .../references/row-contract.md                |  0
 ...hat_messages_to_pivot_dataset_reference.py |  0
 ...nal_messages_to_pivot_dataset_reference.py |  0
 .../generic_pivot_dataset_reference.py        |  0
 ...ool_messages_to_pivot_dataset_reference.py |  0
 .../scripts/validate_pivot_dataset.py         |  0
 11 files changed, 64 insertions(+)
 create mode 120000 .claude/skills/nemo-gym-pivot-datasets
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/SKILL.md (99%)
 create mode 100644 skills/nemo-gym-pivot-datasets/evals/evals.json
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/references/conversion-patterns.md (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/references/row-contract.md (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py (100%)
 rename {.claude/skills => skills}/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py (100%)
diff --git a/.claude/skills/nemo-gym-pivot-datasets b/.claude/skills/nemo-gym-pivot-datasets
new file mode 120000
index 000000000..507a7bf28
--- /dev/null
+++ b/.claude/skills/nemo-gym-pivot-datasets
@@ -0,0 +1 @@
+../../skills/nemo-gym-pivot-datasets
\ No newline at end of file
diff --git a/.claude/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md
similarity index 99%
rename from .claude/skills/nemo-gym-pivot-datasets/SKILL.md
rename to skills/nemo-gym-pivot-datasets/SKILL.md
index 5636e796f..59b085c12 100644
--- a/.claude/skills/nemo-gym-pivot-datasets/SKILL.md
+++ b/skills/nemo-gym-pivot-datasets/SKILL.md
@@ -1,5 +1,6 @@
 ---
 name: nemo-gym-pivot-datasets
+license: Apache-2.0
 description: >-
   Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout,
   trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym
diff --git a/skills/nemo-gym-pivot-datasets/evals/evals.json b/skills/nemo-gym-pivot-datasets/evals/evals.json
new file mode 100644
index 000000000..869f6575d
--- /dev/null
+++ b/skills/nemo-gym-pivot-datasets/evals/evals.json
@@ -0,0 +1,62 @@
+[
+  {
+    "id": "nemo-gym-pivot-datasets-positive-001",
+    "question": "Convert these tool-call trajectories in a JSONL file into a NeMo Gym pivot dataset I can use for single-step training.",
+    "expected_skill": "nemo-gym-pivot-datasets",
+    "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, inspects representative source rows and the target resource server before writing a converter, identifies the semantic fields needed per pivot, converts each accepted decision point to one row with responses_create_params, expected_action, and agent_ref, and runs the bundled validator against the output.",
+    "expected_behavior": [
+      "The agent read nemo-gym-pivot-datasets/SKILL.md before acting",
+      "The agent inspected representative source rows before writing a converter",
+      "The agent emitted one pivot row per accepted decision point with responses_create_params, expected_action, and agent_ref",
+      "The agent filtered out source turns with more than one tool call rather than emitting multi-action rows",
+      "The agent ran scripts/validate_pivot_dataset.py on the output"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-positive-002",
+    "question": "Validate this pivot.jsonl file against the resources-server request models and the agent_ref I expect — is it usable for training?",
+    "expected_skill": "nemo-gym-pivot-datasets",
+    "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill and runs the bundled validator with both --agent-ref and --gym-repo, checks that agent_ref matches the config's agent block, and confirms the row contract for single_step_tool_use_with_argument_comparison.",
+    "expected_behavior": [
+      "The agent read nemo-gym-pivot-datasets/SKILL.md before acting",
+      "The agent ran scripts/validate_pivot_dataset.py with --agent-ref",
+      "The agent passed --gym-repo to validate against the resources-server Pydantic models when the Gym repo is available",
+      "The agent confirmed agent_ref.name matches the agent block used by the config"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-positive-003",
+    "question": "I have a batch of chat-completion rollouts from a different framework. Build me a NeMo Gym pivot dataset and the matching Gym YAML config.",
+    "expected_skill": "nemo-gym-pivot-datasets",
+    "ground_truth": "The agent loads the nemo-gym-pivot-datasets skill, normalizes the chat-completion rows into Gym Responses-style pivot rows (one expected_action per row), uses tool_choice: auto in the generated config, points the train dataset entry at the pivot JSONL, and aligns agent_ref with the agent block before validating.",
+    "expected_behavior": [
+      "The agent read nemo-gym-pivot-datasets/SKILL.md before acting",
+      "The agent normalized the chat-completion rows into Responses-style pivot rows",
+      "The agent set tool_choice: auto in the generated config rather than required",
+      "The agent pointed the config's train dataset entry directly at the pivot JSONL",
+      "The agent ensured row-level agent_ref matches the config's agent block"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-negative-001",
+    "question": "Add the cuOpt vehicle-routing benchmark to NeMo-Gym, including data prep and the resources server.",
+    "expected_skill": null,
+    "should_trigger": false,
+    "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a new-benchmark integration task. It should use the add-benchmark skill instead.",
+    "expected_behavior": [
+      "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md",
+      "The agent recognized this as a benchmark integration task"
+    ]
+  },
+  {
+    "id": "nemo-gym-pivot-datasets-negative-002",
+    "question": "My ng_reward_profile job is producing empty profile rows for half the tasks. Help me figure out what's wrong.",
+    "expected_skill": null,
+    "should_trigger": false,
+    "ground_truth": "The agent should not activate the nemo-gym-pivot-datasets skill for a debugging task on an existing reward profiling run. It should use the nemo-gym-debugging skill instead.",
+    "expected_behavior": [
+      "The agent did not read or activate nemo-gym-pivot-datasets/SKILL.md",
+      "The agent recognized this as a debugging task"
+    ]
+  }
+]
diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md b/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md
rename to skills/nemo-gym-pivot-datasets/references/config-training-and-agent-ref.md
diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md b/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/references/conversion-patterns.md
rename to skills/nemo-gym-pivot-datasets/references/conversion-patterns.md
diff --git a/.claude/skills/nemo-gym-pivot-datasets/references/row-contract.md b/skills/nemo-gym-pivot-datasets/references/row-contract.md
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/references/row-contract.md
rename to skills/nemo-gym-pivot-datasets/references/row-contract.md
diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py
rename to skills/nemo-gym-pivot-datasets/scripts/reference/chat_messages_to_pivot_dataset_reference.py
diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py
rename to skills/nemo-gym-pivot-datasets/scripts/reference/conversational_messages_to_pivot_dataset_reference.py
diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py
rename to skills/nemo-gym-pivot-datasets/scripts/reference/generic_pivot_dataset_reference.py
diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py b/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py
rename to skills/nemo-gym-pivot-datasets/scripts/reference/tool_messages_to_pivot_dataset_reference.py
diff --git a/.claude/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py b/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py
similarity index 100%
rename from .claude/skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py
rename to skills/nemo-gym-pivot-datasets/scripts/validate_pivot_dataset.py

From 270fbd1b59f8680cef028da0239a591d84885eaf Mon Sep 17 00:00:00 2001
From: Ananth Subramaniam <ansubramania@nvidia.com>
Date: Fri, 29 May 2026 09:03:28 -0700
Subject: [PATCH 2/4] ci(nemo-gym-pivot-datasets): address NVSkills CI content
 feedback

Apply the same content fixes validated on add-benchmark:
- Add metadata.author and metadata.tags.
- Tighten the description and add negative triggers (not for general
  reward profiling or debugging runs).
- Add a Purpose section and an Examples section, and rename Core
  Workflow to Instructions.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
---
 skills/nemo-gym-pivot-datasets/SKILL.md | 40 +++++++++++++++++++++----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md
index 59b085c12..4fc9e8957 100644
--- a/skills/nemo-gym-pivot-datasets/SKILL.md
+++ b/skills/nemo-gym-pivot-datasets/SKILL.md
@@ -2,14 +2,30 @@
 name: nemo-gym-pivot-datasets
 license: Apache-2.0
 description: >-
-  Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout,
-  trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym
-  Responses-style row conversion, pivot selection, single-step tool-use configs,
-  agent_ref alignment, verifier knobs, expected-action row contracts, and train/eval usage.
+  Create, validate, or document Nemo Gym pivot datasets from rollout, trajectory,
+  chat-completion, Responses API, or tool-call artifacts: Gym Responses-style row
+  conversion, pivot selection, single-step tool-use configs, agent_ref alignment,
+  verifier knobs, expected-action row contracts, and train/eval usage. Not for
+  general reward profiling (use nemo-gym-reward-profiling) or debugging runs (use
+  nemo-gym-debugging).
+metadata:
+  author: NVIDIA <nemo-gym@nvidia.com>
+  tags:
+    - pivot-dataset
+    - dataset-conversion
+    - reinforcement-learning
+    - single-step
+    - trajectory
 ---
 
 # Nemo Gym Pivot Datasets
 
+## Purpose
+
+Convert agent trajectories and rollout artifacts into single-step Nemo Gym pivot
+datasets for local RL or evaluation, and validate that a pivot JSONL and its Gym
+config can be used together.
+
 ## Paper Reference
 
 This skill operationalizes [PivotRL](https://arxiv.org/html/2603.21383v1): create local
@@ -26,7 +42,7 @@ Before writing a converter, inspect representative source rows and the target re
 Do not assume the source field names are the contract. Convert by reconstructing the semantic
 pieces needed by Gym's Responses-style row format.
 
-## Core Workflow
+## Instructions
 
 1. Inspect the source data shape and count the candidate assistant decision points.
 2. Identify the semantic fields needed for each pivot:
@@ -118,3 +134,17 @@ resource-server request model.
 
 The validator accepts both supported expected-action types by default (`function_call` and `message`)
 and prints an end summary split between tool-call and message pivots.
+
+## Examples
+
+Converting chat-completion logs: inspect representative rows, identify each
+assistant decision point, and reconstruct `responses_create_params`,
+`expected_action` (a single `function_call` or `message`), and `agent_ref` for
+each accepted pivot. Route turns with more than one tool call into a skipped-row
+audit. Borrow from
+`scripts/reference/chat_messages_to_pivot_dataset_reference.py` rather than
+running it unchanged.
+
+Validating a finished dataset: run `scripts/validate_pivot_dataset.py` with the
+expected `--agent-ref`, and add `--gym-repo` when the Gym checkout is available
+to also validate against the resource-server Pydantic models.

From e8b55090b621194f840ec1c59bbf3f8d257cce39 Mon Sep 17 00:00:00 2001
From: Ananth Subramaniam <ansubramania@nvidia.com>
Date: Fri, 29 May 2026 15:10:51 -0700
Subject: [PATCH 3/4] ci(nemo-gym-pivot-datasets): clear low-severity quality
 advisories

- Shorten the description to under 150 characters.
- Add Prerequisites, Limitations, and Troubleshooting sections.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
---
 skills/nemo-gym-pivot-datasets/SKILL.md | 29 ++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md
index 4fc9e8957..2a19ce9eb 100644
--- a/skills/nemo-gym-pivot-datasets/SKILL.md
+++ b/skills/nemo-gym-pivot-datasets/SKILL.md
@@ -2,12 +2,8 @@
 name: nemo-gym-pivot-datasets
 license: Apache-2.0
 description: >-
-  Create, validate, or document Nemo Gym pivot datasets from rollout, trajectory,
-  chat-completion, Responses API, or tool-call artifacts: Gym Responses-style row
-  conversion, pivot selection, single-step tool-use configs, agent_ref alignment,
-  verifier knobs, expected-action row contracts, and train/eval usage. Not for
-  general reward profiling (use nemo-gym-reward-profiling) or debugging runs (use
-  nemo-gym-debugging).
+  Create and validate Nemo Gym single-step pivot datasets from trajectory or
+  rollout artifacts. Not for reward profiling or debugging runs.
 metadata:
   author: NVIDIA <nemo-gym@nvidia.com>
   tags:
@@ -26,6 +22,13 @@ Convert agent trajectories and rollout artifacts into single-step Nemo Gym pivot
 datasets for local RL or evaluation, and validate that a pivot JSONL and its Gym
 config can be used together.
 
+## Prerequisites
+
+- Source artifacts to convert: rollout, trajectory, chat-completion, Responses API, or tool-call data.
+- Python to run `scripts/validate_pivot_dataset.py` and the reference converters.
+- The target Gym config (agent and resource-server names) the pivot rows must align with.
+- Optionally a Gym checkout (`--gym-repo`) to validate against resource-server Pydantic models.
+
 ## Paper Reference
 
 This skill operationalizes [PivotRL](https://arxiv.org/html/2603.21383v1): create local
@@ -148,3 +151,17 @@ running it unchanged.
 Validating a finished dataset: run `scripts/validate_pivot_dataset.py` with the
 expected `--agent-ref`, and add `--gym-repo` when the Gym checkout is available
 to also validate against the resource-server Pydantic models.
+
+## Limitations
+
+- `expected_action` is singular; source turns with more than one tool call are filtered out, not split.
+- Reference converters under `scripts/reference/` are dataset-specific examples, not commands to run unchanged.
+- A valid JSONL file can still be unusable if the agent and resource-server names do not line up.
+
+## Troubleshooting
+
+| Symptom | Likely cause | Resolution |
+|---|---|---|
+| Validator rejects rows | `agent_ref.name` does not match the config's agent block | Align `agent_ref.name` with the agent used by the generated config |
+| Tool-argument matches fail | String-argument threshold too strict | Tune `word_count_similarity_threshold` for the single-step tool-use verifier |
+| Structured-decoding path taken unexpectedly | `tool_choice: "required"` routes some engines there | Use `tool_choice: "auto"` for these rows |

From 293203f014690a9a60369fcc3fc73a84df9fba51 Mon Sep 17 00:00:00 2001
From: Ananth Subramaniam <ansubramania@nvidia.com>
Date: Fri, 29 May 2026 21:54:21 -0700
Subject: [PATCH 4/4] ci(nemo-gym-pivot-datasets): add do-not-activate routing
 to siblings

Add explicit routing in the invocation check so the skill defers to
nemo-gym-reward-profiling, nemo-gym-debugging, and add-benchmark for
adjacent tasks, reducing misrouted activations in agent evaluation.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
---
 skills/nemo-gym-pivot-datasets/SKILL.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/skills/nemo-gym-pivot-datasets/SKILL.md b/skills/nemo-gym-pivot-datasets/SKILL.md
index 2a19ce9eb..2c93b6ba6 100644
--- a/skills/nemo-gym-pivot-datasets/SKILL.md
+++ b/skills/nemo-gym-pivot-datasets/SKILL.md
@@ -41,6 +41,12 @@ Use this skill when the task is to turn existing agent trajectories or rollout a
 Nemo Gym pivot dataset, or to validate whether a pivot JSONL/config pair can be used for
 single-step local RL or evaluation.
 
+Do not activate this skill for these adjacent tasks:
+
+- Running or profiling rewards on an existing dataset. Use `nemo-gym-reward-profiling`.
+- Debugging a failed or crashed run (Ray/vLLM stack traces, empty output). Use `nemo-gym-debugging`.
+- Adding or scaffolding a new benchmark or training environment. Use `add-benchmark`.
+
 Before writing a converter, inspect representative source rows and the target resource server.
 Do not assume the source field names are the contract. Convert by reconstructing the semantic
 pieces needed by Gym's Responses-style row format.