NVIDIA · bryan-anthropic · May 29, 2026
@@ -25,8 +25,8 @@ default_prompts:
 # Reproduced under plugins/<name>/skills/<skill-basename>/ in the form
 # selected by `skill_files:` (defined in plugins.d/_defaults.yml).
 include_skills:
-  - skills/cuopt/cuopt-routing-api-python/
-  - skills/cuopt/cuopt-user-rules/
+  - skills/cuopt-routing-api-python/
+  - skills/cuopt-user-rules/
 
 # Inherits from plugins.d/_defaults.yml:
 #   version, author, homepage, repository, license,

@@ -0,0 +1,99 @@
+# Evaluation Report
+
+Evaluation of the `cuopt-routing-api-python` skill before publication through NVSkills-Eval.
+
+This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.
+
+## Evaluation Summary
+
+- Skill: `cuopt-routing-api-python`
+- Evaluation date: 2026-05-29
+- NVSkills-Eval profile: `external`
+- Environment: `local`
+- Dataset: 1 evaluation tasks
+- Attempts per task: 2
+- Pass threshold: 50%
+- Overall verdict: FAIL
+
+## Agents Used
+
+- `claude-code`
+- `codex`
+
+## Metrics Used
+
+Reported benchmark dimensions:
+
+- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
+- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
+- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
+- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
+- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.
+
+Underlying evaluation signals used in this run:
+
+- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
+- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
+- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
+- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
+- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
+- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
+- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.
+
+## Test Tasks
+
+The benchmark dataset contained 1 evaluation tasks:
+
+- Positive tasks: 1 tasks where the skill was expected to activate.
+- Negative tasks: 0 tasks where no skill was expected.
+- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.
+
+Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.
+
+## Results
+
+| Dimension | Num | `claude-code` | `codex` |
+|---|---:|---:|---:|
+| Security | 2 | 100% (+0%) | 100% (+0%) |
+| Correctness | 2 | 100% (+0%) | 95% (+3%) |
+| Discoverability | 2 | 100% (+0%) | 70% (-5%) |
+| Effectiveness | 2 | 83% (+14%) | 83% (+12%) |
+| Efficiency | 2 | 93% (-0%) | 56% (-5%) |
+
+Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.
+
+## Tier 1: Static Validation Summary
+
+Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 8 total findings.
+
+Top findings:
+
+- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/cuopt-routing-api-python/SKILL.md`)
+- MEDIUM SECURITY/Unknown (SQP-2): Binding the cuOpt server to 0.0.0.0 exposes it on all network interfaces, making it accessible to any host that can reac (`references/server_examples.md:7`)
+- LOW QUALITY/quality_discoverability: No '## Purpose' section (`skills/cuopt-routing-api-python/SKILL.md`)
+- LOW QUALITY/quality_reliability: No prerequisites/requirements documented (`skills/cuopt-routing-api-python/SKILL.md`)
+- LOW QUALITY/quality_reliability: No limitations documented (`skills/cuopt-routing-api-python/SKILL.md`)
+
+## Tier 2: Deduplication Summary
+
+Tier 2 validation reported findings. NVSkills-Eval ran 2 checks and found 4 total findings.
+
+Top findings:
+
+- HIGH DUPLICATE/duplicate: Duplicate content found within references/server_examples.md:
+  "# Poll for solution" in references/server_examples.md (lines 45-51)
+  vs "# Poll for solution" in references/server_examples.md (lines 156-162) (`references/server_examples.md:45`)
+- HIGH DUPLICATE/duplicate: Duplicate content found across SKILL.md and references/examples.md:
+  "# Capacities" in SKILL.md (lines 30-35)
+  vs "# Add capacity dimension (name, demand_per_order, capacity_per_vehicle)" in references/examples.md (lines 73-75)
+  vs "# Add capacity dimension" in references/examples.md (lines 156-158) (`SKILL.md:30`)
+- HIGH DUPLICATE/duplicate: Duplicate content found across references/examples.md and references/server_examples.md:
+  "## Additional References (tested in CI)" in references/examples.md (lines 237-249)
+  vs "## Additional References (tested in CI)" in references/server_examples.md (lines 193-204) (`references/examples.md:237`)
+- HIGH DUPLICATE/duplicate: Duplicate content found across assets/pdp_basic/README.md and assets/pdp_basic/model.py:
+  "# Pickup-Delivery (PDP)" in assets/pdp_basic/README.md (lines 1-7)
+  vs "(module docstring)" in assets/pdp_basic/model.py (lines 1-2) (`assets/pdp_basic/README.md:1`)
+
+## Publication Recommendation
+
+The skill should be reviewed before NVSkills-Eval publication. Skill owners should address the findings above and rerun NVSkills-Eval to refresh this benchmark.
@@ -0,0 +1,113 @@
+---
+name: cuopt-routing-api-python
+version: "26.08.00"
+description: Vehicle routing (VRP, TSP, PDP) with cuOpt — Python API only. Use when the user is building or solving routing in Python.
+license: Apache-2.0
+metadata:
+  author: NVIDIA cuOpt Team
+  tags:
+    - cuopt
+    - routing
+    - vrp
+    - tsp
+    - python
+---
+
+
+
+# cuOpt Routing — Python API
+
+Confirm problem type (TSP, VRP, PDP) and data (locations, orders, fleet, constraints) before coding.
+
+This skill is **Python only**. Routing has no C API in cuOpt.
+
+## Minimal VRP Example
+
+```python
+import cudf
+from cuopt import routing
+
+cost_matrix = cudf.DataFrame([...], dtype="float32")
+dm = routing.DataModel(n_locations=4, n_fleet=2, n_orders=3)
+dm.add_cost_matrix(cost_matrix)
+dm.set_order_locations(cudf.Series([1, 2, 3], dtype="int32"))
+solution = routing.Solve(dm, routing.SolverSettings())
+
+if solution.get_status() == 0:
+    solution.display_routes()
+```
+
+## Adding Constraints
+
+```python
+# Time windows
+dm.add_transit_time_matrix(transit_time_matrix)
+dm.set_order_time_windows(earliest_series, latest_series)
+
+# Capacities
+dm.add_capacity_dimension("weight", demand_series, capacity_series)
+dm.set_order_service_times(service_times)
+dm.set_vehicle_locations(start_locations, end_locations)
+dm.set_vehicle_time_windows(earliest_start, latest_return)
+
+# Pickup-delivery pairs
+dm.set_pickup_delivery_pairs(pickup_indices, delivery_indices)
+
+# Precedence
+dm.add_order_precedence(node_id=2, preceding_nodes=np.array([0, 1]))
+```
+
+## Solution Checking
+
+```python
+status = solution.get_status()  # 0=SUCCESS, 1=FAIL, 2=TIMEOUT, 3=EMPTY
+if status == 0:
+    route_df = solution.get_route()
+    total_cost = solution.get_total_objective()
+else:
+    print(solution.get_error_message())
+    print(solution.get_infeasible_orders().to_list())
+```
+
+## Data Types (use explicit dtypes)
+
+```python
+cost_matrix = cost_matrix.astype("float32")
+order_locations = cudf.Series([...], dtype="int32")
+demand = cudf.Series([...], dtype="int32")
+```
+
+## Solver Settings
+
+```python
+ss = routing.SolverSettings()
+ss.set_time_limit(30)
+ss.set_verbose_mode(True)
+ss.set_error_logging_mode(True)
+```
+
+## Common Issues
+
+| Problem | Fix |
+|---------|-----|
+| Empty solution | Widen time windows or check travel times |
+| Infeasible orders | Increase fleet or capacity |
+| Status != 0 with time windows | Add `add_transit_time_matrix()` |
+| Wrong cost | Check cost_matrix is symmetric |
+| `compute_waypoint_sequence` alters route_df | It replaces the `location` column with waypoint ids in place — pass `route_df.copy()` if you still need cost-matrix indices (e.g. when iterating per truck) |
+
+## Debugging
+
+**When status != 0:** `print(solution.get_error_message())` and `print(solution.get_infeasible_orders().to_list())` to see which orders are infeasible.
+
+**Data types:** Use explicit dtypes (float32, int32) for matrices and series to avoid silent errors.
+
+## Examples
+
+- [examples.md](references/examples.md) — VRP, PDP, multi-depot
+- [server_examples.md](references/server_examples.md) — REST client (curl, Python)
+- **Reference models:** This skill's `assets/` — [vrp_basic](assets/vrp_basic/), [pdp_basic](assets/pdp_basic/). See [assets/README.md](assets/README.md).
+
+## Escalate
+
+For contribution or build-from-source, see the developer skill.
@@ -0,0 +1,10 @@
+# Assets — reference routing models
+
+Routing reference implementations (Python). Use as reference when building new applications; do not edit in place.
+
+| Model | Type | Description |
+|-------|------|-------------|
+| [vrp_basic](vrp_basic/) | VRP | Minimal VRP: 4 locations, 1 vehicle, 3 orders |
+| [pdp_basic](pdp_basic/) | PDP | Pickup-delivery pairs, capacity dimension |
+
+**Run:** From each subdir, `python model.py` (requires cuOpt and cudf). See [references/examples.md](../references/examples.md) for more patterns (time windows, multi-depot).
@@ -0,0 +1,7 @@
+# Pickup-Delivery (PDP)
+
+2 pickup-delivery pairs (4 orders), 2 vehicles. Pickup must occur before delivery; capacity dimension.
+
+**Run:** `python model.py`
+
+**See also:** [references/examples.md](../../references/examples.md) for more PDP and VRP patterns.
@@ -0,0 +1,56 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+PDP: 2 pickup-delivery pairs, 2 vehicles. Pickup before delivery; capacity dimension.
+"""
+
+import cudf
+from cuopt import routing
+
+cost_matrix = cudf.DataFrame(
+    [
+        [0, 10, 20, 30, 40],
+        [10, 0, 15, 25, 35],
+        [20, 15, 0, 10, 20],
+        [30, 25, 10, 0, 15],
+        [40, 35, 20, 15, 0],
+    ],
+    dtype="float32",
+)
+
+transit_time_matrix = cost_matrix.copy(deep=True)
+n_fleet = 2
+n_orders = 4
+
+order_locations = cudf.Series([1, 2, 3, 4], dtype="int32")
+pickup_indices = cudf.Series([0, 2])
+delivery_indices = cudf.Series([1, 3])
+demand = cudf.Series([10, -10, 15, -15], dtype="int32")
+vehicle_capacity = cudf.Series([50, 50], dtype="int32")
+
+dm = routing.DataModel(
+    n_locations=cost_matrix.shape[0],
+    n_fleet=n_fleet,
+    n_orders=n_orders,
+)
+dm.add_cost_matrix(cost_matrix)
+dm.add_transit_time_matrix(transit_time_matrix)
+dm.set_order_locations(order_locations)
+dm.add_capacity_dimension("load", demand, vehicle_capacity)
+dm.set_pickup_delivery_pairs(pickup_indices, delivery_indices)
+dm.set_vehicle_locations(
+    cudf.Series([0, 0], dtype="int32"),
+    cudf.Series([0, 0], dtype="int32"),
+)
+
+ss = routing.SolverSettings()
+ss.set_time_limit(10)
+solution = routing.Solve(dm, ss)
+
+print(f"Status: {solution.get_status()}")
+if solution.get_status() == 0:
+    solution.display_routes()
+    print(f"Total cost: {solution.get_total_objective()}")
+else:
+    print(solution.get_error_message())
@@ -0,0 +1,7 @@
+# Minimal VRP
+
+4 locations (depot 0 + 3 customers), 1 vehicle, 3 orders. Cost matrix only; no time windows or capacity.
+
+**Run:** `python model.py`
+
+**See also:** [references/examples.md](../../references/examples.md) for VRP with time windows, capacity, and multi-depot.
@@ -0,0 +1,31 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+Minimal VRP: 4 locations, 1 vehicle, 3 orders. Cost matrix only.
+"""
+
+import cudf
+from cuopt import routing
+
+cost_matrix = cudf.DataFrame(
+    [
+        [0, 10, 15, 20],
+        [10, 0, 12, 18],
+        [15, 12, 0, 10],
+        [20, 18, 10, 0],
+    ],
+    dtype="float32",
+)
+
+dm = routing.DataModel(n_locations=4, n_fleet=1, n_orders=3)
+dm.add_cost_matrix(cost_matrix)
+dm.set_order_locations(cudf.Series([1, 2, 3], dtype="int32"))
+
+solution = routing.Solve(dm, routing.SolverSettings())
+
+if solution.get_status() == 0:
+    solution.display_routes()
+    print(f"Total cost: {solution.get_total_objective()}")
+else:
+    print(f"Status: {solution.get_status()}", solution.get_error_message())
@@ -0,0 +1,19 @@
+[
+  {
+    "id": "rt-py-eval-001-vrptw-api-call-sequence",
+    "question": "For a VRP with time windows in cuopt (Python), list the API calls I need in order — name each method on routing.DataModel and routing.Solve, and one-line what each does. Don't write a full runnable script.",
+    "expected_skill": "cuopt-routing-api-python",
+    "expected_script": null,
+    "ground_truth": "The agent produces an ordered list of API calls without writing executable code. The list, in order: (1) Construct routing.DataModel(n_locations, n_fleet, n_orders). (2) add_cost_matrix(cost_matrix) — pass as a cudf.DataFrame with float32 dtype. (3) add_transit_time_matrix(transit_time_matrix) — required when time windows are used; omitting it causes Solve to return a non-zero status. (4) set_order_locations(series) — cudf.Series of int32 node indices. (5) set_order_time_windows(earliest, latest) — two int32 cudf.Series. (6) Construct routing.SolverSettings(); call set_time_limit() and optionally set_verbose_mode(). (7) Call routing.Solve(dm, ss) to get a solution object. (8) Check solution.get_status() == 0 before reading the route; on a non-zero status, inspect solution.get_error_message() and solution.get_infeasible_orders().to_list(). (9) On success, retrieve the route via solution.get_route() or display it via solution.display_routes(). The agent mentions explicit dtypes (float32 for the matrices, int32 for index series) as a class-level note. Does not embed full executable code, does not invent method names that aren't in the skill (e.g. no fictitious set_time_windows or add_vehicle), and flags that the user must supply real numeric data.",
+    "expected_behavior": [
+      "Lists the API methods in order without producing a full executable script",
+      "Names routing.DataModel with n_locations / n_fleet / n_orders",
+      "Names add_cost_matrix and add_transit_time_matrix, and flags that transit_time_matrix is required for time windows",
+      "Names set_order_locations and set_order_time_windows",
+      "Names routing.SolverSettings (and set_time_limit) and routing.Solve",
+      "Mentions checking solution.get_status() == 0, and get_error_message / get_infeasible_orders for the failure path",
+      "Mentions explicit dtypes (float32 for matrices, int32 for index series)",
+      "Does not invent method names that are not in the skill"
+    ]
+  }
+]