Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions plugins.d/nvidia-skills.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ default_prompts:
# Reproduced under plugins/<name>/skills/<skill-basename>/ in the form
# selected by `skill_files:` (defined in plugins.d/_defaults.yml).
include_skills:
- skills/cuopt/cuopt-routing-api-python/
- skills/cuopt/cuopt-user-rules/
- skills/cuopt-routing-api-python/
- skills/cuopt-user-rules/

# Inherits from plugins.d/_defaults.yml:
# version, author, homepage, repository, license,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Evaluation Report

Evaluation of the `cuopt-routing-api-python` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

## Evaluation Summary

- Skill: `cuopt-routing-api-python`
- Evaluation date: 2026-05-29
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 1 evaluation tasks
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: FAIL

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 1 evaluation tasks:

- Positive tasks: 1 tasks where the skill was expected to activate.
- Negative tasks: 0 tasks where no skill was expected.
- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.

Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.

## Results

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 2 | 100% (+0%) | 100% (+0%) |
| Correctness | 2 | 100% (+0%) | 95% (+3%) |
| Discoverability | 2 | 100% (+0%) | 70% (-5%) |
| Effectiveness | 2 | 83% (+14%) | 83% (+12%) |
| Efficiency | 2 | 93% (-0%) | 56% (-5%) |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 8 total findings.

Top findings:

- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/cuopt-routing-api-python/SKILL.md`)
- MEDIUM SECURITY/Unknown (SQP-2): Binding the cuOpt server to 0.0.0.0 exposes it on all network interfaces, making it accessible to any host that can reac (`references/server_examples.md:7`)
- LOW QUALITY/quality_discoverability: No '## Purpose' section (`skills/cuopt-routing-api-python/SKILL.md`)
- LOW QUALITY/quality_reliability: No prerequisites/requirements documented (`skills/cuopt-routing-api-python/SKILL.md`)
- LOW QUALITY/quality_reliability: No limitations documented (`skills/cuopt-routing-api-python/SKILL.md`)

## Tier 2: Deduplication Summary

Tier 2 validation reported findings. NVSkills-Eval ran 2 checks and found 4 total findings.

Top findings:

- HIGH DUPLICATE/duplicate: Duplicate content found within references/server_examples.md:
"# Poll for solution" in references/server_examples.md (lines 45-51)
vs "# Poll for solution" in references/server_examples.md (lines 156-162) (`references/server_examples.md:45`)
- HIGH DUPLICATE/duplicate: Duplicate content found across SKILL.md and references/examples.md:
"# Capacities" in SKILL.md (lines 30-35)
vs "# Add capacity dimension (name, demand_per_order, capacity_per_vehicle)" in references/examples.md (lines 73-75)
vs "# Add capacity dimension" in references/examples.md (lines 156-158) (`SKILL.md:30`)
- HIGH DUPLICATE/duplicate: Duplicate content found across references/examples.md and references/server_examples.md:
"## Additional References (tested in CI)" in references/examples.md (lines 237-249)
vs "## Additional References (tested in CI)" in references/server_examples.md (lines 193-204) (`references/examples.md:237`)
- HIGH DUPLICATE/duplicate: Duplicate content found across assets/pdp_basic/README.md and assets/pdp_basic/model.py:
"# Pickup-Delivery (PDP)" in assets/pdp_basic/README.md (lines 1-7)
vs "(module docstring)" in assets/pdp_basic/model.py (lines 1-2) (`assets/pdp_basic/README.md:1`)

## Publication Recommendation

The skill should be reviewed before NVSkills-Eval publication. Skill owners should address the findings above and rerun NVSkills-Eval to refresh this benchmark.
113 changes: 113 additions & 0 deletions plugins/nvidia-skills/skills/cuopt-routing-api-python/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
name: cuopt-routing-api-python
version: "26.08.00"
description: Vehicle routing (VRP, TSP, PDP) with cuOpt — Python API only. Use when the user is building or solving routing in Python.
license: Apache-2.0
metadata:
author: NVIDIA cuOpt Team
tags:
- cuopt
- routing
- vrp
- tsp
- python
---



# cuOpt Routing — Python API

Confirm problem type (TSP, VRP, PDP) and data (locations, orders, fleet, constraints) before coding.

This skill is **Python only**. Routing has no C API in cuOpt.

## Minimal VRP Example

```python
import cudf
from cuopt import routing

cost_matrix = cudf.DataFrame([...], dtype="float32")
dm = routing.DataModel(n_locations=4, n_fleet=2, n_orders=3)
dm.add_cost_matrix(cost_matrix)
dm.set_order_locations(cudf.Series([1, 2, 3], dtype="int32"))
solution = routing.Solve(dm, routing.SolverSettings())

if solution.get_status() == 0:
solution.display_routes()
```

## Adding Constraints

```python
# Time windows
dm.add_transit_time_matrix(transit_time_matrix)
dm.set_order_time_windows(earliest_series, latest_series)

# Capacities
dm.add_capacity_dimension("weight", demand_series, capacity_series)
dm.set_order_service_times(service_times)
dm.set_vehicle_locations(start_locations, end_locations)
dm.set_vehicle_time_windows(earliest_start, latest_return)

# Pickup-delivery pairs
dm.set_pickup_delivery_pairs(pickup_indices, delivery_indices)

# Precedence
dm.add_order_precedence(node_id=2, preceding_nodes=np.array([0, 1]))
```

## Solution Checking

```python
status = solution.get_status() # 0=SUCCESS, 1=FAIL, 2=TIMEOUT, 3=EMPTY
if status == 0:
route_df = solution.get_route()
total_cost = solution.get_total_objective()
else:
print(solution.get_error_message())
print(solution.get_infeasible_orders().to_list())
```

## Data Types (use explicit dtypes)

```python
cost_matrix = cost_matrix.astype("float32")
order_locations = cudf.Series([...], dtype="int32")
demand = cudf.Series([...], dtype="int32")
```

## Solver Settings

```python
ss = routing.SolverSettings()
ss.set_time_limit(30)
ss.set_verbose_mode(True)
ss.set_error_logging_mode(True)
```

## Common Issues

| Problem | Fix |
|---------|-----|
| Empty solution | Widen time windows or check travel times |
| Infeasible orders | Increase fleet or capacity |
| Status != 0 with time windows | Add `add_transit_time_matrix()` |
| Wrong cost | Check cost_matrix is symmetric |
| `compute_waypoint_sequence` alters route_df | It replaces the `location` column with waypoint ids in place — pass `route_df.copy()` if you still need cost-matrix indices (e.g. when iterating per truck) |

## Debugging

**When status != 0:** `print(solution.get_error_message())` and `print(solution.get_infeasible_orders().to_list())` to see which orders are infeasible.

**Data types:** Use explicit dtypes (float32, int32) for matrices and series to avoid silent errors.

## Examples

- [examples.md](references/examples.md) — VRP, PDP, multi-depot
- [server_examples.md](references/server_examples.md) — REST client (curl, Python)
- **Reference models:** This skill's `assets/` — [vrp_basic](assets/vrp_basic/), [pdp_basic](assets/pdp_basic/). See [assets/README.md](assets/README.md).

## Escalate

For contribution or build-from-source, see the developer skill.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Assets — reference routing models

Routing reference implementations (Python). Use as reference when building new applications; do not edit in place.

| Model | Type | Description |
|-------|------|-------------|
| [vrp_basic](vrp_basic/) | VRP | Minimal VRP: 4 locations, 1 vehicle, 3 orders |
| [pdp_basic](pdp_basic/) | PDP | Pickup-delivery pairs, capacity dimension |

**Run:** From each subdir, `python model.py` (requires cuOpt and cudf). See [references/examples.md](../references/examples.md) for more patterns (time windows, multi-depot).
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Pickup-Delivery (PDP)

2 pickup-delivery pairs (4 orders), 2 vehicles. Pickup must occur before delivery; capacity dimension.

**Run:** `python model.py`

**See also:** [references/examples.md](../../references/examples.md) for more PDP and VRP patterns.
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""
PDP: 2 pickup-delivery pairs, 2 vehicles. Pickup before delivery; capacity dimension.
"""

import cudf
from cuopt import routing

cost_matrix = cudf.DataFrame(
[
[0, 10, 20, 30, 40],
[10, 0, 15, 25, 35],
[20, 15, 0, 10, 20],
[30, 25, 10, 0, 15],
[40, 35, 20, 15, 0],
],
dtype="float32",
)

transit_time_matrix = cost_matrix.copy(deep=True)
n_fleet = 2
n_orders = 4

order_locations = cudf.Series([1, 2, 3, 4], dtype="int32")
pickup_indices = cudf.Series([0, 2])
delivery_indices = cudf.Series([1, 3])
demand = cudf.Series([10, -10, 15, -15], dtype="int32")
vehicle_capacity = cudf.Series([50, 50], dtype="int32")

dm = routing.DataModel(
n_locations=cost_matrix.shape[0],
n_fleet=n_fleet,
n_orders=n_orders,
)
dm.add_cost_matrix(cost_matrix)
dm.add_transit_time_matrix(transit_time_matrix)
dm.set_order_locations(order_locations)
dm.add_capacity_dimension("load", demand, vehicle_capacity)
dm.set_pickup_delivery_pairs(pickup_indices, delivery_indices)
dm.set_vehicle_locations(
cudf.Series([0, 0], dtype="int32"),
cudf.Series([0, 0], dtype="int32"),
)

ss = routing.SolverSettings()
ss.set_time_limit(10)
solution = routing.Solve(dm, ss)

print(f"Status: {solution.get_status()}")
if solution.get_status() == 0:
solution.display_routes()
print(f"Total cost: {solution.get_total_objective()}")
else:
print(solution.get_error_message())
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Minimal VRP

4 locations (depot 0 + 3 customers), 1 vehicle, 3 orders. Cost matrix only; no time windows or capacity.

**Run:** `python model.py`

**See also:** [references/examples.md](../../references/examples.md) for VRP with time windows, capacity, and multi-depot.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""
Minimal VRP: 4 locations, 1 vehicle, 3 orders. Cost matrix only.
"""

import cudf
from cuopt import routing

cost_matrix = cudf.DataFrame(
[
[0, 10, 15, 20],
[10, 0, 12, 18],
[15, 12, 0, 10],
[20, 18, 10, 0],
],
dtype="float32",
)

dm = routing.DataModel(n_locations=4, n_fleet=1, n_orders=3)
dm.add_cost_matrix(cost_matrix)
dm.set_order_locations(cudf.Series([1, 2, 3], dtype="int32"))

solution = routing.Solve(dm, routing.SolverSettings())

if solution.get_status() == 0:
solution.display_routes()
print(f"Total cost: {solution.get_total_objective()}")
else:
print(f"Status: {solution.get_status()}", solution.get_error_message())
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[
{
"id": "rt-py-eval-001-vrptw-api-call-sequence",
"question": "For a VRP with time windows in cuopt (Python), list the API calls I need in order — name each method on routing.DataModel and routing.Solve, and one-line what each does. Don't write a full runnable script.",
"expected_skill": "cuopt-routing-api-python",
"expected_script": null,
"ground_truth": "The agent produces an ordered list of API calls without writing executable code. The list, in order: (1) Construct routing.DataModel(n_locations, n_fleet, n_orders). (2) add_cost_matrix(cost_matrix) — pass as a cudf.DataFrame with float32 dtype. (3) add_transit_time_matrix(transit_time_matrix) — required when time windows are used; omitting it causes Solve to return a non-zero status. (4) set_order_locations(series) — cudf.Series of int32 node indices. (5) set_order_time_windows(earliest, latest) — two int32 cudf.Series. (6) Construct routing.SolverSettings(); call set_time_limit() and optionally set_verbose_mode(). (7) Call routing.Solve(dm, ss) to get a solution object. (8) Check solution.get_status() == 0 before reading the route; on a non-zero status, inspect solution.get_error_message() and solution.get_infeasible_orders().to_list(). (9) On success, retrieve the route via solution.get_route() or display it via solution.display_routes(). The agent mentions explicit dtypes (float32 for the matrices, int32 for index series) as a class-level note. Does not embed full executable code, does not invent method names that aren't in the skill (e.g. no fictitious set_time_windows or add_vehicle), and flags that the user must supply real numeric data.",
"expected_behavior": [
"Lists the API methods in order without producing a full executable script",
"Names routing.DataModel with n_locations / n_fleet / n_orders",
"Names add_cost_matrix and add_transit_time_matrix, and flags that transit_time_matrix is required for time windows",
"Names set_order_locations and set_order_time_windows",
"Names routing.SolverSettings (and set_time_limit) and routing.Solve",
"Mentions checking solution.get_status() == 0, and get_error_message / get_infeasible_orders for the failure path",
"Mentions explicit dtypes (float32 for matrices, int32 for index series)",
"Does not invent method names that are not in the skill"
]
}
]
Loading