Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 26 additions & 12 deletions tests/tasks/uipath-human-in-the-loop/smoke_01_explicit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,32 @@ initial_prompt: |
I have a UiPath Flow. Add a Human-in-the-Loop node before the final data
write step so a manager can review and approve the data before it is posted.

Write a recommendation.json file with:
Recommend whether HITL is needed and identify which canonical HITL
pattern from the `uipath-human-in-the-loop` skill applies. The skill's
`references/hitl-patterns.md` enumerates six canonical patterns; pick
exactly one and emit its machine name verbatim.

Write a `recommendation.json` file with this exact shape:
{
"hitl_needed": <true or false>,
"pattern": "<which business pattern applies>",
"pattern": "<one of the canonical pattern names listed below>",
"proposed_schema": {
"inputs": ["<field names the human will see>"],
"outputs": ["<field names the human fills in>"],
"outcomes": ["<action button names>"]
}
}

Use EXACTLY one of these machine names for `pattern` (lowercase,
hyphenated, no extra adjectives or prefixes — names mirror the
section titles in `references/hitl-patterns.md`):
- approval-gate
- exception-escalation
- data-enrichment
- compliance-checkpoint
- write-back-validation
- agentic-output-review

success_criteria:
- type: file_exists
description: "Agent wrote a recommendation.json"
Expand All @@ -40,20 +55,19 @@ success_criteria:
pass_threshold: 1.0

- type: json_check
description: "Agent named an approval or write-back pattern (case-tolerant)"
description: >
Agent picked a canonical HITL pattern that fits an "approve before
writing" scenario. Either approval-gate (manager approves the
artifact) or write-back-validation (HITL gates a write to a
system of record) is documented in `references/hitl-patterns.md`
as applicable here; both are accepted.
path: "recommendation.json"
assertions:
- expression: "pattern"
operator: contains
expected: "pprov"
- expression: "pattern"
operator: contains
expected: "rite"
- expression: "pattern"
operator: contains
expected: "alid"
operator: regex
expected: "^(approval-gate|write-back-validation)$"
weight: 1.5
pass_threshold: 0.33
pass_threshold: 1.0

- type: file_contains
description: "Agent proposed a schema with outcomes"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,32 @@ initial_prompt: |
finance — but a manager must approve each expense report before the email
is sent.

Write a recommendation.json file with:
Recommend whether HITL is needed and identify which canonical HITL
pattern from the `uipath-human-in-the-loop` skill applies. The skill's
`references/hitl-patterns.md` enumerates six canonical patterns; pick
exactly one and emit its machine name verbatim.

Write a `recommendation.json` file with this exact shape:
{
"hitl_needed": <true or false>,
"pattern": "<which business pattern applies>",
"pattern": "<one of the canonical pattern names listed below>",
"proposed_schema": {
"inputs": ["<field names>"],
"outputs": ["<field names>"],
"outcomes": ["<button names>"]
}
}

Use EXACTLY one of these machine names for `pattern` (lowercase,
hyphenated, no extra adjectives or prefixes — names mirror the
section titles in `references/hitl-patterns.md`):
- approval-gate
- exception-escalation
- data-enrichment
- compliance-checkpoint
- write-back-validation
- agentic-output-review

success_criteria:
- type: file_exists
description: "Agent wrote a recommendation.json"
Expand All @@ -41,11 +56,11 @@ success_criteria:
pass_threshold: 1.0

- type: json_check
description: "Agent identified an approval gate pattern (case-tolerant)"
description: "Agent identified the canonical approval-gate pattern"
path: "recommendation.json"
assertions:
- expression: "pattern"
operator: contains
expected: "pprov"
operator: equals
expected: "approval-gate"
weight: 1.5
pass_threshold: 1.0
40 changes: 27 additions & 13 deletions tests/tasks/uipath-human-in-the-loop/smoke_04_writeback.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,29 @@ initial_prompt: |
enriches the missing vendor and cost-center fields using company data, and
writes the corrected records back to SAP.

Analyze whether this flow needs any human checkpoints. Write a
recommendation.json file with:
Analyze whether this flow needs any human checkpoints. If yes, identify
which canonical HITL pattern from the `uipath-human-in-the-loop` skill
applies. The skill's `references/hitl-patterns.md` enumerates six
canonical patterns; pick exactly one and emit its machine name verbatim.

Write a `recommendation.json` file with this exact shape:
{
"hitl_needed": <true or false>,
"pattern": "<pattern name if applicable>",
"pattern": "<one of the canonical pattern names listed below>",
"reason": "<why HITL is or is not needed>",
"proposed_insertion_point": "<where in the flow>"
}

Use EXACTLY one of these machine names for `pattern` (lowercase,
hyphenated, no extra adjectives or prefixes — names mirror the
section titles in `references/hitl-patterns.md`):
- approval-gate
- exception-escalation
- data-enrichment
- compliance-checkpoint
- write-back-validation
- agentic-output-review

success_criteria:
- type: file_exists
description: "Agent wrote a recommendation.json"
Expand All @@ -39,17 +53,17 @@ success_criteria:
pass_threshold: 1.0

- type: json_check
description: "Agent named a write-back / validation / enrichment pattern (any one, case-tolerant)"
description: >
Agent picked a canonical HITL pattern that fits an "AI enriches data,
writes it back to a system of record" scenario. Write-back-validation
(HITL before write to system of record), data-enrichment (HITL fills
/ validates incomplete data), and agentic-output-review (human
verifies AI output before downstream use) are all documented in
`references/hitl-patterns.md` as applicable here.
path: "recommendation.json"
assertions:
- expression: "pattern"
operator: contains
expected: "rite"
- expression: "pattern"
operator: contains
expected: "alid"
- expression: "pattern"
operator: contains
expected: "nrich"
operator: regex
expected: "^(write-back-validation|data-enrichment|agentic-output-review)$"
weight: 1.5
pass_threshold: 0.33
pass_threshold: 1.0
49 changes: 31 additions & 18 deletions tests/tasks/uipath-human-in-the-loop/smoke_05_compliance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,38 @@ sandbox:
python: {}

initial_prompt: |
Automate GDPR data deletion requests. Each request requires documented
sign-off from our data privacy officer before the deletion actually runs —
we need an audit trail for every decision.
Automate GDPR data deletion requests. These are subject to regulatory
compliance requirements: each request needs documented regulatory
sign-off from our data privacy officer before the deletion runs, plus
an audit trail for every decision (required for compliance with GDPR
Article 17 — Right to Erasure).

Write a recommendation.json file with:
Recommend whether HITL is needed and identify which canonical HITL
pattern from the `uipath-human-in-the-loop` skill applies. The skill's
`references/hitl-patterns.md` enumerates six canonical patterns; pick
exactly one and emit its machine name verbatim.

Write a `recommendation.json` file with this exact shape:
{
"hitl_needed": <true or false>,
"pattern": "<which business pattern applies>",
"pattern": "<one of the canonical pattern names listed below>",
"proposed_schema": {
"inputs": ["<what the privacy officer will see>"],
"outputs": ["<what they fill in>"],
"outcomes": ["<their decision options>"]
}
}

Use EXACTLY one of these machine names for `pattern` (lowercase,
hyphenated, no extra adjectives or prefixes — names mirror the
section titles in `references/hitl-patterns.md`):
- approval-gate
- exception-escalation
- data-enrichment
- compliance-checkpoint
- write-back-validation
- agentic-output-review

success_criteria:
- type: file_exists
description: "Agent wrote a recommendation.json"
Expand All @@ -40,20 +57,16 @@ success_criteria:
pass_threshold: 1.0

- type: json_check
description: "Agent identified a compliance / audit / sign-off / approval pattern (any one, case-tolerant)"
description: >
Agent picked a canonical HITL pattern that fits a regulatory-sign-off
scenario. Compliance-checkpoint is the textbook fit (the doc lists
"regulatory sign-off" and "GDPR consent flows" as its examples), but
approval-gate is also defensible since "sign off" is a shared signal
phrase. Both are accepted.
path: "recommendation.json"
assertions:
- expression: "pattern"
operator: contains
expected: "udit"
- expression: "pattern"
operator: contains
expected: "omplianc"
- expression: "pattern"
operator: contains
expected: "ign"
- expression: "pattern"
operator: contains
expected: "pprov"
operator: regex
expected: "^(compliance-checkpoint|approval-gate)$"
weight: 1.5
pass_threshold: 0.25
pass_threshold: 1.0
Loading