test(hitl): migrate 4 HITL smoke tasks to canonical pattern vocabulary by tmatup · Pull Request #556 · UiPath/skills

tmatup · 2026-05-04T22:53:45Z

Summary

Migrates 10 brittle json_check.contains substring assertions across four HITL smoke tasks to strict canonical-name checks aligned with references/hitl-patterns.md. Companion to #555 (smoke_03_escalation).

Task	Before	After
`smoke_01_explicit`	`pattern contains pprov \| rite \| alid` (3 fragments, OR @ 0.33)	`pattern regex ^(approval-gate\|write-back-validation)$`
`smoke_02_approval_gate`	`pattern contains pprov`	`pattern equals approval-gate`
`smoke_04_writeback`	`pattern contains rite \| alid \| nrich` (3 fragments, OR @ 0.33)	`pattern regex ^(write-back-validation\|data-enrichment\|agentic-output-review)$`
`smoke_05_compliance`	`pattern contains udit \| omplianc \| ign \| pprov` (4 fragments, OR @ 0.25)	`pattern regex ^(compliance-checkpoint\|approval-gate)$` (+ prompt strengthened with explicit "regulatory compliance" framing and GDPR Article 17 reference)

Each task's prompt now lists the same 6 canonical pattern machine names mirroring the section titles in references/hitl-patterns.md:

approval-gate, exception-escalation, data-enrichment, compliance-checkpoint, write-back-validation, agentic-output-review

The agent is required to emit exactly one of those names verbatim. Where the scenario maps to multiple defensible canonical answers per the skill doc, the criterion uses regex and accepts each documented option (mirroring the approach already used in #555 for exception-escalation vs agentic-output-review).

Why

The original assertions used 4–6-character substrings of the canonical pattern names (e.g. expected: "scalat" for "escalation", expected: "udit" for "audit", expected: "ign" for "sign-off"). When the agent's free-form English answer didn't happen to contain that exact substring fragment, the smoke task failed — even when the answer was semantically correct. Local triage of run 2026-05-04_04-05-27 traced one such failure to expected: "scalat".

Replacing the substring trap with a controlled vocabulary:

Decouples the test from agent paraphrase variance.
Forces the agent to consult references/hitl-patterns.md (the prompt names the file) and pick a documented pattern.
Surfaces test/skill drift loudly: if the skill ever renames a pattern, the test fails immediately rather than slowly drifting into uselessness.
Matches what #555 did for smoke_03_escalation.

Test plan

Local: coder-eval run --repeats 3 --max-parallel 6 against this branch — 12/12 SUCCESS at score 1.0 across the four migrated tasks.
No remaining json_check.contains violators with < 8-char literals in tests/tasks/ after this PR + test(hitl): migrate HITL smoke tasks to canonical pattern vocabulary #555 land.
CI smoke-skills run (will fire on PR open).

Merge ordering

This PR plus #555 are the prerequisites for the matching coder_eval validator (UiPath/coder_eval#212), which hard-fails any task YAML using json_check.contains with a < 8-char literal at task-load time. Land both #555 and this PR before merging UiPath/coder_eval#212.

🤖 Generated with Claude Code

Replace 10 brittle `json_check.contains` substring assertions across four HITL smoke tasks (smoke_01_explicit, smoke_02_approval_gate, smoke_04_writeback, smoke_05_compliance) with strict canonical-name checks aligned with `references/hitl-patterns.md`. Each task's prompt now lists the same 6 canonical pattern machine names and requires the agent to emit one verbatim. Each task's `pattern` criterion uses `equals` (single canonical answer) or `regex` (multiple defensible canonical answers per the skill doc): - smoke_01_explicit: `equals approval-gate` -> `regex (approval-gate |write-back-validation)` — the "approve before write" scenario documents both as applicable. - smoke_02_approval_gate: `equals approval-gate` — single canonical fit, manager-must-approve language. - smoke_04_writeback: `regex (write-back-validation|data-enrichment |agentic-output-review)` — AI-enriches-then-writes-to-SAP scenario matches three patterns in the doc. - smoke_05_compliance: `regex (compliance-checkpoint|approval-gate)` — GDPR scenario; "regulatory sign-off" steers toward compliance-checkpoint, but "sign off" alone is also an approval-gate signal phrase. Prompt also strengthened with explicit "regulatory compliance" framing and a GDPR Article 17 reference. Companion to #555 (smoke_03_escalation), which migrated the same anti-pattern in a separate, focused PR. Verified: all four tasks pass 3/3 reps each (12/12 SUCCESS at score 1.0) using local coder_eval against this branch. This PR plus #555 unblocks the matching coder_eval validator (UiPath/coder_eval#212), which hard-fails any task YAML using `json_check.contains` with a `< 8`-char literal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-04T22:53:59Z

Claude finished @tmatup's task in 2m 34s —— View job

PR Review: test(hitl): migrate 4 HITL smoke tasks to canonical pattern vocabulary

Read review criteria and project conventions
Read full diff and changed files
Validate canonical names against references/hitl-patterns.md
Check for remaining short-literal contains assertions
Check for secrets, cross-skill refs, repo hygiene
Post review

Summary

Replaces 10 brittle json_check.contains substring assertions (e.g. expected: "pprov", expected: "udit") across 4 HITL smoke tasks with strict regex/equals checks against canonical pattern machine names. Each prompt now explicitly lists the 6 accepted pattern names and instructs the agent to emit one verbatim. Well-motivated by real test failures traced to substring fragility.

Change-by-Change Review

1. `tests/tasks/uipath-human-in-the-loop/smoke_01_explicit.yaml`

Severity: OK

Prompt upgraded with canonical vocabulary block. Three contains assertions ("pprov", "rite", "alid" at pass_threshold: 0.33) replaced with a single regex assertion: ^(approval-gate|write-back-validation)$ at pass_threshold: 1.0. Both patterns are defensible for an "approve before writing" scenario per hitl-patterns.md. Clean.

2. `tests/tasks/uipath-human-in-the-loop/smoke_02_approval_gate.yaml`

Severity: OK

Prompt upgraded with canonical vocabulary block. Single contains "pprov" replaced with equals "approval-gate". This is the strictest task — only one valid answer, which is correct since the scenario is unambiguously an approval gate. Clean.

3. `tests/tasks/uipath-human-in-the-loop/smoke_04_writeback.yaml`

Severity: OK

Prompt upgraded with canonical vocabulary block. Three contains assertions ("rite", "alid", "nrich" at pass_threshold: 0.33) replaced with regex: ^(write-back-validation|data-enrichment|agentic-output-review)$. All three patterns are documented as applicable for "AI enriches data, writes it back to a system of record." Clean.

4. `tests/tasks/uipath-human-in-the-loop/smoke_05_compliance.yaml`

Severity: OK

Prompt strengthened with explicit "regulatory compliance" framing and GDPR Article 17 reference. Four contains assertions ("udit", "omplianc", "ign", "pprov" at pass_threshold: 0.25) replaced with regex: ^(compliance-checkpoint|approval-gate)$. Both patterns are defensible per hitl-patterns.md — compliance-checkpoint is the primary fit, approval-gate covers the "sign off" signal overlap. Clean.

5. Cross-cutting: Canonical vocabulary block duplication

Severity: Low

The 10-line "Use EXACTLY one of these machine names..." block is copy-pasted identically across all 4 files. This is expected and correct — each task YAML must be self-contained for the coder_eval runner. Not a real duplication issue.

6. Cross-cutting: Prompt claims "six canonical patterns" but `hitl-patterns.md` has 10 sections

Severity: Low

Each prompt says "The skill's references/hitl-patterns.md enumerates six canonical patterns" — but the file actually has 10 ###-level pattern sections (the 6 core patterns plus 4 industry-specific ones: IT/change-management, HR/contract, customer-communication, financial-transaction). The agent gets the explicit list of 6 names so this doesn't cause test failures (12/12 confirmed), but the claim is factually inaccurate when the agent reads the file. If the agent notices the discrepancy, it could hesitate.

Consider either:

Changing "six" to "six core" in the prompts, or
Adding a --- separator or heading in hitl-patterns.md between the core and industry-specific patterns to make the distinction clearer

Fix this →

What's Missing

smoke_03_escalation.yaml still uses old-style contains with short fragments ("scalat" = 6 chars, "lassif" = 6 chars). This is acknowledged — companion PR test(hitl): migrate HITL smoke tasks to canonical pattern vocabulary #555 covers it. No action needed here.
Nothing else identified.

Area Ratings

Area	Status	Notes
Frontmatter	N/A	No skill frontmatter changed — test task files only
E2E Tests	OK	Smoke tasks correctly migrated, regex patterns syntactically valid
Skill Body	N/A	No skill doc changes
References & Assets	OK	`hitl-patterns.md` verified — canonical names match section titles
Repo Hygiene	OK	No secrets, no cross-skill refs, changes scoped to 4 test files

Issues for Manual Review

Confirm that the coder_eval runner supports the regex operator for json_check assertions. The PR description and 12/12 local test results imply it does, but this can't be verified from this repo alone.
Confirm merge ordering: this PR + test(hitl): migrate HITL smoke tasks to canonical pattern vocabulary #555 must land before UiPath/coder_eval#212 (which hard-fails short-literal contains).

Conclusion

Clean PR. The migration from substring fragments to canonical pattern vocabulary is well-motivated, correctly implemented, and consistent across all 4 files. The only finding is a Low severity factual inaccuracy in the prompts ("six" vs. 10 patterns in the referenced file) that doesn't affect test outcomes but could be tightened. Approve.

tmatup · 2026-05-04T23:09:29Z

Superseded by #555 — its commit was cherry-picked into that PR for a unified history. Closing.

tmatup requested a review from dushyant-uipath as a code owner May 4, 2026 22:53

tmatup mentioned this pull request May 4, 2026

test(hitl): migrate HITL smoke tasks to canonical pattern vocabulary #555

Merged

3 tasks

tmatup closed this May 4, 2026

tmatup deleted the chore/migrate-canonical-criteria branch May 4, 2026 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(hitl): migrate 4 HITL smoke tasks to canonical pattern vocabulary#556

test(hitl): migrate 4 HITL smoke tasks to canonical pattern vocabulary#556
tmatup wants to merge 1 commit intomainfrom
chore/migrate-canonical-criteria

tmatup commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

tmatup commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmatup commented May 4, 2026

Summary

Why

Test plan

Merge ordering

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: test(hitl): migrate 4 HITL smoke tasks to canonical pattern vocabulary

Summary

Change-by-Change Review

1. tests/tasks/uipath-human-in-the-loop/smoke_01_explicit.yaml

2. tests/tasks/uipath-human-in-the-loop/smoke_02_approval_gate.yaml

3. tests/tasks/uipath-human-in-the-loop/smoke_04_writeback.yaml

4. tests/tasks/uipath-human-in-the-loop/smoke_05_compliance.yaml

5. Cross-cutting: Canonical vocabulary block duplication

6. Cross-cutting: Prompt claims "six canonical patterns" but hitl-patterns.md has 10 sections

What's Missing

Area Ratings

Issues for Manual Review

Conclusion

Uh oh!

tmatup commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 4, 2026 •

edited

Loading

1. `tests/tasks/uipath-human-in-the-loop/smoke_01_explicit.yaml`

2. `tests/tasks/uipath-human-in-the-loop/smoke_02_approval_gate.yaml`

3. `tests/tasks/uipath-human-in-the-loop/smoke_04_writeback.yaml`

4. `tests/tasks/uipath-human-in-the-loop/smoke_05_compliance.yaml`

6. Cross-cutting: Prompt claims "six canonical patterns" but `hitl-patterns.md` has 10 sections