fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for#561
Conversation
… asks for
Local 3-rep baseline of `skill-flow-ipe-ceql-where` on current main:
0/3 PASS at score 0.273. Three reps split across two checker-driven
failure modes that aren't really agent failures:
1. Connector-key invention (reps 00 + 02) — agent wrote
`uipath-microsoft-entra` / `uipath-microsoft-entra-id`, derived from
the current product name "Microsoft Entra ID". The registered
connector key is the legacy `uipath-microsoft-azureactivedirectory`,
and nothing in the prompt or skill SKILL.md surfaces that.
2. Empty inputs.detail (rep 01) — agent built a structurally-valid
flow with `inputs: {}` and ran `uip maestro flow validate`, which
returned "Status: Valid". The CLI itself does not require
inputs.detail for a connector node; only this test's checker did.
Since the prompt explicitly forbids `uip flow node configure` (no
live tenant), populating inputs.detail requires the agent to
reverse-engineer what the CLI would emit — that's not what the
prompt asks for.
Fix follows the same playbook as
#555 (smoke_03_escalation):
give the agent the canonical vocabulary and grade the artifacts the
prompt actually asks for.
Prompt updates:
- Names the registered connector key verbatim
(`uipath-microsoft-azureactivedirectory`) and warns that display
names like "Microsoft Entra" / "Microsoft Entra ID" are NOT registry
keys.
- Points the agent at the canonical filter-tree shape: section "Filter
Trees (CEQL)" in
`skills/uipath-platform/references/integration-service/activities.md`
(added by #492) and Step 6a in the Maestro Flow connector plugin's
impl.md (which cross-references it).
- Enumerates the required `where_detail.json` shape inline:
groupOperator + filters[].id + PascalCase operator + WorkflowValue-
wrapped value.
Checker updates:
- Validates `where_detail.json`'s `filter` against the canonical shape
(groupOperator numeric, filters[] non-empty, leaves carry
PascalCase operator + id referencing displayName + value 'active').
This is the artifact the prompt asks the agent to plan.
- Validates the .flow file at the structural level only: registered
connector key present + List Groups operation referenced + Decision
+ Terminate nodes. No more inputs.detail / queryParameters.where /
configuration / =jsonString:... reverse-engineering — those are the
CLI's job, not the agent's.
- Drops the `inputs.detail` / =js: expression form path (the CLI's
own validator accepts `inputs: {}`, so we should too).
Verified locally on this branch with 3 fresh reps: 3/3 SUCCESS at
score 1.0 (durations 298s / 602s / 346s, vs baseline 835s / 402s /
1086s — the canonical-key directive shaves ~45% off the agent's
trial-and-error loops on connector picking and validator schema).
All three reps converged on the canonical filter-tree shape verbatim.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @tmatup's task in 1m 59s —— View job PR Review: fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for
SummaryReplaces a brittle checker that graded Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | N/A | No skill SKILL.md changed — only test task files |
| E2E Tests | OK | Existing smoke-tagged task updated; checker aligns with prompt |
| Skill Body | N/A | No skill body changed |
| References & Assets | OK | Both referenced doc paths verified to exist |
| Repo Hygiene | OK | CODEOWNERS already covers uipath-maestro-flow; no secrets; changes scoped to the task being fixed |
Issues for Manual Review
- Domain accuracy of the connector key: The PR asserts
uipath-microsoft-azureactivedirectoryis the correct registered key for the Microsoft Entra ID connector. This is a runtime fact that can only be verified against a live tenant's registry. The 3/3 test pass rate strongly suggests it's correct. - Filter-tree canonical shape: The checker validates the shape documented in
activities.md. If that doc's shape changes upstream, both the prompt and checker would need updating. This is a known coupling, documented in the PR body.
Conclusion
Clean, well-motivated change that fixes a real test reliability issue (0/3 → 3/3). The approach — "grade what the prompt asks for, give the agent the vocabulary it needs" — is sound and consistent with the companion PR (#555). Two Low items noted (unclosed file handles, PascalCase not enforced) — neither blocks merge. Approve.
Summary
uipath-microsoft-azureactivedirectory) and a pointer to the Filter Trees (CEQL) doc in the prompt.inputs.detailbe hand-populated —uip maestro flow validateaccepts a connector node withinputs: {}, so we shouldn't be stricter than the CLI.Background
Local 3-rep baseline on current
origin/mainforskill-flow-ipe-ceql-where: 0/3 PASS, all three FAILURE at score 0.273. Three reps split across two checker-driven failure modes that aren't really agent failures:uipath-microsoft-entra/uipath-microsoft-entra-id, derived from the current product name "Microsoft Entra ID". The registered key is the legacyuipath-microsoft-azureactivedirectory, and nothing in the prompt orSKILL.mdsurfaced that.inputs.detailinputs: {}and ranuip maestro flow validate→ "Status: Valid". The CLI itself does not requireinputs.detailpopulated; only this test's checker did. The prompt forbidsuip flow node configure(no live tenant) — the command that populatesinputs.detail. Asking the agent to reverse-engineer that population is asking for something the prompt forbids and the CLI doesn't enforce.The pattern is identical to #555 (smoke_03_escalation): the checker grades artifacts/shapes that the prompt doesn't ask for and the agent has no reliable signal to know about.
Approach
Mirror the HITL fix's playbook: give the agent the canonical vocabulary, grade what the prompt asks for.
Prompt
uipath-microsoft-azureactivedirectory) and explicitly warn that "Microsoft Entra" / "Microsoft Entra ID" are display names, not registry keys.skills/uipath-platform/references/integration-service/activities.md(added by docs: add CEQL filter trees section for IS list activities #492) and the cross-reference in Step 6a of the Maestro Flow connector plugin'simpl.md.where_detail.jsonshape inline:groupOperator+filters[].id+ PascalCaseoperator+WorkflowValue-wrappedvalue.Checker
where_detail.json'sfilteragainst the canonical shape (the artifact the prompt asks the agent to plan)..flowfile at structural level only: registered connector key present + List Groups operation referenced + Decision + Terminate nodes. No moreinputs.detail/queryParameters.where/configuration: '=jsonString:...'reverse-engineering — those are whatuip flow node configureproduces, and that command is forbidden by the prompt.=js:expression-form fallback path (only meaningful for the persisted shape we're no longer grading).Test plan
Baseline (current
origin/main):uipath-microsoft-entra(invented key)inputs.detailuipath-microsoft-entra-id(invented key)After fix (this branch):
Average duration: 415s (vs 774s baseline — the canonical-key directive shaves ~45% off the agent's trial-and-error on connector picking and validator schema reverse-engineering). All three reps converged on the canonical filter-tree shape verbatim:
{ "groupOperator": 0, "index": 0, "filters": [ { "id": "displayName", "operator": "Equals", "value": { "value": "active", "rawString": "\"active\"", "isLiteral": true } } ], "groups": [] }Companion PRs in flight
skill-hitl-smoke-escalation+ four sibling HITL smoke tasks.json_check.containsshort-literal assertions at task-load time. Independent of this PR (CEQL fix doesn't touchjson_check.contains).Follow-ups (out of scope)
uipath-microsoft-azureactivedirectory, OneDrive/SharePoint/M365 →uipath-microsoft-onedrive, etc.) would close that gap for future tasks. Documented but not bundled here.🤖 Generated with Claude Code