fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for by tmatup · Pull Request #561 · UiPath/skills

tmatup · 2026-05-04T23:50:49Z

Summary

Replaces brittle .flow-file-internals checks with structural+canonical-shape grading.
Agent now gets the registered connector key (uipath-microsoft-azureactivedirectory) and a pointer to the Filter Trees (CEQL) doc in the prompt.
Drops the requirement that inputs.detail be hand-populated — uip maestro flow validate accepts a connector node with inputs: {}, so we shouldn't be stricter than the CLI.
Local verification: 3/3 SUCCESS at score 1.0 vs 0/3 baseline.

Background

Local 3-rep baseline on current origin/main for skill-flow-ipe-ceql-where: 0/3 PASS, all three FAILURE at score 0.273. Three reps split across two checker-driven failure modes that aren't really agent failures:

Failure mode	Reps	Why
Connector-key invention	00 + 02	Agent wrote `uipath-microsoft-entra` / `uipath-microsoft-entra-id`, derived from the current product name "Microsoft Entra ID". The registered key is the legacy `uipath-microsoft-azureactivedirectory`, and nothing in the prompt or `SKILL.md` surfaced that.
Empty `inputs.detail`	01	Agent built a structurally-valid flow with `inputs: {}` and ran `uip maestro flow validate` → "Status: Valid". The CLI itself does not require `inputs.detail` populated; only this test's checker did. The prompt forbids `uip flow node configure` (no live tenant) — the command that populates `inputs.detail`. Asking the agent to reverse-engineer that population is asking for something the prompt forbids and the CLI doesn't enforce.

The pattern is identical to #555 (smoke_03_escalation): the checker grades artifacts/shapes that the prompt doesn't ask for and the agent has no reliable signal to know about.

Approach

Mirror the HITL fix's playbook: give the agent the canonical vocabulary, grade what the prompt asks for.

Prompt

Name the registered connector key verbatim (uipath-microsoft-azureactivedirectory) and explicitly warn that "Microsoft Entra" / "Microsoft Entra ID" are display names, not registry keys.
Point the agent at the canonical filter-tree shape: the "Filter Trees (CEQL)" section in skills/uipath-platform/references/integration-service/activities.md (added by docs: add CEQL filter trees section for IS list activities #492) and the cross-reference in Step 6a of the Maestro Flow connector plugin's impl.md.
Enumerate the required where_detail.json shape inline: groupOperator + filters[].id + PascalCase operator + WorkflowValue-wrapped value.

Checker

Validates where_detail.json's filter against the canonical shape (the artifact the prompt asks the agent to plan).
Validates the .flow file at structural level only: registered connector key present + List Groups operation referenced + Decision + Terminate nodes. No more inputs.detail / queryParameters.where / configuration: '=jsonString:...' reverse-engineering — those are what uip flow node configure produces, and that command is forbidden by the prompt.
Drops the =js: expression-form fallback path (only meaningful for the persisted shape we're no longer grading).

Test plan

Baseline (current origin/main):

rep 00: FAILURE 0.273 — uipath-microsoft-entra (invented key)
rep 01: FAILURE 0.273 — empty inputs.detail
rep 02: FAILURE 0.273 — uipath-microsoft-entra-id (invented key)

After fix (this branch):

rep 00: SUCCESS 1.0 in 298s
rep 01: SUCCESS 1.0 in 602s
rep 02: SUCCESS 1.0 in 346s

Average duration: 415s (vs 774s baseline — the canonical-key directive shaves ~45% off the agent's trial-and-error on connector picking and validator schema reverse-engineering). All three reps converged on the canonical filter-tree shape verbatim:

{
  "groupOperator": 0, "index": 0,
  "filters": [
    { "id": "displayName", "operator": "Equals",
      "value": { "value": "active", "rawString": "\"active\"", "isLiteral": true } }
  ],
  "groups": []
}

Companion PRs in flight

test(hitl): migrate HITL smoke tasks to canonical pattern vocabulary #555 — same-spirit fix for skill-hitl-smoke-escalation + four sibling HITL smoke tasks.
UiPath/coder_eval#212 — Pydantic validator that rejects brittle json_check.contains short-literal assertions at task-load time. Independent of this PR (CEQL fix doesn't touch json_check.contains).

Follow-ups (out of scope)

Tier-1 connector cheat sheet in the maestro-flow skill. All three baseline reps independently invented connector keys from product names. Adding a "display name → registry key" table for Tier-1 connectors (Microsoft Entra ID → uipath-microsoft-azureactivedirectory, OneDrive/SharePoint/M365 → uipath-microsoft-onedrive, etc.) would close that gap for future tasks. Documented but not bundled here.

🤖 Generated with Claude Code

… asks for Local 3-rep baseline of `skill-flow-ipe-ceql-where` on current main: 0/3 PASS at score 0.273. Three reps split across two checker-driven failure modes that aren't really agent failures: 1. Connector-key invention (reps 00 + 02) — agent wrote `uipath-microsoft-entra` / `uipath-microsoft-entra-id`, derived from the current product name "Microsoft Entra ID". The registered connector key is the legacy `uipath-microsoft-azureactivedirectory`, and nothing in the prompt or skill SKILL.md surfaces that. 2. Empty inputs.detail (rep 01) — agent built a structurally-valid flow with `inputs: {}` and ran `uip maestro flow validate`, which returned "Status: Valid". The CLI itself does not require inputs.detail for a connector node; only this test's checker did. Since the prompt explicitly forbids `uip flow node configure` (no live tenant), populating inputs.detail requires the agent to reverse-engineer what the CLI would emit — that's not what the prompt asks for. Fix follows the same playbook as #555 (smoke_03_escalation): give the agent the canonical vocabulary and grade the artifacts the prompt actually asks for. Prompt updates: - Names the registered connector key verbatim (`uipath-microsoft-azureactivedirectory`) and warns that display names like "Microsoft Entra" / "Microsoft Entra ID" are NOT registry keys. - Points the agent at the canonical filter-tree shape: section "Filter Trees (CEQL)" in `skills/uipath-platform/references/integration-service/activities.md` (added by #492) and Step 6a in the Maestro Flow connector plugin's impl.md (which cross-references it). - Enumerates the required `where_detail.json` shape inline: groupOperator + filters[].id + PascalCase operator + WorkflowValue- wrapped value. Checker updates: - Validates `where_detail.json`'s `filter` against the canonical shape (groupOperator numeric, filters[] non-empty, leaves carry PascalCase operator + id referencing displayName + value 'active'). This is the artifact the prompt asks the agent to plan. - Validates the .flow file at the structural level only: registered connector key present + List Groups operation referenced + Decision + Terminate nodes. No more inputs.detail / queryParameters.where / configuration / =jsonString:... reverse-engineering — those are the CLI's job, not the agent's. - Drops the `inputs.detail` / =js: expression form path (the CLI's own validator accepts `inputs: {}`, so we should too). Verified locally on this branch with 3 fresh reps: 3/3 SUCCESS at score 1.0 (durations 298s / 602s / 346s, vs baseline 835s / 402s / 1086s — the canonical-key directive shaves ~45% off the agent's trial-and-error loops on connector picking and validator schema). All three reps converged on the canonical filter-tree shape verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-04T23:51:12Z

Claude finished @tmatup's task in 1m 59s —— View job

PR Review: fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for

Read review criteria and project conventions
Read full diff and existing PR comments
Read full changed files for context
Provide review feedback

Summary

Replaces a brittle checker that graded .flow file internals (inputs.detail, queryParameters.where, savedFilterTrees, =jsonString: configuration blobs) with a two-part checker: (1) validate the agent's planned where_detail.json against the canonical CEQL filter-tree shape, and (2) validate the .flow file at structural level only (connector key present, List Groups operation, Decision + Terminate nodes). The prompt is updated to give the agent the registered connector key verbatim and pointers to the canonical filter-tree documentation.

Change-by-Change Review

1. `tests/tasks/uipath-maestro-flow/connector_features/ceql_where.yaml`

Severity: OK

Well-structured changes:

Description (ceql_where.yaml:2-9): Updated to accurately describe what's now graded — the planned where_detail.json artifact and structural flow checks, not .flow internals. Clear and accurate.
Prompt (ceql_where.yaml:26-70): The key improvement. The agent now gets:
- The exact registered connector key (uipath-microsoft-azureactivedirectory) with explicit warning about display names
- Pointers to canonical filter-tree docs (two specific file paths)
- Inline enumeration of the required where_detail.json shape
- A registry search command to confirm the key
Success criteria (ceql_where.yaml:79-84): Description updated to match new grading scope. Weights unchanged (1.0 file_exists + 8.0 checker + 2.0 validate command = sensible distribution).

One minor observation: the prompt references skills/uipath-platform/references/integration-service/activities.md and skills/uipath-maestro-flow/references/author/references/plugins/connector/impl.md — both files confirmed to exist in the repo.

2. `tests/tasks/uipath-maestro-flow/connector_features/check_ceql_where_flow.py`

Severity: Low (two minor items, overall well done)

What's good:

Clean separation into _check_where_detail() and _check_flow_structure() — each validates its own artifact.
_assert_filter_tree_shape is now parametrized with source for clear error messages.
_leaf_field priority reordered (check_ceql_where_flow.py:46) to prefer id (canonical) over fieldName/field/name — correct since the prompt explicitly asks for id.
Removed the =js: expression fallback path — no longer needed since we don't grade inputs.detail.
The CONNECTOR_KEYS tuple simplified to a single CONNECTOR_KEY string — cleaner now that the prompt gives exactly one key.
Error messages are actionable and reference the doc source.

Minor items:

Unclosed file handles (check_ceql_where_flow.py:97,119): json.load(open("where_detail.json")) and open(flow_path).read() — these rely on CPython's refcounting for immediate close. Not a correctness issue for a short-lived checker script, but with open(...) as f: is the idiomatic pattern. Low — no functional impact.
PascalCase operator not enforced (check_ceql_where_flow.py:57-60): The docstring says "PascalCase operator" and the prompt explicitly requires "operator": PascalCase ("Equals", "Contains", ...), but the actual check at line 74 only verifies isinstance(n.get("operator"), str). An agent writing "operator": "equals" (lowercase) would pass. If PascalCase is truly part of the canonical contract, consider adding a check like operator[0].isupper(). Low — the prompt is clear enough that agents will likely produce PascalCase, and the canonical doc drives the shape.

What's Missing

report.json not validated by checker: The prompt asks the agent to write report.json with specific fields (scenario, uses_filter_tree, used_filter_expression_string, notes), but neither success_criteria nor the checker validates it. This is consistent with the pre-existing behavior (report.json was also not validated before this PR), so not a regression — but worth noting for a follow-up if report.json carries signal.
Nothing else identified.

Area Ratings

Area	Status	Notes
Frontmatter	N/A	No skill SKILL.md changed — only test task files
E2E Tests	OK	Existing smoke-tagged task updated; checker aligns with prompt
Skill Body	N/A	No skill body changed
References & Assets	OK	Both referenced doc paths verified to exist
Repo Hygiene	OK	CODEOWNERS already covers `uipath-maestro-flow`; no secrets; changes scoped to the task being fixed

Issues for Manual Review

Domain accuracy of the connector key: The PR asserts uipath-microsoft-azureactivedirectory is the correct registered key for the Microsoft Entra ID connector. This is a runtime fact that can only be verified against a live tenant's registry. The 3/3 test pass rate strongly suggests it's correct.
Filter-tree canonical shape: The checker validates the shape documented in activities.md. If that doc's shape changes upstream, both the prompt and checker would need updating. This is a known coupling, documented in the PR body.

Conclusion

Clean, well-motivated change that fixes a real test reliability issue (0/3 → 3/3). The approach — "grade what the prompt asks for, give the agent the vocabulary it needs" — is sound and consistent with the companion PR (#555). Two Low items noted (unclosed file handles, PascalCase not enforced) — neither blocks merge. Approve.

tmatup requested a review from baishalighosh May 4, 2026 23:56

baishalighosh approved these changes May 5, 2026

View reviewed changes

baishalighosh merged commit 33c2c72 into main May 5, 2026
6 of 7 checks passed

baishalighosh deleted the fix/ceql-where-canonical-filter-tree branch May 5, 2026 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for#561

fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for#561
baishalighosh merged 1 commit intomainfrom
fix/ceql-where-canonical-filter-tree

tmatup commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tmatup commented May 4, 2026

Summary

Background

Approach

Prompt

Checker

Test plan

Companion PRs in flight

Follow-ups (out of scope)

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for

Summary

Change-by-Change Review

1. tests/tasks/uipath-maestro-flow/connector_features/ceql_where.yaml

2. tests/tasks/uipath-maestro-flow/connector_features/check_ceql_where_flow.py

What's Missing

Area Ratings

Issues for Manual Review

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 4, 2026 •

edited

Loading

1. `tests/tasks/uipath-maestro-flow/connector_features/ceql_where.yaml`

2. `tests/tasks/uipath-maestro-flow/connector_features/check_ceql_where_flow.py`