Skip to content

fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for#561

Merged
baishalighosh merged 1 commit intomainfrom
fix/ceql-where-canonical-filter-tree
May 5, 2026
Merged

fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for#561
baishalighosh merged 1 commit intomainfrom
fix/ceql-where-canonical-filter-tree

Conversation

@tmatup
Copy link
Copy Markdown
Member

@tmatup tmatup commented May 4, 2026

Summary

  • Replaces brittle .flow-file-internals checks with structural+canonical-shape grading.
  • Agent now gets the registered connector key (uipath-microsoft-azureactivedirectory) and a pointer to the Filter Trees (CEQL) doc in the prompt.
  • Drops the requirement that inputs.detail be hand-populated — uip maestro flow validate accepts a connector node with inputs: {}, so we shouldn't be stricter than the CLI.
  • Local verification: 3/3 SUCCESS at score 1.0 vs 0/3 baseline.

Background

Local 3-rep baseline on current origin/main for skill-flow-ipe-ceql-where: 0/3 PASS, all three FAILURE at score 0.273. Three reps split across two checker-driven failure modes that aren't really agent failures:

Failure mode Reps Why
Connector-key invention 00 + 02 Agent wrote uipath-microsoft-entra / uipath-microsoft-entra-id, derived from the current product name "Microsoft Entra ID". The registered key is the legacy uipath-microsoft-azureactivedirectory, and nothing in the prompt or SKILL.md surfaced that.
Empty inputs.detail 01 Agent built a structurally-valid flow with inputs: {} and ran uip maestro flow validate → "Status: Valid". The CLI itself does not require inputs.detail populated; only this test's checker did. The prompt forbids uip flow node configure (no live tenant) — the command that populates inputs.detail. Asking the agent to reverse-engineer that population is asking for something the prompt forbids and the CLI doesn't enforce.

The pattern is identical to #555 (smoke_03_escalation): the checker grades artifacts/shapes that the prompt doesn't ask for and the agent has no reliable signal to know about.

Approach

Mirror the HITL fix's playbook: give the agent the canonical vocabulary, grade what the prompt asks for.

Prompt

  • Name the registered connector key verbatim (uipath-microsoft-azureactivedirectory) and explicitly warn that "Microsoft Entra" / "Microsoft Entra ID" are display names, not registry keys.
  • Point the agent at the canonical filter-tree shape: the "Filter Trees (CEQL)" section in skills/uipath-platform/references/integration-service/activities.md (added by docs: add CEQL filter trees section for IS list activities #492) and the cross-reference in Step 6a of the Maestro Flow connector plugin's impl.md.
  • Enumerate the required where_detail.json shape inline: groupOperator + filters[].id + PascalCase operator + WorkflowValue-wrapped value.

Checker

  • Validates where_detail.json's filter against the canonical shape (the artifact the prompt asks the agent to plan).
  • Validates the .flow file at structural level only: registered connector key present + List Groups operation referenced + Decision + Terminate nodes. No more inputs.detail / queryParameters.where / configuration: '=jsonString:...' reverse-engineering — those are what uip flow node configure produces, and that command is forbidden by the prompt.
  • Drops the =js: expression-form fallback path (only meaningful for the persisted shape we're no longer grading).

Test plan

Baseline (current origin/main):

  • rep 00: FAILURE 0.273 — uipath-microsoft-entra (invented key)
  • rep 01: FAILURE 0.273 — empty inputs.detail
  • rep 02: FAILURE 0.273 — uipath-microsoft-entra-id (invented key)

After fix (this branch):

  • rep 00: SUCCESS 1.0 in 298s
  • rep 01: SUCCESS 1.0 in 602s
  • rep 02: SUCCESS 1.0 in 346s

Average duration: 415s (vs 774s baseline — the canonical-key directive shaves ~45% off the agent's trial-and-error on connector picking and validator schema reverse-engineering). All three reps converged on the canonical filter-tree shape verbatim:

{
  "groupOperator": 0, "index": 0,
  "filters": [
    { "id": "displayName", "operator": "Equals",
      "value": { "value": "active", "rawString": "\"active\"", "isLiteral": true } }
  ],
  "groups": []
}

Companion PRs in flight

Follow-ups (out of scope)

  • Tier-1 connector cheat sheet in the maestro-flow skill. All three baseline reps independently invented connector keys from product names. Adding a "display name → registry key" table for Tier-1 connectors (Microsoft Entra ID → uipath-microsoft-azureactivedirectory, OneDrive/SharePoint/M365 → uipath-microsoft-onedrive, etc.) would close that gap for future tasks. Documented but not bundled here.

🤖 Generated with Claude Code

… asks for

Local 3-rep baseline of `skill-flow-ipe-ceql-where` on current main:
0/3 PASS at score 0.273. Three reps split across two checker-driven
failure modes that aren't really agent failures:

1. Connector-key invention (reps 00 + 02) — agent wrote
   `uipath-microsoft-entra` / `uipath-microsoft-entra-id`, derived from
   the current product name "Microsoft Entra ID". The registered
   connector key is the legacy `uipath-microsoft-azureactivedirectory`,
   and nothing in the prompt or skill SKILL.md surfaces that.
2. Empty inputs.detail (rep 01) — agent built a structurally-valid
   flow with `inputs: {}` and ran `uip maestro flow validate`, which
   returned "Status: Valid". The CLI itself does not require
   inputs.detail for a connector node; only this test's checker did.
   Since the prompt explicitly forbids `uip flow node configure` (no
   live tenant), populating inputs.detail requires the agent to
   reverse-engineer what the CLI would emit — that's not what the
   prompt asks for.

Fix follows the same playbook as
#555 (smoke_03_escalation):
give the agent the canonical vocabulary and grade the artifacts the
prompt actually asks for.

Prompt updates:
- Names the registered connector key verbatim
  (`uipath-microsoft-azureactivedirectory`) and warns that display
  names like "Microsoft Entra" / "Microsoft Entra ID" are NOT registry
  keys.
- Points the agent at the canonical filter-tree shape: section "Filter
  Trees (CEQL)" in
  `skills/uipath-platform/references/integration-service/activities.md`
  (added by #492) and Step 6a in the Maestro Flow connector plugin's
  impl.md (which cross-references it).
- Enumerates the required `where_detail.json` shape inline:
  groupOperator + filters[].id + PascalCase operator + WorkflowValue-
  wrapped value.

Checker updates:
- Validates `where_detail.json`'s `filter` against the canonical shape
  (groupOperator numeric, filters[] non-empty, leaves carry
  PascalCase operator + id referencing displayName + value 'active').
  This is the artifact the prompt asks the agent to plan.
- Validates the .flow file at the structural level only: registered
  connector key present + List Groups operation referenced + Decision
  + Terminate nodes. No more inputs.detail / queryParameters.where /
  configuration / =jsonString:... reverse-engineering — those are the
  CLI's job, not the agent's.
- Drops the `inputs.detail` / =js: expression form path (the CLI's
  own validator accepts `inputs: {}`, so we should too).

Verified locally on this branch with 3 fresh reps: 3/3 SUCCESS at
score 1.0 (durations 298s / 602s / 346s, vs baseline 835s / 402s /
1086s — the canonical-key directive shaves ~45% off the agent's
trial-and-error loops on connector picking and validator schema).
All three reps converged on the canonical filter-tree shape verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Claude finished @tmatup's task in 1m 59s —— View job


PR Review: fix(maestro-flow): grade CEQL where test against artifacts the prompt asks for

  • Read review criteria and project conventions
  • Read full diff and existing PR comments
  • Read full changed files for context
  • Provide review feedback

Summary

Replaces a brittle checker that graded .flow file internals (inputs.detail, queryParameters.where, savedFilterTrees, =jsonString: configuration blobs) with a two-part checker: (1) validate the agent's planned where_detail.json against the canonical CEQL filter-tree shape, and (2) validate the .flow file at structural level only (connector key present, List Groups operation, Decision + Terminate nodes). The prompt is updated to give the agent the registered connector key verbatim and pointers to the canonical filter-tree documentation.

Change-by-Change Review

1. tests/tasks/uipath-maestro-flow/connector_features/ceql_where.yaml

Severity: OK

Well-structured changes:

  • Description (ceql_where.yaml:2-9): Updated to accurately describe what's now graded — the planned where_detail.json artifact and structural flow checks, not .flow internals. Clear and accurate.
  • Prompt (ceql_where.yaml:26-70): The key improvement. The agent now gets:
    • The exact registered connector key (uipath-microsoft-azureactivedirectory) with explicit warning about display names
    • Pointers to canonical filter-tree docs (two specific file paths)
    • Inline enumeration of the required where_detail.json shape
    • A registry search command to confirm the key
  • Success criteria (ceql_where.yaml:79-84): Description updated to match new grading scope. Weights unchanged (1.0 file_exists + 8.0 checker + 2.0 validate command = sensible distribution).

One minor observation: the prompt references skills/uipath-platform/references/integration-service/activities.md and skills/uipath-maestro-flow/references/author/references/plugins/connector/impl.md — both files confirmed to exist in the repo.

2. tests/tasks/uipath-maestro-flow/connector_features/check_ceql_where_flow.py

Severity: Low (two minor items, overall well done)

What's good:

  • Clean separation into _check_where_detail() and _check_flow_structure() — each validates its own artifact.
  • _assert_filter_tree_shape is now parametrized with source for clear error messages.
  • _leaf_field priority reordered (check_ceql_where_flow.py:46) to prefer id (canonical) over fieldName/field/name — correct since the prompt explicitly asks for id.
  • Removed the =js: expression fallback path — no longer needed since we don't grade inputs.detail.
  • The CONNECTOR_KEYS tuple simplified to a single CONNECTOR_KEY string — cleaner now that the prompt gives exactly one key.
  • Error messages are actionable and reference the doc source.

Minor items:

  1. Unclosed file handles (check_ceql_where_flow.py:97,119): json.load(open("where_detail.json")) and open(flow_path).read() — these rely on CPython's refcounting for immediate close. Not a correctness issue for a short-lived checker script, but with open(...) as f: is the idiomatic pattern. Low — no functional impact.

  2. PascalCase operator not enforced (check_ceql_where_flow.py:57-60): The docstring says "PascalCase operator" and the prompt explicitly requires "operator": PascalCase ("Equals", "Contains", ...), but the actual check at line 74 only verifies isinstance(n.get("operator"), str). An agent writing "operator": "equals" (lowercase) would pass. If PascalCase is truly part of the canonical contract, consider adding a check like operator[0].isupper(). Low — the prompt is clear enough that agents will likely produce PascalCase, and the canonical doc drives the shape.

What's Missing

  • report.json not validated by checker: The prompt asks the agent to write report.json with specific fields (scenario, uses_filter_tree, used_filter_expression_string, notes), but neither success_criteria nor the checker validates it. This is consistent with the pre-existing behavior (report.json was also not validated before this PR), so not a regression — but worth noting for a follow-up if report.json carries signal.
  • Nothing else identified.

Area Ratings

Area Status Notes
Frontmatter N/A No skill SKILL.md changed — only test task files
E2E Tests OK Existing smoke-tagged task updated; checker aligns with prompt
Skill Body N/A No skill body changed
References & Assets OK Both referenced doc paths verified to exist
Repo Hygiene OK CODEOWNERS already covers uipath-maestro-flow; no secrets; changes scoped to the task being fixed

Issues for Manual Review

  • Domain accuracy of the connector key: The PR asserts uipath-microsoft-azureactivedirectory is the correct registered key for the Microsoft Entra ID connector. This is a runtime fact that can only be verified against a live tenant's registry. The 3/3 test pass rate strongly suggests it's correct.
  • Filter-tree canonical shape: The checker validates the shape documented in activities.md. If that doc's shape changes upstream, both the prompt and checker would need updating. This is a known coupling, documented in the PR body.

Conclusion

Clean, well-motivated change that fixes a real test reliability issue (0/3 → 3/3). The approach — "grade what the prompt asks for, give the agent the vocabulary it needs" — is sound and consistent with the companion PR (#555). Two Low items noted (unclosed file handles, PascalCase not enforced) — neither blocks merge. Approve.

@tmatup tmatup requested a review from baishalighosh May 4, 2026 23:56
@baishalighosh baishalighosh merged commit 33c2c72 into main May 5, 2026
6 of 7 checks passed
@baishalighosh baishalighosh deleted the fix/ceql-where-canonical-filter-tree branch May 5, 2026 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants