UiPath · mjnovice · May 4, 2026 · May 5, 2026 · May 5, 2026 · May 5, 2026
@@ -46,6 +46,7 @@ Determine the agent mode before proceeding:
 | Add guardrails (PII, harmful content, custom rules) to a low-code agent | Low-code | [lowcode/lowcode.md](references/lowcode/lowcode.md) § Capability Registry | `lowcode/capabilities/guardrails/guardrails.md` |
 | Add escalation guardrail (escalate action / Action Center app) | Low-code | [lowcode/capabilities/guardrails/guardrails.md](references/lowcode/capabilities/guardrails/guardrails.md) § escalate — Hand Off to Action Center | Run `uip solution resource list --kind App` to confirm app exists |
 | Embed a low-code agent inline in a flow, or wire a multi-agent solution | Low-code | [lowcode/lowcode.md](references/lowcode/lowcode.md) § Capability Registry | `lowcode/capabilities/inline-in-flow/inline-in-flow.md`, `lowcode/capabilities/process/solution-agent.md` |
+| Run low-code evaluations | Low-code | [lowcode/evaluations/evaluate.md](references/lowcode/evaluations/evaluate.md) | `lowcode/evaluations/evaluators.md`, `lowcode/evaluations/evaluation-sets.md`, `lowcode/evaluations/running-evaluations.md` |
 | Validate, pack, publish, upload, or deploy a low-code agent | Low-code | [lowcode/lowcode.md](references/lowcode/lowcode.md) | `lowcode/project-lifecycle.md`, `lowcode/solution-resources.md` |
 
 ## Resources

@@ -0,0 +1,90 @@
+# Evaluate Low-Code Agents
+
+Design and run evaluations against low-code agents using the `uip agent eval` CLI.
+
+## Quick Reference
+
+```bash
+# Add a test case
+uip agent eval add happy-path --set "Default Evaluation Set" --inputs '{"input":"hello"}' --expected '{"content":"greeting"}' --path ./my-agent --output json
+
+# Run evals and wait for results
+uip agent eval run start --set "Default Evaluation Set" --path ./my-agent --wait --output json
+
+# Check results (failures only, with justifications)
+uip agent eval run results <run_id> --set "Default Evaluation Set" --only-failed --verbose --path ./my-agent --output json
+```
+
+## Prerequisites
+
+- Agent project initialized (`uip agent init <path>`)
+- `entry-points.json` present (defines `input`/`output` schema that test case `--inputs`/`--expected` must conform to)
+- `uip agent validate --output json` passes (validate also checks evals and evaluators)
+- Agent pushed to Studio Web (`uip agent push`) — required for running evals (the Agent Runtime executes test cases in the cloud)
+
+Local operations (managing evaluators, eval sets, test cases) do **not** require authentication or a cloud connection. Only `uip agent eval run *` commands require cloud connectivity.
+
+## Reference Navigation
+
+- [Evaluators](evaluators.md) — evaluator types, adding/removing, default prompts
+- [Evaluation Sets and Test Cases](evaluation-sets.md) — creating sets, adding test cases, simulation options
+- [Running Evaluations](running-evaluations.md) — start, status, results, compare
+
+Read Evaluators before choosing an evaluator type, and Evaluation Sets before writing test cases.
+
+## File Structure
+
+After `uip agent init`, the project structure is:
+
+```
+my-agent/
+  agent.json
+  entry-points.json                       # Input/output schema — test case --inputs / --expected must match
+  project.uiproj
+  flow-layout.json
+  evals/
+    evaluators/
+      evaluator-default.json              # name: "Default Evaluator" (semantic-similarity)
+      evaluator-default-trajectory.json   # name: "Default Trajectory Evaluator"
+    eval-sets/
+      evaluation-set-default.json         # name: "Default Evaluation Set" (references both evaluators)
+```
+
+Evaluators live in `evals/evaluators/` and eval sets (with inline test cases) live in `evals/eval-sets/`. Both are auto-discovered by the CLI from these directories.
+
+CLI-added evaluators are written as `evaluator-<uuid8>.json` (first 8 hex chars of the evaluator UUID). The `<name>` argument populates the `name` field inside the JSON, NOT the filename. Reference evaluators in eval sets by `id` (UUID), not filename.
+
+## Key Differences from Coded Agent Evals
+
+| Aspect | Coded (`uip codedagent eval`) | Low-code (`uip agent eval`) |
+|--------|-------------------------------|------------------------------|
+| Execution | Local Python process | Cloud-based via Agent Runtime |
+| Auth required | Only for `--report` | Always (cloud execution) |
+| Prerequisite | `entry-points.json` | `uip agent push` |
+| Mocking | `@mockable()` decorator + declarative | Simulation instructions only |
+| CLI prefix | `uip codedagent eval` | `uip agent eval` |
+
+## Troubleshooting
+
+| Error | Cause | Fix |
+|-------|-------|-----|
+| Solution ID could not be resolved | Agent not pushed to Studio Web | Run `uip agent push --output json`, or pass `--solution-id <id>` explicitly to `uip agent eval run start` |
+| `No evaluators found` | Empty `evals/evaluators/` directory | Run `uip agent eval evaluator add` or re-init with `uip agent init` |
+| `No test cases in eval set` | Eval set has no evaluations | Run `uip agent eval add` to add test cases |
+| `Unknown evaluator type "X"` | Wrong case on `--type` value | Use kebab-case only: `semantic-similarity`, `trajectory` |
+| `Evaluator '<id>' is an LLM-based evaluator but 'model' is not set in its evaluatorConfig.` | LLM evaluator JSON has empty/missing `model` and is not `same-as-agent` | Set `"model"` in the evaluator JSON to a valid model (e.g. `claude-haiku-4-5-20251001`), or set it to `"same-as-agent"` and ensure `agent.json` has a model |
+| `'same-as-agent' model option requires agent settings. Ensure agent.json contains valid model settings.` | Evaluator uses `"model": "same-as-agent"` but `agent.json` has no resolvable model | Set a model in `agent.json`, or override the evaluator with an explicit model |
+| `401 Unauthorized` | Auth expired | Run `uip login --output json` |
+| Eval run timeout (with `--wait`) | Agent taking too long or stuck | Increase `--timeout` or check agent health in Studio Web. Note: this only stops the local CLI from blocking; the run continues server-side — query with `uip agent eval run status <run_id>` |
+| Validate fails with eval errors | Eval set references an evaluator that no longer exists, OR evaluator JSON missing required field, OR `category`/`type` mismatch (see [evaluators.md](evaluators.md) § What `uip agent validate` Checks) | Re-run `uip agent eval evaluator list` and reconcile `evaluatorRefs`; fix per the validate error message |
+
+The two model-resolution errors above are **runtime checks in the cloud eval worker**, not validate-time checks — `uip agent validate` will not catch them. They surface only after `uip agent eval run start`. To pre-empt them, inspect each evaluator's `model` field locally before pushing.
+
+## Anti-patterns
+
+- **Don't run `uip agent eval run start` before `uip agent push`.** The Agent Runtime executes against the pushed agent. Local edits to `agent.json` after the last push will not be reflected in the run.
+- **Don't skip `uip agent validate` before push.** Validate checks `evals/` and `evaluators/`; broken eval JSON will not block push but will surface as runtime errors.
+- **Don't hand-edit `id` or `evaluatorRefs` UUIDs.** Eval sets reference evaluators by UUID. Renaming an evaluator file or copy-pasting a UUID across evaluators silently breaks resolution.
+- **Don't expect filenames to match `<name>`.** CLI-generated evaluator files use `evaluator-<uuid8>.json`, not `<name>.json`. Look up evaluators by the `name` field inside the JSON, not by filename.
+- **Don't pass `--type` in PascalCase.** The CLI rejects `SemanticSimilarity`. Only kebab-case is accepted.
+- **Don't reference evaluators across projects.** Each agent project has its own `evals/evaluators/` directory; UUIDs are not portable.
@@ -0,0 +1,153 @@
+# Evaluation Sets and Test Cases
+
+Evaluation sets group test cases and reference which evaluators to use. Each set is a JSON file in `evals/eval-sets/`. Test cases are stored inline within the eval set.
+
+## Managing Eval Sets
+
+### Add an eval set
+
+```bash
+uip agent eval set add <name> --path <agent_dir> --output json
+```
+
+**Options:**
+
+| Flag | Required | Description | Default |
+|------|----------|-------------|---------|
+| `--evaluators <ids>` | No | Comma-separated evaluator IDs | All existing evaluators |
+| `--path <path>` | No | Agent project directory | `.` |
+
+When `--evaluators` is not provided, the new eval set automatically references **all** evaluators in the project.
+
+### List eval sets
+
+```bash
+uip agent eval set list --path <agent_dir> --output json
+```
+
+### Remove an eval set
+
+```bash
+uip agent eval set remove <id_or_name> --path <agent_dir> --output json
+```
+
+## Managing Test Cases
+
+Test cases live inside eval sets. Each test case defines an input, expected output, and optional behavior expectations.
+
+### Add a test case
+
+```bash
+uip agent eval add <name> \
+  --set "<eval_set_name>" \
+  --inputs '{"input":"hello"}' \
+  --expected '{"content":"greeting response"}' \
+  --path <agent_dir> \
+  --output json
+```
+
+**Options:**
+
+| Flag | Required | Description | Default |
+|------|----------|-------------|---------|
+| `--set <name>` | Yes | Eval set name or ID | — |
+| `--inputs <json>` | Yes | Input values as JSON | — |
+| `--expected <json>` | No | Expected output as JSON | `{}` |
+| `--expected-agent-behavior <text>` | No | Description of expected behavior (used by trajectory evaluator) | `""` |
+| `--simulation-instructions <text>` | No | Instructions for simulating agent behavior | `""` |
+| `--simulate-input` | No | Enable input simulation | `false` |
+| `--simulate-tools` | No | Enable tool simulation | `false` |
+| `--input-generation-instructions <text>` | No | Instructions for generating synthetic inputs | `""` |
+| `--path <path>` | No | Agent project directory | `.` |
+
+### List test cases
+
+```bash
+uip agent eval list --set "<eval_set_name>" --path <agent_dir> --output json
+```
+
+### Remove a test case
+
+```bash
+uip agent eval remove <id_or_name> --set "<eval_set_name>" --path <agent_dir> --output json
+```
+
+## Test Case Design
+
+### Aligning `--inputs` with `entry-points.json`
+
+`--inputs` JSON keys must match the `input` schema in `entry-points.json`. Mismatched keys do not block `eval add` (the CLI stores the JSON verbatim) but will fail at run time when the Agent Runtime invokes the agent. Run `uip agent validate --output json` after adding test cases to surface schema drift.
+
+### Matching evaluator to test case fields
+
+The `--inputs` and `--expected` flags populate `inputs` and `expectedOutput` on the test case. Each evaluator type sources its placeholder values from a different combination of test-case fields and agent run trace:
+
+| Evaluator Type | From test case | From agent run |
+|----------------|---------------|----------------|
+| Semantic Similarity (type 5) | `expectedOutput` → `{{ExpectedOutput}}` | Agent output → `{{ActualOutput}}` |
+| Trajectory (type 7) | `expectedAgentBehavior` → `{{ExpectedAgentBehavior}}`, `inputs` → `{{UserOrSyntheticInput}}`, `simulationInstructions` → `{{SimulationInstructions}}` | Trace → `{{AgentRunHistory}}` |
+| Exact match (type 1) | `expectedOutput` (compared verbatim, no placeholders) | Agent output (compared verbatim) |
+| JSON similarity (type 6) | `expectedOutput` (tree-compared, no placeholders) | Agent output (tree-compared) |
+
+For trajectory evaluation, write `--expected-agent-behavior` as a natural language description of what the agent should do, not what it should output:
+
+```bash
+uip agent eval add tool-usage-test \
+  --set "Default Evaluation Set" \
+  --inputs '{"input":"What is the weather in NYC?"}' \
+  --expected-agent-behavior "Agent should call the weather tool with location NYC and return a formatted weather summary" \
+  --path ./my-agent --output json
+```
+
+### Simulation options
+
+- `--simulate-input` — runtime generates synthetic input variations based on the provided input
+- `--simulate-tools` — tool calls are simulated rather than executed against real services
+- `--input-generation-instructions` — guides synthetic input generation (e.g., "generate edge cases with empty strings and special characters")
+- `--simulation-instructions` — guides overall simulation behavior
+
+Use these to expand test coverage without writing every input by hand.
+
+## Eval Set JSON Format
+
+```json
+{
+  "fileName": "evaluation-set-default.json",
+  "id": "<uuid>",
+  "name": "Default Evaluation Set",
+  "batchSize": 10,
+  "evaluatorRefs": ["<evaluator-uuid-1>", "<evaluator-uuid-2>"],
+  "evaluations": [
+    {
+      "id": "<uuid>",
+      "name": "happy-path",
+      "inputs": {"input": "hello"},
+      "expectedOutput": {"content": "greeting"},
+      "expectedAgentBehavior": "",
+      "simulationInstructions": "",
+      "simulateInput": false,
+      "simulateTools": false,
+      "inputGenerationInstructions": "",
+      "evalSetId": "<eval-set-uuid>",
+      "source": "manual",
+      "createdAt": "...",
+      "updatedAt": "..."
+    }
+  ],
+  "modelSettings": [],
+  "agentMemoryEnabled": false,
+  "agentMemorySettings": [],
+  "lineByLineEvaluation": false,
+  "createdAt": "...",
+  "updatedAt": "..."
+}
+```
+
+The `source` field indicates how the test case was created. CLI-added test cases are always `"manual"` (verified). Other observed values from Studio Web include `"debugRun"`, `"runtimeRun"`, `"simulatedRun"`, and `"autopilotUserInitiated"` — treat the `source` field as an enum but do not set it manually; the CLI and Studio Web own this value.
+
+## Anti-patterns
+
+- **Don't hand-write `evalSetId` or test case `id` UUIDs.** Use `uip agent eval add` so the CLI keeps `evaluations[].evalSetId` consistent with the parent eval set's `id`.
+- **Don't add `--inputs` keys that are not in `entry-points.json`.** The runtime will reject the test case at execution time. Run `uip agent validate` to catch this before push.
+- **Don't set `--expected '{}'` (empty) and `--expected-agent-behavior ""` together.** The semantic-similarity evaluator scores against an empty `{{ExpectedOutput}}`; the trajectory evaluator scores against an empty `{{ExpectedAgentBehavior}}`. Every run scores low for non-actionable reasons.
+- **Don't set the `source` field manually.** Owned by CLI and Studio Web; hand-edits may be overwritten on the next sync.