mathaix
diff --git a/‎.claude/commands/create-flow.md‎
Lines changed: 98 additions & 0 deletions b/‎.claude/commands/create-flow.md‎
Lines changed: 98 additions & 0 deletions
diff --git a/‎.claude/commands/test-flow.md‎
Lines changed: 68 additions & 0 deletions b/‎.claude/commands/test-flow.md‎
Lines changed: 68 additions & 0 deletions
@@ -0,0 +1,98 @@
+# Create a New Test Flow
+
+Create a new YAML test flow for LLM compliance testing.
+
+## Usage
+```
+/create-flow <flow_name>
+```
+
+## Instructions
+
+Create a new flow spec file at `src/backend/tests/integration/flows/$ARGUMENTS.yml`.
+
+### Template
+
+```yaml
+name: $ARGUMENTS
+description: |
+  [DESCRIBE: What this flow tests and why it matters]
+
+version: "1.0"
+
+context:
+  goal: "[DESCRIBE: The project goal for this test scenario]"
+  project_type: "ma_due_diligence"
+
+session:
+  project_id: test-$ARGUMENTS
+
+steps:
+  - name: initial_step
+    description: "[DESCRIBE: What this step does]"
+    user_says: "[USER MESSAGE: What the user sends]"
+    expect:
+      phase: goal_understanding
+      event: CUSTOM
+      event_name: clara:ask
+      cards:
+        must_include_types:
+          - stepper
+          - snapshot
+
+  # Add more steps as needed...
+
+assertions:
+  - name: primary_assertion
+    description: "[DESCRIBE: What we're validating]"
+    critical: true
+    check: |
+      # Python validation code
+      pass
+
+compliance_notes: |
+  If this flow fails:
+  1. Check the relevant prompt file
+  2. Verify the LLM outputs correct card types
+  3. Review the event trace for debugging
+
+failure_actions:
+  - action: show_event_trace
+    description: "Display the actual event for debugging"
+```
+
+### Steps to Complete
+
+1. Create the file with the template above
+2. Replace all `[PLACEHOLDERS]` with actual values
+3. Add the appropriate steps for this flow
+4. Define expectations for each step using:
+   - `phase`: Expected phase (goal_understanding, agent_configuration, blueprint_design)
+   - `event`: Event type (CUSTOM, TEXT_MESSAGE_START, etc.)
+   - `event_name`: For CUSTOM events (clara:ask, clara:confirm, etc.)
+   - `cards.must_include_types`: List of required card types
+   - `cards.must_include`: Detailed card requirements with body validation
+   - `cards.stepper_current_step_contains`: Text in active stepper step
+
+### Available Card Types
+- `stepper` - Progress indicator
+- `snapshot` - Project snapshot
+- `domain_setup` - Domain configuration
+- `personas` - Persona selection
+- `info` - General information
+- `agent_configured` - Agent configuration complete
+
+### Verify the Flow
+
+```bash
+cd src/backend
+
+# Check it appears in the list
+uv run python -m clara.testing.flow_runner --list
+
+# Run the flow (requires backend running)
+uv run python -m clara.testing.flow_runner $ARGUMENTS
+```
+
+### Example Flows to Reference
+- `personas_step.yml` - Tests persona card type compliance
@@ -0,0 +1,68 @@
+# Run LLM Compliance Flow Test
+
+Run a Design Assistant flow compliance test to verify LLM outputs correct AG-UI events.
+
+## Usage
+```
+/test-flow <flow_name>
+```
+
+## Available Flows
+- `personas_step` - Tests that the personas step outputs type: "personas" cards (not "info")
+
+## Instructions
+
+Run the specified flow test against the running Design Assistant:
+
+1. First, ensure the backend is running:
+   ```bash
+   cd src/backend && uv run uvicorn clara.main:app --reload --port 8000
+   ```
+
+2. Run the flow test:
+   ```bash
+   cd src/backend && uv run python -m clara.testing.flow_runner $ARGUMENTS
+   ```
+
+3. If the test fails:
+   - Review the error output to understand which card types or events are incorrect
+   - Check the compliance notes in the flow spec for recommended fixes
+   - The most common issue is the LLM outputting `type: "info"` instead of `type: "personas"` at the personas step
+
+4. Report the results including:
+   - Which steps passed/failed
+   - The specific errors encountered
+   - Recommended prompt fixes if applicable
+
+## Flow Specs Location
+Flow specs are YAML files in: `src/backend/tests/integration/flows/`
+
+## Example Output
+```
+============================================================
+Running flow: personas_step_compliance
+Description: Verify the LLM outputs correct card types at the Personas step.
+============================================================
+
+Created session: abc123
+
+Step 1/4: initial_goal
+  Sending: "I want to build an IT incident discovery system for M&A due diligence"
+  ✓ PASSED
+
+Step 2/4: confirm_domain
+  Sending: "Confirm"
+  ✓ PASSED
+
+Step 3/4: skip_context
+  Sending: "Skip"
+  ✓ PASSED
+
+Step 4/4: personas_step
+  Sending: "Continue"
+  ✓ PASSED
+
+============================================================
+Results: 4/4 steps passed
+============================================================
+```