Skip to content

Commit 7172b07

Browse files
authored
Merge pull request #54 from mathaix/feature/integration-testing-framework
test: Add 4-layer integration testing framework
2 parents 55fb7e4 + 3024c68 commit 7172b07

File tree

20 files changed

+3207
-279
lines changed

20 files changed

+3207
-279
lines changed

.claude/commands/create-flow.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Create a New Test Flow
2+
3+
Create a new YAML test flow for LLM compliance testing.
4+
5+
## Usage
6+
```
7+
/create-flow <flow_name>
8+
```
9+
10+
## Instructions
11+
12+
Create a new flow spec file at `src/backend/tests/integration/flows/$ARGUMENTS.yml`.
13+
14+
### Template
15+
16+
```yaml
17+
name: $ARGUMENTS
18+
description: |
19+
[DESCRIBE: What this flow tests and why it matters]
20+
21+
version: "1.0"
22+
23+
context:
24+
goal: "[DESCRIBE: The project goal for this test scenario]"
25+
project_type: "ma_due_diligence"
26+
27+
session:
28+
project_id: test-$ARGUMENTS
29+
30+
steps:
31+
- name: initial_step
32+
description: "[DESCRIBE: What this step does]"
33+
user_says: "[USER MESSAGE: What the user sends]"
34+
expect:
35+
phase: goal_understanding
36+
event: CUSTOM
37+
event_name: clara:ask
38+
cards:
39+
must_include_types:
40+
- stepper
41+
- snapshot
42+
43+
# Add more steps as needed...
44+
45+
assertions:
46+
- name: primary_assertion
47+
description: "[DESCRIBE: What we're validating]"
48+
critical: true
49+
check: |
50+
# Python validation code
51+
pass
52+
53+
compliance_notes: |
54+
If this flow fails:
55+
1. Check the relevant prompt file
56+
2. Verify the LLM outputs correct card types
57+
3. Review the event trace for debugging
58+
59+
failure_actions:
60+
- action: show_event_trace
61+
description: "Display the actual event for debugging"
62+
```
63+
64+
### Steps to Complete
65+
66+
1. Create the file with the template above
67+
2. Replace all `[PLACEHOLDERS]` with actual values
68+
3. Add the appropriate steps for this flow
69+
4. Define expectations for each step using:
70+
- `phase`: Expected phase (goal_understanding, agent_configuration, blueprint_design)
71+
- `event`: Event type (CUSTOM, TEXT_MESSAGE_START, etc.)
72+
- `event_name`: For CUSTOM events (clara:ask, clara:confirm, etc.)
73+
- `cards.must_include_types`: List of required card types
74+
- `cards.must_include`: Detailed card requirements with body validation
75+
- `cards.stepper_current_step_contains`: Text in active stepper step
76+
77+
### Available Card Types
78+
- `stepper` - Progress indicator
79+
- `snapshot` - Project snapshot
80+
- `domain_setup` - Domain configuration
81+
- `personas` - Persona selection
82+
- `info` - General information
83+
- `agent_configured` - Agent configuration complete
84+
85+
### Verify the Flow
86+
87+
```bash
88+
cd src/backend
89+
90+
# Check it appears in the list
91+
uv run python -m clara.testing.flow_runner --list
92+
93+
# Run the flow (requires backend running)
94+
uv run python -m clara.testing.flow_runner $ARGUMENTS
95+
```
96+
97+
### Example Flows to Reference
98+
- `personas_step.yml` - Tests persona card type compliance

.claude/commands/test-flow.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Run LLM Compliance Flow Test
2+
3+
Run a Design Assistant flow compliance test to verify LLM outputs correct AG-UI events.
4+
5+
## Usage
6+
```
7+
/test-flow <flow_name>
8+
```
9+
10+
## Available Flows
11+
- `personas_step` - Tests that the personas step outputs type: "personas" cards (not "info")
12+
13+
## Instructions
14+
15+
Run the specified flow test against the running Design Assistant:
16+
17+
1. First, ensure the backend is running:
18+
```bash
19+
cd src/backend && uv run uvicorn clara.main:app --reload --port 8000
20+
```
21+
22+
2. Run the flow test:
23+
```bash
24+
cd src/backend && uv run python -m clara.testing.flow_runner $ARGUMENTS
25+
```
26+
27+
3. If the test fails:
28+
- Review the error output to understand which card types or events are incorrect
29+
- Check the compliance notes in the flow spec for recommended fixes
30+
- The most common issue is the LLM outputting `type: "info"` instead of `type: "personas"` at the personas step
31+
32+
4. Report the results including:
33+
- Which steps passed/failed
34+
- The specific errors encountered
35+
- Recommended prompt fixes if applicable
36+
37+
## Flow Specs Location
38+
Flow specs are YAML files in: `src/backend/tests/integration/flows/`
39+
40+
## Example Output
41+
```
42+
============================================================
43+
Running flow: personas_step_compliance
44+
Description: Verify the LLM outputs correct card types at the Personas step.
45+
============================================================
46+
47+
Created session: abc123
48+
49+
Step 1/4: initial_goal
50+
Sending: "I want to build an IT incident discovery system for M&A due diligence"
51+
✓ PASSED
52+
53+
Step 2/4: confirm_domain
54+
Sending: "Confirm"
55+
✓ PASSED
56+
57+
Step 3/4: skip_context
58+
Sending: "Skip"
59+
✓ PASSED
60+
61+
Step 4/4: personas_step
62+
Sending: "Continue"
63+
✓ PASSED
64+
65+
============================================================
66+
Results: 4/4 steps passed
67+
============================================================
68+
```

0 commit comments

Comments
 (0)