Philosophy: Feature flows include testing instructions. Follow them to verify everything works.
Last Updated: 2025-12-08
Each feature flow document (docs/memory/feature-flows/*.md) includes:
## Testing
### Prerequisites
- [ ] Services running (backend, frontend, Redis, etc.)
- [ ] Test user logged in
- [ ] Agent running (if testing agent features)
### Test Steps
#### 1. [Feature Action Name]
**Action**: Describe what to do
**Expected**: What should happen
**Verify**:
- [ ] UI shows correct state
- [ ] API returns expected response
- [ ] Database updated correctly
### Edge Cases
- [ ] What happens if...
- [ ] Test with invalid data...
**Last Tested**: YYYY-MM-DD
**Status**: ✅ Working / ⚠️ Issues Found / ❌ BrokenWhen implementing or modifying a feature:
- Read the feature flow document
- Follow the Testing section step-by-step
- Verify each step works as documented
- Update "Last Tested" timestamp
- Document any issues found
Manual testing (following documented steps) catches:
- Integration issues
- UX problems
- Edge cases
- Real-world workflows
Automated tests only when:
- Feature breaks repeatedly
- Critical path that must never break
- Regression prevention needed
All feature flows are indexed in docs/memory/feature-flows.md. Key flows:
| Flow | Document | Status |
|---|---|---|
| Agent Lifecycle | agent-lifecycle.md |
✅ Tested 2025-12-07 |
| Agent Terminal | agent-terminal.md |
✅ Working |
| Authentication | email-authentication.md |
✅ Working |
| Agent Network | agent-network.md |
✅ Tested 2025-12-07 |
| Execution Queue | execution-queue.md |
✅ Ready for testing |
| Flow | Document | Status |
|---|---|---|
| Credential Injection | credential-injection.md |
✅ Working |
| Agent Scheduling | scheduling.md |
✅ Working |
| File Browser | file-browser.md |
✅ Working |
| Agent Sharing | agent-sharing.md |
✅ Working |
| MCP Orchestration | mcp-orchestration.md |
✅ Working |
| Activity Stream | activity-stream.md |
✅ Working |
Copy this into each feature flow:
## Testing
### Prerequisites
- [ ] Backend running at http://localhost:8000
- [ ] Frontend running at http://localhost
- [ ] Docker daemon running
- [ ] Redis running (for queue/credential features)
- [ ] Logged in (via email auth or admin login)
- [ ] Test agent created and running
### Test Steps
#### 1. [Feature Action Name]
**Action**:
- Step-by-step description
- Include specific URLs, buttons, inputs
**Expected**:
- What should happen immediately
- Any WebSocket updates
- Toast notifications
**Verify**:
- [ ] UI: Check specific element states
- [ ] API: `curl` command or browser DevTools
- [ ] Database: SQL query or API response
- [ ] Docker: Container state if applicable
#### 2. [Next Action]
...
### Edge Cases
- [ ] Invalid input: What error message?
- [ ] Unauthorized access: 403 response?
- [ ] Concurrent operations: Race conditions?
- [ ] Network failure: Graceful degradation?
### Cleanup
- [ ] Delete test data
- [ ] Reset state
- [ ] Verify no orphaned resources
**Last Tested**: YYYY-MM-DD
**Tested By**: claude / human
**Status**: ✅ All tests passed
**Issues**: None (or list issues found)## Testing
### Prerequisites
- [ ] Backend running at http://localhost:8000
- [ ] Frontend running at http://localhost
- [ ] Docker daemon running
- [ ] Logged in as test@ability.ai
### 1. Create Agent
**Action**:
- Navigate to http://localhost
- Click "Create Agent" button
- Enter name: "test-lifecycle"
- Select template: "local:default"
- Click "Create"
**Expected**:
- Agent appears in agent list
- Status shows "running"
- SSH port assigned (2290+)
- WebSocket broadcast received
**Verify**:
- [ ] UI shows agent card with correct name
- [ ] API: GET /api/agents includes agent
- [ ] Docker: `docker ps | grep test-lifecycle` shows container
- [ ] Database: agent_ownership record exists
- [ ] Container has correct labels
### 2. Start Agent
**Action**: Click "Start" button on stopped agent
**Expected**:
- Button shows loading spinner
- Status changes to "running"
- Trinity meta-prompt injected
**Verify**:
- [ ] UI shows "running" badge
- [ ] Docker: container status is "Up"
- [ ] Trinity injection: Agent has planning commands
### 3. Stop Agent
**Action**: Click "Stop" button
**Expected**:
- Status changes to "stopped"
- Container stops but isn't removed
### 4. Delete Agent
**Action**: Click trash icon, confirm deletion
**Expected**:
- Agent removed from list
- Container deleted
- All resources cleaned up
**Edge Cases**:
- [ ] Duplicate name (should fail with 400)
- [ ] Unauthorized delete (should fail with 403)
- [ ] Start already running agent (idempotent)
**Cleanup**:
- [ ] Delete test-lifecycle if exists
- [ ] `docker ps -a | grep test-` - verify no orphans
**Last Tested**: 2025-12-07
**Status**: ✅ All tests passed- Read feature flow: Understand what to build
- Implement the feature: Write the code
- Follow testing instructions: Execute each test step
- Document results: Update "Last Tested" and "Status"
- Report issues: If anything fails, document in "Issues"
- Update feature flow: Document the changes
- Update testing section: Add new test steps
- Run all tests: Ensure nothing broke
- Update timestamp: Mark as tested
- Read testing section: See how feature should work
- Reproduce issue: Follow test steps
- Identify failure point: Find where actual ≠ expected
- Fix and retest: Follow all steps again
Before marking a feature as ✅ Implemented:
- Feature flow document exists
- Testing section completed with instructions
- All test steps executed successfully
- Edge cases tested
- "Last Tested" timestamp updated
- "Status" marked as ✅ Working
- Changelog entry added
- Requirements.md updated if new feature
Add automated tests only when:
- Feature broke in production - Prevent regression
- Critical user path - Must never break (auth, agent creation)
- Complex edge cases - Hard to test manually every time
- API contract - External integrations that need stability
How to add:
- Create
tests/integration/test_{feature}.py - Link to feature flow in header comment
- Focus on the specific scenario that needs automation
- Keep tests simple and focused
-
Python
requestslibraryimport requests token = 'eyJhbGc...' headers = {'Authorization': f'Bearer {token}'} response = requests.get('http://localhost:8000/api/agents', headers=headers)
-
Browser DevTools - Best for integration testing user flows
- Network tab: Monitor API calls, check request/response
- WebSocket frames: Verify real-time updates
- Application tab: Check localStorage persistence
-
Postman/Insomnia - GUI tools for manual API testing
curl with bash variables - Tokens can be truncated due to shell escaping:
# DON'T DO THIS - token may be truncated
TOKEN='eyJhbGc...'
curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/api/agentsWhy it fails:
- Bash variable expansion with special characters
- JWT tokens contain
-,_,.which can cause issues - Use Python or save token to file instead
If you get 401 Unauthorized:
- ✅ Test with Python
requestsfirst (rules out client issues) - ✅ Check backend logs for actual error
- ✅ Verify token expiration (
expclaim) - ❌ Don't assume it's a backend bug - verify with multiple clients
Many Trinity features use WebSocket for real-time updates:
- Browser DevTools → Network → WS
- Click the WebSocket connection (
ws://localhost:8000/ws) - View Messages tab for incoming events
| Event | Trigger | Payload |
|---|---|---|
agent_created |
Create agent | {name, type, status} |
agent_started |
Start agent | {name, trinity_injection} |
agent_stopped |
Stop agent | {name} |
agent_deleted |
Delete agent | {name} |
agent_collaboration |
Agent-to-agent chat | {source_agent, target_agent} |
agent_activity |
Tool calls, chat | {agent_name, activity_type} |
- Open two browser tabs
- Trigger action in Tab 1
- Verify Tab 2 receives WebSocket update
- Check DevTools for event payload
Many features interact with Docker containers:
# List Trinity agents
docker ps --filter "label=trinity.platform=agent"
# Check agent container
docker inspect agent-{name} | grep -E '"Status"|"Running"'
# View agent logs
docker logs agent-{name} --tail 50
# Check container labels
docker inspect agent-{name} --format '{{json .Config.Labels}}' | jq
# Execute command in agent
docker exec agent-{name} ls -la /home/developer/# Agent running?
docker ps | grep agent-{name}
# Agent has Trinity injection?
docker exec agent-{name} ls -la /home/developer/.trinity/
# Agent has planning commands?
docker exec agent-{name} ls -la /home/developer/.claude/commands/trinity/Most database state can be verified via API:
# Agent ownership
GET /api/agents/{name} # Returns owner info
# Chat sessions
GET /api/agents/{name}/chat/sessions
# Activities
GET /api/activities/timeline?activity_types=agent_collaboration# Connect to database
sqlite3 ~/trinity-data/trinity.db
# Check agent ownership
SELECT * FROM agent_ownership WHERE agent_name = 'test-agent';
# Check activities
SELECT * FROM agent_activities ORDER BY created_at DESC LIMIT 10;
# Check chat sessions
SELECT * FROM chat_sessions WHERE agent_name = 'test-agent';✅ Working - All tests pass, feature works as documented
| Feature | Flow Doc | Last Tested | Status |
|---|---|---|---|
| Agent Create/Start/Stop/Delete | agent-lifecycle.md | 2025-12-07 | ✅ |
| Agent Terminal | agent-terminal.md | 2025-12-25 | ✅ |
| Agent Network Dashboard | agent-network.md | 2025-12-07 | ✅ |
| Execution Queue | execution-queue.md | 2025-12-06 | ✅ |
| Activity Stream | activity-stream.md | 2025-12-02 | ✅ |
| Agent Sharing | agent-sharing.md | 2025-11-28 | ✅ |
| Scheduling | scheduling.md | 2025-11-28 | ✅ |
| File Browser | file-browser.md | 2025-12-01 | ✅ |
| Credential Injection | credential-injection.md | 2025-12-01 | ✅ |
| MCP Orchestration | mcp-orchestration.md | 2025-11-27 | ✅ |
| Email Authentication | email-authentication.md | 2025-12-26 | ✅ |
| GitHub Sync | github-sync.md | 2025-11-29 | ✅ |
| Agent Replay Mode | agent-network-replay-mode.md | 2025-12-02 | ✅ |
| Agents Page UI | agents-page-ui-improvements.md | 2025-12-07 | ✅ |
| System Settings | system-wide-trinity-prompt.md | 2025-12-14 | ✅ 19/19 |
- Before implementing: Read the feature flow document
- During development: Use TodoWrite to track test steps
- After implementing: Execute all tests, update timestamps
- On issues: Document in flow, create bug fix task
Located in tests/ directory. Run with pytest:
cd tests
source .venv/bin/activate
python -m pytest -v --tb=shortLatest Results (2025-12-09):
- 142 tests collected
- 110 passed (77.5%)
- 25 skipped (agent-server direct tests - require running agent)
- 0 failures ✅
Test Categories:
| Category | Tests | Status |
|---|---|---|
| Authentication | 12 | ✅ All pass |
| Agent Lifecycle | 21 | ✅ All pass |
| Agent Chat | 11 | ✅ All pass |
| Agent Files | 8 | ✅ All pass |
| Agent Sharing | 7 | ✅ All pass |
| Credentials | 11 | ✅ All pass |
| Execution Queue | 6 | ✅ All pass |
| MCP Keys | 8 | ✅ All pass |
| Schedules | 9 | ✅ All pass |
| Templates | 7 | ✅ All pass |
| Git Sync | 6 | ✅ All pass |
Reports:
- HTML Report:
tests/reports/test-report.html - Coverage Report:
tests/reports/coverage/index.html
Approach: Manual testing via documented instructions > Automated tests Principle: Load context first, test thoroughly, then mark complete