Skip to content

Commit eb1a7ff

Browse files
committed
Add process tests
1 parent 4b94b36 commit eb1a7ff

16 files changed

Lines changed: 6221 additions & 3 deletions
Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
# AI Agent — LLM Programming Guidelines
2+
3+
Supplement to `CCTBX_LLM_PROGRAMMING_GUIDELINES.md`.
4+
This document covers patterns and pitfalls specific to
5+
the AI Agent codebase (`agent/`, `knowledge/`,
6+
`programs/ai_agent.py`). Read the CCTBX guide first —
7+
everything there applies here too.
8+
9+
---
10+
11+
## 1. Parameter Verification Against programs.yaml
12+
13+
LLMs frequently hallucinate PHENIX command-line
14+
arguments. The agent has a specific defense: the
15+
command sanitizer strips any parameter not in the
16+
program's `strategy_flags` allowlist in
17+
`knowledge/programs.yaml`.
18+
19+
When generating or modifying agent code that builds
20+
commands:
21+
22+
- Check `programs.yaml` for the program's
23+
`strategy_flags` section. If a parameter is not
24+
listed, the sanitizer will strip it — your code
25+
will silently have no effect.
26+
- If you need a new strategy flag, add it to the
27+
YAML first, then use it in code.
28+
- The YAML also defines `inputs` (required/optional
29+
file slots), `invariants` (auto-fill rules), and
30+
`defaults` (always appended). Understand all four
31+
sections before modifying command building.
32+
33+
Example: to add `rebuild_in_place=False` for
34+
autobuild, you must first verify that
35+
`rebuild_in_place` is in autobuild's `strategy_flags`.
36+
If it isn't, the sanitizer strips it and the parameter
37+
never reaches PHENIX.
38+
39+
---
40+
41+
## 2. Logging Conventions
42+
43+
The codebase uses three logging patterns in different
44+
contexts. Don't mix them.
45+
46+
**`agent/` and `knowledge/` modules:**
47+
Use `logging.getLogger(__name__)`:
48+
49+
```python
50+
logger = logging.getLogger(__name__)
51+
logger.debug("update failed", exc_info=True)
52+
```
53+
54+
**`programs/ai_agent.py`:**
55+
Use `self.vlog` at the appropriate verbosity level:
56+
57+
```python
58+
self.vlog.quiet("Fatal: %s" % error) # always
59+
self.vlog.normal("Decision: %s" % prog) # default
60+
self.vlog.verbose("File list: %s" % f) # detail
61+
```
62+
63+
**Test files:**
64+
Use `print()`:
65+
66+
```python
67+
print(" PASS: test_name")
68+
print(" SKIP (ai_agent.py not found)")
69+
```
70+
71+
No `print()` in `agent/` modules. No `logger` in
72+
test files.
73+
74+
---
75+
76+
## 3. Import Fallbacks
77+
78+
The agent code runs in two contexts: inside PHENIX
79+
(where modules live under `libtbx.langchain`) and
80+
standalone (where they're imported directly). All
81+
cross-module imports in `agent/` and `knowledge/`
82+
must have a fallback:
83+
84+
```python
85+
try:
86+
from libtbx.langchain.agent.structure_model \
87+
import StructureModel
88+
except ImportError:
89+
from agent.structure_model import StructureModel
90+
```
91+
92+
Never use bare `from libtbx.langchain.X import Y`
93+
without a fallback `except ImportError` block.
94+
95+
This is different from the general cctbx pattern
96+
(where optional imports set the name to `None`).
97+
In the agent, both paths must resolve to the same
98+
class — the code that follows uses it unconditionally.
99+
100+
---
101+
102+
## 4. Analysis Mode Routing
103+
104+
`ai_analysis.py` has five `analysis_mode` values, but
105+
the AI Agent only uses four of them. Understanding
106+
which modes need the server is critical:
107+
108+
| Mode | Used by | Needs server |
109+
|------|---------|:------------:|
110+
| `standard` | `phenix.ai_analysis` (standalone) | **Yes** (RAG DB) |
111+
| `directive_extraction` | Agent (session start) | No |
112+
| `advice_preprocessing` | Agent (session start) | No |
113+
| `failure_diagnosis` | Agent (on terminal error) | No |
114+
| `agent_session` | Agent (session end) | No |
115+
116+
The agent **never uses `standard` mode**. That mode is
117+
the standalone `phenix.ai_analysis` program which does
118+
retrieval-augmented generation against the Phenix
119+
knowledge base.
120+
121+
The four agent modes are pure LLM calls routed locally
122+
when `run_on_server=False` or `provider=ollama`. If you
123+
add a new analysis mode, decide whether it needs the
124+
RAG database and add it to `_LLM_ONLY_MODES` in
125+
`run_job_on_server_or_locally()` if it doesn't.
126+
127+
The agent's THINK node does its own log analysis via
128+
`thinking_prompts.py` and the expert knowledge base —
129+
it does not go through `ai_analysis.py` at all.
130+
131+
---
132+
133+
## 5. Three Error Classification Systems
134+
135+
Error handling is split across three independent
136+
systems. When adding a new error pattern, you must
137+
check all three to ensure they agree:
138+
139+
1. **`agent/error_classifier.py`** — called by
140+
PERCEIVE at the start of the next cycle. Five
141+
categories: TERMINAL, PHIL_ERROR, AMBIGUOUS_PHIL,
142+
LABEL_ERROR, RETRYABLE. Feeds `should_pivot()`.
143+
144+
2. **`agent/error_analyzer.py`** — YAML-driven.
145+
`recoverable_errors.yaml` for auto-fixable errors
146+
(ambiguous labels). `diagnosable_errors.yaml` for
147+
terminal errors needing LLM diagnosis.
148+
149+
3. **`programs/ai_agent.py::_classify_error()`**
150+
oldest classifier. Maps to INPUT_ERROR (agent's
151+
fault, don't count) or REAL_FAILURE (count it).
152+
Used only for cycle history bookkeeping.
153+
154+
See ARCHITECTURE.md "Design tensions" for the full
155+
overlap analysis and recommended consolidation path.
156+
157+
---
158+
159+
## 6. Client-Server Code Path Awareness
160+
161+
Changes to files in `agent/` and `knowledge/` are
162+
server-side — they take effect immediately for all
163+
users. Changes to `programs/ai_agent.py` are
164+
client-side — users must update their install.
165+
166+
When modifying `programs/ai_agent.py`, identify
167+
whether your change is in the **top half** (client
168+
code: `run()`, `_run_single_cycle()`,
169+
`_inject_user_params()`) or the **bottom half**
170+
(server code called when `run_on_server=False`).
171+
Top-half changes require user updates; bottom-half
172+
changes are effectively server-side.
173+
174+
See ARCHITECTURE.md "Client-Server Update Model"
175+
for the full execution split diagram.
176+
177+
---
178+
179+
## 7. Session State Persistence
180+
181+
Every piece of state the agent needs across cycles
182+
must appear in three places:
183+
184+
1. Written to `session.data["key"]` after the cycle
185+
2. Read from `session.data["key"]` on resume
186+
3. Included in `create_initial_state()` if passed
187+
through the graph
188+
189+
When you add a new state field, grep for all three
190+
patterns and verify the field appears in each. Also
191+
register new `session_info` fields in
192+
`agent/contract.py` with a default value.
193+
194+
---
195+
196+
## 8. CHANGELOG Format
197+
198+
CHANGELOG entries for the agent follow a specific
199+
format:
200+
201+
```
202+
## Version NNN.NN (Short Title)
203+
204+
### Summary
205+
One paragraph describing the change.
206+
207+
### New files (N)
208+
| File | Lines | Purpose |
209+
|------|-------|---------|
210+
| `path/to/file.py` | 500 | Description |
211+
212+
### Modified files (N)
213+
| File | Changes |
214+
|------|---------|
215+
| `path/to/file.py` | What changed |
216+
217+
### Tests
218+
N new tests.
219+
Run with: `python3 tests/run_all_tests.py`
220+
```
221+
222+
See `docs/project/CHANGELOG.md` for examples.
223+
224+
---
225+
226+
## 9. Test Patterns Specific to the Agent
227+
228+
### File-exists guards
229+
230+
Tests that open files outside `agent/` and
231+
`knowledge/` (e.g., `programs/ai_agent.py`,
232+
`wxGUI2/Programs/AIAgent.py`) MUST guard with
233+
`os.path.isfile()` because these directories may
234+
not be present in all deployment contexts:
235+
236+
```python
237+
def test_something_in_ai_agent():
238+
path = os.path.join(PROJECT_ROOT,
239+
"programs", "ai_agent.py")
240+
if not os.path.isfile(path):
241+
print(" SKIP (ai_agent.py not found)")
242+
return
243+
with open(path) as f:
244+
source = f.read()
245+
assert_in("expected_string", source)
246+
```
247+
248+
Files in `agent/` and `knowledge/` are always
249+
present — no guard needed.
250+
251+
### Source-scanning search windows
252+
253+
Tests that scan source code for specific patterns
254+
must account for function length:
255+
256+
```python
257+
# WRONG — function may be longer than 5000 chars
258+
idx = src.find("def my_function")
259+
assert "expected" in src[idx:idx + 5000]
260+
261+
# RIGHT — search to the next function definition
262+
idx = src.find("def my_function")
263+
next_def = src.find("\ndef ", idx + 1)
264+
assert "expected" in src[idx:next_def]
265+
```
266+
267+
### Don't swallow test failures
268+
269+
`run_tests_with_fail_fast()` raises `AssertionError`
270+
on failure. If your `run_all_tests()` wraps it in a
271+
try/except, failures are silently swallowed:
272+
273+
```python
274+
# WRONG — test failures discarded
275+
def run_all_tests():
276+
try:
277+
run_tests_with_fail_fast()
278+
except Exception:
279+
pass
280+
281+
# RIGHT — let failures propagate
282+
def run_all_tests():
283+
run_tests_with_fail_fast()
284+
```
285+
286+
---
287+
288+
## 10. Checklist (Agent-Specific Additions)
289+
290+
In addition to the CCTBX checklist:
291+
292+
- [ ] New strategy flags added to `programs.yaml`
293+
before use in command-building code
294+
- [ ] New error patterns checked against all three
295+
classification systems
296+
- [ ] New `analysis_mode` values added to
297+
`_LLM_ONLY_MODES` if they don't need the RAG DB
298+
- [ ] Client vs server code path identified for any
299+
changes to `programs/ai_agent.py`
300+
- [ ] Session state fields appear in `session.data`,
301+
`create_initial_state()`, and `contract.py`
302+
- [ ] File-exists guards for tests opening files
303+
outside `agent/` or `knowledge/`
304+
- [ ] Source-scanning tests use function-boundary
305+
windows, not fixed character counts
306+
- [ ] CHANGELOG entry follows the standard table
307+
format
308+
- [ ] Documentation updated (OVERVIEW.md,
309+
ARCHITECTURE.md, DEVELOPER_GUIDE.md) for
310+
user-visible or architectural changes

0 commit comments

Comments
 (0)