Current state: prototype — works but fragile. All trust is implicit, all files are world-writable, no validation. Fine for a rental box experiment, not for production or untrusted agents.
| Threat | Severity | Current State |
|---|---|---|
| Concurrent write corruption | HIGH | No file locking on results.tsv |
| best/config race condition | HIGH | No read/write lock |
| Score self-reporting | MEDIUM | Agents report their own scores |
| Blackboard injection | MEDIUM | No message validation |
| No audit trail | MEDIUM | Append-only by convention only |
| Stale prompt persistence | HIGH (proven) | Prompts written once at launch |
# Replace: echo "$result" >> results.tsv
flock /tmp/results.lock -c "echo '$result' >> results.tsv"Harness should write score to a machine-readable file. Agent reads the file, doesn't parse logs.
chattr +a results.tsv # Linux onlyOnly allow CLAIM, RESPONSE, REFUTE, REQUEST prefixes. Reject anything else.
Read prompt from shared location every round instead of static .agent-prompt.txt.
Watchdog that alerts if any agent hasn't written to run.log in >10 minutes.
If agents were different models or adversarial:
- Cryptographic signing of results
- Consensus on best/ updates
- Rate limiting on blackboard posts
- Isolated containers per agent
- Human-in-the-loop for best/ updates
See docs/ARCHITECTURE.md for the full security analysis.