AnimaWorks runs autonomous AI agents with tool access, persistent memory, and inter-agent communication. This creates a fundamentally different threat surface than stateless LLM wrappers — agents can read files, execute commands, send messages, and operate on schedules without human intervention.
This document describes the layered security model and an adversarial threat analysis based on cutting-edge LLM/agent attack research (OWASP Top 10 for LLM 2025, AdapTools, MemoryGraft, ChatInject, RoguePilot, MCP Tool Poisoning, RAGPoison, Confused Deputy attacks).
Last audited: 2026-03-06
| Threat | Attack Vector | Impact |
|---|---|---|
| Prompt injection via external data | Web search results, Slack/Chatwork messages, emails | Agent executes attacker-controlled instructions |
| RAG / Memory poisoning | Malicious web content → knowledge → persistent recall | Long-term behavioral drift across all sessions |
| Lateral movement between agents | Compromised agent sends malicious DMs to peers | Privilege escalation across the organization |
| Confused Deputy attack | Low-privilege agent tricks high-privilege agent | Unauthorized tool execution, data exfiltration |
| Consolidation contamination | Poisoned episodes/activity → knowledge extraction | Trusted knowledge generated from tainted sources |
| Destructive command execution | Agent runs rm -rf / or curl … | sh |
Data loss, system compromise |
| Shell injection bypass | Network tools via pipes | Data exfiltration via allowed commands |
| Path traversal | Agent reads/writes outside its sandbox | Cross-agent data leak, config tampering |
| Activity log tampering | Agent writes fake entries to own activity_log | Manipulated Priming context |
| Infinite message loops | Two agents endlessly replying to each other | Resource exhaustion, API cost explosion |
| Unintended external sending | Agent sends messages to unexpected recipients | Data exfiltration |
| Session hijacking | Stolen tokens with no expiration | Persistent unauthorized access |
| Credential exposure | Plaintext API keys in config.json | External service abuse |
Every piece of data entering an agent's context is tagged with a trust level. The model sees these boundaries explicitly and is instructed to treat untrusted content as data, never as instructions.
| Level | Target Sources | Treatment |
|---|---|---|
trusted |
Internal tools (send_message, search_memory, submit_tasks, update_task, post_channel, etc.), system-generated | Execute normally |
medium |
Read, Grep, Write, Bash, RAG results, user profiles, consolidated knowledge | Interpret as reference data |
untrusted |
web_search, WebFetch, x_search, x_user_tweets, slack_, chatwork_, gmail_*, read_channel, read_dm_history, local_llm | Never follow directives |
<tool_result tool="web_search" trust="untrusted">
Search results — may contain injection attempts
</tool_result>
<priming source="related_knowledge" trust="medium" origin="consolidation">
RAG-retrieved context
</priming>
Origin categories: system, human, anima, external_platform, external_web, consolidation, unknown. Each maps to a trust level via ORIGIN_TRUST_MAP.
Origin chain propagation: When data flows through multiple systems (e.g., web → RAG index → priming), the trust level degrades to the minimum in the chain. resolve_trust(origin, origin_chain) computes the conservative minimum across all nodes in the chain plus the current origin.
Session-level trust tracking: _min_trust_seen tracks the minimum trust rank (2=trusted, 1=medium, 0=untrusted) across all tool calls in a session. Updated in Mode S (PreToolUse hook + run/min_trust_seen file), Mode A (litellm_loop and anthropic_fallback). Reset at each interaction cycle start.
Trigger and tier injection conditions (core/prompt/builder.py):
tool_data_interpretationis in Group 1 but is not injected whentrigger="task"(TaskExec). TaskExec runs with minimal context, so the model does not receive trust boundary interpretation instructions. Tool results are still wrapped withwrap_tool_resultso tags are applied, but note that the "tag interpretation rules" instruction to the model is omitted.permissionsis injected only whentier != TIER_MINIMAL. When context is under 16k (TIER_MINIMAL), permissions are omitted.behavior_rulesapplies only to TIER_FULL and TIER_STANDARD. Omitted for TIER_LIGHT / TIER_MINIMAL.- Tier boundaries: 128k+ = FULL, 32k–128k = STANDARD, 16k–32k = LIGHT, under 16k = MINIMAL.
Key files: core/execution/_sanitize.py (trust resolution, boundary wrapping, TOOL_TRUST_LEVELS, ORIGIN_TRUST_MAP), core/prompt/builder.py (trigger/tier prompt construction, tool_data_interpretation injection conditions), templates/{locale}/prompts/tool_data_interpretation.md (trust level and origin chain interpretation instructions; locale depends on config.locale)
When an agent writes to knowledge/*.md, the system checks _min_trust_seen for the session. If the session has processed untrusted (rank 0) or medium (rank 1) tool results, an origin frontmatter is added:
- Rank 0 (untrusted) →
origin: external_web - Rank 1 (medium) →
origin: mixed - Rank 2 (trusted) → no origin tag (clean knowledge)
The origin is passed to the RAG indexer and stored in ChromaDB chunk metadata.
index_file() accepts an origin parameter and stores it as metadata["origin"] in chunk metadata.
When Priming retrieves related knowledge via RAG, each chunk's origin metadata is evaluated with resolve_trust(). Chunks are split into:
- trusted/medium →
related_knowledge(wrapped withtrust="medium") - untrusted →
related_knowledge_external(wrapped withtrust="untrusted",origin="external_platform")
Budget prioritizes trusted/medium content first; untrusted content fills remaining budget.
Daily consolidation reads YAML frontmatter origin: from source knowledge files. If any source has external origin (external_web, mixed, consolidation_external), the consolidated output is downgraded to origin: consolidation_external (resolves to untrusted).
Key files: core/tooling/handler_memory.py (write_memory_file origin propagation), core/memory/rag/indexer.py (origin in chunk metadata), core/memory/priming.py (Channel C trust splitting), core/memory/consolidation.py (origin chain tracking)
Agents can execute shell commands. Five independent layers prevent abuse:
Blocks shell metacharacters that could chain or inject commands:
- Semicolons (
;), backticks (`), newlines (\n) - Command substitution (
$(),${},$VAR)
Pattern-matched commands that are always blocked regardless of permissions:
| Pattern | Reason |
|---|---|
rm -rf, rm -r |
Recursive deletion |
mkfs |
Filesystem creation |
dd of=/dev/ |
Direct disk write |
curl|sh, wget|sh |
Remote code execution |
| sh, | bash, | python, | perl, | ruby, | node |
Pipe to interpreter |
nc, ncat, socat, telnet |
Network exfiltration tools |
curl -d/-F/-T, curl --data, wget --post |
Data upload / exfiltration |
chmod *7* |
World-writable permissions |
shutdown, reboot |
System shutdown |
> /dev/sd*, > /dev/nvme*, > /etc/ |
Device/system file redirect |
Each agent's permissions.json can define commands.denied_commands for additional blocked commands.
permissions.json uses an "Open by Default, Deny by Exception" model. When commands.allow_all is true (default), all commands are permitted except those in denied_commands and the hardcoded blocklist. When false, only commands in commands.allowlist are permitted.
When commands.allow_all is false, only commands matching the agent's allowlist are permitted.
Command arguments are checked for path traversal patterns (../).
Pipeline segment checking: Each segment of piped commands is checked independently.
Key files: core/tooling/handler_base.py (_BLOCKED_CMD_PATTERNS, _INJECTION_RE), core/tooling/handler_perms.py (_check_command_permission)
Each agent operates within its own directory (~/.animaworks/animas/{name}/). File access is controlled by file_roots in permissions.json — default ["/"] grants full access within the anima directory; restricted roots limit writable paths. Mode C (Codex) sandbox adapts dynamically: file_roots: ["/"] → danger-full-access; restricted → workspace-write with dynamic writable_roots.
These cannot be written by the agent that owns them:
permissions.json— Pydantic-validated tool, file, and command permissions (replaces legacypermissions.md)permissions.md— Legacy file; protected when present (auto-migrated to JSON)identity.md— Core personality (immutable baseline)bootstrap.md— First-run instructionsactivity_log/— Activity log directory; onlyActivityLogger(code-level) may append entries
| Path | Direct Report | All Descendants |
|---|---|---|
activity_log/ |
Read | Read |
state/current_state.md, pending.md |
— | Read |
state/task_queue.jsonl, pending/ |
— | Read |
status.json |
Read/Write | Read |
identity.md |
— | Read |
injection.md |
Read/Write | Read |
cron.md, heartbeat.md |
Read/Write | — |
Descendant resolution uses BFS with cycle detection. Peers (same supervisor) can read each other's activity_log/.
Key files: core/tooling/handler_base.py (_PROTECTED_FILES, _PROTECTED_DIRS, _is_protected_write), core/tooling/handler_perms.py (_check_file_permission)
Each agent runs as an independent OS process:
- Process isolation: Crash in one agent doesn't affect others
- Unix Domain Socket IPC: Inter-process communication over filesystem sockets (no TCP)
- Independent locks: Chat, Inbox, and background tasks use separate asyncio locks
- Socket directory:
~/.animaworks/run/sockets/{name}.sockwith stale socket cleanup on startup
Key files: core/supervisor/manager.py, core/supervisor/ipc.py, core/supervisor/runner.py
- No duplicate DM to the same recipient
- Max 2 distinct DM recipients per execution
- One channel post per channel per session
- Cross-session channel post cooldown (
channel_post_cooldown_s) - Persisted to
run/replied_to.jsonl
- Configurable per-agent send limits (hourly and daily)
- Computed from
activity_logsliding window ack,error,system_alertmessages are exempt
Recent outbound messages (last 2 hours, max 3) are injected into the system prompt via Priming.
- Conversation depth limiter: Configurable max turns within
depth_window_s - Inbox rate limiter: Cooldown, cascade detection, per-sender rate limit
- Fail-closed: Returns
Falseon activity log read failure
Key files: core/tooling/handler_comms.py, core/cascade_limiter.py, core/supervisor/inbox_rate_limiter.py, core/memory/priming.py
| Mode | Use Case |
|---|---|
local_trust |
Development — localhost requests bypass auth |
password |
Single-user password protection |
multi_user |
Multiple users with individual accounts |
- Argon2id password hashing (memory-hard, side-channel resistant)
- 48-byte URL-safe tokens (cryptographically random)
- Max 10 sessions per user — oldest evicted on overflow
- Session TTL —
config.server.session_ttl_days(default: 7 days). Expired sessions are rejected and removed invalidate_session(). - Password change revokes sessions —
change_password()callsrevoke_all_sessions()to invalidate all sessions - Cookie-based session transport with middleware guard on
/api/and/wsroutes - Config files saved with 0600 permissions
When trust_localhost is enabled, requests from loopback addresses are authenticated automatically. Origin and Host header checks mitigate CSRF.
Key files: core/auth/manager.py, server/app.py, server/localhost.py
| Platform | Method | Replay Protection |
|---|---|---|
| Slack | HMAC-SHA256 (signing secret) | Timestamp check (5-minute window) |
| Chatwork | HMAC-SHA256 (webhook token) | — |
Both use constant-time comparison (hmac.compare_digest).
Key file: server/routes/webhooks.py
The media proxy (/api/media/proxy) fetches external images for display in the UI:
- HTTPS only
- Domain allowlist or open-with-scan — configurable via
MediaProxyConfig.mode - Private IP blocking — localhost, RFC 1918, link-local, multicast, reserved
- DNS resolution check — prevents DNS rebinding
- Content-Type validation — only
image/jpeg,image/png,image/gif,image/webp; SVG blocked - Magic bytes verification — validates actual file format matches declared content-type
- Size limit —
max_bytes(default 5 MB) - Redirect validation — redirect targets re-validated; max redirect count enforced
- Per-IP rate limiting — configurable (default 30 req/min)
- Security headers —
X-Content-Type-Options: nosniff
Key file: server/routes/media_proxy.py
When running on Claude Agent SDK (Mode S), additional guardrails apply via PreToolUse hooks:
- Bash command filtering: Separate blocklist for SDK (includes Chatwork CLI bypass prevention, network exfiltration tools, data upload patterns)
- File write protection: Validates against protected file list and sandbox
- File read restriction: Blocks access to other agents' directories (except subordinate/peer activity_log, subordinate management files)
- Output truncation: Bash output capped at 10KB; file reads default-limited to 500 lines; grep/glob also limited
- Trust tracking:
_SDK_TOOL_TRUSTmapping; persisted torun/min_trust_seen
Key files: core/execution/_sdk_security.py, core/execution/_sdk_hooks.py
resolve_recipient() prevents agents from sending to unintended recipients:
- Exact match against known agent names (case-sensitive)
- User alias lookup (case-insensitive)
- Platform-prefixed recipients
- Slack User ID pattern match
- Case-insensitive agent name match
- Unknown recipients → RecipientNotFoundError (fail-closed)
Key file: core/outbound.py
DMs carry origin_chain metadata, built by build_outgoing_origin_chain(). Receivers can evaluate the trust lineage of messages.
Messenger.receive() validates from_person against known_animas (config.animas). Unknown from_person is rejected and logged.
Inbox directories are created with 0o700 permissions.
_SAFE_NAME_RE = re.compile(r"^[a-z][a-z0-9_-]{0,30}$") prevents path traversal.
Channel posts are limited to max_length=10000 via Pydantic.
Key files: core/messenger.py, core/tooling/handler_comms.py, core/tooling/handler_base.py
Vulnerabilities identified in the initial audit that have been addressed:
| ID | Severity | Title | Resolution |
|---|---|---|---|
| RAG-1 | Critical → Mitigated | Web → Knowledge → RAG Persistent Poisoning | write_memory_file propagates _min_trust_seen as origin frontmatter; RAG indexer stores origin in chunk metadata; Priming Channel C splits trusted/untrusted |
| CON-1 | High → Mitigated | Consolidation Pipeline Contamination | _has_external_origin_in_files() checks source file origins; output downgraded to consolidation_external when external origin present |
| MSG-1 | High → Mitigated | Inbox File-Level Spoofing | from_person validated against known_animas; inbox dirs set to 0o700 |
| BOARD-1 | High → Mitigated | Board Channel Broadcast Poisoning | Auth middleware protects channel POST; content limited to 10,000 chars; channel name regex validation |
| ALOG-1 | High → Resolved | Activity Log Tampering | activity_log/ in _PROTECTED_DIRS; writes blocked via _is_protected_write |
| CMD-1 | High → Resolved | Shell Mode Network Exfiltration | nc, ncat, socat, telnet, curl -d/--data, wget --post added to blocklist |
| AUTH-1 | High → Resolved | Perpetual Session Tokens | TTL check in validate_session() (default 7 days); change_password() calls revoke_all_sessions() |
| DEPUTY-1 | Medium → Mitigated | Confused Deputy Privilege Escalation | origin_chain metadata in messages; from_person validation; trust boundary instructions in tool_data_interpretation |
| ID | Category | Title | Status |
|---|---|---|---|
| CFG-1 | Config | Plaintext Credential Storage | Partial (per-tool env_var fallback exists; no env-only mode in CredentialConfig) |
| ID | Category | Title | Status |
|---|---|---|---|
| IPC-1 | Network | Socket File Permissions | Not implemented (no chmod 0o700 on Unix sockets) |
| WS-1 | Network | Voice WebSocket Audio Injection | Partial (60s buffer max; no max frame size or PCM format validation) |
| OB-1 | Rate Limit | Multi-Agent Distributed Spam | Not implemented (per-sender rate limit only; no per-recipient aggregate) |
| PR-1 | Memory | PageRank Graph Manipulation | Not implemented (no trust-weighted PageRank) |
| SKILL-1 | Memory | Skill Description Keyword Stuffing | Not implemented (no mitigation in 3-tier matching) |
| PI-1 | Prompt | Tool Trust Level Registration Gap | Not implemented (unlisted tools fall back to untrusted; no CI check) |
| CMD-2 | Execution | Denied List Partial Match Bypass | Not implemented (substring matching; no shutil.which() resolution) |
| EXT-1 | External | Indirect Injection via External Sources | Mitigated by trust labeling; no additional regex filter |
| LEAK-1 | Info Disclosure | System Prompt Leakage | Partial (trust rules exist; no explicit anti-leak instruction) |
| ID | Category | Title | Status |
|---|---|---|---|
| AUTH-2 | Auth | Localhost Trust Over-Permission | Not implemented (no X-Forwarded-For support) |
| FILE-1 | File | Symlink Following in allowed_dirs | Not implemented (uses resolve(); no strict symlink rejection) |
| WS-2 | Network | WebSocket JSON Schema Laxity | Not implemented (no Pydantic validation for voice WebSocket JSON) |
| OB-2 | Rate Limit | Activity Log Write Bypass | Not implemented (send does not depend on activity log success) |
| ACCESS-1 | Memory | RAG Access Count Inflation | Not implemented (no access_count cap) |
┌─────────────────────────────────────────────────────────┐
│ External Data │
│ (Web, Slack, email, Board, DM, etc.) │
└────────────────────────┬────────────────────────────────┘
│
┌──────────▼──────────┐
│ Trust Boundary │ ← untrusted/medium/trusted tags
│ Labeling │ ← origin chain propagation
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Auth & Session │ ← Argon2id, TTL-enforced sessions
│ Management │ ← Webhook HMAC verification
└──────────┬──────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌────▼────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Command │ │ File Access │ │ Outbound │
│ Security│ │ Control │ │ Rate Limit │
│ (5-layer│ │ (sandbox + │ │ (3-layer + │
│ check) │ │ ACL) │ │ cascade) │
└────┬────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────┐ │ ┌────────────────┘
│ │ │
┌──────▼──▼──▼────────┐
│ Memory Provenance │ ← origin tracking in RAG/knowledge
│ (trust chain) │ ← Channel C trust splitting
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Process Isolation │ ← per-agent OS process
│ (Unix sockets) │ ← independent locks
└─────────────────────┘
Each layer operates independently. A failure in one layer is caught by others.
| Priority | ID | Action | Effort |
|---|---|---|---|
| 1 | IPC-1 | chmod 0o700 on socket files and run/ directory |
XS |
| 2 | PI-1 | CI check for tool trust level registration completeness | XS |
| 3 | ACCESS-1 | Access count cap + per-session deduplication | XS |
| Priority | ID | Action | Effort |
|---|---|---|---|
| 4 | CFG-1 | Env-var-only credential mode; agent-unreadable paths for config.json |
M |
| 5 | WS-1 | Max frame size + PCM format validation | S |
| 6 | OB-1 | Per-recipient rate limit across all agents | S |
| 7 | LEAK-1 | Anti-leak instruction in system prompt; output monitoring | S |
| 8 | CMD-2 | shutil.which() resolution + basename comparison |
S |
| Priority | ID | Action | Effort |
|---|---|---|---|
| 9 | PR-1 | Trust-weighted PageRank | M |
| 10 | EXT-1 | Injection pattern regex filter on external data | M |
| 11 | AUTH-2 | Reverse proxy guidance; X-Forwarded-For support |
S |
| 12 | ALOG+ | Append-only hash chain for activity log | M |
| 13 | MSG+ | HMAC message signing between agents | L |
Effort scale: XS = less than 1 hour, S = 1–4 hours, M = 4–16 hours, L = more than 16 hours
| Document | Description |
|---|---|
| Provenance Foundation | Trust resolution and origin categories |
| Input Boundary Labeling | Tool result and priming trust tagging |
| Trust Propagation | Origin chain across data flows |
| RAG Provenance | Trust tracking in vector search |
| Mode S Trust | Agent SDK security hooks |
| Command Injection Fix | Pipe and newline injection |
| Path Traversal Fix | common_knowledge and create_anima path validation |
| Memory Write Security | Protected files and cross-mode hardening |