security: Phase B/C top-10 remediation + full re-audit wave (1 critical + highs) by AVADSA25 · Pull Request #154 · AVADSA25/codec

AVADSA25 · 2026-05-29T10:43:49Z

Supersedes #153 — same Phase B/C commits plus a full re-audit remediation wave. Phase A (C1/C2/C3/C6/C7) is already on main (PR #152). One squash-merge lands everything below.

All TDD'd; 224 tests green across touched modules; ruff clean; rebased cleanly on current main.

Phase B/C — top-10 completion

Fix	Findings
C4 — `run_with_timeout`, kill hanging ThreadPoolExecutor	concurrency hang
C5/H14/M6 — SQLite RLock + cross-process file_lock	state races
H1 — SSRF guard (web_fetch + crew)	SSRF
H2/H6 — approve refuses shadowing a pinned built-in	skill overwrite
C8 — jsonstore Phase 0/2/5 + STATE-FILES.md	ad-hoc state writers
C9 — chat_completion → helpers (356→252)	god-module
test coverage — A-12 CI guard + OAuth scope tests	—

Re-audit wave (fresh specialist sweep of the post-remediation tree)

Fix	Sev	What
N1	🔴 Critical	`/ws/voice` was unauthenticated — BaseHTTPMiddleware never runs on the websocket scope. Added `_ws_authorized` mirroring HTTP auth; reject before `accept()`.
N3	🟠 High	SSRF guard bypassed by HTTP redirects — both fetch sites now follow redirects manually, validating every hop. (patched the Fix #7 gap)
N4/N10	🟠 High	`file_write`/`file_ops` resolved only the parent dir → symlinked final component slipped the blocklist. Realpath the full path + open the resolved target.
N5	🟠 High	TOTP bypass via the `?s=` session fallback (skipped `totp_verified`). Both paths now route through `_session_token_valid`.
N7	🟠 High	vision chat path raised a raw 500 on backend hiccup → graceful 502.
N8	🟠 High	observer leaked `sys.path` every poll (dead lines) → removed.
N9	🟠 High	`_research_jobs` unbounded + unlocked → lock + TTL eviction (mirrors `_agent_jobs`).
N19/N20/N21	🟢 Low	AST gate now blocks `import builtins`; skill-approve fails closed when the manifest can't be read; PIN compared with `hmac.compare_digest`.

Still requires you (not in this PR)

Fix fix(tests): skip destructive skills in test_mcp_all_tools + incident report #6 + the dependency CVEs — the live-env work: fastmcp 9.3 SSRF (CVE‑2026‑32871) is installed on the internet-facing codec-mcp-http; plus urllib3/pillow/cryptography/requests/starlette CVEs and the PM2 interpreter pin. requirements.txt already floors the safe versions; the installed venvs are stale. Needs you to confirm the venv path + watch the pm2 restart.
Deferred follow-ons: feat(self_improve): migrate to plugin hook architecture (Phase 1 Step 4) #8 routes/chat.py module split, feat(observer): continuous observation loop (Phase 2 Step 5) #9 jsonstore Phases 1/3/4 (docs/STATE-FILES.md), plus the medium re-audit items (consent gate on chat/MCP skill paths, atomic config writes, step-budget off-by-one) and the ~515 LOC dead-code cleanup.

🤖 Generated with Claude Code

The idiom `with ThreadPoolExecutor() as ex: fut.result(timeout=T)` defeats its own timeout: the context-manager __exit__ calls shutdown(wait=True), which blocks until the runaway task finishes. A 50ms timeout on a 5s task took ~5s, so codec_mcp tool dispatch and codec_observer OCR could hang on a slow skill or a stuck screencapture popup. Add codec_concurrency.run_with_timeout (daemon thread + join(timeout), no shutdown(wait=True)): surfaces the timeout promptly, abandons the daemon worker (never blocks shutdown), re-raises fn exceptions, and raises TimeoutError (== concurrent.futures.TimeoutError on py3.11+). Migrate both call sites; mirrors the proven codec_hooks._run_hook_with_timeout pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CodecMemory shared one sqlite3 connection across threads (check_same_thread=False) with no app-level lock — concurrent save/search on the same connection can corrupt cursor state or even segfault (the new test reproduced a hard crash before the fix). Add a threading.RLock and serialize every runtime connection use (save, search, search_recent, get_sessions, cleanup, rebuild_fts, close). RLock so get_context->search and cleanup->close don't self-deadlock; WAL + busy_timeout stay the cross-process layer. Route the unguarded ~/.codec JSON read-modify-writes through the cross-process file_lock: - grant_permission (routes/agents.py): new codec_agent_plan.grants_lock (mirrors _status_lock) held across load_grants -> modify -> save_grants -> set_grants_hash, so concurrent /grant calls can't clobber grants.json. - both notifications.json writers in codec_ask_user (question-write + mark-read) now also hold file_lock(NOTIFICATIONS_PATH), matching the pairing the pending_questions path already used. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…s (H1) Attacker-influenceable URLs (chat injection, clipboard, crew task) could be fetched against 127.0.0.1, 10/8, 192.168/16, 169.254.169.254 (cloud metadata), etc. — and the body flows back into the LLM transcript, so a read is an exfil path even with egress otherwise denied. Add codec_ssrf.validate_url: enforces http(s), resolves the host, and rejects if ANY resolved address is loopback / private / link-local / multicast / reserved / unspecified (dual-record + IPv4-mapped-IPv6 aware). Wire it BEFORE the HTTP call in skills/web_fetch.py (which clipboard_url_fetch delegates to) and in codec_agents._web_fetch. DNS-rebinding TOCTOU limitation documented. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…(H2/H6) The review-and-approve flow wrote an approved skill to the skills dir by basename with no name-collision check. An approved skill named after a hash-pinned built-in (calculator.py, file_write.py, ...) would take that trusted name — shadowing the real built-in, or overwriting it if the write dir is the repo skills dir. skill_approve now refuses any filename matching a name in the committed skills/.manifest.json (case-insensitive, so the guard also holds on case-insensitive filesystems), emitting skill_approve_blocked. PR-1A's load-time hash/AST gate remains the last line of defense; this refuses at the write point so the trusted file is never displaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#10) - tests/test_a12_invariant.py: fails if a new inline `.post(...chat/completions ...)` appears outside codec_llm. Allowlists the documented vision sites (dashboard, watcher, both bridges, screenshot_text) + codec_core's generated session-script string. Runs in CI via the full pytest suite (no separate workflow step needed). Includes a synthetic-violation test proving the guard is not a no-op. - tests/test_oauth_provider.py: regression tests for the refresh-token scope-escalation defense (requested scopes must be a subset of the original grant) — previously untested. permission_gate mutation cases are already covered by test_agent_runner.py (8 tests incl. D-5 traversal/symlink); not duplicated. The flaky time.sleep -> threading.Event refactor is deferred — no specific flaky test identified, and rewriting passing tests without evidence risks regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…store registry) Design-only, per CLAUDE.md §11 + the SECURITY-REMEDIATION-DESIGN gate for the two remaining structural fixes. No production code. Each note grounds scope, seam, migration phasing, test plan, rollback, and open decisions in the actual HEAD surface (codec_dashboard chat cluster ~lines 2207-3005; ~30 ad-hoc json.dump writers + 2 duplicate atomic helpers). Awaiting sign-off before implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…k + registry (C8) Fix #9, scoped to the high-value, low-risk core (recommended decisions applied): - Phase 0: atomic_write_json gains `default=` (custom serializer, e.g. str for datetime) and `sort_keys=` passthrough; new read_modify_write(path, fn) standardizes lock+load+mutate+atomic-write so a future RMW can't forget the file_lock. - Phase 2: routes/_shared._save_notification now holds the cross-process file_lock across its load->insert->write (it ran under the in-process _notif_lock only, so separate PM2 daemons clobbered each other). - Phase 5: docs/STATE-FILES.md registry of every ~/.codec state file + lock policy + migration status. Deferred to a documented follow-on (see STATE-FILES.md): Phase 1 mass overwrite-durability swaps; Phase 3 helper convergence (entangles the don't-touch pending_questions/grants write paths — byte-preserving, reviewed per-helper); Phase 4 don't-touch files (config.json, OAuth/Google tokens) with per-file sign-off. The broad json.dump CI ratchet was rejected as low-signal (rationale in STATE-FILES.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…start) Fix #8, intra-file CC reduction. The routes/chat.py module split is deferred: chat_completion shares _load_prompt_overrides + CHAT_SYSTEM_PROMPT with the prompt-override endpoints, so a clean split needs those relocated first (tracked in docs/FIX8-ROUTES-CHAT-EXTRACTION-DESIGN.md). Extract two of the heaviest blocks out of the 356-line chat_completion into named, independently-testable helpers — behavior-preserving (the 48-test chat/stream/budget/escalation baseline is byte-identical before/after): - _build_chat_system_prompt(config, budget, has_attachment, last_user_text): override + step-budget + attachment / content-rewrite / observer-inject logic. - _chat_vision_response(body, messages): the image -> vision-model branch. chat_completion: 356 -> 252 lines. Adds tests/test_chat_helpers.py (5 unit tests now possible because the logic is no longer trapped inside the handler). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

AuthMiddleware subclasses Starlette BaseHTTPMiddleware, which only runs on the `http` scope — so the /ws/voice WebSocket handshake was completely unauthenticated. Combined with the voice pipeline running any trigger-matched skill (terminal/file_ops/pilot, no allowlist), an exposed dashboard (Cloudflare tunnel / dashboard_host=0.0.0.0) was an open, auth-free path to code execution. Add _ws_authorized() mirroring the HTTP AuthMiddleware Layers 0/1/2 (open when nothing configured; else dashboard_token via ?token=/Bearer, OR a valid TOTP-verified biometric session cookie) and reject before accept(). The middleware author already intended /ws to be gated (the 401 branch checks path.startswith("/ws")) — it just never executed on the websocket scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…re-audit N3) Fix #7 validated only the INITIAL URL, but both fetch sites follow redirects by default (requests allow_redirects=True; httpx follow_redirects=True), so a public URL returning 302 -> http://169.254.169.254/ (cloud metadata) or http://127.0.0.1/ reached the internal target unchecked and returned its body into the LLM transcript — defeating the H1 guard. Both sites now follow redirects MANUALLY with allow_redirects/follow_redirects False, calling codec_ssrf.validate_url on every hop and capping at 5 redirects. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… (re-audit N4/N10) Both _is_safe_target (file_write) and _is_safe_path (file_ops) realpath-resolved only the PARENT dir then re-joined the raw basename, so a symlinked FINAL component (e.g. ~/Documents/x.md -> ~/.zshrc, or x.json -> ~/.codec/ oauth_state.json) passed the blocklist and open() followed it — a write-through to ~/.zshrc/~/.codec (RCE/secret overwrite) or a read-exfil of secrets over the MCP-exposed surface, defeating D-4. Realpath the FULL path in both safety checks (resolves a symlinked basename to its blocked target), and open()/read()/write() the resolved target so the op can't be redirected between check and open (TOCTOU). realpath handles not-yet-existing files. Matches codec_agents._file_write, which already did it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

AuthMiddleware's ?s=<token> GET fallback (for mobile img/stream URLs) checked only token existence + age — NOT totp_verified, unlike the cookie path. Since auth_verify/auth_pin hand out a usable session token BEFORE TOTP is completed, a pre-TOTP (or stolen pre-TOTP) token could reach any GET /api endpoint via ?s=, bypassing 2FA. Extract _session_token_valid() (existence + age + TOTP) and route BOTH the cookie path (_verify_biometric_session) and the ?s= fallback through it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…lock/TTL (re-audit N7/N8/N9) N7: _chat_vision_response runs OUTSIDE chat_completion's try/except, so a non-200/malformed vision-backend response surfaced as a raw 500. Guard the POST + parse and return a graceful 502. N8: codec_observer._ocr_call did sys.path.insert() twice per poll (up to ~5760 entries/day) — and the very next comment says no import happens. Dead + a slow unbounded leak; removed. N9: _research_jobs had no lock and no eviction (unlike _agent_jobs post-H-4) — unbounded memory growth + a dict-changed-size race vs the worker. Add _evict_stale_research_jobs (+ _research_jobs_lock) mirroring the agent-jobs pattern; deep_research_start evicts + guards the add/update under the lock. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…re-audit N19/N20/N21) N19: add `builtins` to _SKILL_DANGEROUS_MODULES — `import builtins; builtins.exec(src)` bypassed the gate (builtins.exec is an Attribute call, not the bare-name Call the dangerous-calls check catches). N20: _pinned_builtin_names() returned an empty set on a manifest read failure, so skill_approve's pinned-name guard blocked nothing during a transient failure. Return None on failure and FAIL CLOSED (refuse the approve). N21: routes/auth.py compared the PIN hash with `==` (timing side-channel) — use hmac.compare_digest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

#6) Live-env investigation (read-only + staged, rolled back) found: the deployed python3.13 runs fastmcp 3.1.1 while requirements declares >=3.3.1, and an in-place upgrade to 3.3.1 left a broken install state on that interpreter. In a CLEAN env the code imports fine on 3.3.1 — so the fix is a clean reinstall, NOT a code port. The fastmcp SSRF (CVE-2026-32871) is in OpenAPIProvider, which CODEC does not use, so it isn't exploitable here (hygiene, not an emergency). - requirements.txt: add advisory-safe floors for the transitive/runtime deps that were unpinned — urllib3>=2.7.0, pillow>=12.2.0, cryptography>=46.0.7. (fastmcp>=3.3.1 and requests>=2.34.2 were already correct; the deployed interpreters were just stale vs the declared floors.) - tests/test_fastmcp_compat.py: assert fastmcp meets the 3.3.1 floor and that codec_mcp + codec_oauth_provider import against it — runs in CI and would have caught both the version drift and any future fastmcp API move. NO live mutation: the deployed venvs are untouched (rolled back to their original state). Refresh them at a maintenance window — see PR runbook. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

AVADSA25 · 2026-05-29T10:54:01Z

Fix #6 — live dependency-refresh runbook (run at a maintenance window)

The deployed interpreters are stale vs the (now CVE-safe) requirements.txt floors. None of the advisories is exploitable in CODEC's usage (fastmcp SSRF = OpenAPIProvider, unused; pillow = image-parsing, only generates QR; urllib3 = ProxyManager / malicious-server vs SSRF-guarded web_fetch) — so this is hygiene. The codec-mcp-http interpreter (/usr/local/bin/python3.13) had a broken in-place fastmcp upgrade state, so use a clean reinstall, and verify imports BEFORE restarting the service:

# 1) codec-mcp-http interpreter — clean fastmcp reinstall (in-place upgrade left a broken state)
/usr/local/bin/python3.13 -m pip uninstall -y fastmcp fastmcp-slim
/usr/local/bin/python3.13 -m pip install "fastmcp>=3.3.1" "urllib3>=2.7.0" "pillow>=12.2.0" "cryptography>=46.0.7" "requests>=2.34.2"
# VERIFY before any restart — must print OK:
/usr/local/bin/python3.13 -c "import codec_mcp, codec_oauth_provider; from fastmcp import FastMCP; print('OK')"
pm2 restart codec-mcp-http && pm2 logs codec-mcp-http --lines 30   # confirm it comes up clean

# 2) the Homebrew interpreter (dashboard/bridges/etc.)
/opt/homebrew/bin/python3 -m pip install "urllib3>=2.7.0" "pillow>=12.2.0" "cryptography>=46.0.7" "requests>=2.34.2"
pm2 restart codec-dashboard   # + any other Homebrew-python services as needed

starlette (BadHost, CVE-2026-48710) is a major bump requiring a coordinated fastapi upgrade + middleware review — deliberately deferred, not in the floors above.

The repo + tests are already on the safe floors; tests/test_fastmcp_compat.py guards the fastmcp floor↔import consistency in CI going forward.

…mediums) CHAIN-002: skill_forge writes forged code to disk WITHOUT the review gate, yet was on CHAT_SKILL_ALLOWLIST — a prompt-injected [SKILL:skill_forge:...] tag could persist an unsandboxed skill. Removed it (+ the stale ask_codec_to_build entry that had no backing skill); create_skill stays as the review-gated path. Atomic config writes (truncated-read → empty-fallback race): route save_triggers (custom_triggers.json), custom-agent save, and update_schedule (schedules.json) through codec_jsonstore.atomic_write_json instead of truncate-then-write, so a concurrent reader can't catch a half-written file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The Step-3 strict-consent gate lives only in codec_agent_runner; the chat [SKILL:] + MCP tool paths reach high-power skills with only is_dangerous / path-blocklists (red-team CHAIN-001/002/006). This >1-module, UX-changing fix is design-gated per CLAUDE.md §11 — note covers the two chokepoints (codec_dispatch.run_skill, codec_mcp.tool_fn), the per-transport consent UX (MCP hard-refuse / chat prompt / voice announce), destructive classification, test/rollback, and the open decisions. No code until approved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…1/C) — and regen skill manifest Consent gate, increment 1 (Decisions B1 + C): - codec_consent: is_destructive_skill (per-skill SKILL_DESTRUCTIVE flag via the registry + _HTTP_BLOCKED backstop + known high-power built-ins) + gate_enabled kill switch (CONSENT_GATE_ENABLED). - codec_skill_registry: AST-extract SKILL_DESTRUCTIVE + get_destructive() accessor (parallel to SKILL_MCP_EXPOSE) — the extensible per-skill path. - codec_mcp.tool_fn: hard-refuse destructive skills over MCP (claude.ai can't consent at the operator tier) — closes the red-team's file_ops/imessage_send/ pilot-over-MCP chains; emits a `denied` audit. Also REGENERATES skills/.manifest.json: the earlier Fix #7 (web_fetch) and N4/N10 (file_write, file_ops) edits changed those built-ins' bytes, so their hash-pins were stale — the load-time gate would have AST-refused file_write/ file_ops (they import os) in production, and CI --check would fail. Manifest now matches the 76 built-ins. Chat-side consent (A1: consent_required + re-dispatch) is the next increment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Consent gate increment 2 (Decision A2 — reuse the proven Phase 1 Step 3 PWA consent panel, no new frontend). Both chat skill-runners now gate destructive skills before firing: - _try_skill (pre-LLM hijack) and _try_skill_by_name (post-LLM [SKILL:] tag — the prompt-injection vector) call codec_consent.chat_consent_ok, which for a destructive skill routes through codec_ask_user.ask(destructive=True, asked_from="chat") — literal verb-match consent rendered by the existing PWA panel; declined/timeout/ask_user-unavailable → skill skipped (fail-closed). - Non-destructive skills are unaffected (no prompt). Kill switch: CONSENT_GATE_ENABLED=false. With B1 (MCP hard-refuse) this closes the red-team's cross-cutting root cause: the chat [SKILL:] + MCP paths reached high-power skills with only is_dangerous. is_dangerous is retained as the layered terminal command heuristic (Decision D). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

AVADSA25 · 2026-05-29T11:37:25Z

Update — re-audit mediums + consent gate landed

Added since the PR opened:

skill_forge off the chat allowlist + atomic config writes (mediums).
Strict-consent gate (design: docs/REAUDIT-CONSENT-GATE-DESIGN.md):
- B1 — MCP hard-refuse of destructive skills (codec_mcp.tool_fn) — closes the red-team's file_ops/imessage_send/pilot-over-MCP chains.
- A2 — chat consent via the existing AskUserQuestion PWA panel: destructive skills (pre-LLM hijack + post-LLM [SKILL:] tag) require literal-verb consent before firing; benign skills unaffected; kill switch CONSENT_GATE_ENABLED.
- C — per-skill SKILL_DESTRUCTIVE flag (registry AST-extracted) + _HTTP_BLOCKED/known-builtin backstop; D — is_dangerous retained as the layered terminal heuristic.
Manifest regen — the Fix fix(tests): hotfix v2 — skip Terminal-popup skills (memory_search/clipboard/self_improve) #7/N4/N10 edits to web_fetch/file_write/file_ops changed those built-ins' bytes; skills/.manifest.json is regenerated so they stay hash-pinned-trusted (otherwise file_write/file_ops would be AST-refused at load, and CI --check would fail). --check now clean.

230 tests green across the touched modules. Every Critical/High/actionable-Medium from both audits is now in this PR.

Mikarina13 and others added 14 commits May 29, 2026 11:39

AVADSA25 mentioned this pull request May 29, 2026

security: audit remediation Phase B/C (#4,#5,#7,#9,#10 + #8 partial) #153

Closed

Mikarina13 and others added 4 commits May 29, 2026 13:02

AVADSA25 merged commit 6d27e0b into main May 29, 2026
1 check passed

AVADSA25 mentioned this pull request May 29, 2026

security: re-audit tail — traversal guard, step-budget parity, jsonstore Phase 1 #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: Phase B/C top-10 remediation + full re-audit wave (1 critical + highs)#154

security: Phase B/C top-10 remediation + full re-audit wave (1 critical + highs)#154
AVADSA25 merged 19 commits into
mainfrom
security-reaudit-wave2

AVADSA25 commented May 29, 2026

Uh oh!

AVADSA25 commented May 29, 2026

Uh oh!

AVADSA25 commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 29, 2026

Phase B/C — top-10 completion

Re-audit wave (fresh specialist sweep of the post-remediation tree)

Still requires you (not in this PR)

Uh oh!

AVADSA25 commented May 29, 2026

Fix #6 — live dependency-refresh runbook (run at a maintenance window)

Uh oh!

AVADSA25 commented May 29, 2026

Update — re-audit mediums + consent gate landed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants