feat: pets system, GPU passthrough, NemoClaw health scan & secrets hardening#84
feat: pets system, GPU passthrough, NemoClaw health scan & secrets hardening#84
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an MCP gateway service and GPU Ollama variant to docker-compose; integrates MCP-based GitHub context and a leaderboard into broski-pets-bridge; introduces live code health scanning (script, cached report, Discord cog/commands), pre-build safety checks, secrets/env additions, config tweaks (ignore empty envs), and documentation/CLAUDE updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Bridge as Broski Bridge
participant MCP as MCP Gateway
participant GitHub as GitHub (via MCP tool)
participant LLM as Local LLM / Ollama
rect rgba(173,216,230,0.5)
Bridge->>MCP: GET /tools/list (discover available tools)
MCP-->>Bridge: tools metadata
end
rect rgba(144,238,144,0.5)
Bridge->>MCP: POST /tools/call (invoke GitHub search tool + args)
MCP->>GitHub: execute tool request
GitHub-->>MCP: tool output (chunked text)
MCP-->>Bridge: aggregated tool output (SSE / JSON-RPC)
end
rect rgba(255,228,181,0.5)
Bridge->>LLM: construct prompt (include GitHub context) and send
LLM-->>Bridge: response (stream/SSE)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Around line 1723-1727: Add a start_period to the GPU container healthcheck to
avoid false-negative restarts during GPU driver/CUDA warm-up: update the
healthcheck block (the keys test: ["CMD", "ollama", "list"], interval, timeout,
retries) to include start_period: 30s–60s (use 60s for slower cold starts) so
the container won’t be marked unhealthy while the runtime warms; also apply the
same change to the CPU healthcheck block for parity.
- Around line 1700-1735: The GPU and CPU Ollama services conflict on port,
volume, and DNS; make them mutually exclusive by putting the CPU variant behind
a "cpu" profile (add profile: ["cpu"] to the hypercode-ollama service), remove
the host port publication and the network alias from hypercode-ollama-gpu (drop
ports: and the agents-net.aliases entry) so it does not collide, and set
container_name and/or hostname "hypercode-ollama" on each variant
(hypercode-ollama and hypercode-ollama-gpu) so whichever profile is active
answers at the expected name; also update dependents (e.g., broski-pets-bridge
depends_on) to rely on the canonical name or ensure they are profile-aware; if
you ever need both to run, give hypercode-ollama-gpu a distinct host port and
alias instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| hypercode-ollama-gpu: | ||
| image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14} | ||
| profiles: ["gpu"] | ||
| volumes: | ||
| - ollama-data:/root/.ollama | ||
| ports: | ||
| - "127.0.0.1:11434:11434" | ||
| networks: | ||
| agents-net: | ||
| aliases: | ||
| - hypercode-ollama | ||
| deploy: | ||
| resources: | ||
| limits: | ||
| cpus: "2" | ||
| memory: 3G | ||
| reservations: | ||
| cpus: "1" | ||
| memory: 1G | ||
| devices: | ||
| - driver: nvidia | ||
| count: all | ||
| capabilities: ["gpu"] | ||
| healthcheck: | ||
| test: ["CMD", "ollama", "list"] | ||
| interval: 30s | ||
| timeout: 10s | ||
| retries: 3 | ||
| logging: | ||
| driver: "json-file" | ||
| options: | ||
| max-size: "10m" | ||
| max-file: "3" | ||
| security_opt: | ||
| - no-new-privileges:true | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Docker Compose deploy.resources.reservations.devices nvidia GPU syntax latest
💡 Result:
The syntax for reserving NVIDIA GPUs in Docker Compose under deploy.resources.reservations.devices is as follows (latest as of official Docker docs): yaml services: your-service: image: nvidia/cuda:12.0-base-ubuntu22.04 # or your GPU-enabled image deploy: resources: reservations: devices: - driver: nvidia count: 1 # or 'all', or integer capabilities: [gpu] Key points: - Required: driver: nvidia and capabilities: [gpu] - Optional: count: 1 (number of GPUs or 'all'), device_ids: ['0', '1'] (specific GPU IDs from nvidia-smi; mutually exclusive with count), options: {key: value} for driver options. - Use docker compose up (not docker stack deploy for Swarm, as deploy is ignored in Swarm mode per older notes). - Prerequisites: NVIDIA Container Toolkit installed, Docker daemon configured for GPUs. Example for all GPUs: yaml devices: - driver: nvidia count: all capabilities: [gpu] Example for specific devices: yaml devices: - driver: nvidia device_ids: ['0', '3'] capabilities: [gpu] This matches the official Docker Compose GPU support documentation and NVIDIA examples.
Citations:
- 1: https://docs.docker.com/compose/how-tos/gpu-support/
- 2: https://docs.docker.com/compose/gpu-support/
- 3: https://docs-stage.docker.com/compose/how-tos/gpu-support/
- 4: https://docs.nvidia.com/ai-workbench/user-guide/latest/reference/projects/compose-patterns-reference.html
🏁 Script executed:
#!/bin/bash
# Search for documentation files mentioning NVIDIA Container Toolkit or GPU setup
echo "=== Searching for GPU/NVIDIA documentation in repo ==="
# Search in common doc locations
find . -type f \( -name "README*" -o -name "CONTRIBUTING*" -o -name "SETUP*" -o -name "*install*" -o -name "*docker*" \) 2>/dev/null | head -20
# Search for NVIDIA/GPU mentions
echo ""
echo "=== Searching for 'nvidia' or 'GPU' mentions in documentation ==="
rg -i "nvidia.*toolkit|gpu.*prerequisite|container.*toolkit" --type md --type txt 2>/dev/null || echo "No matches found"
# Check if there's a docker or compose specific setup doc
echo ""
echo "=== Checking for docker-compose related docs ==="
find . -type f \( -name "*.md" -o -name "*.txt" \) -exec grep -l "docker-compose\|docker compose" {} \; 2>/dev/null | head -10Repository: welshDog/HyperCode-V2.4
Length of output: 2989
🏁 Script executed:
#!/bin/bash
# Read the files that mention NVIDIA Container Toolkit
echo "=== Content of hyperstudio-platform/README.md (NVIDIA section) ==="
grep -A 5 -B 2 "NVIDIA" hyperstudio-platform/README.md 2>/dev/null || echo "File not found or no match"
echo ""
echo "=== Content of hyperstudio-platform/hyperstudio-setup.md (NVIDIA section) ==="
grep -A 5 -B 2 "NVIDIA" hyperstudio-platform/hyperstudio-setup.md 2>/dev/null || echo "File not found or no match"
echo ""
echo "=== Content of docs/NEXT_MOVES.md (NVIDIA/GPU section) ==="
grep -A 5 -B 2 "NVIDIA\|GPU" docs/NEXT_MOVES.md 2>/dev/null || echo "File not found or no match"
# Check main README for GPU setup info
echo ""
echo "=== Main README.md GPU/NVIDIA references ==="
grep -i "gpu\|nvidia\|ollama" README.md 2>/dev/null | head -20Repository: welshDog/HyperCode-V2.4
Length of output: 6580
Critical: CPU and GPU Ollama services collide on host port, shared volume, and network alias.
The hypercode-ollama-gpu service is additive (profile-gated) whilst hypercode-ollama has no profile and therefore always starts. Running docker compose --profile gpu up brings up both simultaneously, which produces three concurrent conflicts:
- Host port bind clash — both publish
127.0.0.1:11434:11434. The second container to start will fail withbind: address already in use. - Concurrent writes to the same data dir — both mount the
ollama-databind volume at/root/.ollama. Twoollama serveprocesses writing to the same blob store, manifests, and lockfile risk data corruption and flapping health checks. - Duplicate DNS alias on
agents-net— the GPU variant aliases itself ashypercode-ollama, which is also the default service's implicit alias. Docker's embedded DNS will round-robin across both IPs, so clients likehypercode-core,broski-pets-bridge, andagent-x(all hardcoded tohttp://hypercode-ollama:11434perbackend/app/core/config.py:41,agents/agent-x/designer.py:22,agents/broski-pets-bridge/main.py:467) may land on either endpoint unpredictably.
Additionally, broski-pets-bridge has depends_on: hypercode-ollama only, so activating the GPU profile does not gate dependents on GPU readiness.
Make the two variants mutually exclusive so operators pick one. The cleanest pattern is to put the CPU variant behind a cpu profile, drop the duplicate host-port publication and DNS alias from the GPU variant (internal service name hypercode-ollama-gpu), and let callers continue to resolve hypercode-ollama by promoting whichever variant is active via container_name/hostname.
🛠️ Suggested fix — profile-gate the CPU variant and keep a single service name resolvable
hypercode-ollama:
+ profiles: ["cpu", "default"]
image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
container_name: hypercode-ollama
volumes:
- ollama-data:/root/.ollama
ports:
- "127.0.0.1:11434:11434"
networks:
- agents-net hypercode-ollama-gpu:
- image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
+ image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
+ container_name: hypercode-ollama # reuse the canonical name so clients resolve it
profiles: ["gpu"]
volumes:
- ollama-data:/root/.ollama
ports:
- "127.0.0.1:11434:11434"
networks:
- agents-net:
- aliases:
- - hypercode-ollama
+ - agents-netOperators then run either docker compose --profile cpu up (default) or docker compose --profile gpu up, never both. If you need both variants to coexist for benchmarking, give the GPU variant a distinct host port (e.g. 11435:11434) and a distinct service DNS name, and add an environment switch for OLLAMA_HOST on clients.
Note: The GPU syntax deploy.resources.reservations.devices with driver: nvidia, count: all, capabilities: [gpu] is confirmed to match the latest Docker Compose specification. NVIDIA Container Toolkit prerequisites are documented in docs/NEXT_MOVES.md with platform-specific installation instructions and verification steps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` around lines 1700 - 1735, The GPU and CPU Ollama services
conflict on port, volume, and DNS; make them mutually exclusive by putting the
CPU variant behind a "cpu" profile (add profile: ["cpu"] to the hypercode-ollama
service), remove the host port publication and the network alias from
hypercode-ollama-gpu (drop ports: and the agents-net.aliases entry) so it does
not collide, and set container_name and/or hostname "hypercode-ollama" on each
variant (hypercode-ollama and hypercode-ollama-gpu) so whichever profile is
active answers at the expected name; also update dependents (e.g.,
broski-pets-bridge depends_on) to rely on the canonical name or ensure they are
profile-aware; if you ever need both to run, give hypercode-ollama-gpu a
distinct host port and alias instead.
Add detailed system documentation covering architecture, roadmap, security posture, and operational procedures to help the AI assistant understand the HyperCode ecosystem. This includes repository structure, completed phases, network configuration, known issues, and key paths. Also add openssl command to Claude settings for security testing purposes.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (5)
.claude/CLAUDE_CONTEXT.md (2)
205-217: Add an explicit safety preface for command scope/environment before destructive ops.The command block includes high-impact operations (
docker compose build --no-cache,alembic upgrade head). Add a brief “pre-flight” note (target environment, backup/rollback expectation, and repo-root requirement) to reduce accidental misuse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/CLAUDE_CONTEXT.md around lines 205 - 217, Add a short pre-flight safety preface above the command block that clearly states the target environment (e.g., local/dev vs staging/prod), requires running from the repository root, and reminds users to have a backup or rollback plan before running destructive commands; mention specific high-impact commands by name (docker compose build --no-cache, docker compose up -d, alembic upgrade head, bash scripts/network-migrate.sh) and suggest using --dry-run or confirmation steps where available.
103-113: Move volatile CVE/version status to a generated report, then link it here.These sections present time-sensitive values (CVE counts, “all Debian-unfixable”, resolved package status) as fixed truth. In an always-read boot/context file, this can go stale quickly and mislead security decisions.
Based on learnings, this file class is treated as a first-read entrypoint, so stale operational/security facts have outsized impact.
Also applies to: 241-259
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/CLAUDE_CONTEXT.md around lines 103 - 113, The "## ✅ Phase 9 — CVE Elimination Results (April 14, 2026)" section (including the CVE table, the "13 Remaining HIGHs" list and the `starlette` resolved note) contains volatile, time-sensitive security status and should be removed from this static CLAUDE_CONTEXT.md and instead generated automatically at build/runtime; move its content into a generated report endpoint/file (e.g., a CI-produced security_report) and replace the section here with a short stable link and a one-line note pointing to that generated report (e.g., "See generated security report for current CVE status"), ensuring any references to `docker.io/runc`, `libexpat1`, `libncursesw6`, `libnghttp2`, `libsystemd0`, and the `starlette`/`fastapi==0.135.3`+`starlette==0.47.2` resolution are preserved only in the generated report.agents/broski-pets-bridge/main.py (2)
178-195:_mcp_connectedswallows a 1 s timeout — OK, but be aware/healthstill pays the latency.
/healthis a liveness/readiness probe that is hit every 30 s by Docker, once per request by load balancers, and on every dashboard refresh. Adding an outbound HTTP round-trip (up to 1 s on failure, more on success if the gateway is slow) on the health path can turn/healthinto a blocking call and cascade failures (e.g. the container gets marked unhealthy because the gateway is slow, even though the pets-bridge itself is fine).Consider one of:
- Cache the last probe result with a short TTL (e.g. 15 s) and return the cached value on
/healthcalls.- Move
mcp_connectedout of/healthinto a/readyzor a dedicated/diagendpoint.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/broski-pets-bridge/main.py` around lines 178 - 195, The _mcp_connected function performs an outbound JSON-RPC call on each health probe, which can add up to 1s latency and make /health blocking; change this by either (A) introducing a small TTL cache for the last _mcp_connected result (e.g., store last_result and last_checked timestamp and return cached value if now - last_checked < 15s) and have the /health handler read the cached value instead of calling _mcp_connected directly, or (B) remove the MCP check from the /health handler and instead expose it via a separate endpoint such as /readyz or /diag that calls _mcp_connected; update the code paths that call _mcp_connected (the /health handler function and any readiness routes) to use the chosen approach and ensure cache state is thread-safe/async-safe (e.g., protect with an asyncio.Lock or atomic variables).
690-695: Bareexcept Exceptionis fine here — but consider logging the swallowed error.The MCP call is best-effort and falling back to an empty context on failure is the right UX call. However, silently swallowing every exception makes it very hard to diagnose "why is GitHub context never populated" in production. At minimum, log at
warninglevel so the failure surfaces in the structured log pipeline (Loki/Promtail) that this repo already ships.try: github_context = await _github_context_via_mcp(body.message) - except Exception: + except Exception as exc: + # MCP is optional; keep the chat flow resilient but surface the failure. + import logging + logging.getLogger(__name__).warning( + "MCP GitHub context fetch failed: %s: %s", type(exc).__name__, exc + ) github_context = ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/broski-pets-bridge/main.py` around lines 690 - 695, The try/except around the _github_context_via_mcp(body.message) call swallows all exceptions silently; change it to catch Exception as e and emit a warning-level log that includes the exception details (e.g. logger.warning("Failed to retrieve GitHub context via MCP", exc_info=True) or include e in the message) before falling back to github_context = ""; update the block that sets github_context when body.mode == "ask" to log the failure while preserving the existing empty-string fallback.docker-compose.yml (1)
916-916: Avoid:latestfordocker/mcp-gateway— pin a stable version tag.
docker/mcp-gateway:latestmeans everydocker compose pullor fresh node join can silently introduce breaking changes to the MCP protocol, CLI flags, or runtime behaviour — all of which will then cascade intobroski-pets-bridge(which already makes assumptions about the SSE/JSON-RPC contract). Every other first-party service in this file pins a version (redis:7-alpine,postgres:15-alpine,ollama/ollama:0.3.14,prom/prometheus:v2.55.1, etc.) via${..._IMAGE:-<pinned>}. Do the same here.♻️ Suggested diff
- image: docker/mcp-gateway:latest + image: ${MCP_GATEWAY_IMAGE:-docker/mcp-gateway:v2}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` at line 916, Replace the unstable image tag docker/mcp-gateway:latest with a pinned, explicit version to prevent silent breaking changes; locate the image declaration "image: docker/mcp-gateway:latest" in the docker-compose service for the MCP gateway (the service consumed by broski-pets-bridge) and change it to a specific version string (e.g., docker/mcp-gateway:<stable-tag> or use an env variable like ${MCP_GATEWAY_IMAGE:-docker/mcp-gateway:<stable-tag>}) consistent with how other services are pinned (redis, postgres, ollama, prom/prometheus).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.env.example:
- Line 30: The MCP_GATEWAY_URL default is inconsistent: .env.example uses
"http://" while docker-compose.yml uses "tcp://"; update the docker-compose.yml
default for MCP_GATEWAY_URL (the env default that currently starts with
"tcp://mcp-gateway:8099") to use "http://mcp-gateway:8099" so both defaults
match the actual adapter transport (note agents/broski-pets-bridge/main.py
currently normalises tcp:// → http:// but we should align the defaults to
"http://").
- Line 31: Remove the bare GITHUB_TOKEN= entry from .env.example or at least
mark it as a secret and relocate it into the External APIs block to match the
file's convention; specifically, either comment the GITHUB_TOKEN line and add a
pointer to secrets/github_token.txt (following the pattern used for OPENAI,
DISCORD, API_KEY, etc.) or move the key next to the External APIs comments and
annotate it as a secret, and update docker-compose.yml to read it via a Docker
secret or _FILE indirection if necessary; ensure key ordering/sorting matches
surrounding keys (MCP_GATEWAY_URL, PETS_BRIDGE_URL) to satisfy dotenv-linter.
In `@agents/broski-pets-bridge/main.py`:
- Around line 156-175: The loop in _mcp_jsonrpc_call currently returns
immediately after a 200 response but before parsing, so if
_parse_mcp_sse_payload raises (e.g. ValueError) the fallback endpoint is never
tried; change the logic in _mcp_jsonrpc_call to call _parse_mcp_sse_payload
inside a try/except when resp.status_code == 200, capture any parse exceptions
(ValueError or general Exception) into last_err and continue to the next
endpoint instead of letting them bubble, and only return when parsing succeeds
(still preserving the sid extraction from resp.headers and using session_id as
fallback).
- Around line 225-255: _build_tool_args currently assigns the full "owner/name"
string into args["repo"], which breaks tools that expect only the repository
name; change the logic to split the GITHUB_CONTEXT_REPO into owner and repo_name
up front (e.g. owner, name = repo.split("/",1)) and then set args values based
on which props are present: if props contains both "owner" and "repo" set
args["owner"]=owner and args["repo"]=name; if props contains only "repo" set
args["repo"]=name (not owner/name); if props contains only "owner" set
args["owner"]=owner; for fields like "repository" or query templates that expect
owner/name use owner + "/" + name; keep existing defaults for limit/per_page.
Ensure you update references to repo in the query construction and to the
symbols repo, owner, name, props, args in _build_tool_args.
- Around line 258-272: After receiving the initialize response in
_github_context_via_mcp, send an MCP notification for
"notifications/initialized" (i.e., a JSON-RPC message with no "id" and "method"
set to "initialized") using the same session id returned as sid so the session
is marked initialized before any further requests; call _mcp_jsonrpc_call with
the notification payload (include session id in params) immediately after the
initialize response and before issuing subsequent requests like tools/list.
In `@docker-compose.yml`:
- Around line 922-934: The mcp-gateway service is inconsistent: either remove
the docker socket bind-mount and rely on the docker-socket-proxy, or drop the
proxy dependency; prefer the proxy pattern—remove the /var/run/docker.sock
volume, keep depends_on: docker-socket-proxy, and add
DOCKER_HOST=tcp://docker-socket-proxy:2375 (the docker/mcp-gateway image honors
DOCKER_HOST). Also stop passing GITHUB_TOKEN and GITHUB_PERSONAL_ACCESS_TOKEN as
plain env vars: switch them to the repo’s secret/file-backed pattern (use Docker
secrets or file-referenced env vars per .env.example and secrets/*.txt) so
credentials aren’t exposed via docker inspect.
---
Nitpick comments:
In @.claude/CLAUDE_CONTEXT.md:
- Around line 205-217: Add a short pre-flight safety preface above the command
block that clearly states the target environment (e.g., local/dev vs
staging/prod), requires running from the repository root, and reminds users to
have a backup or rollback plan before running destructive commands; mention
specific high-impact commands by name (docker compose build --no-cache, docker
compose up -d, alembic upgrade head, bash scripts/network-migrate.sh) and
suggest using --dry-run or confirmation steps where available.
- Around line 103-113: The "## ✅ Phase 9 — CVE Elimination Results (April 14,
2026)" section (including the CVE table, the "13 Remaining HIGHs" list and the
`starlette` resolved note) contains volatile, time-sensitive security status and
should be removed from this static CLAUDE_CONTEXT.md and instead generated
automatically at build/runtime; move its content into a generated report
endpoint/file (e.g., a CI-produced security_report) and replace the section here
with a short stable link and a one-line note pointing to that generated report
(e.g., "See generated security report for current CVE status"), ensuring any
references to `docker.io/runc`, `libexpat1`, `libncursesw6`, `libnghttp2`,
`libsystemd0`, and the `starlette`/`fastapi==0.135.3`+`starlette==0.47.2`
resolution are preserved only in the generated report.
In `@agents/broski-pets-bridge/main.py`:
- Around line 178-195: The _mcp_connected function performs an outbound JSON-RPC
call on each health probe, which can add up to 1s latency and make /health
blocking; change this by either (A) introducing a small TTL cache for the last
_mcp_connected result (e.g., store last_result and last_checked timestamp and
return cached value if now - last_checked < 15s) and have the /health handler
read the cached value instead of calling _mcp_connected directly, or (B) remove
the MCP check from the /health handler and instead expose it via a separate
endpoint such as /readyz or /diag that calls _mcp_connected; update the code
paths that call _mcp_connected (the /health handler function and any readiness
routes) to use the chosen approach and ensure cache state is
thread-safe/async-safe (e.g., protect with an asyncio.Lock or atomic variables).
- Around line 690-695: The try/except around the
_github_context_via_mcp(body.message) call swallows all exceptions silently;
change it to catch Exception as e and emit a warning-level log that includes the
exception details (e.g. logger.warning("Failed to retrieve GitHub context via
MCP", exc_info=True) or include e in the message) before falling back to
github_context = ""; update the block that sets github_context when body.mode ==
"ask" to log the failure while preserving the existing empty-string fallback.
In `@docker-compose.yml`:
- Line 916: Replace the unstable image tag docker/mcp-gateway:latest with a
pinned, explicit version to prevent silent breaking changes; locate the image
declaration "image: docker/mcp-gateway:latest" in the docker-compose service for
the MCP gateway (the service consumed by broski-pets-bridge) and change it to a
specific version string (e.g., docker/mcp-gateway:<stable-tag> or use an env
variable like ${MCP_GATEWAY_IMAGE:-docker/mcp-gateway:<stable-tag>}) consistent
with how other services are pinned (redis, postgres, ollama, prom/prometheus).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e7b0231f-5946-49da-ad7f-861b09057e31
📒 Files selected for processing (5)
.claude/CLAUDE_CONTEXT.md.claude/settings.local.json.env.exampleagents/broski-pets-bridge/main.pydocker-compose.yml
✅ Files skipped from review due to trivial changes (1)
- .claude/settings.local.json
| PETS_DISCORD_ID= | ||
| PETS_WEBHOOK_SECRET= | ||
| PETS_OLLAMA_MODEL=qwen2.5:7b | ||
| MCP_GATEWAY_URL=http://mcp-gateway:8099 |
There was a problem hiding this comment.
Default here diverges from docker-compose.yml.
This file defaults MCP_GATEWAY_URL to http://mcp-gateway:8099, but docker-compose.yml line 968 defaults it to tcp://mcp-gateway:8099. The Python code in agents/broski-pets-bridge/main.py (lines 127‑133) normalises tcp:// → http://, so behaviour is fine, but the two defaults should still agree to avoid confusion when operators copy .env.example to .env (the compose default is then overridden with the HTTP form). Pick one — http:// is the correct scheme for the streaming transport and is what the adapter actually speaks.
♻️ Proposed alignment
In docker-compose.yml line 968:
- - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099}
+ - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 30-30: [UnorderedKey] The MCP_GATEWAY_URL key should go before the PETS_BRIDGE_URL key
(UnorderedKey)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.env.example at line 30, The MCP_GATEWAY_URL default is inconsistent:
.env.example uses "http://" while docker-compose.yml uses "tcp://"; update the
docker-compose.yml default for MCP_GATEWAY_URL (the env default that currently
starts with "tcp://mcp-gateway:8099") to use "http://mcp-gateway:8099" so both
defaults match the actual adapter transport (note
agents/broski-pets-bridge/main.py currently normalises tcp:// → http:// but we
should align the defaults to "http://").
| PETS_WEBHOOK_SECRET= | ||
| PETS_OLLAMA_MODEL=qwen2.5:7b | ||
| MCP_GATEWAY_URL=http://mcp-gateway:8099 | ||
| GITHUB_TOKEN= |
There was a problem hiding this comment.
GITHUB_TOKEN is a secret and should not live in .env.example.
The header of this file explicitly states "Secrets (passwords, tokens, keys) → use Docker secrets instead" (line 5), and every other token/key in the file (OPENAI, PERPLEXITY, DISCORD, JWT, MEMORY, MINIO, API_KEY, …) is commented out with a pointer to secrets/*.txt. Adding a bare GITHUB_TOKEN= slot contradicts that convention and nudges operators to paste a PAT into a file that is frequently committed by mistake.
Consider either:
- Following the established pattern — keep the slot commented and point to a secret file, and have
docker-compose.ymlread it via a Docker secret /_FILEindirection; or - At minimum, annotate it as a secret and move it next to the other
External APIscomments so its sensitivity is unambiguous.
♻️ Minimal suggested change (option 2)
-MCP_GATEWAY_URL=http://mcp-gateway:8099
-GITHUB_TOKEN=
+MCP_GATEWAY_URL=http://mcp-gateway:8099
+# GITHUB_TOKEN → secrets/github_token.txt ✅ (used by mcp-gateway for github MCP server)Note the static analysis hint about key ordering (GITHUB_TOKEN before MCP_GATEWAY_URL, both before PETS_BRIDGE_URL) — if you keep the key here, re-sort; if you move it to the External APIs block, the dotenv-linter warnings go away naturally.
🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 31-31: [UnorderedKey] The GITHUB_TOKEN key should go before the MCP_GATEWAY_URL key
(UnorderedKey)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.env.example at line 31, Remove the bare GITHUB_TOKEN= entry from
.env.example or at least mark it as a secret and relocate it into the External
APIs block to match the file's convention; specifically, either comment the
GITHUB_TOKEN line and add a pointer to secrets/github_token.txt (following the
pattern used for OPENAI, DISCORD, API_KEY, etc.) or move the key next to the
External APIs comments and annotate it as a secret, and update
docker-compose.yml to read it via a Docker secret or _FILE indirection if
necessary; ensure key ordering/sorting matches surrounding keys
(MCP_GATEWAY_URL, PETS_BRIDGE_URL) to satisfy dotenv-linter.
| async def _mcp_jsonrpc_call(message: dict, timeout_s: float, session_id: str | None = None) -> tuple[dict, str | None]: | ||
| base = _mcp_base_url() | ||
| endpoints = [f"{base}/mcp", base] | ||
| headers = {"Accept": "application/json, text/event-stream", **_mcp_headers()} | ||
| if session_id: | ||
| headers["Mcp-Session-Id"] = session_id | ||
|
|
||
| async with httpx.AsyncClient(timeout=timeout_s, headers=headers) as client: | ||
| last_err: Exception | None = None | ||
| for url in endpoints: | ||
| try: | ||
| resp = await client.post(url, json=message) | ||
| if resp.status_code == 200: | ||
| sid = resp.headers.get("Mcp-Session-Id") or session_id | ||
| return _parse_mcp_sse_payload(resp.text), sid | ||
| last_err = HTTPException(status_code=resp.status_code, detail=resp.text) | ||
| except Exception as e: | ||
| last_err = e | ||
| continue | ||
| raise last_err or RuntimeError("MCP call failed") |
There was a problem hiding this comment.
Endpoint fallback swallows SSE parse errors on the first endpoint.
When the first endpoint ({base}/mcp) returns HTTP 200 but _parse_mcp_sse_payload raises ValueError (empty body, non-SSE content, or malformed data: lines), the exception is not caught inside the loop, so the fallback to base never runs and the whole call fails. Only transport exceptions (client.post raising) or non-200 statuses currently trigger fallback.
♻️ Proposed fix — treat parse failures as a retryable error
for url in endpoints:
try:
resp = await client.post(url, json=message)
if resp.status_code == 200:
sid = resp.headers.get("Mcp-Session-Id") or session_id
- return _parse_mcp_sse_payload(resp.text), sid
+ try:
+ return _parse_mcp_sse_payload(resp.text), sid
+ except ValueError as parse_err:
+ last_err = parse_err
+ continue
last_err = HTTPException(status_code=resp.status_code, detail=resp.text)
except Exception as e:
last_err = e
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/broski-pets-bridge/main.py` around lines 156 - 175, The loop in
_mcp_jsonrpc_call currently returns immediately after a 200 response but before
parsing, so if _parse_mcp_sse_payload raises (e.g. ValueError) the fallback
endpoint is never tried; change the logic in _mcp_jsonrpc_call to call
_parse_mcp_sse_payload inside a try/except when resp.status_code == 200, capture
any parse exceptions (ValueError or general Exception) into last_err and
continue to the next endpoint instead of letting them bubble, and only return
when parsing succeeds (still preserving the sid extraction from resp.headers and
using session_id as fallback).
| def _build_tool_args(tool: dict, query: str) -> dict: | ||
| schema = tool.get("inputSchema") or {} | ||
| props = schema.get("properties") if isinstance(schema, dict) else {} | ||
| if not isinstance(props, dict): | ||
| props = {} | ||
|
|
||
| args: dict[str, object] = {} | ||
| repo = os.getenv("GITHUB_CONTEXT_REPO", "welshDog/HyperCode-V2.4") | ||
|
|
||
| q_key = "query" if "query" in props else ("q" if "q" in props else None) | ||
| if q_key: | ||
| args[q_key] = f"repo:{repo} {query}" | ||
| else: | ||
| args["query"] = f"repo:{repo} {query}" | ||
|
|
||
| if "repo" in props: | ||
| args["repo"] = repo | ||
| if "repository" in props: | ||
| args["repository"] = repo | ||
| if "owner" in props and "repo" in props: | ||
| if "/" in repo: | ||
| owner, name = repo.split("/", 1) | ||
| args["owner"] = owner | ||
| args["repo"] = name | ||
|
|
||
| if "limit" in props: | ||
| args["limit"] = 5 | ||
| if "per_page" in props: | ||
| args["per_page"] = 5 | ||
|
|
||
| return args |
There was a problem hiding this comment.
_build_tool_args can emit a malformed repo value for tools that only accept a bare repository name.
Flow for GITHUB_CONTEXT_REPO=welshDog/HyperCode-V2.4:
- Line 240‑241:
"repo" in props→args["repo"] = "welshDog/HyperCode-V2.4"(owner/name). - Line 244‑248: if the schema also has
owner, this is corrected toargs["repo"] = "HyperCode-V2.4"+args["owner"] = "welshDog". - But if the schema has
repowithoutowner, line 248 never runs and we send"welshDog/HyperCode-V2.4"into a field that most GitHub MCP tools expect to be just the name.
A slightly safer default is to split the owner/name up-front and only concatenate when the schema clearly wants owner/name:
♻️ Suggested reshape
- if "repo" in props:
- args["repo"] = repo
- if "repository" in props:
- args["repository"] = repo
- if "owner" in props and "repo" in props:
- if "/" in repo:
- owner, name = repo.split("/", 1)
- args["owner"] = owner
- args["repo"] = name
+ owner, name = (repo.split("/", 1) + [repo])[:2] if "/" in repo else ("", repo)
+ if "owner" in props and "repo" in props:
+ args["owner"] = owner or repo
+ args["repo"] = name
+ elif "repo" in props:
+ args["repo"] = name # bare repo name for single-field schemas
+ if "repository" in props:
+ args["repository"] = repo # typically full owner/name🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/broski-pets-bridge/main.py` around lines 225 - 255, _build_tool_args
currently assigns the full "owner/name" string into args["repo"], which breaks
tools that expect only the repository name; change the logic to split the
GITHUB_CONTEXT_REPO into owner and repo_name up front (e.g. owner, name =
repo.split("/",1)) and then set args values based on which props are present: if
props contains both "owner" and "repo" set args["owner"]=owner and
args["repo"]=name; if props contains only "repo" set args["repo"]=name (not
owner/name); if props contains only "owner" set args["owner"]=owner; for fields
like "repository" or query templates that expect owner/name use owner + "/" +
name; keep existing defaults for limit/per_page. Ensure you update references to
repo in the query construction and to the symbols repo, owner, name, props, args
in _build_tool_args.
| async def _github_context_via_mcp(question: str) -> str: | ||
| _resp, sid = await _mcp_jsonrpc_call( | ||
| { | ||
| "jsonrpc": "2.0", | ||
| "id": 1, | ||
| "method": "initialize", | ||
| "params": { | ||
| "protocolVersion": "2024-11-05", | ||
| "capabilities": {}, | ||
| "clientInfo": {"name": "broski-pets-bridge", "version": "2.4"}, | ||
| }, | ||
| }, | ||
| timeout_s=10.0, | ||
| ) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
MCP Model Context Protocol specification initialize notifications/initialized required before tools/list
💡 Result:
Yes, according to the Model Context Protocol (MCP) specification, the notifications/initialized notification is required before tools/list requests. The protocol defines a strict lifecycle where the client sends an 'initialize' request, receives the server's response (including capabilities like tools), and then MUST send the 'notifications/initialized' notification to signal readiness for normal operations. Only after this can operational requests like tools/list be sent. Key rules from the lifecycle specification: - Client SHOULD NOT send requests other than pings before server responds to initialize. - Server SHOULD NOT send requests other than pings/logging before receiving initialized. - This ensures proper capability negotiation before feature usage. Tools/list is a normal operation request during the operation phase, after initialization completes with the initialized notification. Servers rejecting tools/list before initialized (as seen in implementation issues) enforce this strictly. The tools specification confirms servers declare tools capability during initialize, but listing occurs post-initialized.
Citations:
- 1: https://modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle
- 2: Missing MCP Notifications/Initialized in Handshake Protocol anthropics/claude-code#1604
- 3: https://modelcontextprotocol.io/specification/latest/basic/lifecycle
- 4: https://modelcontextprotocol.io/specification/2024-11-05/basic/lifecycle/
- 5: https://modelcontextprotocol.io/specification/latest/server/tools
- 6: https://www.grizzlypeaksoftware.com/library/mcp-protocol-specification-deep-dive-b1sbttpr
🏁 Script executed:
# First, locate and read the file mentioned in the review
find . -name "main.py" -path "*/broski-pets-bridge/*" | head -20Repository: welshDog/HyperCode-V2.4
Length of output: 102
🏁 Script executed:
# Search for files in the agents directory
git ls-files agents/ | grep -E "broski.*main.py"Repository: welshDog/HyperCode-V2.4
Length of output: 259
🏁 Script executed:
# Look for the specific function and surrounding context
rg "_github_context_via_mcp" -A 30 -B 5Repository: welshDog/HyperCode-V2.4
Length of output: 5041
🏁 Script executed:
# Check if notifications/initialized is being called anywhere in the codebase
rg "notifications/initialized" --type pyRepository: welshDog/HyperCode-V2.4
Length of output: 201
🏁 Script executed:
# Find the _mcp_jsonrpc_call function definition to understand its parameters
rg "_mcp_jsonrpc_call" -A 15 --type py | head -100Repository: welshDog/HyperCode-V2.4
Length of output: 5396
🏁 Script executed:
# Verify the exact line numbers for the _github_context_via_mcp function
cat -n agents/broski-pets-bridge/main.py | grep -A 20 "async def _github_context_via_mcp"Repository: welshDog/HyperCode-V2.4
Length of output: 889
Send notifications/initialized notification after receiving initialize response.
Per the MCP specification, a client must send an initialized notification after receiving the initialize response and before issuing any subsequent requests. Several gateway implementations—including the Docker MCP gateway in streaming mode—will reject tools/list and other operational requests on sessions that have not been marked initialised, or silently return no tools. This is likely the cause if you're seeing tools unavailable in development.
The function signature already supports the required session_id parameter to pass to subsequent calls.
♻️ Suggested implementation
_resp, sid = await _mcp_jsonrpc_call(
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "broski-pets-bridge", "version": "2.4"},
},
},
timeout_s=10.0,
)
+ # Required per MCP spec before issuing further requests on this session.
+ try:
+ await _mcp_jsonrpc_call(
+ {"jsonrpc": "2.0", "method": "notifications/initialized", "params": {}},
+ timeout_s=5.0,
+ session_id=sid,
+ )
+ except Exception:
+ pass # best-effort; some gateways accept tools/list without it📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def _github_context_via_mcp(question: str) -> str: | |
| _resp, sid = await _mcp_jsonrpc_call( | |
| { | |
| "jsonrpc": "2.0", | |
| "id": 1, | |
| "method": "initialize", | |
| "params": { | |
| "protocolVersion": "2024-11-05", | |
| "capabilities": {}, | |
| "clientInfo": {"name": "broski-pets-bridge", "version": "2.4"}, | |
| }, | |
| }, | |
| timeout_s=10.0, | |
| ) | |
| async def _github_context_via_mcp(question: str) -> str: | |
| _resp, sid = await _mcp_jsonrpc_call( | |
| { | |
| "jsonrpc": "2.0", | |
| "id": 1, | |
| "method": "initialize", | |
| "params": { | |
| "protocolVersion": "2024-11-05", | |
| "capabilities": {}, | |
| "clientInfo": {"name": "broski-pets-bridge", "version": "2.4"}, | |
| }, | |
| }, | |
| timeout_s=10.0, | |
| ) | |
| # Required per MCP spec before issuing further requests on this session. | |
| try: | |
| await _mcp_jsonrpc_call( | |
| {"jsonrpc": "2.0", "method": "notifications/initialized", "params": {}}, | |
| timeout_s=5.0, | |
| session_id=sid, | |
| ) | |
| except Exception: | |
| pass # best-effort; some gateways accept tools/list without it |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/broski-pets-bridge/main.py` around lines 258 - 272, After receiving
the initialize response in _github_context_via_mcp, send an MCP notification for
"notifications/initialized" (i.e., a JSON-RPC message with no "id" and "method"
set to "initialized") using the same session id returned as sid so the session
is marked initialized before any further requests; call _mcp_jsonrpc_call with
the notification payload (include session id in params) immediately after the
initialize response and before issuing subsequent requests like tools/list.
| volumes: | ||
| - /var/run/docker.sock:/var/run/docker.sock:ro | ||
| environment: | ||
| - DOCKER_MCP_IN_CONTAINER=1 | ||
| - GITHUB_TOKEN=${GITHUB_TOKEN} | ||
| - GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_TOKEN} | ||
| expose: | ||
| - "8099" | ||
| networks: | ||
| - agents-net | ||
| depends_on: | ||
| docker-socket-proxy: | ||
| condition: service_started |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether docker-socket-proxy endpoints enabled are sufficient for docker/mcp-gateway,
# and whether any other service in the compose file already demonstrates the DOCKER_HOST-over-proxy pattern.
rg -nP --type=yaml -C2 'DOCKER_HOST\s*[:=]\s*tcp://docker-socket-proxy'
rg -nP --type=yaml -C3 'docker-socket-proxy' docker-compose.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 7894
🏁 Script executed:
# Get the full context around lines 922-934 in docker-compose.yml
sed -n '900,940p' docker-compose.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 991
🏁 Script executed:
# Get the docker-socket-proxy service configuration to see what endpoints are enabled
sed -n '742,800p' docker-compose.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 1506
🏁 Script executed:
# Search for mcp-gateway or gateway service to understand what it needs
rg -nP 'mcp.*gateway|gateway.*mcp|mcp-gateway' docker-compose.yml -i -A 5 -B 2Repository: welshDog/HyperCode-V2.4
Length of output: 937
🌐 Web query:
docker/mcp-gateway DOCKER_HOST environment variable support
💡 Result:
The docker/mcp-gateway (Docker MCP Gateway) supports the DOCKER_HOST environment variable for connecting to a custom Docker daemon, particularly in containerized or remote Docker Engine setups without Docker Desktop. Key evidence: - In GitHub issue #140, users set DOCKER_HOST=tcp://dind:2375 in a Docker Compose file for the gateway service to connect to a Docker-in-Docker (DinD) daemon. Developers confirmed it should be observed during engine discovery, noting "DOCKER_HOST is special. We should be observing that when doing engine discovery." - The gateway initializes a docker.Client (standard Go Docker client library, which respects DOCKER_HOST by default), as seen in source code pkg/gateway/run.go. - Official docs and README mention running independently of Docker Desktop (e.g., Docker CE, WSL2) with DOCKER_MCP_IN_CONTAINER=1 to bypass Desktop checks, implying standard Docker client behavior including DOCKER_HOST support. - Examples like docker-in-docker in the Docker Hub repo further indicate remote daemon compatibility. No explicit documentation lists DOCKER_HOST, but practical usage and code confirm support. For containerized gateway, pass it via environment in Compose or docker run.
Citations:
- 1: https://github.com/docker/mcp-gateway
- 2: https://docs.docker.com/reference/cli/docker/mcp/gateway/gateway_run
- 3: https://hub.docker.com/r/docker/mcp-gateway
- 4: Env variables getting overwritten docker/mcp-gateway#140
- 5: https://github.com/docker/mcp-gateway?tab=readme-ov-file
- 6: https://github.com/docker/mcp-gateway/blob/main/docs/mcp-gateway.md
- 7: https://github.com/docker/mcp-gateway/blob/main/pkg/gateway/run.go
Remove direct socket mount or route through the socket proxy — not both.
Two inconsistencies in the mcp-gateway service (lines 922–934):
-
Direct socket mount alongside unused proxy dependency. The service bind-mounts
/var/run/docker.sockread-only and declaresdepends_on: docker-socket-proxy, yet does not use it — there is noDOCKER_HOSTenvironment variable pointing to the proxy. Other services in the same compose file (coder-agent, throttle-agent, api-engine) demonstrate the correct pattern: they route throughDOCKER_HOST=tcp://docker-socket-proxy:2375without any direct socket mount. Either drop thedepends_onand explicitly document why the gateway requires direct socket access (consider an ADR), or adopt the proxy pattern by removing the volume mount and addingDOCKER_HOST=tcp://docker-socket-proxy:2375. The docker/mcp-gateway image honoursDOCKER_HOSTvia the standard Go Docker client, so the latter approach is viable. -
GITHUB_TOKENas plain environment variables. BothGITHUB_TOKENandGITHUB_PERSONAL_ACCESS_TOKENare exposed as process environment variables. Per the repository convention (documented in.env.exampleand thesecrets/*.txtpattern), sensitive credentials should be delivered via Docker secrets or file-backed environment references to prevent leakage throughdocker inspect, child processes, and futuredocker execinvocations.
🛡️ Suggested hardening — route through the socket proxy
command:
- --servers=github
- --transport=streaming
- --port=8099
- volumes:
- - /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- DOCKER_MCP_IN_CONTAINER=1
+ - DOCKER_HOST=tcp://docker-socket-proxy:2375
- GITHUB_TOKEN=${GITHUB_TOKEN}
- GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_TOKEN}The docker-socket-proxy service already enables CONTAINERS, IMAGES, INFO, PING, and VERSION endpoints, which cover standard Docker operations required by the gateway.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` around lines 922 - 934, The mcp-gateway service is
inconsistent: either remove the docker socket bind-mount and rely on the
docker-socket-proxy, or drop the proxy dependency; prefer the proxy
pattern—remove the /var/run/docker.sock volume, keep depends_on:
docker-socket-proxy, and add DOCKER_HOST=tcp://docker-socket-proxy:2375 (the
docker/mcp-gateway image honors DOCKER_HOST). Also stop passing GITHUB_TOKEN and
GITHUB_PERSONAL_ACCESS_TOKEN as plain env vars: switch them to the repo’s
secret/file-backed pattern (use Docker secrets or file-referenced env vars per
.env.example and secrets/*.txt) so credentials aren’t exposed via docker
inspect.
- Add Anthropic API key as a Docker secret for secure credential management - Replace OpenAI API key references with Anthropic API key across services - Update default OLLAMA model to qwen2.5-coder:3b for improved coding assistance - Add Memstream API configuration for enhanced memory management capabilities - Include additional diagnostic commands in Claude settings for debugging
- Mount NemoClaw agent and source directories in Docker for live scanning - Add `/health` Discord command with real-time code analysis and grading - Create standalone health report script with webhook posting capability - Implement Makefile targets for quick health checks and automated scans - Include initial health report showing current codebase status
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Makefile (1)
4-4:⚠️ Potential issue | 🟡 MinorMinor:
pythonvspython3inconsistency and missing.PHONYentries.Every other script-invocation target in this Makefile uses
python3explicitly (lines 111, 133, 135, 155, 158, 215). On hosts where onlypython3is onPATH(most modern Linux distros, CI runners withsetup-python),make health/make health-quickwill fail with "python: command not found". Also, both new targets are absent from the.PHONYline, so a stray file calledhealthin the repo would suppress the recipe.♻️ Proposed tweak
-.PHONY: help build up down logs status clean test restart network-init init start agents stop setup dev prod scan scan-quick scan-sast scan-secrets scan-deps scan-iac scan-licenses scan-report pre-commit-install scan-agent scan-all scan-build trivy-hook-install calm +.PHONY: help build up down logs status clean test restart network-init init start agents stop setup dev prod scan scan-quick scan-sast scan-secrets scan-deps scan-iac scan-licenses scan-report pre-commit-install scan-agent scan-all scan-build trivy-hook-install calm health health-quick @@ health: ## 🏥 NemoClaw code health scan + Discord webhook post `@echo` "🔍 Running NemoClaw health scan..." - `@python` scripts/health_report.py --webhook + `@python3` scripts/health_report.py --webhook health-quick: ## 🏥 NemoClaw scan — terminal only, no webhook - `@python` scripts/health_report.py + `@python3` scripts/health_report.pyAlso applies to: 120-125
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` at line 4, Update the Makefile so the health and health-quick targets invoke python3 (not python) and add both targets to the top .PHONY list; specifically change any "python" invocations inside the health and health-quick recipes to "python3" and append "health" and "health-quick" to the existing .PHONY declaration so a file named health won't override the targets.
🧹 Nitpick comments (2)
agents/broski-pets-bridge/main.py (2)
4-4: Unusedasyncioimport.Nothing in the diff (or the surrounding file) actually references
asyncio— the new MCP helpers usehttpx.AsyncClientdirectly. Trim it to keep the import list clean.♻️ Proposed tweak
-import asyncio import subprocess🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/broski-pets-bridge/main.py` at line 4, Remove the unused asyncio import at the top of the module: delete the lone "import asyncio" line in main.py (no other symbols reference asyncio; the MCP helpers use httpx.AsyncClient directly), then run the linter/tests to confirm no remaining references; if any async helpers are later needed, re-introduce asyncio where actually used.
825-868:/leaderboardblocks the event loop and does an O(N) Redis scan on every call — fine for now, but plan for growth.Two related concerns worth noting:
- The handler is
async defbut uses the synchronousredis.Redisclient (r.scan_iter,r.get) — consistent with the rest of the file, but every call blocks the FastAPI event loop for as long as the full keyspace walk takes. Once the pets corpus grows past a few thousand keys this will start eating p99.- Scanning every
pet:*key and JSON‑decoding each entry on every leaderboard request is expensive. A Redis sorted set (ZADD pet:xp <xp> <discord_id>on every XP award, thenZREVRANGE pet:xp 0 9 WITHSCORES) would turn the endpoint into O(log N + 10) and remove the blocking walk entirely. Pet metadata (name/species/level) can still be fetched as a pipelinedMGETon just the top 10 keys.Not a release blocker at current scale, but a sensible target before the pets feature goes broadly live.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/broski-pets-bridge/main.py` around lines 825 - 868, The /leaderboard handler (async def leaderboard) currently uses the synchronous redis client and a full scan of keys (r.scan_iter and r.get on pet:*) which blocks the event loop and is O(N); switch to either an async Redis client (redis.asyncio or aioredis) so leaderboard awaits nonblocking calls, or run the blocking scan/get in a threadpool, and change the data model to maintain a Redis sorted set (e.g., key "pet:xp") updated whenever pet XP changes (call ZADD pet:xp <xp> <discord_id> from your XP-award codepath) then implement leaderboard to ZREVRANGE pet:xp 0 9 WITHSCORES to get top IDs and scores and MGET (or pipeline GET) the corresponding pet:<discord_id> entries for metadata; this makes leaderboard O(log N + 10) and avoids scanning all pet:* keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@agents/broski-bot/src/cogs/health_check.py`:
- Around line 101-116: The live report keys must match the cached schema to
avoid KeyError: modify _run_scan() to include the writer's keys in its returned
dict (add files_scanned, lint_issues, ast_issues, total_issues, top_lint,
top_ast alongside the existing files, lint, ast, total, top) so both consumers
work; additionally update _build_embed() (and any code that reads the cached
payload, e.g., _load_cached fallback) to prefer the writer keys but fall back to
the existing keys using r.get("files_scanned", r.get("files")) etc., ensuring no
KeyError when a cached report is used.
- Around line 77-99: The health-check scan blocks the event loop because
_ruff_scan uses subprocess.run(timeout=60) synchronously (and _ast_scan
walks/parses files synchronously) so update _ruff_scan to use
asyncio.create_subprocess_exec and await proc.communicate() (making it async)
and offload CPU/IO heavy work in _ast_scan/_run_scan to a thread executor (e.g.,
asyncio.to_thread or loop.run_in_executor) so the slash command handler awaits
non-blocking coroutines; locate and change functions named _ruff_scan, _ast_scan
and where _run_scan is awaited to call the new async/threaded helpers.
- Around line 168-172: HealthCheck currently sets self._nemoclaw_ready based on
_NEMOCLAW.exists(), but the scanners (_ast_scan, _ruff_scan, _py_files) actually
need the ruff executable and scanable source directories; update
HealthCheck.__init__ to compute _nemoclaw_ready by checking the real scanner
prerequisites: verify ruff is available (e.g. shutil.which('ruff') or a
lightweight subprocess run like ['ruff','--version'] that returns success) and
that at least one target scan directory (the ones used by _py_files:
/app/backend_src or /app/project_agents) contains Python files (or is present);
set _nemoclaw_ready True only if both conditions pass so the cached-report
fallback is only used when live scanning truly isn’t possible. Ensure you
reference and reuse the same logic/paths used by _py_files/_ruff_scan to avoid
drift.
In `@docker-compose.secrets.yml`:
- Line 50: The ANTHROPIC_API_KEY_FILE secret is incorrectly assigned to the
hypercode-core service; remove ANTHROPIC_API_KEY_FILE:
/run/secrets/anthropic_api_key from the hypercode-core service and add
ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key to the broski-bot and
project-strategist service blocks so those services receive the secret at
runtime (they are the ones that call os.getenv("ANTHROPIC_API_KEY")); ensure the
anthropic_api_key secret exists in the secrets section and that both broski-bot
and project-strategist have the environment variable set to use the secret file.
In `@docker-compose.yml`:
- Line 347: The bind-mount for the health report is currently writable
("./reports/broski-analysis:/app/health_reports") but the container only reads
it (see agents/broski-bot/src/cogs/health_check.py using
LATEST_JSON.read_text()) while host script scripts/health_report.py writes it;
change the compose mount to be read-only by appending :ro to the mount
("./reports/broski-analysis:/app/health_reports:ro") so the container cannot
modify the cached report and matches the other :ro mounts.
In `@scripts/health_report.py`:
- Around line 120-133: The report schema in scripts/health_report.py must match
the consumer in agents/broski-bot/src/cogs/health_check.py (_build_embed expects
r["files"], r["lint"], r["ast"], r["total"], r.get("top")); update the report
dict keys accordingly: rename files_scanned→files, lint_issues→lint,
ast_issues→ast, total_issues→total, and provide a single top key (e.g., merge
top_lint and top_ast into one list or pick top_lint if present) so r.get("top")
returns the expected value; keep other fields (score, grade, label, emoji,
colour, scanned_at) unchanged.
- Around line 136-137: The write to the latest report currently uses
Path.write_text without an explicit encoding which will use the platform default
and can raise UnicodeEncodeError on Windows; update the write call that uses
latest (the Path named latest under REPORTS_DIR) to pass encoding='utf-8' when
writing the json.dumps(report, indent=2) so emoji and non‑ASCII grade labels are
written safely.
---
Outside diff comments:
In `@Makefile`:
- Line 4: Update the Makefile so the health and health-quick targets invoke
python3 (not python) and add both targets to the top .PHONY list; specifically
change any "python" invocations inside the health and health-quick recipes to
"python3" and append "health" and "health-quick" to the existing .PHONY
declaration so a file named health won't override the targets.
---
Nitpick comments:
In `@agents/broski-pets-bridge/main.py`:
- Line 4: Remove the unused asyncio import at the top of the module: delete the
lone "import asyncio" line in main.py (no other symbols reference asyncio; the
MCP helpers use httpx.AsyncClient directly), then run the linter/tests to
confirm no remaining references; if any async helpers are later needed,
re-introduce asyncio where actually used.
- Around line 825-868: The /leaderboard handler (async def leaderboard)
currently uses the synchronous redis client and a full scan of keys (r.scan_iter
and r.get on pet:*) which blocks the event loop and is O(N); switch to either an
async Redis client (redis.asyncio or aioredis) so leaderboard awaits nonblocking
calls, or run the blocking scan/get in a threadpool, and change the data model
to maintain a Redis sorted set (e.g., key "pet:xp") updated whenever pet XP
changes (call ZADD pet:xp <xp> <discord_id> from your XP-award codepath) then
implement leaderboard to ZREVRANGE pet:xp 0 9 WITHSCORES to get top IDs and
scores and MGET (or pipeline GET) the corresponding pet:<discord_id> entries for
metadata; this makes leaderboard O(log N + 10) and avoids scanning all pet:*
keys.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a4b1956b-b382-4359-9a19-6c4c39135364
📒 Files selected for processing (10)
.claude/settings.local.jsonMakefileagents/broski-bot/src/bot.pyagents/broski-bot/src/cogs/health_check.pyagents/broski-bot/src/cogs/pets.pyagents/broski-pets-bridge/main.pydocker-compose.secrets.ymldocker-compose.ymlreports/broski-analysis/latest.jsonscripts/health_report.py
✅ Files skipped from review due to trivial changes (1)
- reports/broski-analysis/latest.json
🚧 Files skipped from review as they are similar to previous changes (1)
- .claude/settings.local.json
| def _ruff_scan(roots: list[Path]) -> list[dict]: | ||
| import subprocess, json as _json | ||
| paths = [str(r) for r in roots if r.exists()] | ||
| if not paths: | ||
| return [] | ||
| try: | ||
| result = subprocess.run( | ||
| ["ruff", "check"] + paths + ["--output-format", "json"], | ||
| capture_output=True, text=True, timeout=60, | ||
| ) | ||
| raw = _json.loads(result.stdout) | ||
| out = [] | ||
| for item in raw: | ||
| loc = item.get("location") or {} | ||
| out.append({ | ||
| "file": Path(item.get("filename", "?")).name, | ||
| "line": loc.get("row"), | ||
| "msg": str(item.get("message", ""))[:65], | ||
| }) | ||
| return out | ||
| except Exception: | ||
| return [] | ||
|
|
There was a problem hiding this comment.
Major: the /health scan blocks the Discord event loop for up to ~60 s.
_ruff_scan calls subprocess.run(..., timeout=60) synchronously, and _ast_scan walks every .py file under /app/backend_src and /app/project_agents, reading and AST‑parsing each one. Both run on _run_scan() at line 202, which is awaited inline in the command handler — so the entire bot's coroutine scheduler is frozen during the scan. Slash-command ack is already out (via defer), but heartbeats, reaction handlers, other /commands, and Prometheus HTTP responses all stall until the scan finishes.
Offload to a worker thread so the event loop stays responsive:
🛠️ Proposed fix
+import asyncio
@@
- try:
- report = _run_scan()
+ try:
+ report = await asyncio.to_thread(_run_scan)
except Exception as e:
await msg.edit(content=f"❌ Scan failed: {e}", embed=None)
returnFor _ruff_scan specifically, asyncio.create_subprocess_exec + await proc.communicate() is the more idiomatic non-blocking path if you'd rather not add a thread.
Also applies to: 101-116, 174-208
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/broski-bot/src/cogs/health_check.py` around lines 77 - 99, The
health-check scan blocks the event loop because _ruff_scan uses
subprocess.run(timeout=60) synchronously (and _ast_scan walks/parses files
synchronously) so update _ruff_scan to use asyncio.create_subprocess_exec and
await proc.communicate() (making it async) and offload CPU/IO heavy work in
_ast_scan/_run_scan to a thread executor (e.g., asyncio.to_thread or
loop.run_in_executor) so the slash command handler awaits non-blocking
coroutines; locate and change functions named _ruff_scan, _ast_scan and where
_run_scan is awaited to call the new async/threaded helpers.
| def _run_scan() -> dict: | ||
| files = _py_files(_SCAN_ROOTS) | ||
| lint = _ruff_scan(_SCAN_ROOTS) | ||
| ast_issues = _ast_scan(files) | ||
| total = len(lint) + len(ast_issues) | ||
| score = max(0, min(100, round(100 - (total / max(len(files), 1)) * 40))) | ||
| letter, label, colour, emoji = _grade(score) | ||
| return { | ||
| "score": score, "grade": letter, "label": label, | ||
| "colour": colour, "emoji": emoji, | ||
| "files": len(files), "lint": len(lint), "ast": len(ast_issues), | ||
| "total": total, | ||
| "top": (lint + ast_issues)[:5], | ||
| "scanned_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"), | ||
| "live": True, | ||
| } |
There was a problem hiding this comment.
Critical: cached-report schema mismatch — /health-last and the /health cache fallback will KeyError.
_run_scan() returns a dict keyed as files, lint, ast, total, top. The cached report loaded by _load_cached() is produced by scripts/health_report.py (see lines 120‑133 of that file), which writes files_scanned, lint_issues, ast_issues, total_issues, top_lint, top_ast instead. _build_embed then does bracket‑indexing on the live names (r["files"], r["lint"], r["ast"], r["total"]) and r.get("top") (always None in the cached payload).
Consequences:
/health-lastraisesKeyError: 'files'on the very first invocation whenever a cached report exists — the command never works.- The fallback branch inside
/health(lines 179‑190) has the same bug: when NemoClaw isn't mounted, the "helpful cached embed" also crashes instead of rendering.
Pick one canonical schema and align both sides. Adapting the consumer is the smallest change:
🛠️ Proposed fix in _run_scan() + _build_embed() — match the writer's schema
- return {
- "score": score, "grade": letter, "label": label,
- "colour": colour, "emoji": emoji,
- "files": len(files), "lint": len(lint), "ast": len(ast_issues),
- "total": total,
- "top": (lint + ast_issues)[:5],
- "scanned_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"),
- "live": True,
- }
+ return {
+ "score": score, "grade": letter, "label": label,
+ "colour": colour, "emoji": emoji,
+ "files_scanned": len(files),
+ "lint_issues": len(lint),
+ "ast_issues": len(ast_issues),
+ "total_issues": total,
+ "top_lint": lint[:5],
+ "top_ast": ast_issues[:5],
+ "scanned_at": datetime.now(timezone.utc).isoformat(),
+ "live": True,
+ }
@@
- embed.add_field(name="📄 Files", value=str(r["files"]), inline=True)
- embed.add_field(name="🔍 Lint", value=str(r["lint"]), inline=True)
- embed.add_field(name="🌳 AST", value=str(r["ast"]), inline=True)
- embed.add_field(name="⚠️ Total", value=str(r["total"]), inline=True)
+ embed.add_field(name="📄 Files", value=str(r.get("files_scanned", 0)), inline=True)
+ embed.add_field(name="🔍 Lint", value=str(r.get("lint_issues", 0)), inline=True)
+ embed.add_field(name="🌳 AST", value=str(r.get("ast_issues", 0)), inline=True)
+ embed.add_field(name="⚠️ Total", value=str(r.get("total_issues", 0)), inline=True)
@@
- if r.get("top"):
- top_text = ""
- for i in r["top"][:5]:
+ top_entries = (r.get("top_lint") or []) + (r.get("top_ast") or [])
+ if top_entries:
+ top_text = ""
+ for i in top_entries[:5]:Also applies to: 128-149
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/broski-bot/src/cogs/health_check.py` around lines 101 - 116, The live
report keys must match the cached schema to avoid KeyError: modify _run_scan()
to include the writer's keys in its returned dict (add files_scanned,
lint_issues, ast_issues, total_issues, top_lint, top_ast alongside the existing
files, lint, ast, total, top) so both consumers work; additionally update
_build_embed() (and any code that reads the cached payload, e.g., _load_cached
fallback) to prefer the writer keys but fall back to the existing keys using
r.get("files_scanned", r.get("files")) etc., ensuring no KeyError when a cached
report is used.
| class HealthCheck(commands.Cog): | ||
| def __init__(self, bot: commands.Bot): | ||
| self.bot = bot | ||
| self._nemoclaw_ready = _NEMOCLAW.exists() | ||
|
|
There was a problem hiding this comment.
Minor: self._nemoclaw_ready doesn't actually gate scanning capability.
The check at line 171 is _NEMOCLAW.exists() (/app/nemoclaw_agent), but _ast_scan/_ruff_scan/_py_files never use anything from that mount — they walk /app/backend_src and /app/project_agents directly and shell out to system ruff. So _nemoclaw_ready = False can fire the cached-report fallback even when a live scan would work fine, and conversely a mounted-but-empty NemoClaw dir would still advertise "live" mode.
Consider gating on what the scanner actually needs:
♻️ Proposed tweak
- self._nemoclaw_ready = _NEMOCLAW.exists()
+ # We need at least one scan root and the `ruff` binary (or AST-only mode).
+ self._nemoclaw_ready = any(r.exists() for r in _SCAN_ROOTS)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class HealthCheck(commands.Cog): | |
| def __init__(self, bot: commands.Bot): | |
| self.bot = bot | |
| self._nemoclaw_ready = _NEMOCLAW.exists() | |
| class HealthCheck(commands.Cog): | |
| def __init__(self, bot: commands.Bot): | |
| self.bot = bot | |
| # We need at least one scan root and the `ruff` binary (or AST-only mode). | |
| self._nemoclaw_ready = any(r.exists() for r in _SCAN_ROOTS) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/broski-bot/src/cogs/health_check.py` around lines 168 - 172,
HealthCheck currently sets self._nemoclaw_ready based on _NEMOCLAW.exists(), but
the scanners (_ast_scan, _ruff_scan, _py_files) actually need the ruff
executable and scanable source directories; update HealthCheck.__init__ to
compute _nemoclaw_ready by checking the real scanner prerequisites: verify ruff
is available (e.g. shutil.which('ruff') or a lightweight subprocess run like
['ruff','--version'] that returns success) and that at least one target scan
directory (the ones used by _py_files: /app/backend_src or /app/project_agents)
contains Python files (or is present); set _nemoclaw_ready True only if both
conditions pass so the cached-report fallback is only used when live scanning
truly isn’t possible. Ensure you reference and reuse the same logic/paths used
by _py_files/_ruff_scan to avoid drift.
| HYPERCODE_JWT_SECRET_FILE: /run/secrets/jwt_secret | ||
| JWT_SECRET_FILE: /run/secrets/jwt_secret | ||
| PERPLEXITY_API_KEY_FILE: /run/secrets/perplexity_api_key | ||
| ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the backend either reads ANTHROPIC_API_KEY_FILE or has a _FILE → env bootstrap.
rg -nP --type=py -C2 '\bANTHROPIC_API_KEY(_FILE)?\b' backend/ || echo "no backend references"
# Also look for any generic _FILE → env loader pattern (e.g. Docker-secrets helper)
rg -nP --type=py -C3 '_FILE.*(open|read_text|Path)' backend/ | head -60Repository: welshDog/HyperCode-V2.4
Length of output: 88
🏁 Script executed:
# First, confirm backend directory exists and see its structure
find . -type d -name backend -o -type d -name src 2>/dev/null | head -20
ls -la backend/ 2>/dev/null || echo "No backend dir"Repository: welshDog/HyperCode-V2.4
Length of output: 1061
🏁 Script executed:
# Search for Anthropic references more broadly across the entire codebase
rg -i "anthropic" --type=py -l 2>/dev/null | head -20
rg -i "api.?key" --type=py -l 2>/dev/null | head -20Repository: welshDog/HyperCode-V2.4
Length of output: 1404
🏁 Script executed:
# Check the docker-compose files mentioned in the review
cat docker-compose.yml 2>/dev/null | head -250
cat docker-compose.secrets.yml 2>/dev/nullRepository: welshDog/HyperCode-V2.4
Length of output: 11030
🏁 Script executed:
# Search specifically in backend/app directory for Anthropic references
rg -i "anthropic" backend/app/ --type=py -n -C3Repository: welshDog/HyperCode-V2.4
Length of output: 49
🏁 Script executed:
# Check for environment loading patterns and _FILE handling
rg "os.getenv|os.environ|\.get\(" backend/app/ --type=py -A1 -B1 | grep -i "api\|key\|secret" | head -50Repository: welshDog/HyperCode-V2.4
Length of output: 4068
🏁 Script executed:
# Look at the test_docker_secrets.py file that was listed
cat tests/unit/test_docker_secrets.pyRepository: welshDog/HyperCode-V2.4
Length of output: 8887
🏁 Script executed:
# Check if there's a helper function for loading _FILE variables
rg -i "file.*open|read_text|_FILE" backend/ --type=py -B2 -A2 | head -100Repository: welshDog/HyperCode-V2.4
Length of output: 6107
🏁 Script executed:
# Check if anthropic_api_key is actually used/referenced in docker-compose.secrets.yml
grep -n "anthropic" docker-compose.secrets.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 228
🏁 Script executed:
# Search for Anthropic API usage in agents/ directory
rg -i "anthropic" agents/ --type=py -lRepository: welshDog/HyperCode-V2.4
Length of output: 668
🏁 Script executed:
# Check if broski-bot or any other service in secrets overlay uses anthropic_api_key
grep -B5 -A5 "anthropic_api_key" docker-compose.secrets.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 1087
🏁 Script executed:
# Look for any Anthropic client or SDK imports in Python files
rg "from anthropic|import anthropic|Anthropic\(" --type=pyRepository: welshDog/HyperCode-V2.4
Length of output: 2153
🏁 Script executed:
# Check if agents services are in docker-compose.secrets.yml and if they get anthropic_api_key
grep -A20 "agents\|coder-agent\|architect" docker-compose.secrets.yml | grep -B5 -A10 "anthropic"Repository: welshDog/HyperCode-V2.4
Length of output: 49
🏁 Script executed:
# Check if agents are defined in base docker-compose.yml and how they get their config
grep -A15 "agents\|coder-agent\|architect\|frontend-specialist" docker-compose.yml | head -100Repository: welshDog/HyperCode-V2.4
Length of output: 2947
🏁 Script executed:
# Look at one of the agent files to see how anthropic_api_key is passed to the config
cat agents/01-frontend-specialist/base_agent.py | head -80Repository: welshDog/HyperCode-V2.4
Length of output: 2483
🏁 Script executed:
# Check if agent services are listed in docker-compose.yml
grep -E "^\s{2}[a-z]+-agent:" docker-compose.yml | head -20Repository: welshDog/HyperCode-V2.4
Length of output: 129
🏁 Script executed:
# Check one agent's environment variables in base compose
grep -A30 "coder-agent:" docker-compose.yml | grep -A20 "environment:"Repository: welshDog/HyperCode-V2.4
Length of output: 798
🏁 Script executed:
# Confirm agents don't have ANTHROPIC_API_KEY_FILE in base compose either
grep -B5 -A15 "ANTHROPIC_API_KEY" docker-compose.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 2240
🏁 Script executed:
# Check all services in secrets overlay that reference anthropic
grep -B20 "anthropic_api_key" docker-compose.secrets.yml | grep -E "^\s{2}[a-z]+-[a-z]+:|anthropic"Repository: welshDog/HyperCode-V2.4
Length of output: 234
🏁 Script executed:
# Check if broski-bot gets anthropic_api_key in secrets overlay
grep -A40 "broski-bot:" docker-compose.secrets.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 1171
🏁 Script executed:
# List all services in base compose that have ANTHROPIC_API_KEY set
grep -B10 "ANTHROPIC_API_KEY=" docker-compose.yml | grep "^\s\s[a-z]" | grep ":"Repository: welshDog/HyperCode-V2.4
Length of output: 49
🏁 Script executed:
# Confirm hypercode-core backend does NOT use Anthropic by checking all its imports
rg "import.*anthropic|from.*anthropic" backend/app/ --type=pyRepository: welshDog/HyperCode-V2.4
Length of output: 49
🏁 Script executed:
# Check if hypercode-core has any Anthropic client instantiation
rg "Anthropic\(|AsyncAnthropic\(" backend/app/ --type=py -C2Repository: welshDog/HyperCode-V2.4
Length of output: 49
🏁 Script executed:
# Verify broski-bot gets plain ANTHROPIC_API_KEY from base compose
grep -A50 "broski-bot:" docker-compose.yml | grep "ANTHROPIC_API_KEY"Repository: welshDog/HyperCode-V2.4
Length of output: 113
🏁 Script executed:
# Check if other agents besides project-strategist also use ANTHROPIC_API_KEY
grep -B5 "ANTHROPIC_API_KEY=" docker-compose.yml | grep -B5 "ANTHROPIC_API_KEY" | grep "^\s\s[a-z].*:" | tail -10Repository: welshDog/HyperCode-V2.4
Length of output: 49
🏁 Script executed:
# Double-check: list all services in base compose with ANTHROPIC_API_KEY
grep -n "ANTHROPIC_API_KEY=" docker-compose.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 169
🏁 Script executed:
# Show context for each service that has ANTHROPIC_API_KEY
awk '/^ [a-z-]+:$/ {service=$0} /ANTHROPIC_API_KEY=/ {print NR": "service" -> "$0}' docker-compose.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 213
🏁 Script executed:
# Check if anthropic_api_key secret is referenced by any other service in secrets overlay
grep -n "anthropic_api_key" docker-compose.secrets.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 228
🏁 Script executed:
# Verify the complete picture: all services in secrets overlay and their secrets
grep -A5 "^\s\s[a-z-]*bot:\|^\s\s[a-z-]*agent:\|^\s\shypercode-core:" docker-compose.secrets.yml | grep -A4 "secrets:"Repository: welshDog/HyperCode-V2.4
Length of output: 274
🏁 Script executed:
# Check full secrets override for project-strategist and all agent services
grep -A10 "project-strategist:\|coder-agent:\|tips-tricks-writer:" docker-compose.secrets.ymlRepository: welshDog/HyperCode-V2.4
Length of output: 767
🏁 Script executed:
# Verify which services actually need ANTHROPIC_API_KEY by checking agent code
rg "os.getenv\(.*ANTHROPIC" agents/ --type=py -B2 -A2Repository: welshDog/HyperCode-V2.4
Length of output: 5103
Incorrect service assignment — anthropic_api_key secret is wired to the wrong service.
The anthropic_api_key secret is being passed to hypercode-core, which does not use Anthropic at all. The backend has no Anthropic imports or client instantiation. However, broski-bot and project-strategist agents—which do read os.getenv("ANTHROPIC_API_KEY")—are not receiving ANTHROPIC_API_KEY_FILE overrides in the secrets compose. When operators run with the secrets overlay and no plain .env var set, these agents will silently resolve the key to an empty string and fail at runtime with authentication errors.
Move anthropic_api_key from the hypercode-core service override to the services that actually need it (broski-bot and project-strategist), and provide ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key environment variable to each.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.secrets.yml` at line 50, The ANTHROPIC_API_KEY_FILE secret is
incorrectly assigned to the hypercode-core service; remove
ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key from the hypercode-core
service and add ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key to the
broski-bot and project-strategist service blocks so those services receive the
secret at runtime (they are the ones that call os.getenv("ANTHROPIC_API_KEY"));
ensure the anthropic_api_key secret exists in the secrets section and that both
broski-bot and project-strategist have the environment variable set to use the
secret file.
| - ./agents/broski-nemoclaw-agent:/app/nemoclaw_agent:ro | ||
| - ./backend:/app/backend_src:ro | ||
| - ./agents:/app/project_agents:ro | ||
| - ./reports/broski-analysis:/app/health_reports |
There was a problem hiding this comment.
Mount the health-report directory read-only.
./reports/broski-analysis:/app/health_reports is read-write, but the only consumer inside the container — agents/broski-bot/src/cogs/health_check.py — just calls LATEST_JSON.read_text() (line 122). Nothing in the bot writes back; the writer is scripts/health_report.py running on the host. Dropping write access inside the container matches the :ro pattern already used for nemoclaw_agent, backend_src, and project_agents on lines 344‑346, and prevents a compromised bot process from tampering with the cached report.
♻️ Proposed tweak
- - ./reports/broski-analysis:/app/health_reports
+ - ./reports/broski-analysis:/app/health_reports:ro🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` at line 347, The bind-mount for the health report is
currently writable ("./reports/broski-analysis:/app/health_reports") but the
container only reads it (see agents/broski-bot/src/cogs/health_check.py using
LATEST_JSON.read_text()) while host script scripts/health_report.py writes it;
change the compose mount to be read-only by appending :ro to the mount
("./reports/broski-analysis:/app/health_reports:ro") so the container cannot
modify the cached report and matches the other :ro mounts.
| report = { | ||
| "score": score, | ||
| "grade": letter, | ||
| "label": label, | ||
| "emoji": emoji, | ||
| "colour": colour, | ||
| "files_scanned": len(files), | ||
| "lint_issues": len(ruff_issues), | ||
| "ast_issues": len(ast_issues), | ||
| "total_issues": total, | ||
| "top_lint": top_lint, | ||
| "top_ast": top_ast, | ||
| "scanned_at": datetime.now(timezone.utc).isoformat(), | ||
| } |
There was a problem hiding this comment.
Field naming diverges from the Discord cog that consumes this file — cached fallback is broken.
reports/broski-analysis/latest.json produced here uses files_scanned, lint_issues, ast_issues, total_issues, top_lint, top_ast. The new consumer agents/broski-bot/src/cogs/health_check.py indexes r["files"], r["lint"], r["ast"], r["total"], r.get("top") (see _build_embed lines 138‑149 in that file), so /health-last and the /health cache-fallback path will raise KeyError the first time they're exercised. See the root‑cause comment on health_check.py — pick one canonical schema and align both producers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/health_report.py` around lines 120 - 133, The report schema in
scripts/health_report.py must match the consumer in
agents/broski-bot/src/cogs/health_check.py (_build_embed expects r["files"],
r["lint"], r["ast"], r["total"], r.get("top")); update the report dict keys
accordingly: rename files_scanned→files, lint_issues→lint, ast_issues→ast,
total_issues→total, and provide a single top key (e.g., merge top_lint and
top_ast into one list or pick top_lint if present) so r.get("top") returns the
expected value; keep other fields (score, grade, label, emoji, colour,
scanned_at) unchanged.
| latest = REPORTS_DIR / "latest.json" | ||
| latest.write_text(json.dumps(report, indent=2)) |
There was a problem hiding this comment.
Specify UTF‑8 when writing the report — emoji in label/emoji will blow up on Windows.
Path.write_text without encoding= uses locale.getpreferredencoding(), which is typically cp1252 on Windows. The report embeds emoji (🏆, 🆘, etc.) and the grade labels, so this call will raise UnicodeEncodeError on the very same platform you already added sys.stdout.reconfigure / PYTHONUTF8=1 to protect against (lines 24‑27). Lock the file encoding explicitly so the make health target doesn't crash mid-run for contributors on Windows.
♻️ Proposed fix
- latest.write_text(json.dumps(report, indent=2))
+ latest.write_text(json.dumps(report, indent=2, ensure_ascii=False), encoding="utf-8")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| latest = REPORTS_DIR / "latest.json" | |
| latest.write_text(json.dumps(report, indent=2)) | |
| latest = REPORTS_DIR / "latest.json" | |
| latest.write_text(json.dumps(report, indent=2, ensure_ascii=False), encoding="utf-8") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/health_report.py` around lines 136 - 137, The write to the latest
report currently uses Path.write_text without an explicit encoding which will
use the platform default and can raise UnicodeEncodeError on Windows; update the
write call that uses latest (the Path named latest under REPORTS_DIR) to pass
encoding='utf-8' when writing the json.dumps(report, indent=2) so emoji and
non‑ASCII grade labels are written safely.
Remove explicit bind mount configurations for redis, postgres, and minio volumes to use Docker-managed volumes instead. This eliminates dependency on host-specific paths that may not exist. Set default passwords for PostgreSQL and MinIO services to prevent runtime failures when environment variables are unset. This ensures services can start in development environments without requiring all secrets to be configured.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Around line 974-975: Add a healthcheck to the mcp-gateway service that
verifies the MCP streaming transport is accepting connections (e.g., a TCP probe
to :8099 or an HTTP GET /health) and change broski-pets-bridge's depends_on from
service_started to service_healthy so Docker waits for mcp-gateway readiness;
update the docker-compose mcp-gateway service block to include a minimal
healthcheck stanza and adjust broski-pets-bridge depends_on accordingly so the
bridge's _mcp_jsonrpc_call won't run before the gateway is actually ready.
- Line 959: Update the MCP_GATEWAY_URL default in the docker-compose.yml to use
the HTTP scheme so it matches .env.example and any consumers that expect an HTTP
URL: change MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} to use
http:// (e.g. MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099});
reference the MCP_GATEWAY_URL variable and the
agents/broski-pets-bridge/main.py::_mcp_base_url rewrite when making the change
so the code no longer relies on a tcp:// → http:// shim.
- Line 130: Replace unsafe hardcoded defaults like POSTGRES_PASSWORD:
${POSTGRES_PASSWORD:-hypercode} and similar occurrences for MINIO_ROOT_USER,
MINIO_ROOT_PASSWORD, DB_PASSWORD (broski-bot) across services (hypercode-core,
celery-worker, crew-orchestrator, healer-agent, hyper-architect, hyper-worker,
hyperhealth-api, hyperhealth-worker) with a fail-fast or environment-gated
approach: either require the variable with the Docker Compose parameter
expansion form ${VAR:?error message} so compose aborts when missing, or
conditionally allow a development-only fallback when ENVIRONMENT==development
and refuse to start (or read from docker-compose.secrets.yml) when
ENVIRONMENT!=development; update the referenced env uses in docker-compose.yml
accordingly so production/staging never silently get the literal "hypercode"
password.
In `@reports/docker-environment-status-report.md`:
- Around line 125-133: The report flags four services (agent-x, hyper-architect,
hyper-observer, hyper-worker) as publishing on 0.0.0.0 but omits the compose
binding context; update the HIGH finding to add a single clarifying sentence
that explains the observation methodology (e.g., that the port list came from
docker ps / host netstat behavior on Docker Desktop/WSL2 where vpnkit often
shows 0.0.0.0) and cross-reference docker-compose.yml bindings (which show
127.0.0.1:<port>:<port>) so readers understand the apparent discrepancy and that
host ports may actually be loopback-only.
- Around line 110-170: The remediation guidance is stale because redis-data,
postgres-data and minio_data were converted from local bind mounts to
Docker-managed volumes in this PR; update
reports/docker-environment-status-report.md to either regenerate the report
after this PR or add a short “Superseded by PR `#84`” note clarifying that the
CRITICAL advice to create H:\HyperStation zone\HyperCode\volumes\* no longer
applies to redis-data, postgres-data and minio_data, while keeping the remaining
bind-mounts (grafana-data, prometheus-data, ollama-data, tempo-data, loki-data,
alertmanager-data, chroma_data, agent_memory) and HC_DATA_ROOT remediation
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 127f6b26-cf1b-43be-9364-8c492c0276f9
📒 Files selected for processing (2)
docker-compose.ymlreports/docker-environment-status-report.md
| environment: | ||
| POSTGRES_USER: ${POSTGRES_USER:-postgres} | ||
| POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-} | ||
| POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode} |
There was a problem hiding this comment.
Hardcoded hypercode default credentials are a production footgun.
POSTGRES_PASSWORD, MINIO_ROOT_USER, MINIO_ROOT_PASSWORD, DB_PASSWORD (broski-bot) and every ${POSTGRES_PASSWORD:-hypercode} in downstream URLs (hypercode-core, celery-worker, crew-orchestrator, healer-agent, hyper-architect, hyper-worker, hyperhealth-api, hyperhealth-worker) will now silently fall back to the literal string hypercode when the env var is unset. That avoids dev startup failures as intended, but it also means a misconfigured prod/staging bring-up gets a predictable, publicly-documented password on postgres and the MinIO console — with MinIO's console and Postgres reachable from any container on data-net.
At minimum, gate the fallback on a non-production flag, or refuse to start when ENVIRONMENT!=development and the secret is missing. Consider reading the secret from the existing docker-compose.secrets.yml mechanism instead of env-var defaults.
🛡️ One possible shape
- POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode}
+ POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set (see .env.example / secrets/)}${VAR:?message} aborts docker compose with a clear error instead of booting with a known-bad credential.
Also applies to: 281-286, 497-498, 1126-1126, 1793-1793
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` at line 130, Replace unsafe hardcoded defaults like
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode} and similar occurrences for
MINIO_ROOT_USER, MINIO_ROOT_PASSWORD, DB_PASSWORD (broski-bot) across services
(hypercode-core, celery-worker, crew-orchestrator, healer-agent,
hyper-architect, hyper-worker, hyperhealth-api, hyperhealth-worker) with a
fail-fast or environment-gated approach: either require the variable with the
Docker Compose parameter expansion form ${VAR:?error message} so compose aborts
when missing, or conditionally allow a development-only fallback when
ENVIRONMENT==development and refuse to start (or read from
docker-compose.secrets.yml) when ENVIRONMENT!=development; update the referenced
env uses in docker-compose.yml accordingly so production/staging never silently
get the literal "hypercode" password.
| - PETS_OLLAMA_MODEL=${PETS_OLLAMA_MODEL:-qwen2.5:7b} | ||
| - WORKSPACE_PATH=/workspace | ||
| - SQUAD_JSON_PATH=/workspace/squad.json | ||
| - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} |
There was a problem hiding this comment.
Default scheme mismatch: compose uses tcp://, .env.example uses http://.
MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} works because agents/broski-pets-bridge/main.py::_mcp_base_url rewrites tcp:// → http://, but the PR's .env.example default is http://mcp-gateway:8099. Align the two so operators copying .env.example don't end up with a different effective URL than the unset-env default, and so anything else that reads this var (e.g. future consumers without the rewrite shim) sees a valid HTTP URL.
♻️ Proposed tweak
- - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099}
+ - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} | |
| - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` at line 959, Update the MCP_GATEWAY_URL default in the
docker-compose.yml to use the HTTP scheme so it matches .env.example and any
consumers that expect an HTTP URL: change
MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} to use http:// (e.g.
MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}); reference the
MCP_GATEWAY_URL variable and the
agents/broski-pets-bridge/main.py::_mcp_base_url rewrite when making the change
so the code no longer relies on a tcp:// → http:// shim.
| mcp-gateway: | ||
| condition: service_started |
There was a problem hiding this comment.
mcp-gateway has no healthcheck, so service_started is effectively a no-op gate.
broski-pets-bridge uses depends_on: mcp-gateway: service_started, which only waits for the container process to spawn — not for the MCP streaming transport on :8099 to be accepting requests. The bridge's own healthcheck (curl /health) will then run MCP calls (see _mcp_jsonrpc_call) that may fail during the gateway's warm-up, leading to avoidable mcp_connected=false in the first few health cycles.
Adding even a minimal TCP-probe healthcheck to mcp-gateway and switching the dependent to service_healthy would make startup ordering deterministic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yml` around lines 974 - 975, Add a healthcheck to the
mcp-gateway service that verifies the MCP streaming transport is accepting
connections (e.g., a TCP probe to :8099 or an HTTP GET /health) and change
broski-pets-bridge's depends_on from service_started to service_healthy so
Docker waits for mcp-gateway readiness; update the docker-compose mcp-gateway
service block to include a minimal healthcheck stanza and adjust
broski-pets-bridge depends_on accordingly so the bridge's _mcp_jsonrpc_call
won't run before the gateway is actually ready.
| ## Volume Mount Integrity | ||
|
|
||
| ### CRITICAL | ||
|
|
||
| Multiple Docker volumes are configured as local-driver bind mounts to: | ||
| - H:/HyperStation zone/HyperCode/volumes/<service> | ||
|
|
||
| On this Windows filesystem, H:\HyperStation zone\HyperCode\volumes does not exist. | ||
|
|
||
| Observed symptoms consistent with missing/invalid bind targets: | ||
| - redis log indicates failure saving RDB temp file under /data | ||
| - minio reports “no online disks found” and quorum failures | ||
|
|
||
| ## Port Exposure Validation | ||
|
|
||
| ### HIGH (Public binds on 0.0.0.0) | ||
|
|
||
| The following running containers publish ports on all interfaces: | ||
| - agent-x: 8080/tcp (0.0.0.0 and [::]) | ||
| - hyper-architect: 8091/tcp (0.0.0.0 and [::]) | ||
| - hyper-observer: 8092/tcp (0.0.0.0 and [::]) | ||
| - hyper-worker: 8093/tcp (0.0.0.0 and [::]) | ||
|
|
||
| Everything else observed is localhost-bound (127.0.0.1) or internal-only. | ||
|
|
||
| ## Security Vulnerabilities (Images) | ||
|
|
||
| ### HIGH (Security visibility gap) | ||
|
|
||
| The always-on Trivy scanner container is running, but it is not producing the expected output file: | ||
| - Expected: reports/security/trivy-hypercode-core.json | ||
| - Current state: missing | ||
| - Scanner log shows: context deadline exceeded during analysis of hypercode-core:latest | ||
|
|
||
| Impact: | ||
| - Current HIGH/CRITICAL vulnerability counts for hypercode-core cannot be confirmed from this environment snapshot. | ||
|
|
||
| ## Environment Variable Consistency | ||
|
|
||
| Compose evaluation reports unset variables (defaulting to blank string): | ||
| - API_KEY | ||
| - DISCORD_TOKEN | ||
| - HYPERCODE_JWT_SECRET | ||
| - HYPERCODE_MEMORY_KEY | ||
| - MINIO_ROOT_USER | ||
| - MINIO_ROOT_PASSWORD | ||
|
|
||
| Observed runtime impact: | ||
| - broski-bot fails because DB password resolves to blank. | ||
|
|
||
| ## Remediation Recommendations | ||
|
|
||
| ### CRITICAL (Do first) | ||
|
|
||
| - Fix persistent volume bind targets: | ||
| - Create the missing host directory tree under H:\HyperStation zone\HyperCode\volumes\* | ||
| - Or update HC_DATA_ROOT so it points to an existing, Docker-shared path and recreate the volumes | ||
| - Restore data services: | ||
| - Bring up postgres + redis and confirm they become healthy before starting dependent services | ||
| - Fix required secrets/env: | ||
| - Ensure POSTGRES_PASSWORD (or DB_PASSWORD) is set so broski-bot can initialize its DB |
There was a problem hiding this comment.
Remediation section is already partly stale relative to this PR's compose changes.
The "Volume Mount Integrity / CRITICAL" finding and the first "Remediation Recommendations → CRITICAL" bullet both tell operators to “Create the missing host directory tree under H:\HyperStation zone\HyperCode\volumes\*” for postgres, redis and minio. In this same PR the redis-data, postgres-data and minio_data volumes were switched from local-driver bind mounts to Docker-managed volumes, so those three specific symptoms (redis RDB write failures under /data, minio "no online disks", postgres client-disconnect storms tied to the bind path) are no longer reproducible from a fresh docker compose up.
Either regenerate the report after this PR lands, or add a short "Superseded by PR #84" note so readers don't waste time creating Windows directories that the compose file no longer references. The observability-stack volumes (grafana-data, prometheus-data, ollama-data, tempo-data, loki-data, alertmanager-data, chroma_data, agent_memory) are still bind-mounted to ${HC_DATA_ROOT}, so that subset of the finding is still accurate and worth keeping.
🧰 Tools
🪛 LanguageTool
[grammar] ~117-~117: The singular determiner ‘this’ may not agree with the plural noun ‘windows’. Did you mean “these”?
Context: ...on zone/HyperCode/volumes/ On this Windows filesystem, H:\HyperStation zon...
(THIS_NNS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@reports/docker-environment-status-report.md` around lines 110 - 170, The
remediation guidance is stale because redis-data, postgres-data and minio_data
were converted from local bind mounts to Docker-managed volumes in this PR;
update reports/docker-environment-status-report.md to either regenerate the
report after this PR or add a short “Superseded by PR `#84`” note clarifying that
the CRITICAL advice to create H:\HyperStation zone\HyperCode\volumes\* no longer
applies to redis-data, postgres-data and minio_data, while keeping the remaining
bind-mounts (grafana-data, prometheus-data, ollama-data, tempo-data, loki-data,
alertmanager-data, chroma_data, agent_memory) and HC_DATA_ROOT remediation
intact.
| ### HIGH (Public binds on 0.0.0.0) | ||
|
|
||
| The following running containers publish ports on all interfaces: | ||
| - agent-x: 8080/tcp (0.0.0.0 and [::]) | ||
| - hyper-architect: 8091/tcp (0.0.0.0 and [::]) | ||
| - hyper-observer: 8092/tcp (0.0.0.0 and [::]) | ||
| - hyper-worker: 8093/tcp (0.0.0.0 and [::]) | ||
|
|
||
| Everything else observed is localhost-bound (127.0.0.1) or internal-only. |
There was a problem hiding this comment.
Port exposure finding is incomplete — include protocol/interface assumptions or cross-reference the compose bindings.
The four 0.0.0.0 publishes listed (agent-x:8080, hyper-architect:8091, hyper-observer:8092, hyper-worker:8093) match the hyper profile services in docker-compose.yml, all of which declare 127.0.0.1:<port>:<port>. On Docker Desktop / WSL2 the 127.0.0.1: prefix is frequently reported as 0.0.0.0 inside docker ps because of the vpnkit proxy, even though the host port is in fact loopback-only. Without calling this out, the HIGH rating will confuse anyone who greps the compose file and sees the 127.0.0.1: prefix and assumes the report is wrong. A single-sentence note on the observation methodology (e.g. netstat -an on the Windows host vs docker ps) would disambiguate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@reports/docker-environment-status-report.md` around lines 125 - 133, The
report flags four services (agent-x, hyper-architect, hyper-observer,
hyper-worker) as publishing on 0.0.0.0 but omits the compose binding context;
update the HIGH finding to add a single clarifying sentence that explains the
observation methodology (e.g., that the port list came from docker ps / host
netstat behavior on Docker Desktop/WSL2 where vpnkit often shows 0.0.0.0) and
cross-reference docker-compose.yml bindings (which show 127.0.0.1:<port>:<port>)
so readers understand the apparent discrepancy and that host ports may actually
be loopback-only.
…M crashes - Add pre-build-check.sh script to verify disk space and memory before builds - Integrate safety check into Makefile build process - Apply memory limits to all services in docker-compose.yml to prevent cascade failures - Document OOM recovery procedure and prevention measures in runbook - Update project status files to reflect new safety measures
- Add `env_ignore_empty=True` to Pydantic settings to ignore empty environment variables - Use Header alias "Stripe-Signature" for webhook endpoint to match Stripe's header naming - Validate Stripe-Signature header presence in production environment - Skip signature verification only when both webhook secret is missing AND environment is not production
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/RUNBOOK.md (1)
313-316:⚠️ Potential issue | 🟡 MinorOrphan table row misplaced inside the Alpha Routing section.
Line 316 contains a single table row for
pgAdmindangling immediately after the rollback validation bullets. It is clearly a fragment of the Service Access Points table (lines 241–252) that was accidentally pushed out during an edit — it will render as a broken one-row table in the middle of the Alpha Routing docs.📝 Proposed fix — move the row back into the services table
At lines 241–252, add it alongside the other services:
| MinIO | http://127.0.0.1:9000 | Object storage | +| pgAdmin | http://127.0.0.1:59080 | DB admin UI |And remove the orphan at line 316:
Validation after rollback: - no OpenRouter requests occur - local Ollama and existing cloud fallback behavior remain unchanged -| pgAdmin | http://127.0.0.1:59080 | DB admin UI |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/RUNBOOK.md` around lines 313 - 316, The markdown has an orphan table row "pgAdmin | http://127.0.0.1:59080 | DB admin UI" placed immediately after the "Validation after rollback:" bullet list inside the Alpha Routing section; move that single row back into the Service Access Points table (the table that lists services alongside the other rows) so it becomes a proper table row with the other service entries, and remove the orphan row from beneath the rollback validation bullets to prevent a broken one-row table in the docs.WHATS_DONE.md (1)
20-29:⚠️ Potential issue | 🟡 MinorMinor: container count inconsistency between bullets.
Line 20 still claims
29/29 Docker containers — all healthy, while the new April‑17 bullet on line 29 reports recovery to24/24 containers. Worth reconciling so the "current state" bullet at line 20 is updated to reflect the post-OOM baseline (or the April‑17 entry is reworded to make clear it's a post-incident snapshot) — otherwise future readers won't know which number is authoritative.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WHATS_DONE.md` around lines 20 - 29, There's a mismatch between the top "current state" bullet ("29/29 Docker containers — all healthy") and the April‑17 incident bullet ("24/24 containers restored"); update the document so the authoritative current-state bullet and the April‑17 snapshot are consistent: either change the line that reads "29/29 Docker containers — all healthy" to "24/24 Docker containers — all healthy (post‑OOM baseline)" or reword the April‑17 line to explicitly mark it as an incident snapshot (e.g., "post‑incident: 24/24 restored on April 17"), ensuring both bullets reference the same canonical container count.
🧹 Nitpick comments (1)
scripts/pre-build-check.sh (1)
8-8: Considerset -euo pipefailfor stricter failure semantics.
set -ealone will not catch unset variables or broken pipes in subshells (e.g. thedocker images -f "dangling=true" -q | wc -lpipeline on line 69). For a safety guard script,set -euo pipefailis the idiomatic choice and would surface unexpected environment issues instead of silently producing zero counts.-set -e +set -euo pipefailPurely optional — the current checks do defensive fallbacks already.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/pre-build-check.sh` at line 8, Replace the current loose failure mode by changing the shell options: update the script's "set -e" to "set -euo pipefail" to catch unset variables and pipeline failures (this will affect constructs like the pipeline "docker images -f \"dangling=true\" -q | wc -l"); ensure any intentionally optional variables are given safe defaults or are referenced with ${VAR:-} so the new -u flag does not cause false failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/routes/stripe.py`:
- Around line 111-122: The webhook handler currently allows skipping signature
verification when STRIPE_WEBHOOK_SECRET is missing even in production; update
the logic in the handler (the block that sets is_production and inspects
webhook_secret and stripe_signature in backend/app/routes/stripe.py) so that if
webhook_secret is missing and is_production is True you log an error and
return/raise an HTTPException (fail closed), otherwise if webhook_secret is
missing and not production you emit a clear dev-mode warning and skip
verification; also separate the case where stripe_signature is missing in dev to
emit a distinct warning and only skip in non-production. Additionally, add a
startup validation in validate_security() (backend/app/core/config.py) to check
STRIPE_WEBHOOK_SECRET and raise/log an error when running in production so the
misconfiguration is detected at boot.
In `@scripts/pre-build-check.sh`:
- Around line 47-58: The memory check can silently skip when awk doesn't find
"MemAvailable" and FREE_MEM_MB becomes empty; update the logic around
FREE_MEM_MB so you validate its value before the numeric comparison: after
reading /proc/meminfo (using the existing FREE_MEM_MB assignment), check if
FREE_MEM_MB is empty or non-numeric (e.g. with a regex or POSIX arithmetic
test), and if so set FREE_MEM_MB=0 and emit a warning message that
"MemAvailable" was not found in /proc/meminfo and the check is falling back to
0; then keep the existing comparison against MIN_MEM_MB and the same exit
behavior. Reference symbols: FREE_MEM_MB, MIN_MEM_MB, the /proc/meminfo read and
the if [ "$FREE_MEM_MB" -gt 0 ] guard.
---
Outside diff comments:
In `@docs/RUNBOOK.md`:
- Around line 313-316: The markdown has an orphan table row "pgAdmin |
http://127.0.0.1:59080 | DB admin UI" placed immediately after the "Validation
after rollback:" bullet list inside the Alpha Routing section; move that single
row back into the Service Access Points table (the table that lists services
alongside the other rows) so it becomes a proper table row with the other
service entries, and remove the orphan row from beneath the rollback validation
bullets to prevent a broken one-row table in the docs.
In `@WHATS_DONE.md`:
- Around line 20-29: There's a mismatch between the top "current state" bullet
("29/29 Docker containers — all healthy") and the April‑17 incident bullet
("24/24 containers restored"); update the document so the authoritative
current-state bullet and the April‑17 snapshot are consistent: either change the
line that reads "29/29 Docker containers — all healthy" to "24/24 Docker
containers — all healthy (post‑OOM baseline)" or reword the April‑17 line to
explicitly mark it as an incident snapshot (e.g., "post‑incident: 24/24 restored
on April 17"), ensuring both bullets reference the same canonical container
count.
---
Nitpick comments:
In `@scripts/pre-build-check.sh`:
- Line 8: Replace the current loose failure mode by changing the shell options:
update the script's "set -e" to "set -euo pipefail" to catch unset variables and
pipeline failures (this will affect constructs like the pipeline "docker images
-f \"dangling=true\" -q | wc -l"); ensure any intentionally optional variables
are given safe defaults or are referenced with ${VAR:-} so the new -u flag does
not cause false failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e5e3adc2-1d88-48dc-9692-185f8cb583b1
📒 Files selected for processing (8)
CLAUDE.mdMakefileWHATS_DONE.mdbackend/app/core/config.pybackend/app/routes/stripe.pydocker-compose.ymldocs/RUNBOOK.mdscripts/pre-build-check.sh
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
| is_production = settings.ENVIRONMENT.lower() == "production" | ||
|
|
||
| if (not webhook_secret) or ((not is_production) and (not stripe_signature)): | ||
| logger.warning("STRIPE_WEBHOOK_SECRET not set — skipping signature check (dev mode)") | ||
| try: | ||
| import json | ||
| event = stripe.Event.construct_from(json.loads(payload), stripe.api_key) | ||
| except Exception as e: | ||
| raise HTTPException(status_code=400, detail=f"Invalid payload: {e}") | ||
| else: | ||
| if not stripe_signature: | ||
| raise HTTPException(status_code=400, detail="Missing Stripe-Signature header") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm there's no other guard refusing to boot the app in production without STRIPE_WEBHOOK_SECRET
rg -nP -C2 'STRIPE_WEBHOOK_SECRET' --type=py
ast-grep --pattern $'def validate_security($_) -> None:
$$$'Repository: welshDog/HyperCode-V2.4
Length of output: 2022
🏁 Script executed:
#!/bin/bash
# Check if validate_security() is called at app startup
rg -n 'validate_security\(\)' --type=py
# Find the main/startup entry point
fd -name 'main.py\|__main__.py\|app.py' --type f | head -5
# Check the stripe webhook endpoint definition for decorators
rg -nB5 'async def webhook' backend/app/routes/stripe.py | head -30Repository: welshDog/HyperCode-V2.4
Length of output: 441
🏁 Script executed:
#!/bin/bash
# Search for rate limiting decorators/imports in stripe.py
rg -n '@.*rate\|from.*rate\|import.*rate' backend/app/routes/stripe.py
# Check the entire route/endpoint structure
head -120 backend/app/routes/stripe.pyRepository: welshDog/HyperCode-V2.4
Length of output: 4732
Production webhook endpoint silently skips signature verification when STRIPE_WEBHOOK_SECRET is unset.
The current condition at line 113 is:
if (not webhook_secret) or ((not is_production) and (not stripe_signature)):
# skip verificationThis allows unsigned webhooks in production when STRIPE_WEBHOOK_SECRET is missing or empty—a plausible misconfiguration scenario. The logic should gate missing-secret failures exclusively to non-production environments. Currently:
- Production + secret missing → skips verification (vulnerability ✗)
- Production + secret set + signature missing → rejects (correct)
- Non-production + secret missing → skips verification (acceptable for development)
Additionally, validate_security() at startup in backend/app/core/config.py does not check for STRIPE_WEBHOOK_SECRET, so this misconfiguration only surfaces when the first webhook arrives, not at boot time.
Secondary issue: the warning message on line 114 is emitted for both "secret missing (dev mode)" and "secret set but signature missing in dev" cases, which will confuse log triage.
Fix: Fail closed in production when the secret is missing, and add STRIPE_WEBHOOK_SECRET validation to the startup validate_security() check.
Proposed code structure
is_production = settings.ENVIRONMENT.lower() == "production"
if not webhook_secret:
if is_production:
logger.error("STRIPE_WEBHOOK_SECRET missing in production — refusing to process webhook")
raise HTTPException(status_code=500, detail="Webhook not configured")
logger.warning("STRIPE_WEBHOOK_SECRET not set — skipping signature check (dev mode)")
skip_verification = True
elif (not is_production) and (not stripe_signature):
logger.warning("Stripe-Signature header missing in dev — skipping signature check")
skip_verification = True
else:
skip_verification = False
if skip_verification:
# Construct event without verification
else:
# Verify signature🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/routes/stripe.py` around lines 111 - 122, The webhook handler
currently allows skipping signature verification when STRIPE_WEBHOOK_SECRET is
missing even in production; update the logic in the handler (the block that sets
is_production and inspects webhook_secret and stripe_signature in
backend/app/routes/stripe.py) so that if webhook_secret is missing and
is_production is True you log an error and return/raise an HTTPException (fail
closed), otherwise if webhook_secret is missing and not production you emit a
clear dev-mode warning and skip verification; also separate the case where
stripe_signature is missing in dev to emit a distinct warning and only skip in
non-production. Additionally, add a startup validation in validate_security()
(backend/app/core/config.py) to check STRIPE_WEBHOOK_SECRET and raise/log an
error when running in production so the misconfiguration is detected at boot.
| FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0") | ||
|
|
||
| echo "" | ||
| if [ "$FREE_MEM_MB" -gt 0 ]; then | ||
| echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}" | ||
| if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then | ||
| echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}" | ||
| exit 1 | ||
| else | ||
| echo -e "${GREEN}✅ Memory OK${NC}" | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Memory check silently skipped when /proc/meminfo lacks MemAvailable or awk yields empty.
If /proc/meminfo exists but awk doesn't match MemAvailable (older kernels, non-standard layouts, or a WSL VM edge case), FREE_MEM_MB becomes an empty string, the fallback || echo "0" only kicks in when awk exits non-zero, and the if [ "$FREE_MEM_MB" -gt 0 ] guard fails silently — the entire memory section is skipped without any warning, defeating the purpose of the guard.
🛡️ Small hardening
-FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0")
+FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null)
+FREE_MEM_MB=${FREE_MEM_MB:-0}
echo ""
-if [ "$FREE_MEM_MB" -gt 0 ]; then
+if [ "$FREE_MEM_MB" -gt 0 ]; then
echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}"
if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then
echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}"
exit 1
else
echo -e "${GREEN}✅ Memory OK${NC}"
fi
+else
+ echo -e "${YELLOW}⚠️ Memory check skipped (could not read MemAvailable from /proc/meminfo)${NC}"
fiAlso worth noting in a comment: on Docker Desktop for Windows/macOS, /proc/meminfo reflects the Linux VM's memory, not the host's, so the threshold should be tuned accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0") | |
| echo "" | |
| if [ "$FREE_MEM_MB" -gt 0 ]; then | |
| echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}" | |
| if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then | |
| echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}" | |
| exit 1 | |
| else | |
| echo -e "${GREEN}✅ Memory OK${NC}" | |
| fi | |
| fi | |
| FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null) | |
| FREE_MEM_MB=${FREE_MEM_MB:-0} | |
| echo "" | |
| if [ "$FREE_MEM_MB" -gt 0 ]; then | |
| echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}" | |
| if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then | |
| echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}" | |
| exit 1 | |
| else | |
| echo -e "${GREEN}✅ Memory OK${NC}" | |
| fi | |
| else | |
| echo -e "${YELLOW}⚠️ Memory check skipped (could not read MemAvailable from /proc/meminfo)${NC}" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/pre-build-check.sh` around lines 47 - 58, The memory check can
silently skip when awk doesn't find "MemAvailable" and FREE_MEM_MB becomes
empty; update the logic around FREE_MEM_MB so you validate its value before the
numeric comparison: after reading /proc/meminfo (using the existing FREE_MEM_MB
assignment), check if FREE_MEM_MB is empty or non-numeric (e.g. with a regex or
POSIX arithmetic test), and if so set FREE_MEM_MB=0 and emit a warning message
that "MemAvailable" was not found in /proc/meminfo and the check is falling back
to 0; then keep the existing comparison against MIN_MEM_MB and the same exit
behavior. Reference symbols: FREE_MEM_MB, MIN_MEM_MB, the /proc/meminfo read and
the if [ "$FREE_MEM_MB" -gt 0 ] guard.
…, no cold-load on repeat requests GPU profile (hypercode-ollama-gpu) already in compose — activate with --profile gpu when NVIDIA adapter available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NER, 26 GitHub tools now live Removing DOCKER_MCP_IN_CONTAINER=1 stops the gateway trying to resolve secrets via the Docker Desktop gRPC resolver (which times out in containers on Windows WSL2). Gateway now uses GITHUB_PERSONAL_ACCESS_TOKEN env var directly. Result: mcp_connected=true in pets-bridge health, 26 GitHub MCP tools available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Session update — Moves 1, 2, 3 completeNew commits pushed to this branch:
Verified:
🤖 Generated with Claude Code |
Add nvidia-smi and Docker GPU test commands to help diagnose GPU availability issues in development environment.
Summary
hypercode-ollamacontainerbroski-pets-bridge /askendpointmake health/make health-quick+ Discord/healthslash command (live scan, grade S–D, webhook posting); first scan scored S-LEGENDARY 99/100PETS_WEBHOOK_SECRETgenerated and wired; all.env.*files reviewed and cleaned; no real keys in any committed fileTest plan
docker compose up -d --no-deps broski-pets-bridge broski-bot— confirm both start healthymake health-quick— confirm NemoClaw scanner runs and prints grade/healthin Discord — confirm embed appears with score + gradedocker exec healer-agent env | grep ANTHROPIC— confirm key present in containercurl http://localhost:8000/metrics— confirm Prometheus endpoint still livegit log -S "sk-ant-" --oneline— confirm returns nothing (no keys in history)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores / Configuration
Documentation