Skip to content

feat: pets system, GPU passthrough, NemoClaw health scan & secrets hardening#84

Merged
welshDog merged 13 commits intomainfrom
feat/pets-gpu-mcp-leaderboard
Apr 17, 2026
Merged

feat: pets system, GPU passthrough, NemoClaw health scan & secrets hardening#84
welshDog merged 13 commits intomainfrom
feat/pets-gpu-mcp-leaderboard

Conversation

@welshDog
Copy link
Copy Markdown
Owner

@welshDog welshDog commented Apr 16, 2026

Summary

  • BROski Pets — chat, ask, feed, powers commands + pet XP leaderboard endpoint
  • NVIDIA GPU passthrough — enabled for hypercode-ollama container
  • MCP GitHub context — wired into broski-pets-bridge /ask endpoint
  • LLM stack hardened — Anthropic API key added as Docker secret; OpenAI replaced by Anthropic + Perplexity across all services; priority chain: Claude → Perplexity → Ollama → fallback
  • NemoClaw health scanmake health / make health-quick + Discord /health slash command (live scan, grade S–D, webhook posting); first scan scored S-LEGENDARY 99/100
  • Secrets auditPETS_WEBHOOK_SECRET generated and wired; all .env.* files reviewed and cleaned; no real keys in any committed file

Test plan

  • docker compose up -d --no-deps broski-pets-bridge broski-bot — confirm both start healthy
  • make health-quick — confirm NemoClaw scanner runs and prints grade
  • /health in Discord — confirm embed appears with score + grade
  • docker exec healer-agent env | grep ANTHROPIC — confirm key present in container
  • curl http://localhost:8000/metrics — confirm Prometheus endpoint still live
  • git log -S "sk-ant-" --oneline — confirm returns nothing (no keys in history)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Pet leaderboard, live code health scan (Discord), GitHub context for bot, MCP gateway, Ollama GPU support, MCP gateway URL support.
    • Pre-build safety check and health-reporting commands added.
  • Chores / Configuration

    • Default credentials/fallbacks updated and service resource limits set.
    • New secret handling and service startup ordering tweaks.
  • Documentation

    • Expanded runbooks, reports and project context/roadmap added.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an MCP gateway service and GPU Ollama variant to docker-compose; integrates MCP-based GitHub context and a leaderboard into broski-pets-bridge; introduces live code health scanning (script, cached report, Discord cog/commands), pre-build safety checks, secrets/env additions, config tweaks (ignore empty envs), and documentation/CLAUDE updates.

Changes

Cohort / File(s) Summary
Docker Compose & secrets
docker-compose.yml, docker-compose.secrets.yml
Removed host bind-backed volume driver_opts for several volumes; set default Postgres and MinIO credential fallbacks; added mcp-gateway (agents profile) with docker socket mount and resource/security constraints; exposed hypercode-ollama on localhost and added hypercode-ollama-gpu (gpu profile) with NVIDIA reservation and healthcheck; wired anthropic_api_key Docker secret and updated service envs; added MCP gateway URL envs.
Broski bridge (MCP integration)
agents/broski-pets-bridge/main.py
Added MCP helpers (URL/auth normaliser, SSE data: parsing, JSON‑RPC caller with timeouts/session propagation), MCP probe (_mcp_connected), best‑effort GitHub-context fetch via MCP tools and injection into pet chat prompts; /health now reports mcp_connected; new /leaderboard endpoint reading Redis pet:* keys.
Broski bot — commands & cogs
agents/broski-bot/src/cogs/health_check.py, agents/broski-bot/src/cogs/pets.py, agents/broski-bot/src/bot.py
Added health_check cog providing /health and /health-last (live AST + ruff scan or cached fallback, grading and embed generation); added pet leaderboard command calling bridge /leaderboard; registered health cog in bot setup.
Health tooling & build checks
scripts/health_report.py, scripts/pre-build-check.sh, Makefile, reports/broski-analysis/latest.json
New scripts/health_report.py that runs AST+ruff scans, grades repo, writes reports/broski-analysis/latest.json, and can post a Discord webhook; new scripts/pre-build-check.sh for disk/memory/docker sanity checks; Makefile build depends on pre-build-check and new health/health-quick targets; added sample latest JSON report.
Backend config & routes
backend/app/core/config.py, backend/app/routes/stripe.py
Set Pydantic Settings env_ignore_empty=True to ignore empty env vars; updated Stripe webhook handler to read Stripe-Signature header explicitly and tighten signature verification behaviour based on environment.
Docs, runbooks & CLAUDE
.claude/CLAUDE_CONTEXT.md, .claude/settings.local.json, CLAUDE.md, WHATS_DONE.md, docs/RUNBOOK.md, reports/docker-environment-status-report.md
Added extensive CLAUDE context doc and extended local CLAUDE allowlist; updated CLAUDE.md/WHATS_DONE.md and RUNBOOK to V2.4, OOM recovery, memory limits and pre-build guards; added Docker environment status report.
Env examples & CI helpers
.env.example, docker-compose.*
Added MCP_GATEWAY_URL and GITHUB_TOKEN examples to .env.example; various compose env updates to embed new default password fallbacks and added MCP/MEMSTREAM/ANTHROPIC envs.
Docker secrets & service envs
docker-compose.secrets.yml, docker-compose.yml (services: hypercode-core, broski-bot, healer-agent, project-strategist, etc.)
Added anthropic_api_key secret; updated many services to reference Anthropic/Perplexity envs and removed OPENAI_API_KEY where applicable; unified DB connection strings to fall back to hypercode password.
Static reports and miscellaneous
reports/broski-analysis/latest.json, reports/docker-environment-status-report.md
New generated reports: detailed code health JSON report and Docker environment status Markdown.

Sequence Diagram(s)

sequenceDiagram
  participant Bridge as Broski Bridge
  participant MCP as MCP Gateway
  participant GitHub as GitHub (via MCP tool)
  participant LLM as Local LLM / Ollama

  rect rgba(173,216,230,0.5)
    Bridge->>MCP: GET /tools/list (discover available tools)
    MCP-->>Bridge: tools metadata
  end

  rect rgba(144,238,144,0.5)
    Bridge->>MCP: POST /tools/call (invoke GitHub search tool + args)
    MCP->>GitHub: execute tool request
    GitHub-->>MCP: tool output (chunked text)
    MCP-->>Bridge: aggregated tool output (SSE / JSON-RPC)
  end

  rect rgba(255,228,181,0.5)
    Bridge->>LLM: construct prompt (include GitHub context) and send
    LLM-->>Bridge: response (stream/SSE)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through compose and scripts tonight,

gateways, probes and GPUs now in sight,
I fetched some repos via MCP’s hand,
scanned the code and ranked the land,
a tidy report — the rabbit’s delight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the main features added: pets system, GPU passthrough, NemoClaw health scanning, and secrets hardening, matching the substantial changes across docker-compose.yml, broski-pets-bridge, health_check cog, and security hardening.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/pets-gpu-mcp-leaderboard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-compose.yml`:
- Around line 1723-1727: Add a start_period to the GPU container healthcheck to
avoid false-negative restarts during GPU driver/CUDA warm-up: update the
healthcheck block (the keys test: ["CMD", "ollama", "list"], interval, timeout,
retries) to include start_period: 30s–60s (use 60s for slower cold starts) so
the container won’t be marked unhealthy while the runtime warms; also apply the
same change to the CPU healthcheck block for parity.
- Around line 1700-1735: The GPU and CPU Ollama services conflict on port,
volume, and DNS; make them mutually exclusive by putting the CPU variant behind
a "cpu" profile (add profile: ["cpu"] to the hypercode-ollama service), remove
the host port publication and the network alias from hypercode-ollama-gpu (drop
ports: and the agents-net.aliases entry) so it does not collide, and set
container_name and/or hostname "hypercode-ollama" on each variant
(hypercode-ollama and hypercode-ollama-gpu) so whichever profile is active
answers at the expected name; also update dependents (e.g., broski-pets-bridge
depends_on) to rely on the canonical name or ensure they are profile-aware; if
you ever need both to run, give hypercode-ollama-gpu a distinct host port and
alias instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b28f10cb-3b2c-4ff6-9ede-872ce707015e

📥 Commits

Reviewing files that changed from the base of the PR and between d9ba50a and 9f466a1.

📒 Files selected for processing (1)
  • docker-compose.yml

Comment thread docker-compose.yml
Comment on lines +1700 to +1735
hypercode-ollama-gpu:
image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
profiles: ["gpu"]
volumes:
- ollama-data:/root/.ollama
ports:
- "127.0.0.1:11434:11434"
networks:
agents-net:
aliases:
- hypercode-ollama
deploy:
resources:
limits:
cpus: "2"
memory: 3G
reservations:
cpus: "1"
memory: 1G
devices:
- driver: nvidia
count: all
capabilities: ["gpu"]
healthcheck:
test: ["CMD", "ollama", "list"]
interval: 30s
timeout: 10s
retries: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
security_opt:
- no-new-privileges:true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Docker Compose deploy.resources.reservations.devices nvidia GPU syntax latest

💡 Result:

The syntax for reserving NVIDIA GPUs in Docker Compose under deploy.resources.reservations.devices is as follows (latest as of official Docker docs): yaml services: your-service: image: nvidia/cuda:12.0-base-ubuntu22.04 # or your GPU-enabled image deploy: resources: reservations: devices: - driver: nvidia count: 1 # or 'all', or integer capabilities: [gpu] Key points: - Required: driver: nvidia and capabilities: [gpu] - Optional: count: 1 (number of GPUs or 'all'), device_ids: ['0', '1'] (specific GPU IDs from nvidia-smi; mutually exclusive with count), options: {key: value} for driver options. - Use docker compose up (not docker stack deploy for Swarm, as deploy is ignored in Swarm mode per older notes). - Prerequisites: NVIDIA Container Toolkit installed, Docker daemon configured for GPUs. Example for all GPUs: yaml devices: - driver: nvidia count: all capabilities: [gpu] Example for specific devices: yaml devices: - driver: nvidia device_ids: ['0', '3'] capabilities: [gpu] This matches the official Docker Compose GPU support documentation and NVIDIA examples.

Citations:


🏁 Script executed:

#!/bin/bash

# Search for documentation files mentioning NVIDIA Container Toolkit or GPU setup
echo "=== Searching for GPU/NVIDIA documentation in repo ==="

# Search in common doc locations
find . -type f \( -name "README*" -o -name "CONTRIBUTING*" -o -name "SETUP*" -o -name "*install*" -o -name "*docker*" \) 2>/dev/null | head -20

# Search for NVIDIA/GPU mentions
echo ""
echo "=== Searching for 'nvidia' or 'GPU' mentions in documentation ==="
rg -i "nvidia.*toolkit|gpu.*prerequisite|container.*toolkit" --type md --type txt 2>/dev/null || echo "No matches found"

# Check if there's a docker or compose specific setup doc
echo ""
echo "=== Checking for docker-compose related docs ==="
find . -type f \( -name "*.md" -o -name "*.txt" \) -exec grep -l "docker-compose\|docker compose" {} \; 2>/dev/null | head -10

Repository: welshDog/HyperCode-V2.4

Length of output: 2989


🏁 Script executed:

#!/bin/bash

# Read the files that mention NVIDIA Container Toolkit
echo "=== Content of hyperstudio-platform/README.md (NVIDIA section) ==="
grep -A 5 -B 2 "NVIDIA" hyperstudio-platform/README.md 2>/dev/null || echo "File not found or no match"

echo ""
echo "=== Content of hyperstudio-platform/hyperstudio-setup.md (NVIDIA section) ==="
grep -A 5 -B 2 "NVIDIA" hyperstudio-platform/hyperstudio-setup.md 2>/dev/null || echo "File not found or no match"

echo ""
echo "=== Content of docs/NEXT_MOVES.md (NVIDIA/GPU section) ==="
grep -A 5 -B 2 "NVIDIA\|GPU" docs/NEXT_MOVES.md 2>/dev/null || echo "File not found or no match"

# Check main README for GPU setup info
echo ""
echo "=== Main README.md GPU/NVIDIA references ==="
grep -i "gpu\|nvidia\|ollama" README.md 2>/dev/null | head -20

Repository: welshDog/HyperCode-V2.4

Length of output: 6580


Critical: CPU and GPU Ollama services collide on host port, shared volume, and network alias.

The hypercode-ollama-gpu service is additive (profile-gated) whilst hypercode-ollama has no profile and therefore always starts. Running docker compose --profile gpu up brings up both simultaneously, which produces three concurrent conflicts:

  1. Host port bind clash — both publish 127.0.0.1:11434:11434. The second container to start will fail with bind: address already in use.
  2. Concurrent writes to the same data dir — both mount the ollama-data bind volume at /root/.ollama. Two ollama serve processes writing to the same blob store, manifests, and lockfile risk data corruption and flapping health checks.
  3. Duplicate DNS alias on agents-net — the GPU variant aliases itself as hypercode-ollama, which is also the default service's implicit alias. Docker's embedded DNS will round-robin across both IPs, so clients like hypercode-core, broski-pets-bridge, and agent-x (all hardcoded to http://hypercode-ollama:11434 per backend/app/core/config.py:41, agents/agent-x/designer.py:22, agents/broski-pets-bridge/main.py:467) may land on either endpoint unpredictably.

Additionally, broski-pets-bridge has depends_on: hypercode-ollama only, so activating the GPU profile does not gate dependents on GPU readiness.

Make the two variants mutually exclusive so operators pick one. The cleanest pattern is to put the CPU variant behind a cpu profile, drop the duplicate host-port publication and DNS alias from the GPU variant (internal service name hypercode-ollama-gpu), and let callers continue to resolve hypercode-ollama by promoting whichever variant is active via container_name/hostname.

🛠️ Suggested fix — profile-gate the CPU variant and keep a single service name resolvable
   hypercode-ollama:
+    profiles: ["cpu", "default"]
     image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
     container_name: hypercode-ollama
     volumes:
       - ollama-data:/root/.ollama
     ports:
       - "127.0.0.1:11434:11434"
     networks:
       - agents-net
   hypercode-ollama-gpu:
-    image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
+    image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14}
+    container_name: hypercode-ollama        # reuse the canonical name so clients resolve it
     profiles: ["gpu"]
     volumes:
       - ollama-data:/root/.ollama
     ports:
       - "127.0.0.1:11434:11434"
     networks:
-      agents-net:
-        aliases:
-          - hypercode-ollama
+      - agents-net

Operators then run either docker compose --profile cpu up (default) or docker compose --profile gpu up, never both. If you need both variants to coexist for benchmarking, give the GPU variant a distinct host port (e.g. 11435:11434) and a distinct service DNS name, and add an environment switch for OLLAMA_HOST on clients.

Note: The GPU syntax deploy.resources.reservations.devices with driver: nvidia, count: all, capabilities: [gpu] is confirmed to match the latest Docker Compose specification. NVIDIA Container Toolkit prerequisites are documented in docs/NEXT_MOVES.md with platform-specific installation instructions and verification steps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 1700 - 1735, The GPU and CPU Ollama services
conflict on port, volume, and DNS; make them mutually exclusive by putting the
CPU variant behind a "cpu" profile (add profile: ["cpu"] to the hypercode-ollama
service), remove the host port publication and the network alias from
hypercode-ollama-gpu (drop ports: and the agents-net.aliases entry) so it does
not collide, and set container_name and/or hostname "hypercode-ollama" on each
variant (hypercode-ollama and hypercode-ollama-gpu) so whichever profile is
active answers at the expected name; also update dependents (e.g.,
broski-pets-bridge depends_on) to rely on the canonical name or ensure they are
profile-aware; if you ever need both to run, give hypercode-ollama-gpu a
distinct host port and alias instead.

Comment thread docker-compose.yml
Add detailed system documentation covering architecture, roadmap, security posture, and operational procedures to help the AI assistant understand the HyperCode ecosystem. This includes repository structure, completed phases, network configuration, known issues, and key paths.

Also add openssl command to Claude settings for security testing purposes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
.claude/CLAUDE_CONTEXT.md (2)

205-217: Add an explicit safety preface for command scope/environment before destructive ops.

The command block includes high-impact operations (docker compose build --no-cache, alembic upgrade head). Add a brief “pre-flight” note (target environment, backup/rollback expectation, and repo-root requirement) to reduce accidental misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/CLAUDE_CONTEXT.md around lines 205 - 217, Add a short pre-flight
safety preface above the command block that clearly states the target
environment (e.g., local/dev vs staging/prod), requires running from the
repository root, and reminds users to have a backup or rollback plan before
running destructive commands; mention specific high-impact commands by name
(docker compose build --no-cache, docker compose up -d, alembic upgrade head,
bash scripts/network-migrate.sh) and suggest using --dry-run or confirmation
steps where available.

103-113: Move volatile CVE/version status to a generated report, then link it here.

These sections present time-sensitive values (CVE counts, “all Debian-unfixable”, resolved package status) as fixed truth. In an always-read boot/context file, this can go stale quickly and mislead security decisions.

Based on learnings, this file class is treated as a first-read entrypoint, so stale operational/security facts have outsized impact.

Also applies to: 241-259

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/CLAUDE_CONTEXT.md around lines 103 - 113, The "## ✅ Phase 9 — CVE
Elimination Results (April 14, 2026)" section (including the CVE table, the "13
Remaining HIGHs" list and the `starlette` resolved note) contains volatile,
time-sensitive security status and should be removed from this static
CLAUDE_CONTEXT.md and instead generated automatically at build/runtime; move its
content into a generated report endpoint/file (e.g., a CI-produced
security_report) and replace the section here with a short stable link and a
one-line note pointing to that generated report (e.g., "See generated security
report for current CVE status"), ensuring any references to `docker.io/runc`,
`libexpat1`, `libncursesw6`, `libnghttp2`, `libsystemd0`, and the
`starlette`/`fastapi==0.135.3`+`starlette==0.47.2` resolution are preserved only
in the generated report.
agents/broski-pets-bridge/main.py (2)

178-195: _mcp_connected swallows a 1 s timeout — OK, but be aware /health still pays the latency.

/health is a liveness/readiness probe that is hit every 30 s by Docker, once per request by load balancers, and on every dashboard refresh. Adding an outbound HTTP round-trip (up to 1 s on failure, more on success if the gateway is slow) on the health path can turn /health into a blocking call and cascade failures (e.g. the container gets marked unhealthy because the gateway is slow, even though the pets-bridge itself is fine).

Consider one of:

  1. Cache the last probe result with a short TTL (e.g. 15 s) and return the cached value on /health calls.
  2. Move mcp_connected out of /health into a /readyz or a dedicated /diag endpoint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` around lines 178 - 195, The _mcp_connected
function performs an outbound JSON-RPC call on each health probe, which can add
up to 1s latency and make /health blocking; change this by either (A)
introducing a small TTL cache for the last _mcp_connected result (e.g., store
last_result and last_checked timestamp and return cached value if now -
last_checked < 15s) and have the /health handler read the cached value instead
of calling _mcp_connected directly, or (B) remove the MCP check from the /health
handler and instead expose it via a separate endpoint such as /readyz or /diag
that calls _mcp_connected; update the code paths that call _mcp_connected (the
/health handler function and any readiness routes) to use the chosen approach
and ensure cache state is thread-safe/async-safe (e.g., protect with an
asyncio.Lock or atomic variables).

690-695: Bare except Exception is fine here — but consider logging the swallowed error.

The MCP call is best-effort and falling back to an empty context on failure is the right UX call. However, silently swallowing every exception makes it very hard to diagnose "why is GitHub context never populated" in production. At minimum, log at warning level so the failure surfaces in the structured log pipeline (Loki/Promtail) that this repo already ships.

         try:
             github_context = await _github_context_via_mcp(body.message)
-        except Exception:
+        except Exception as exc:
+            # MCP is optional; keep the chat flow resilient but surface the failure.
+            import logging
+            logging.getLogger(__name__).warning(
+                "MCP GitHub context fetch failed: %s: %s", type(exc).__name__, exc
+            )
             github_context = ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` around lines 690 - 695, The try/except
around the _github_context_via_mcp(body.message) call swallows all exceptions
silently; change it to catch Exception as e and emit a warning-level log that
includes the exception details (e.g. logger.warning("Failed to retrieve GitHub
context via MCP", exc_info=True) or include e in the message) before falling
back to github_context = ""; update the block that sets github_context when
body.mode == "ask" to log the failure while preserving the existing empty-string
fallback.
docker-compose.yml (1)

916-916: Avoid :latest for docker/mcp-gateway — pin a stable version tag.

docker/mcp-gateway:latest means every docker compose pull or fresh node join can silently introduce breaking changes to the MCP protocol, CLI flags, or runtime behaviour — all of which will then cascade into broski-pets-bridge (which already makes assumptions about the SSE/JSON-RPC contract). Every other first-party service in this file pins a version (redis:7-alpine, postgres:15-alpine, ollama/ollama:0.3.14, prom/prometheus:v2.55.1, etc.) via ${..._IMAGE:-<pinned>}. Do the same here.

♻️ Suggested diff
-    image: docker/mcp-gateway:latest
+    image: ${MCP_GATEWAY_IMAGE:-docker/mcp-gateway:v2}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` at line 916, Replace the unstable image tag
docker/mcp-gateway:latest with a pinned, explicit version to prevent silent
breaking changes; locate the image declaration "image:
docker/mcp-gateway:latest" in the docker-compose service for the MCP gateway
(the service consumed by broski-pets-bridge) and change it to a specific version
string (e.g., docker/mcp-gateway:<stable-tag> or use an env variable like
${MCP_GATEWAY_IMAGE:-docker/mcp-gateway:<stable-tag>}) consistent with how other
services are pinned (redis, postgres, ollama, prom/prometheus).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.env.example:
- Line 30: The MCP_GATEWAY_URL default is inconsistent: .env.example uses
"http://" while docker-compose.yml uses "tcp://"; update the docker-compose.yml
default for MCP_GATEWAY_URL (the env default that currently starts with
"tcp://mcp-gateway:8099") to use "http://mcp-gateway:8099" so both defaults
match the actual adapter transport (note agents/broski-pets-bridge/main.py
currently normalises tcp:// → http:// but we should align the defaults to
"http://").
- Line 31: Remove the bare GITHUB_TOKEN= entry from .env.example or at least
mark it as a secret and relocate it into the External APIs block to match the
file's convention; specifically, either comment the GITHUB_TOKEN line and add a
pointer to secrets/github_token.txt (following the pattern used for OPENAI,
DISCORD, API_KEY, etc.) or move the key next to the External APIs comments and
annotate it as a secret, and update docker-compose.yml to read it via a Docker
secret or _FILE indirection if necessary; ensure key ordering/sorting matches
surrounding keys (MCP_GATEWAY_URL, PETS_BRIDGE_URL) to satisfy dotenv-linter.

In `@agents/broski-pets-bridge/main.py`:
- Around line 156-175: The loop in _mcp_jsonrpc_call currently returns
immediately after a 200 response but before parsing, so if
_parse_mcp_sse_payload raises (e.g. ValueError) the fallback endpoint is never
tried; change the logic in _mcp_jsonrpc_call to call _parse_mcp_sse_payload
inside a try/except when resp.status_code == 200, capture any parse exceptions
(ValueError or general Exception) into last_err and continue to the next
endpoint instead of letting them bubble, and only return when parsing succeeds
(still preserving the sid extraction from resp.headers and using session_id as
fallback).
- Around line 225-255: _build_tool_args currently assigns the full "owner/name"
string into args["repo"], which breaks tools that expect only the repository
name; change the logic to split the GITHUB_CONTEXT_REPO into owner and repo_name
up front (e.g. owner, name = repo.split("/",1)) and then set args values based
on which props are present: if props contains both "owner" and "repo" set
args["owner"]=owner and args["repo"]=name; if props contains only "repo" set
args["repo"]=name (not owner/name); if props contains only "owner" set
args["owner"]=owner; for fields like "repository" or query templates that expect
owner/name use owner + "/" + name; keep existing defaults for limit/per_page.
Ensure you update references to repo in the query construction and to the
symbols repo, owner, name, props, args in _build_tool_args.
- Around line 258-272: After receiving the initialize response in
_github_context_via_mcp, send an MCP notification for
"notifications/initialized" (i.e., a JSON-RPC message with no "id" and "method"
set to "initialized") using the same session id returned as sid so the session
is marked initialized before any further requests; call _mcp_jsonrpc_call with
the notification payload (include session id in params) immediately after the
initialize response and before issuing subsequent requests like tools/list.

In `@docker-compose.yml`:
- Around line 922-934: The mcp-gateway service is inconsistent: either remove
the docker socket bind-mount and rely on the docker-socket-proxy, or drop the
proxy dependency; prefer the proxy pattern—remove the /var/run/docker.sock
volume, keep depends_on: docker-socket-proxy, and add
DOCKER_HOST=tcp://docker-socket-proxy:2375 (the docker/mcp-gateway image honors
DOCKER_HOST). Also stop passing GITHUB_TOKEN and GITHUB_PERSONAL_ACCESS_TOKEN as
plain env vars: switch them to the repo’s secret/file-backed pattern (use Docker
secrets or file-referenced env vars per .env.example and secrets/*.txt) so
credentials aren’t exposed via docker inspect.

---

Nitpick comments:
In @.claude/CLAUDE_CONTEXT.md:
- Around line 205-217: Add a short pre-flight safety preface above the command
block that clearly states the target environment (e.g., local/dev vs
staging/prod), requires running from the repository root, and reminds users to
have a backup or rollback plan before running destructive commands; mention
specific high-impact commands by name (docker compose build --no-cache, docker
compose up -d, alembic upgrade head, bash scripts/network-migrate.sh) and
suggest using --dry-run or confirmation steps where available.
- Around line 103-113: The "## ✅ Phase 9 — CVE Elimination Results (April 14,
2026)" section (including the CVE table, the "13 Remaining HIGHs" list and the
`starlette` resolved note) contains volatile, time-sensitive security status and
should be removed from this static CLAUDE_CONTEXT.md and instead generated
automatically at build/runtime; move its content into a generated report
endpoint/file (e.g., a CI-produced security_report) and replace the section here
with a short stable link and a one-line note pointing to that generated report
(e.g., "See generated security report for current CVE status"), ensuring any
references to `docker.io/runc`, `libexpat1`, `libncursesw6`, `libnghttp2`,
`libsystemd0`, and the `starlette`/`fastapi==0.135.3`+`starlette==0.47.2`
resolution are preserved only in the generated report.

In `@agents/broski-pets-bridge/main.py`:
- Around line 178-195: The _mcp_connected function performs an outbound JSON-RPC
call on each health probe, which can add up to 1s latency and make /health
blocking; change this by either (A) introducing a small TTL cache for the last
_mcp_connected result (e.g., store last_result and last_checked timestamp and
return cached value if now - last_checked < 15s) and have the /health handler
read the cached value instead of calling _mcp_connected directly, or (B) remove
the MCP check from the /health handler and instead expose it via a separate
endpoint such as /readyz or /diag that calls _mcp_connected; update the code
paths that call _mcp_connected (the /health handler function and any readiness
routes) to use the chosen approach and ensure cache state is
thread-safe/async-safe (e.g., protect with an asyncio.Lock or atomic variables).
- Around line 690-695: The try/except around the
_github_context_via_mcp(body.message) call swallows all exceptions silently;
change it to catch Exception as e and emit a warning-level log that includes the
exception details (e.g. logger.warning("Failed to retrieve GitHub context via
MCP", exc_info=True) or include e in the message) before falling back to
github_context = ""; update the block that sets github_context when body.mode ==
"ask" to log the failure while preserving the existing empty-string fallback.

In `@docker-compose.yml`:
- Line 916: Replace the unstable image tag docker/mcp-gateway:latest with a
pinned, explicit version to prevent silent breaking changes; locate the image
declaration "image: docker/mcp-gateway:latest" in the docker-compose service for
the MCP gateway (the service consumed by broski-pets-bridge) and change it to a
specific version string (e.g., docker/mcp-gateway:<stable-tag> or use an env
variable like ${MCP_GATEWAY_IMAGE:-docker/mcp-gateway:<stable-tag>}) consistent
with how other services are pinned (redis, postgres, ollama, prom/prometheus).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e7b0231f-5946-49da-ad7f-861b09057e31

📥 Commits

Reviewing files that changed from the base of the PR and between 9f466a1 and d91e8ad.

📒 Files selected for processing (5)
  • .claude/CLAUDE_CONTEXT.md
  • .claude/settings.local.json
  • .env.example
  • agents/broski-pets-bridge/main.py
  • docker-compose.yml
✅ Files skipped from review due to trivial changes (1)
  • .claude/settings.local.json

Comment thread .env.example
PETS_DISCORD_ID=
PETS_WEBHOOK_SECRET=
PETS_OLLAMA_MODEL=qwen2.5:7b
MCP_GATEWAY_URL=http://mcp-gateway:8099
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default here diverges from docker-compose.yml.

This file defaults MCP_GATEWAY_URL to http://mcp-gateway:8099, but docker-compose.yml line 968 defaults it to tcp://mcp-gateway:8099. The Python code in agents/broski-pets-bridge/main.py (lines 127‑133) normalises tcp://http://, so behaviour is fine, but the two defaults should still agree to avoid confusion when operators copy .env.example to .env (the compose default is then overridden with the HTTP form). Pick one — http:// is the correct scheme for the streaming transport and is what the adapter actually speaks.

♻️ Proposed alignment

In docker-compose.yml line 968:

-      - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099}
+      - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}
🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 30-30: [UnorderedKey] The MCP_GATEWAY_URL key should go before the PETS_BRIDGE_URL key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.example at line 30, The MCP_GATEWAY_URL default is inconsistent:
.env.example uses "http://" while docker-compose.yml uses "tcp://"; update the
docker-compose.yml default for MCP_GATEWAY_URL (the env default that currently
starts with "tcp://mcp-gateway:8099") to use "http://mcp-gateway:8099" so both
defaults match the actual adapter transport (note
agents/broski-pets-bridge/main.py currently normalises tcp:// → http:// but we
should align the defaults to "http://").

Comment thread .env.example
PETS_WEBHOOK_SECRET=
PETS_OLLAMA_MODEL=qwen2.5:7b
MCP_GATEWAY_URL=http://mcp-gateway:8099
GITHUB_TOKEN=
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GITHUB_TOKEN is a secret and should not live in .env.example.

The header of this file explicitly states "Secrets (passwords, tokens, keys) → use Docker secrets instead" (line 5), and every other token/key in the file (OPENAI, PERPLEXITY, DISCORD, JWT, MEMORY, MINIO, API_KEY, …) is commented out with a pointer to secrets/*.txt. Adding a bare GITHUB_TOKEN= slot contradicts that convention and nudges operators to paste a PAT into a file that is frequently committed by mistake.

Consider either:

  1. Following the established pattern — keep the slot commented and point to a secret file, and have docker-compose.yml read it via a Docker secret / _FILE indirection; or
  2. At minimum, annotate it as a secret and move it next to the other External APIs comments so its sensitivity is unambiguous.
♻️ Minimal suggested change (option 2)
-MCP_GATEWAY_URL=http://mcp-gateway:8099
-GITHUB_TOKEN=
+MCP_GATEWAY_URL=http://mcp-gateway:8099
+# GITHUB_TOKEN → secrets/github_token.txt ✅  (used by mcp-gateway for github MCP server)

Note the static analysis hint about key ordering (GITHUB_TOKEN before MCP_GATEWAY_URL, both before PETS_BRIDGE_URL) — if you keep the key here, re-sort; if you move it to the External APIs block, the dotenv-linter warnings go away naturally.

🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 31-31: [UnorderedKey] The GITHUB_TOKEN key should go before the MCP_GATEWAY_URL key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.example at line 31, Remove the bare GITHUB_TOKEN= entry from
.env.example or at least mark it as a secret and relocate it into the External
APIs block to match the file's convention; specifically, either comment the
GITHUB_TOKEN line and add a pointer to secrets/github_token.txt (following the
pattern used for OPENAI, DISCORD, API_KEY, etc.) or move the key next to the
External APIs comments and annotate it as a secret, and update
docker-compose.yml to read it via a Docker secret or _FILE indirection if
necessary; ensure key ordering/sorting matches surrounding keys
(MCP_GATEWAY_URL, PETS_BRIDGE_URL) to satisfy dotenv-linter.

Comment on lines +156 to +175
async def _mcp_jsonrpc_call(message: dict, timeout_s: float, session_id: str | None = None) -> tuple[dict, str | None]:
base = _mcp_base_url()
endpoints = [f"{base}/mcp", base]
headers = {"Accept": "application/json, text/event-stream", **_mcp_headers()}
if session_id:
headers["Mcp-Session-Id"] = session_id

async with httpx.AsyncClient(timeout=timeout_s, headers=headers) as client:
last_err: Exception | None = None
for url in endpoints:
try:
resp = await client.post(url, json=message)
if resp.status_code == 200:
sid = resp.headers.get("Mcp-Session-Id") or session_id
return _parse_mcp_sse_payload(resp.text), sid
last_err = HTTPException(status_code=resp.status_code, detail=resp.text)
except Exception as e:
last_err = e
continue
raise last_err or RuntimeError("MCP call failed")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Endpoint fallback swallows SSE parse errors on the first endpoint.

When the first endpoint ({base}/mcp) returns HTTP 200 but _parse_mcp_sse_payload raises ValueError (empty body, non-SSE content, or malformed data: lines), the exception is not caught inside the loop, so the fallback to base never runs and the whole call fails. Only transport exceptions (client.post raising) or non-200 statuses currently trigger fallback.

♻️ Proposed fix — treat parse failures as a retryable error
         for url in endpoints:
             try:
                 resp = await client.post(url, json=message)
                 if resp.status_code == 200:
                     sid = resp.headers.get("Mcp-Session-Id") or session_id
-                    return _parse_mcp_sse_payload(resp.text), sid
+                    try:
+                        return _parse_mcp_sse_payload(resp.text), sid
+                    except ValueError as parse_err:
+                        last_err = parse_err
+                        continue
                 last_err = HTTPException(status_code=resp.status_code, detail=resp.text)
             except Exception as e:
                 last_err = e
                 continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` around lines 156 - 175, The loop in
_mcp_jsonrpc_call currently returns immediately after a 200 response but before
parsing, so if _parse_mcp_sse_payload raises (e.g. ValueError) the fallback
endpoint is never tried; change the logic in _mcp_jsonrpc_call to call
_parse_mcp_sse_payload inside a try/except when resp.status_code == 200, capture
any parse exceptions (ValueError or general Exception) into last_err and
continue to the next endpoint instead of letting them bubble, and only return
when parsing succeeds (still preserving the sid extraction from resp.headers and
using session_id as fallback).

Comment on lines +225 to +255
def _build_tool_args(tool: dict, query: str) -> dict:
schema = tool.get("inputSchema") or {}
props = schema.get("properties") if isinstance(schema, dict) else {}
if not isinstance(props, dict):
props = {}

args: dict[str, object] = {}
repo = os.getenv("GITHUB_CONTEXT_REPO", "welshDog/HyperCode-V2.4")

q_key = "query" if "query" in props else ("q" if "q" in props else None)
if q_key:
args[q_key] = f"repo:{repo} {query}"
else:
args["query"] = f"repo:{repo} {query}"

if "repo" in props:
args["repo"] = repo
if "repository" in props:
args["repository"] = repo
if "owner" in props and "repo" in props:
if "/" in repo:
owner, name = repo.split("/", 1)
args["owner"] = owner
args["repo"] = name

if "limit" in props:
args["limit"] = 5
if "per_page" in props:
args["per_page"] = 5

return args
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

_build_tool_args can emit a malformed repo value for tools that only accept a bare repository name.

Flow for GITHUB_CONTEXT_REPO=welshDog/HyperCode-V2.4:

  • Line 240‑241: "repo" in propsargs["repo"] = "welshDog/HyperCode-V2.4" (owner/name).
  • Line 244‑248: if the schema also has owner, this is corrected to args["repo"] = "HyperCode-V2.4" + args["owner"] = "welshDog".
  • But if the schema has repo without owner, line 248 never runs and we send "welshDog/HyperCode-V2.4" into a field that most GitHub MCP tools expect to be just the name.

A slightly safer default is to split the owner/name up-front and only concatenate when the schema clearly wants owner/name:

♻️ Suggested reshape
-    if "repo" in props:
-        args["repo"] = repo
-    if "repository" in props:
-        args["repository"] = repo
-    if "owner" in props and "repo" in props:
-        if "/" in repo:
-            owner, name = repo.split("/", 1)
-            args["owner"] = owner
-            args["repo"] = name
+    owner, name = (repo.split("/", 1) + [repo])[:2] if "/" in repo else ("", repo)
+    if "owner" in props and "repo" in props:
+        args["owner"] = owner or repo
+        args["repo"] = name
+    elif "repo" in props:
+        args["repo"] = name  # bare repo name for single-field schemas
+    if "repository" in props:
+        args["repository"] = repo  # typically full owner/name
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` around lines 225 - 255, _build_tool_args
currently assigns the full "owner/name" string into args["repo"], which breaks
tools that expect only the repository name; change the logic to split the
GITHUB_CONTEXT_REPO into owner and repo_name up front (e.g. owner, name =
repo.split("/",1)) and then set args values based on which props are present: if
props contains both "owner" and "repo" set args["owner"]=owner and
args["repo"]=name; if props contains only "repo" set args["repo"]=name (not
owner/name); if props contains only "owner" set args["owner"]=owner; for fields
like "repository" or query templates that expect owner/name use owner + "/" +
name; keep existing defaults for limit/per_page. Ensure you update references to
repo in the query construction and to the symbols repo, owner, name, props, args
in _build_tool_args.

Comment on lines +258 to +272
async def _github_context_via_mcp(question: str) -> str:
_resp, sid = await _mcp_jsonrpc_call(
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "broski-pets-bridge", "version": "2.4"},
},
},
timeout_s=10.0,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

MCP Model Context Protocol specification initialize notifications/initialized required before tools/list

💡 Result:

Yes, according to the Model Context Protocol (MCP) specification, the notifications/initialized notification is required before tools/list requests. The protocol defines a strict lifecycle where the client sends an 'initialize' request, receives the server's response (including capabilities like tools), and then MUST send the 'notifications/initialized' notification to signal readiness for normal operations. Only after this can operational requests like tools/list be sent. Key rules from the lifecycle specification: - Client SHOULD NOT send requests other than pings before server responds to initialize. - Server SHOULD NOT send requests other than pings/logging before receiving initialized. - This ensures proper capability negotiation before feature usage. Tools/list is a normal operation request during the operation phase, after initialization completes with the initialized notification. Servers rejecting tools/list before initialized (as seen in implementation issues) enforce this strictly. The tools specification confirms servers declare tools capability during initialize, but listing occurs post-initialized.

Citations:


🏁 Script executed:

# First, locate and read the file mentioned in the review
find . -name "main.py" -path "*/broski-pets-bridge/*" | head -20

Repository: welshDog/HyperCode-V2.4

Length of output: 102


🏁 Script executed:

# Search for files in the agents directory
git ls-files agents/ | grep -E "broski.*main.py"

Repository: welshDog/HyperCode-V2.4

Length of output: 259


🏁 Script executed:

# Look for the specific function and surrounding context
rg "_github_context_via_mcp" -A 30 -B 5

Repository: welshDog/HyperCode-V2.4

Length of output: 5041


🏁 Script executed:

# Check if notifications/initialized is being called anywhere in the codebase
rg "notifications/initialized" --type py

Repository: welshDog/HyperCode-V2.4

Length of output: 201


🏁 Script executed:

# Find the _mcp_jsonrpc_call function definition to understand its parameters
rg "_mcp_jsonrpc_call" -A 15 --type py | head -100

Repository: welshDog/HyperCode-V2.4

Length of output: 5396


🏁 Script executed:

# Verify the exact line numbers for the _github_context_via_mcp function
cat -n agents/broski-pets-bridge/main.py | grep -A 20 "async def _github_context_via_mcp"

Repository: welshDog/HyperCode-V2.4

Length of output: 889


Send notifications/initialized notification after receiving initialize response.

Per the MCP specification, a client must send an initialized notification after receiving the initialize response and before issuing any subsequent requests. Several gateway implementations—including the Docker MCP gateway in streaming mode—will reject tools/list and other operational requests on sessions that have not been marked initialised, or silently return no tools. This is likely the cause if you're seeing tools unavailable in development.

The function signature already supports the required session_id parameter to pass to subsequent calls.

♻️ Suggested implementation
     _resp, sid = await _mcp_jsonrpc_call(
         {
             "jsonrpc": "2.0",
             "id": 1,
             "method": "initialize",
             "params": {
                 "protocolVersion": "2024-11-05",
                 "capabilities": {},
                 "clientInfo": {"name": "broski-pets-bridge", "version": "2.4"},
             },
         },
         timeout_s=10.0,
     )
+    # Required per MCP spec before issuing further requests on this session.
+    try:
+        await _mcp_jsonrpc_call(
+            {"jsonrpc": "2.0", "method": "notifications/initialized", "params": {}},
+            timeout_s=5.0,
+            session_id=sid,
+        )
+    except Exception:
+        pass  # best-effort; some gateways accept tools/list without it
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _github_context_via_mcp(question: str) -> str:
_resp, sid = await _mcp_jsonrpc_call(
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "broski-pets-bridge", "version": "2.4"},
},
},
timeout_s=10.0,
)
async def _github_context_via_mcp(question: str) -> str:
_resp, sid = await _mcp_jsonrpc_call(
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "broski-pets-bridge", "version": "2.4"},
},
},
timeout_s=10.0,
)
# Required per MCP spec before issuing further requests on this session.
try:
await _mcp_jsonrpc_call(
{"jsonrpc": "2.0", "method": "notifications/initialized", "params": {}},
timeout_s=5.0,
session_id=sid,
)
except Exception:
pass # best-effort; some gateways accept tools/list without it
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` around lines 258 - 272, After receiving
the initialize response in _github_context_via_mcp, send an MCP notification for
"notifications/initialized" (i.e., a JSON-RPC message with no "id" and "method"
set to "initialized") using the same session id returned as sid so the session
is marked initialized before any further requests; call _mcp_jsonrpc_call with
the notification payload (include session id in params) immediately after the
initialize response and before issuing subsequent requests like tools/list.

Comment thread docker-compose.yml
Comment on lines +922 to +934
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- DOCKER_MCP_IN_CONTAINER=1
- GITHUB_TOKEN=${GITHUB_TOKEN}
- GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_TOKEN}
expose:
- "8099"
networks:
- agents-net
depends_on:
docker-socket-proxy:
condition: service_started
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether docker-socket-proxy endpoints enabled are sufficient for docker/mcp-gateway,
# and whether any other service in the compose file already demonstrates the DOCKER_HOST-over-proxy pattern.
rg -nP --type=yaml -C2 'DOCKER_HOST\s*[:=]\s*tcp://docker-socket-proxy'
rg -nP --type=yaml -C3 'docker-socket-proxy' docker-compose.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 7894


🏁 Script executed:

# Get the full context around lines 922-934 in docker-compose.yml
sed -n '900,940p' docker-compose.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 991


🏁 Script executed:

# Get the docker-socket-proxy service configuration to see what endpoints are enabled
sed -n '742,800p' docker-compose.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 1506


🏁 Script executed:

# Search for mcp-gateway or gateway service to understand what it needs
rg -nP 'mcp.*gateway|gateway.*mcp|mcp-gateway' docker-compose.yml -i -A 5 -B 2

Repository: welshDog/HyperCode-V2.4

Length of output: 937


🌐 Web query:

docker/mcp-gateway DOCKER_HOST environment variable support

💡 Result:

The docker/mcp-gateway (Docker MCP Gateway) supports the DOCKER_HOST environment variable for connecting to a custom Docker daemon, particularly in containerized or remote Docker Engine setups without Docker Desktop. Key evidence: - In GitHub issue #140, users set DOCKER_HOST=tcp://dind:2375 in a Docker Compose file for the gateway service to connect to a Docker-in-Docker (DinD) daemon. Developers confirmed it should be observed during engine discovery, noting "DOCKER_HOST is special. We should be observing that when doing engine discovery." - The gateway initializes a docker.Client (standard Go Docker client library, which respects DOCKER_HOST by default), as seen in source code pkg/gateway/run.go. - Official docs and README mention running independently of Docker Desktop (e.g., Docker CE, WSL2) with DOCKER_MCP_IN_CONTAINER=1 to bypass Desktop checks, implying standard Docker client behavior including DOCKER_HOST support. - Examples like docker-in-docker in the Docker Hub repo further indicate remote daemon compatibility. No explicit documentation lists DOCKER_HOST, but practical usage and code confirm support. For containerized gateway, pass it via environment in Compose or docker run.

Citations:


Remove direct socket mount or route through the socket proxy — not both.

Two inconsistencies in the mcp-gateway service (lines 922–934):

  1. Direct socket mount alongside unused proxy dependency. The service bind-mounts /var/run/docker.sock read-only and declares depends_on: docker-socket-proxy, yet does not use it — there is no DOCKER_HOST environment variable pointing to the proxy. Other services in the same compose file (coder-agent, throttle-agent, api-engine) demonstrate the correct pattern: they route through DOCKER_HOST=tcp://docker-socket-proxy:2375 without any direct socket mount. Either drop the depends_on and explicitly document why the gateway requires direct socket access (consider an ADR), or adopt the proxy pattern by removing the volume mount and adding DOCKER_HOST=tcp://docker-socket-proxy:2375. The docker/mcp-gateway image honours DOCKER_HOST via the standard Go Docker client, so the latter approach is viable.

  2. GITHUB_TOKEN as plain environment variables. Both GITHUB_TOKEN and GITHUB_PERSONAL_ACCESS_TOKEN are exposed as process environment variables. Per the repository convention (documented in .env.example and the secrets/*.txt pattern), sensitive credentials should be delivered via Docker secrets or file-backed environment references to prevent leakage through docker inspect, child processes, and future docker exec invocations.

🛡️ Suggested hardening — route through the socket proxy
     command:
       - --servers=github
       - --transport=streaming
       - --port=8099
-    volumes:
-      - /var/run/docker.sock:/var/run/docker.sock:ro
     environment:
       - DOCKER_MCP_IN_CONTAINER=1
+      - DOCKER_HOST=tcp://docker-socket-proxy:2375
       - GITHUB_TOKEN=${GITHUB_TOKEN}
       - GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_TOKEN}

The docker-socket-proxy service already enables CONTAINERS, IMAGES, INFO, PING, and VERSION endpoints, which cover standard Docker operations required by the gateway.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 922 - 934, The mcp-gateway service is
inconsistent: either remove the docker socket bind-mount and rely on the
docker-socket-proxy, or drop the proxy dependency; prefer the proxy
pattern—remove the /var/run/docker.sock volume, keep depends_on:
docker-socket-proxy, and add DOCKER_HOST=tcp://docker-socket-proxy:2375 (the
docker/mcp-gateway image honors DOCKER_HOST). Also stop passing GITHUB_TOKEN and
GITHUB_PERSONAL_ACCESS_TOKEN as plain env vars: switch them to the repo’s
secret/file-backed pattern (use Docker secrets or file-referenced env vars per
.env.example and secrets/*.txt) so credentials aren’t exposed via docker
inspect.

- Add Anthropic API key as a Docker secret for secure credential management
- Replace OpenAI API key references with Anthropic API key across services
- Update default OLLAMA model to qwen2.5-coder:3b for improved coding assistance
- Add Memstream API configuration for enhanced memory management capabilities
- Include additional diagnostic commands in Claude settings for debugging
- Mount NemoClaw agent and source directories in Docker for live scanning
- Add `/health` Discord command with real-time code analysis and grading
- Create standalone health report script with webhook posting capability
- Implement Makefile targets for quick health checks and automated scans
- Include initial health report showing current codebase status
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Makefile (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Minor: python vs python3 inconsistency and missing .PHONY entries.

Every other script-invocation target in this Makefile uses python3 explicitly (lines 111, 133, 135, 155, 158, 215). On hosts where only python3 is on PATH (most modern Linux distros, CI runners with setup-python), make health / make health-quick will fail with "python: command not found". Also, both new targets are absent from the .PHONY line, so a stray file called health in the repo would suppress the recipe.

♻️ Proposed tweak
-.PHONY: help build up down logs status clean test restart network-init init start agents stop setup dev prod scan scan-quick scan-sast scan-secrets scan-deps scan-iac scan-licenses scan-report pre-commit-install scan-agent scan-all scan-build trivy-hook-install calm
+.PHONY: help build up down logs status clean test restart network-init init start agents stop setup dev prod scan scan-quick scan-sast scan-secrets scan-deps scan-iac scan-licenses scan-report pre-commit-install scan-agent scan-all scan-build trivy-hook-install calm health health-quick
@@
 health: ## 🏥 NemoClaw code health scan + Discord webhook post
 	`@echo` "🔍 Running NemoClaw health scan..."
-	`@python` scripts/health_report.py --webhook
+	`@python3` scripts/health_report.py --webhook

 health-quick: ## 🏥 NemoClaw scan — terminal only, no webhook
-	`@python` scripts/health_report.py
+	`@python3` scripts/health_report.py

Also applies to: 120-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 4, Update the Makefile so the health and health-quick
targets invoke python3 (not python) and add both targets to the top .PHONY list;
specifically change any "python" invocations inside the health and health-quick
recipes to "python3" and append "health" and "health-quick" to the existing
.PHONY declaration so a file named health won't override the targets.
🧹 Nitpick comments (2)
agents/broski-pets-bridge/main.py (2)

4-4: Unused asyncio import.

Nothing in the diff (or the surrounding file) actually references asyncio — the new MCP helpers use httpx.AsyncClient directly. Trim it to keep the import list clean.

♻️ Proposed tweak
-import asyncio
 import subprocess
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` at line 4, Remove the unused asyncio
import at the top of the module: delete the lone "import asyncio" line in
main.py (no other symbols reference asyncio; the MCP helpers use
httpx.AsyncClient directly), then run the linter/tests to confirm no remaining
references; if any async helpers are later needed, re-introduce asyncio where
actually used.

825-868: /leaderboard blocks the event loop and does an O(N) Redis scan on every call — fine for now, but plan for growth.

Two related concerns worth noting:

  1. The handler is async def but uses the synchronous redis.Redis client (r.scan_iter, r.get) — consistent with the rest of the file, but every call blocks the FastAPI event loop for as long as the full keyspace walk takes. Once the pets corpus grows past a few thousand keys this will start eating p99.
  2. Scanning every pet:* key and JSON‑decoding each entry on every leaderboard request is expensive. A Redis sorted set (ZADD pet:xp <xp> <discord_id> on every XP award, then ZREVRANGE pet:xp 0 9 WITHSCORES) would turn the endpoint into O(log N + 10) and remove the blocking walk entirely. Pet metadata (name/species/level) can still be fetched as a pipelined MGET on just the top 10 keys.

Not a release blocker at current scale, but a sensible target before the pets feature goes broadly live.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-pets-bridge/main.py` around lines 825 - 868, The /leaderboard
handler (async def leaderboard) currently uses the synchronous redis client and
a full scan of keys (r.scan_iter and r.get on pet:*) which blocks the event loop
and is O(N); switch to either an async Redis client (redis.asyncio or aioredis)
so leaderboard awaits nonblocking calls, or run the blocking scan/get in a
threadpool, and change the data model to maintain a Redis sorted set (e.g., key
"pet:xp") updated whenever pet XP changes (call ZADD pet:xp <xp> <discord_id>
from your XP-award codepath) then implement leaderboard to ZREVRANGE pet:xp 0 9
WITHSCORES to get top IDs and scores and MGET (or pipeline GET) the
corresponding pet:<discord_id> entries for metadata; this makes leaderboard
O(log N + 10) and avoids scanning all pet:* keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agents/broski-bot/src/cogs/health_check.py`:
- Around line 101-116: The live report keys must match the cached schema to
avoid KeyError: modify _run_scan() to include the writer's keys in its returned
dict (add files_scanned, lint_issues, ast_issues, total_issues, top_lint,
top_ast alongside the existing files, lint, ast, total, top) so both consumers
work; additionally update _build_embed() (and any code that reads the cached
payload, e.g., _load_cached fallback) to prefer the writer keys but fall back to
the existing keys using r.get("files_scanned", r.get("files")) etc., ensuring no
KeyError when a cached report is used.
- Around line 77-99: The health-check scan blocks the event loop because
_ruff_scan uses subprocess.run(timeout=60) synchronously (and _ast_scan
walks/parses files synchronously) so update _ruff_scan to use
asyncio.create_subprocess_exec and await proc.communicate() (making it async)
and offload CPU/IO heavy work in _ast_scan/_run_scan to a thread executor (e.g.,
asyncio.to_thread or loop.run_in_executor) so the slash command handler awaits
non-blocking coroutines; locate and change functions named _ruff_scan, _ast_scan
and where _run_scan is awaited to call the new async/threaded helpers.
- Around line 168-172: HealthCheck currently sets self._nemoclaw_ready based on
_NEMOCLAW.exists(), but the scanners (_ast_scan, _ruff_scan, _py_files) actually
need the ruff executable and scanable source directories; update
HealthCheck.__init__ to compute _nemoclaw_ready by checking the real scanner
prerequisites: verify ruff is available (e.g. shutil.which('ruff') or a
lightweight subprocess run like ['ruff','--version'] that returns success) and
that at least one target scan directory (the ones used by _py_files:
/app/backend_src or /app/project_agents) contains Python files (or is present);
set _nemoclaw_ready True only if both conditions pass so the cached-report
fallback is only used when live scanning truly isn’t possible. Ensure you
reference and reuse the same logic/paths used by _py_files/_ruff_scan to avoid
drift.

In `@docker-compose.secrets.yml`:
- Line 50: The ANTHROPIC_API_KEY_FILE secret is incorrectly assigned to the
hypercode-core service; remove ANTHROPIC_API_KEY_FILE:
/run/secrets/anthropic_api_key from the hypercode-core service and add
ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key to the broski-bot and
project-strategist service blocks so those services receive the secret at
runtime (they are the ones that call os.getenv("ANTHROPIC_API_KEY")); ensure the
anthropic_api_key secret exists in the secrets section and that both broski-bot
and project-strategist have the environment variable set to use the secret file.

In `@docker-compose.yml`:
- Line 347: The bind-mount for the health report is currently writable
("./reports/broski-analysis:/app/health_reports") but the container only reads
it (see agents/broski-bot/src/cogs/health_check.py using
LATEST_JSON.read_text()) while host script scripts/health_report.py writes it;
change the compose mount to be read-only by appending :ro to the mount
("./reports/broski-analysis:/app/health_reports:ro") so the container cannot
modify the cached report and matches the other :ro mounts.

In `@scripts/health_report.py`:
- Around line 120-133: The report schema in scripts/health_report.py must match
the consumer in agents/broski-bot/src/cogs/health_check.py (_build_embed expects
r["files"], r["lint"], r["ast"], r["total"], r.get("top")); update the report
dict keys accordingly: rename files_scanned→files, lint_issues→lint,
ast_issues→ast, total_issues→total, and provide a single top key (e.g., merge
top_lint and top_ast into one list or pick top_lint if present) so r.get("top")
returns the expected value; keep other fields (score, grade, label, emoji,
colour, scanned_at) unchanged.
- Around line 136-137: The write to the latest report currently uses
Path.write_text without an explicit encoding which will use the platform default
and can raise UnicodeEncodeError on Windows; update the write call that uses
latest (the Path named latest under REPORTS_DIR) to pass encoding='utf-8' when
writing the json.dumps(report, indent=2) so emoji and non‑ASCII grade labels are
written safely.

---

Outside diff comments:
In `@Makefile`:
- Line 4: Update the Makefile so the health and health-quick targets invoke
python3 (not python) and add both targets to the top .PHONY list; specifically
change any "python" invocations inside the health and health-quick recipes to
"python3" and append "health" and "health-quick" to the existing .PHONY
declaration so a file named health won't override the targets.

---

Nitpick comments:
In `@agents/broski-pets-bridge/main.py`:
- Line 4: Remove the unused asyncio import at the top of the module: delete the
lone "import asyncio" line in main.py (no other symbols reference asyncio; the
MCP helpers use httpx.AsyncClient directly), then run the linter/tests to
confirm no remaining references; if any async helpers are later needed,
re-introduce asyncio where actually used.
- Around line 825-868: The /leaderboard handler (async def leaderboard)
currently uses the synchronous redis client and a full scan of keys (r.scan_iter
and r.get on pet:*) which blocks the event loop and is O(N); switch to either an
async Redis client (redis.asyncio or aioredis) so leaderboard awaits nonblocking
calls, or run the blocking scan/get in a threadpool, and change the data model
to maintain a Redis sorted set (e.g., key "pet:xp") updated whenever pet XP
changes (call ZADD pet:xp <xp> <discord_id> from your XP-award codepath) then
implement leaderboard to ZREVRANGE pet:xp 0 9 WITHSCORES to get top IDs and
scores and MGET (or pipeline GET) the corresponding pet:<discord_id> entries for
metadata; this makes leaderboard O(log N + 10) and avoids scanning all pet:*
keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a4b1956b-b382-4359-9a19-6c4c39135364

📥 Commits

Reviewing files that changed from the base of the PR and between d91e8ad and 30ecc9d.

📒 Files selected for processing (10)
  • .claude/settings.local.json
  • Makefile
  • agents/broski-bot/src/bot.py
  • agents/broski-bot/src/cogs/health_check.py
  • agents/broski-bot/src/cogs/pets.py
  • agents/broski-pets-bridge/main.py
  • docker-compose.secrets.yml
  • docker-compose.yml
  • reports/broski-analysis/latest.json
  • scripts/health_report.py
✅ Files skipped from review due to trivial changes (1)
  • reports/broski-analysis/latest.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • .claude/settings.local.json

Comment on lines +77 to +99
def _ruff_scan(roots: list[Path]) -> list[dict]:
import subprocess, json as _json
paths = [str(r) for r in roots if r.exists()]
if not paths:
return []
try:
result = subprocess.run(
["ruff", "check"] + paths + ["--output-format", "json"],
capture_output=True, text=True, timeout=60,
)
raw = _json.loads(result.stdout)
out = []
for item in raw:
loc = item.get("location") or {}
out.append({
"file": Path(item.get("filename", "?")).name,
"line": loc.get("row"),
"msg": str(item.get("message", ""))[:65],
})
return out
except Exception:
return []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Major: the /health scan blocks the Discord event loop for up to ~60 s.

_ruff_scan calls subprocess.run(..., timeout=60) synchronously, and _ast_scan walks every .py file under /app/backend_src and /app/project_agents, reading and AST‑parsing each one. Both run on _run_scan() at line 202, which is awaited inline in the command handler — so the entire bot's coroutine scheduler is frozen during the scan. Slash-command ack is already out (via defer), but heartbeats, reaction handlers, other /commands, and Prometheus HTTP responses all stall until the scan finishes.

Offload to a worker thread so the event loop stays responsive:

🛠️ Proposed fix
+import asyncio
@@
-    try:
-        report = _run_scan()
+    try:
+        report = await asyncio.to_thread(_run_scan)
     except Exception as e:
         await msg.edit(content=f"❌ Scan failed: {e}", embed=None)
         return

For _ruff_scan specifically, asyncio.create_subprocess_exec + await proc.communicate() is the more idiomatic non-blocking path if you'd rather not add a thread.

Also applies to: 101-116, 174-208

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-bot/src/cogs/health_check.py` around lines 77 - 99, The
health-check scan blocks the event loop because _ruff_scan uses
subprocess.run(timeout=60) synchronously (and _ast_scan walks/parses files
synchronously) so update _ruff_scan to use asyncio.create_subprocess_exec and
await proc.communicate() (making it async) and offload CPU/IO heavy work in
_ast_scan/_run_scan to a thread executor (e.g., asyncio.to_thread or
loop.run_in_executor) so the slash command handler awaits non-blocking
coroutines; locate and change functions named _ruff_scan, _ast_scan and where
_run_scan is awaited to call the new async/threaded helpers.

Comment on lines +101 to +116
def _run_scan() -> dict:
files = _py_files(_SCAN_ROOTS)
lint = _ruff_scan(_SCAN_ROOTS)
ast_issues = _ast_scan(files)
total = len(lint) + len(ast_issues)
score = max(0, min(100, round(100 - (total / max(len(files), 1)) * 40)))
letter, label, colour, emoji = _grade(score)
return {
"score": score, "grade": letter, "label": label,
"colour": colour, "emoji": emoji,
"files": len(files), "lint": len(lint), "ast": len(ast_issues),
"total": total,
"top": (lint + ast_issues)[:5],
"scanned_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"),
"live": True,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: cached-report schema mismatch — /health-last and the /health cache fallback will KeyError.

_run_scan() returns a dict keyed as files, lint, ast, total, top. The cached report loaded by _load_cached() is produced by scripts/health_report.py (see lines 120‑133 of that file), which writes files_scanned, lint_issues, ast_issues, total_issues, top_lint, top_ast instead. _build_embed then does bracket‑indexing on the live names (r["files"], r["lint"], r["ast"], r["total"]) and r.get("top") (always None in the cached payload).

Consequences:

  • /health-last raises KeyError: 'files' on the very first invocation whenever a cached report exists — the command never works.
  • The fallback branch inside /health (lines 179‑190) has the same bug: when NemoClaw isn't mounted, the "helpful cached embed" also crashes instead of rendering.

Pick one canonical schema and align both sides. Adapting the consumer is the smallest change:

🛠️ Proposed fix in _run_scan() + _build_embed() — match the writer's schema
-    return {
-        "score": score, "grade": letter, "label": label,
-        "colour": colour, "emoji": emoji,
-        "files": len(files), "lint": len(lint), "ast": len(ast_issues),
-        "total": total,
-        "top": (lint + ast_issues)[:5],
-        "scanned_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"),
-        "live": True,
-    }
+    return {
+        "score": score, "grade": letter, "label": label,
+        "colour": colour, "emoji": emoji,
+        "files_scanned": len(files),
+        "lint_issues": len(lint),
+        "ast_issues": len(ast_issues),
+        "total_issues": total,
+        "top_lint": lint[:5],
+        "top_ast": ast_issues[:5],
+        "scanned_at": datetime.now(timezone.utc).isoformat(),
+        "live": True,
+    }
@@
-    embed.add_field(name="📄 Files",  value=str(r["files"]), inline=True)
-    embed.add_field(name="🔍 Lint",   value=str(r["lint"]),  inline=True)
-    embed.add_field(name="🌳 AST",    value=str(r["ast"]),   inline=True)
-    embed.add_field(name="⚠️ Total",  value=str(r["total"]), inline=True)
+    embed.add_field(name="📄 Files",  value=str(r.get("files_scanned", 0)), inline=True)
+    embed.add_field(name="🔍 Lint",   value=str(r.get("lint_issues", 0)),   inline=True)
+    embed.add_field(name="🌳 AST",    value=str(r.get("ast_issues", 0)),    inline=True)
+    embed.add_field(name="⚠️ Total",  value=str(r.get("total_issues", 0)),  inline=True)
@@
-    if r.get("top"):
-        top_text = ""
-        for i in r["top"][:5]:
+    top_entries = (r.get("top_lint") or []) + (r.get("top_ast") or [])
+    if top_entries:
+        top_text = ""
+        for i in top_entries[:5]:

Also applies to: 128-149

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-bot/src/cogs/health_check.py` around lines 101 - 116, The live
report keys must match the cached schema to avoid KeyError: modify _run_scan()
to include the writer's keys in its returned dict (add files_scanned,
lint_issues, ast_issues, total_issues, top_lint, top_ast alongside the existing
files, lint, ast, total, top) so both consumers work; additionally update
_build_embed() (and any code that reads the cached payload, e.g., _load_cached
fallback) to prefer the writer keys but fall back to the existing keys using
r.get("files_scanned", r.get("files")) etc., ensuring no KeyError when a cached
report is used.

Comment on lines +168 to +172
class HealthCheck(commands.Cog):
def __init__(self, bot: commands.Bot):
self.bot = bot
self._nemoclaw_ready = _NEMOCLAW.exists()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: self._nemoclaw_ready doesn't actually gate scanning capability.

The check at line 171 is _NEMOCLAW.exists() (/app/nemoclaw_agent), but _ast_scan/_ruff_scan/_py_files never use anything from that mount — they walk /app/backend_src and /app/project_agents directly and shell out to system ruff. So _nemoclaw_ready = False can fire the cached-report fallback even when a live scan would work fine, and conversely a mounted-but-empty NemoClaw dir would still advertise "live" mode.

Consider gating on what the scanner actually needs:

♻️ Proposed tweak
-        self._nemoclaw_ready = _NEMOCLAW.exists()
+        # We need at least one scan root and the `ruff` binary (or AST-only mode).
+        self._nemoclaw_ready = any(r.exists() for r in _SCAN_ROOTS)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class HealthCheck(commands.Cog):
def __init__(self, bot: commands.Bot):
self.bot = bot
self._nemoclaw_ready = _NEMOCLAW.exists()
class HealthCheck(commands.Cog):
def __init__(self, bot: commands.Bot):
self.bot = bot
# We need at least one scan root and the `ruff` binary (or AST-only mode).
self._nemoclaw_ready = any(r.exists() for r in _SCAN_ROOTS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/broski-bot/src/cogs/health_check.py` around lines 168 - 172,
HealthCheck currently sets self._nemoclaw_ready based on _NEMOCLAW.exists(), but
the scanners (_ast_scan, _ruff_scan, _py_files) actually need the ruff
executable and scanable source directories; update HealthCheck.__init__ to
compute _nemoclaw_ready by checking the real scanner prerequisites: verify ruff
is available (e.g. shutil.which('ruff') or a lightweight subprocess run like
['ruff','--version'] that returns success) and that at least one target scan
directory (the ones used by _py_files: /app/backend_src or /app/project_agents)
contains Python files (or is present); set _nemoclaw_ready True only if both
conditions pass so the cached-report fallback is only used when live scanning
truly isn’t possible. Ensure you reference and reuse the same logic/paths used
by _py_files/_ruff_scan to avoid drift.

HYPERCODE_JWT_SECRET_FILE: /run/secrets/jwt_secret
JWT_SECRET_FILE: /run/secrets/jwt_secret
PERPLEXITY_API_KEY_FILE: /run/secrets/perplexity_api_key
ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the backend either reads ANTHROPIC_API_KEY_FILE or has a _FILE → env bootstrap.
rg -nP --type=py -C2 '\bANTHROPIC_API_KEY(_FILE)?\b' backend/ || echo "no backend references"
# Also look for any generic _FILE → env loader pattern (e.g. Docker-secrets helper)
rg -nP --type=py -C3 '_FILE.*(open|read_text|Path)' backend/ | head -60

Repository: welshDog/HyperCode-V2.4

Length of output: 88


🏁 Script executed:

# First, confirm backend directory exists and see its structure
find . -type d -name backend -o -type d -name src 2>/dev/null | head -20
ls -la backend/ 2>/dev/null || echo "No backend dir"

Repository: welshDog/HyperCode-V2.4

Length of output: 1061


🏁 Script executed:

# Search for Anthropic references more broadly across the entire codebase
rg -i "anthropic" --type=py -l 2>/dev/null | head -20
rg -i "api.?key" --type=py -l 2>/dev/null | head -20

Repository: welshDog/HyperCode-V2.4

Length of output: 1404


🏁 Script executed:

# Check the docker-compose files mentioned in the review
cat docker-compose.yml 2>/dev/null | head -250
cat docker-compose.secrets.yml 2>/dev/null

Repository: welshDog/HyperCode-V2.4

Length of output: 11030


🏁 Script executed:

# Search specifically in backend/app directory for Anthropic references
rg -i "anthropic" backend/app/ --type=py -n -C3

Repository: welshDog/HyperCode-V2.4

Length of output: 49


🏁 Script executed:

# Check for environment loading patterns and _FILE handling
rg "os.getenv|os.environ|\.get\(" backend/app/ --type=py -A1 -B1 | grep -i "api\|key\|secret" | head -50

Repository: welshDog/HyperCode-V2.4

Length of output: 4068


🏁 Script executed:

# Look at the test_docker_secrets.py file that was listed
cat tests/unit/test_docker_secrets.py

Repository: welshDog/HyperCode-V2.4

Length of output: 8887


🏁 Script executed:

# Check if there's a helper function for loading _FILE variables
rg -i "file.*open|read_text|_FILE" backend/ --type=py -B2 -A2 | head -100

Repository: welshDog/HyperCode-V2.4

Length of output: 6107


🏁 Script executed:

# Check if anthropic_api_key is actually used/referenced in docker-compose.secrets.yml
grep -n "anthropic" docker-compose.secrets.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 228


🏁 Script executed:

# Search for Anthropic API usage in agents/ directory
rg -i "anthropic" agents/ --type=py -l

Repository: welshDog/HyperCode-V2.4

Length of output: 668


🏁 Script executed:

# Check if broski-bot or any other service in secrets overlay uses anthropic_api_key
grep -B5 -A5 "anthropic_api_key" docker-compose.secrets.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 1087


🏁 Script executed:

# Look for any Anthropic client or SDK imports in Python files
rg "from anthropic|import anthropic|Anthropic\(" --type=py

Repository: welshDog/HyperCode-V2.4

Length of output: 2153


🏁 Script executed:

# Check if agents services are in docker-compose.secrets.yml and if they get anthropic_api_key
grep -A20 "agents\|coder-agent\|architect" docker-compose.secrets.yml | grep -B5 -A10 "anthropic"

Repository: welshDog/HyperCode-V2.4

Length of output: 49


🏁 Script executed:

# Check if agents are defined in base docker-compose.yml and how they get their config
grep -A15 "agents\|coder-agent\|architect\|frontend-specialist" docker-compose.yml | head -100

Repository: welshDog/HyperCode-V2.4

Length of output: 2947


🏁 Script executed:

# Look at one of the agent files to see how anthropic_api_key is passed to the config
cat agents/01-frontend-specialist/base_agent.py | head -80

Repository: welshDog/HyperCode-V2.4

Length of output: 2483


🏁 Script executed:

# Check if agent services are listed in docker-compose.yml
grep -E "^\s{2}[a-z]+-agent:" docker-compose.yml | head -20

Repository: welshDog/HyperCode-V2.4

Length of output: 129


🏁 Script executed:

# Check one agent's environment variables in base compose
grep -A30 "coder-agent:" docker-compose.yml | grep -A20 "environment:"

Repository: welshDog/HyperCode-V2.4

Length of output: 798


🏁 Script executed:

# Confirm agents don't have ANTHROPIC_API_KEY_FILE in base compose either
grep -B5 -A15 "ANTHROPIC_API_KEY" docker-compose.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 2240


🏁 Script executed:

# Check all services in secrets overlay that reference anthropic
grep -B20 "anthropic_api_key" docker-compose.secrets.yml | grep -E "^\s{2}[a-z]+-[a-z]+:|anthropic"

Repository: welshDog/HyperCode-V2.4

Length of output: 234


🏁 Script executed:

# Check if broski-bot gets anthropic_api_key in secrets overlay
grep -A40 "broski-bot:" docker-compose.secrets.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 1171


🏁 Script executed:

# List all services in base compose that have ANTHROPIC_API_KEY set
grep -B10 "ANTHROPIC_API_KEY=" docker-compose.yml | grep "^\s\s[a-z]" | grep ":"

Repository: welshDog/HyperCode-V2.4

Length of output: 49


🏁 Script executed:

# Confirm hypercode-core backend does NOT use Anthropic by checking all its imports
rg "import.*anthropic|from.*anthropic" backend/app/ --type=py

Repository: welshDog/HyperCode-V2.4

Length of output: 49


🏁 Script executed:

# Check if hypercode-core has any Anthropic client instantiation
rg "Anthropic\(|AsyncAnthropic\(" backend/app/ --type=py -C2

Repository: welshDog/HyperCode-V2.4

Length of output: 49


🏁 Script executed:

# Verify broski-bot gets plain ANTHROPIC_API_KEY from base compose
grep -A50 "broski-bot:" docker-compose.yml | grep "ANTHROPIC_API_KEY"

Repository: welshDog/HyperCode-V2.4

Length of output: 113


🏁 Script executed:

# Check if other agents besides project-strategist also use ANTHROPIC_API_KEY
grep -B5 "ANTHROPIC_API_KEY=" docker-compose.yml | grep -B5 "ANTHROPIC_API_KEY" | grep "^\s\s[a-z].*:" | tail -10

Repository: welshDog/HyperCode-V2.4

Length of output: 49


🏁 Script executed:

# Double-check: list all services in base compose with ANTHROPIC_API_KEY
grep -n "ANTHROPIC_API_KEY=" docker-compose.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 169


🏁 Script executed:

# Show context for each service that has ANTHROPIC_API_KEY
awk '/^  [a-z-]+:$/ {service=$0} /ANTHROPIC_API_KEY=/ {print NR": "service" -> "$0}' docker-compose.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 213


🏁 Script executed:

# Check if anthropic_api_key secret is referenced by any other service in secrets overlay
grep -n "anthropic_api_key" docker-compose.secrets.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 228


🏁 Script executed:

# Verify the complete picture: all services in secrets overlay and their secrets
grep -A5 "^\s\s[a-z-]*bot:\|^\s\s[a-z-]*agent:\|^\s\shypercode-core:" docker-compose.secrets.yml | grep -A4 "secrets:"

Repository: welshDog/HyperCode-V2.4

Length of output: 274


🏁 Script executed:

# Check full secrets override for project-strategist and all agent services
grep -A10 "project-strategist:\|coder-agent:\|tips-tricks-writer:" docker-compose.secrets.yml

Repository: welshDog/HyperCode-V2.4

Length of output: 767


🏁 Script executed:

# Verify which services actually need ANTHROPIC_API_KEY by checking agent code
rg "os.getenv\(.*ANTHROPIC" agents/ --type=py -B2 -A2

Repository: welshDog/HyperCode-V2.4

Length of output: 5103


Incorrect service assignment — anthropic_api_key secret is wired to the wrong service.

The anthropic_api_key secret is being passed to hypercode-core, which does not use Anthropic at all. The backend has no Anthropic imports or client instantiation. However, broski-bot and project-strategist agents—which do read os.getenv("ANTHROPIC_API_KEY")—are not receiving ANTHROPIC_API_KEY_FILE overrides in the secrets compose. When operators run with the secrets overlay and no plain .env var set, these agents will silently resolve the key to an empty string and fail at runtime with authentication errors.

Move anthropic_api_key from the hypercode-core service override to the services that actually need it (broski-bot and project-strategist), and provide ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key environment variable to each.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.secrets.yml` at line 50, The ANTHROPIC_API_KEY_FILE secret is
incorrectly assigned to the hypercode-core service; remove
ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key from the hypercode-core
service and add ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key to the
broski-bot and project-strategist service blocks so those services receive the
secret at runtime (they are the ones that call os.getenv("ANTHROPIC_API_KEY"));
ensure the anthropic_api_key secret exists in the secrets section and that both
broski-bot and project-strategist have the environment variable set to use the
secret file.

Comment thread docker-compose.yml
- ./agents/broski-nemoclaw-agent:/app/nemoclaw_agent:ro
- ./backend:/app/backend_src:ro
- ./agents:/app/project_agents:ro
- ./reports/broski-analysis:/app/health_reports
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mount the health-report directory read-only.

./reports/broski-analysis:/app/health_reports is read-write, but the only consumer inside the container — agents/broski-bot/src/cogs/health_check.py — just calls LATEST_JSON.read_text() (line 122). Nothing in the bot writes back; the writer is scripts/health_report.py running on the host. Dropping write access inside the container matches the :ro pattern already used for nemoclaw_agent, backend_src, and project_agents on lines 344‑346, and prevents a compromised bot process from tampering with the cached report.

♻️ Proposed tweak
-      - ./reports/broski-analysis:/app/health_reports
+      - ./reports/broski-analysis:/app/health_reports:ro
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` at line 347, The bind-mount for the health report is
currently writable ("./reports/broski-analysis:/app/health_reports") but the
container only reads it (see agents/broski-bot/src/cogs/health_check.py using
LATEST_JSON.read_text()) while host script scripts/health_report.py writes it;
change the compose mount to be read-only by appending :ro to the mount
("./reports/broski-analysis:/app/health_reports:ro") so the container cannot
modify the cached report and matches the other :ro mounts.

Comment thread scripts/health_report.py
Comment on lines +120 to +133
report = {
"score": score,
"grade": letter,
"label": label,
"emoji": emoji,
"colour": colour,
"files_scanned": len(files),
"lint_issues": len(ruff_issues),
"ast_issues": len(ast_issues),
"total_issues": total,
"top_lint": top_lint,
"top_ast": top_ast,
"scanned_at": datetime.now(timezone.utc).isoformat(),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Field naming diverges from the Discord cog that consumes this file — cached fallback is broken.

reports/broski-analysis/latest.json produced here uses files_scanned, lint_issues, ast_issues, total_issues, top_lint, top_ast. The new consumer agents/broski-bot/src/cogs/health_check.py indexes r["files"], r["lint"], r["ast"], r["total"], r.get("top") (see _build_embed lines 138‑149 in that file), so /health-last and the /health cache-fallback path will raise KeyError the first time they're exercised. See the root‑cause comment on health_check.py — pick one canonical schema and align both producers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/health_report.py` around lines 120 - 133, The report schema in
scripts/health_report.py must match the consumer in
agents/broski-bot/src/cogs/health_check.py (_build_embed expects r["files"],
r["lint"], r["ast"], r["total"], r.get("top")); update the report dict keys
accordingly: rename files_scanned→files, lint_issues→lint, ast_issues→ast,
total_issues→total, and provide a single top key (e.g., merge top_lint and
top_ast into one list or pick top_lint if present) so r.get("top") returns the
expected value; keep other fields (score, grade, label, emoji, colour,
scanned_at) unchanged.

Comment thread scripts/health_report.py
Comment on lines +136 to +137
latest = REPORTS_DIR / "latest.json"
latest.write_text(json.dumps(report, indent=2))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify UTF‑8 when writing the report — emoji in label/emoji will blow up on Windows.

Path.write_text without encoding= uses locale.getpreferredencoding(), which is typically cp1252 on Windows. The report embeds emoji (🏆, 🆘, etc.) and the grade labels, so this call will raise UnicodeEncodeError on the very same platform you already added sys.stdout.reconfigure / PYTHONUTF8=1 to protect against (lines 24‑27). Lock the file encoding explicitly so the make health target doesn't crash mid-run for contributors on Windows.

♻️ Proposed fix
-    latest.write_text(json.dumps(report, indent=2))
+    latest.write_text(json.dumps(report, indent=2, ensure_ascii=False), encoding="utf-8")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
latest = REPORTS_DIR / "latest.json"
latest.write_text(json.dumps(report, indent=2))
latest = REPORTS_DIR / "latest.json"
latest.write_text(json.dumps(report, indent=2, ensure_ascii=False), encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/health_report.py` around lines 136 - 137, The write to the latest
report currently uses Path.write_text without an explicit encoding which will
use the platform default and can raise UnicodeEncodeError on Windows; update the
write call that uses latest (the Path named latest under REPORTS_DIR) to pass
encoding='utf-8' when writing the json.dumps(report, indent=2) so emoji and
non‑ASCII grade labels are written safely.

@welshDog welshDog changed the title feat: enable NVIDIA GPU passthrough for hypercode-ollama feat: pets system, GPU passthrough, NemoClaw health scan & secrets hardening Apr 17, 2026
Remove explicit bind mount configurations for redis, postgres, and minio volumes to use Docker-managed volumes instead. This eliminates dependency on host-specific paths that may not exist.

Set default passwords for PostgreSQL and MinIO services to prevent runtime failures when environment variables are unset. This ensures services can start in development environments without requiring all secrets to be configured.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-compose.yml`:
- Around line 974-975: Add a healthcheck to the mcp-gateway service that
verifies the MCP streaming transport is accepting connections (e.g., a TCP probe
to :8099 or an HTTP GET /health) and change broski-pets-bridge's depends_on from
service_started to service_healthy so Docker waits for mcp-gateway readiness;
update the docker-compose mcp-gateway service block to include a minimal
healthcheck stanza and adjust broski-pets-bridge depends_on accordingly so the
bridge's _mcp_jsonrpc_call won't run before the gateway is actually ready.
- Line 959: Update the MCP_GATEWAY_URL default in the docker-compose.yml to use
the HTTP scheme so it matches .env.example and any consumers that expect an HTTP
URL: change MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} to use
http:// (e.g. MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099});
reference the MCP_GATEWAY_URL variable and the
agents/broski-pets-bridge/main.py::_mcp_base_url rewrite when making the change
so the code no longer relies on a tcp:// → http:// shim.
- Line 130: Replace unsafe hardcoded defaults like POSTGRES_PASSWORD:
${POSTGRES_PASSWORD:-hypercode} and similar occurrences for MINIO_ROOT_USER,
MINIO_ROOT_PASSWORD, DB_PASSWORD (broski-bot) across services (hypercode-core,
celery-worker, crew-orchestrator, healer-agent, hyper-architect, hyper-worker,
hyperhealth-api, hyperhealth-worker) with a fail-fast or environment-gated
approach: either require the variable with the Docker Compose parameter
expansion form ${VAR:?error message} so compose aborts when missing, or
conditionally allow a development-only fallback when ENVIRONMENT==development
and refuse to start (or read from docker-compose.secrets.yml) when
ENVIRONMENT!=development; update the referenced env uses in docker-compose.yml
accordingly so production/staging never silently get the literal "hypercode"
password.

In `@reports/docker-environment-status-report.md`:
- Around line 125-133: The report flags four services (agent-x, hyper-architect,
hyper-observer, hyper-worker) as publishing on 0.0.0.0 but omits the compose
binding context; update the HIGH finding to add a single clarifying sentence
that explains the observation methodology (e.g., that the port list came from
docker ps / host netstat behavior on Docker Desktop/WSL2 where vpnkit often
shows 0.0.0.0) and cross-reference docker-compose.yml bindings (which show
127.0.0.1:<port>:<port>) so readers understand the apparent discrepancy and that
host ports may actually be loopback-only.
- Around line 110-170: The remediation guidance is stale because redis-data,
postgres-data and minio_data were converted from local bind mounts to
Docker-managed volumes in this PR; update
reports/docker-environment-status-report.md to either regenerate the report
after this PR or add a short “Superseded by PR `#84`” note clarifying that the
CRITICAL advice to create H:\HyperStation zone\HyperCode\volumes\* no longer
applies to redis-data, postgres-data and minio_data, while keeping the remaining
bind-mounts (grafana-data, prometheus-data, ollama-data, tempo-data, loki-data,
alertmanager-data, chroma_data, agent_memory) and HC_DATA_ROOT remediation
intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 127f6b26-cf1b-43be-9364-8c492c0276f9

📥 Commits

Reviewing files that changed from the base of the PR and between 30ecc9d and 47341a3.

📒 Files selected for processing (2)
  • docker-compose.yml
  • reports/docker-environment-status-report.md

Comment thread docker-compose.yml
environment:
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded hypercode default credentials are a production footgun.

POSTGRES_PASSWORD, MINIO_ROOT_USER, MINIO_ROOT_PASSWORD, DB_PASSWORD (broski-bot) and every ${POSTGRES_PASSWORD:-hypercode} in downstream URLs (hypercode-core, celery-worker, crew-orchestrator, healer-agent, hyper-architect, hyper-worker, hyperhealth-api, hyperhealth-worker) will now silently fall back to the literal string hypercode when the env var is unset. That avoids dev startup failures as intended, but it also means a misconfigured prod/staging bring-up gets a predictable, publicly-documented password on postgres and the MinIO console — with MinIO's console and Postgres reachable from any container on data-net.

At minimum, gate the fallback on a non-production flag, or refuse to start when ENVIRONMENT!=development and the secret is missing. Consider reading the secret from the existing docker-compose.secrets.yml mechanism instead of env-var defaults.

🛡️ One possible shape
-      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode}
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set (see .env.example / secrets/)}

${VAR:?message} aborts docker compose with a clear error instead of booting with a known-bad credential.

Also applies to: 281-286, 497-498, 1126-1126, 1793-1793

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` at line 130, Replace unsafe hardcoded defaults like
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode} and similar occurrences for
MINIO_ROOT_USER, MINIO_ROOT_PASSWORD, DB_PASSWORD (broski-bot) across services
(hypercode-core, celery-worker, crew-orchestrator, healer-agent,
hyper-architect, hyper-worker, hyperhealth-api, hyperhealth-worker) with a
fail-fast or environment-gated approach: either require the variable with the
Docker Compose parameter expansion form ${VAR:?error message} so compose aborts
when missing, or conditionally allow a development-only fallback when
ENVIRONMENT==development and refuse to start (or read from
docker-compose.secrets.yml) when ENVIRONMENT!=development; update the referenced
env uses in docker-compose.yml accordingly so production/staging never silently
get the literal "hypercode" password.

Comment thread docker-compose.yml
- PETS_OLLAMA_MODEL=${PETS_OLLAMA_MODEL:-qwen2.5:7b}
- WORKSPACE_PATH=/workspace
- SQUAD_JSON_PATH=/workspace/squad.json
- MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default scheme mismatch: compose uses tcp://, .env.example uses http://.

MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} works because agents/broski-pets-bridge/main.py::_mcp_base_url rewrites tcp://http://, but the PR's .env.example default is http://mcp-gateway:8099. Align the two so operators copying .env.example don't end up with a different effective URL than the unset-env default, and so anything else that reads this var (e.g. future consumers without the rewrite shim) sees a valid HTTP URL.

♻️ Proposed tweak
-      - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099}
+      - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099}
- MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` at line 959, Update the MCP_GATEWAY_URL default in the
docker-compose.yml to use the HTTP scheme so it matches .env.example and any
consumers that expect an HTTP URL: change
MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} to use http:// (e.g.
MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-http://mcp-gateway:8099}); reference the
MCP_GATEWAY_URL variable and the
agents/broski-pets-bridge/main.py::_mcp_base_url rewrite when making the change
so the code no longer relies on a tcp:// → http:// shim.

Comment thread docker-compose.yml
Comment on lines +974 to +975
mcp-gateway:
condition: service_started
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

mcp-gateway has no healthcheck, so service_started is effectively a no-op gate.

broski-pets-bridge uses depends_on: mcp-gateway: service_started, which only waits for the container process to spawn — not for the MCP streaming transport on :8099 to be accepting requests. The bridge's own healthcheck (curl /health) will then run MCP calls (see _mcp_jsonrpc_call) that may fail during the gateway's warm-up, leading to avoidable mcp_connected=false in the first few health cycles.

Adding even a minimal TCP-probe healthcheck to mcp-gateway and switching the dependent to service_healthy would make startup ordering deterministic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 974 - 975, Add a healthcheck to the
mcp-gateway service that verifies the MCP streaming transport is accepting
connections (e.g., a TCP probe to :8099 or an HTTP GET /health) and change
broski-pets-bridge's depends_on from service_started to service_healthy so
Docker waits for mcp-gateway readiness; update the docker-compose mcp-gateway
service block to include a minimal healthcheck stanza and adjust
broski-pets-bridge depends_on accordingly so the bridge's _mcp_jsonrpc_call
won't run before the gateway is actually ready.

Comment on lines +110 to +170
## Volume Mount Integrity

### CRITICAL

Multiple Docker volumes are configured as local-driver bind mounts to:
- H:/HyperStation zone/HyperCode/volumes/<service>

On this Windows filesystem, H:\HyperStation zone\HyperCode\volumes does not exist.

Observed symptoms consistent with missing/invalid bind targets:
- redis log indicates failure saving RDB temp file under /data
- minio reports “no online disks found” and quorum failures

## Port Exposure Validation

### HIGH (Public binds on 0.0.0.0)

The following running containers publish ports on all interfaces:
- agent-x: 8080/tcp (0.0.0.0 and [::])
- hyper-architect: 8091/tcp (0.0.0.0 and [::])
- hyper-observer: 8092/tcp (0.0.0.0 and [::])
- hyper-worker: 8093/tcp (0.0.0.0 and [::])

Everything else observed is localhost-bound (127.0.0.1) or internal-only.

## Security Vulnerabilities (Images)

### HIGH (Security visibility gap)

The always-on Trivy scanner container is running, but it is not producing the expected output file:
- Expected: reports/security/trivy-hypercode-core.json
- Current state: missing
- Scanner log shows: context deadline exceeded during analysis of hypercode-core:latest

Impact:
- Current HIGH/CRITICAL vulnerability counts for hypercode-core cannot be confirmed from this environment snapshot.

## Environment Variable Consistency

Compose evaluation reports unset variables (defaulting to blank string):
- API_KEY
- DISCORD_TOKEN
- HYPERCODE_JWT_SECRET
- HYPERCODE_MEMORY_KEY
- MINIO_ROOT_USER
- MINIO_ROOT_PASSWORD

Observed runtime impact:
- broski-bot fails because DB password resolves to blank.

## Remediation Recommendations

### CRITICAL (Do first)

- Fix persistent volume bind targets:
- Create the missing host directory tree under H:\HyperStation zone\HyperCode\volumes\*
- Or update HC_DATA_ROOT so it points to an existing, Docker-shared path and recreate the volumes
- Restore data services:
- Bring up postgres + redis and confirm they become healthy before starting dependent services
- Fix required secrets/env:
- Ensure POSTGRES_PASSWORD (or DB_PASSWORD) is set so broski-bot can initialize its DB
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remediation section is already partly stale relative to this PR's compose changes.

The "Volume Mount Integrity / CRITICAL" finding and the first "Remediation Recommendations → CRITICAL" bullet both tell operators to “Create the missing host directory tree under H:\HyperStation zone\HyperCode\volumes\* for postgres, redis and minio. In this same PR the redis-data, postgres-data and minio_data volumes were switched from local-driver bind mounts to Docker-managed volumes, so those three specific symptoms (redis RDB write failures under /data, minio "no online disks", postgres client-disconnect storms tied to the bind path) are no longer reproducible from a fresh docker compose up.

Either regenerate the report after this PR lands, or add a short "Superseded by PR #84" note so readers don't waste time creating Windows directories that the compose file no longer references. The observability-stack volumes (grafana-data, prometheus-data, ollama-data, tempo-data, loki-data, alertmanager-data, chroma_data, agent_memory) are still bind-mounted to ${HC_DATA_ROOT}, so that subset of the finding is still accurate and worth keeping.

🧰 Tools
🪛 LanguageTool

[grammar] ~117-~117: The singular determiner ‘this’ may not agree with the plural noun ‘windows’. Did you mean “these”?
Context: ...on zone/HyperCode/volumes/ On this Windows filesystem, H:\HyperStation zon...

(THIS_NNS)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@reports/docker-environment-status-report.md` around lines 110 - 170, The
remediation guidance is stale because redis-data, postgres-data and minio_data
were converted from local bind mounts to Docker-managed volumes in this PR;
update reports/docker-environment-status-report.md to either regenerate the
report after this PR or add a short “Superseded by PR `#84`” note clarifying that
the CRITICAL advice to create H:\HyperStation zone\HyperCode\volumes\* no longer
applies to redis-data, postgres-data and minio_data, while keeping the remaining
bind-mounts (grafana-data, prometheus-data, ollama-data, tempo-data, loki-data,
alertmanager-data, chroma_data, agent_memory) and HC_DATA_ROOT remediation
intact.

Comment on lines +125 to +133
### HIGH (Public binds on 0.0.0.0)

The following running containers publish ports on all interfaces:
- agent-x: 8080/tcp (0.0.0.0 and [::])
- hyper-architect: 8091/tcp (0.0.0.0 and [::])
- hyper-observer: 8092/tcp (0.0.0.0 and [::])
- hyper-worker: 8093/tcp (0.0.0.0 and [::])

Everything else observed is localhost-bound (127.0.0.1) or internal-only.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Port exposure finding is incomplete — include protocol/interface assumptions or cross-reference the compose bindings.

The four 0.0.0.0 publishes listed (agent-x:8080, hyper-architect:8091, hyper-observer:8092, hyper-worker:8093) match the hyper profile services in docker-compose.yml, all of which declare 127.0.0.1:<port>:<port>. On Docker Desktop / WSL2 the 127.0.0.1: prefix is frequently reported as 0.0.0.0 inside docker ps because of the vpnkit proxy, even though the host port is in fact loopback-only. Without calling this out, the HIGH rating will confuse anyone who greps the compose file and sees the 127.0.0.1: prefix and assumes the report is wrong. A single-sentence note on the observation methodology (e.g. netstat -an on the Windows host vs docker ps) would disambiguate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@reports/docker-environment-status-report.md` around lines 125 - 133, The
report flags four services (agent-x, hyper-architect, hyper-observer,
hyper-worker) as publishing on 0.0.0.0 but omits the compose binding context;
update the HIGH finding to add a single clarifying sentence that explains the
observation methodology (e.g., that the port list came from docker ps / host
netstat behavior on Docker Desktop/WSL2 where vpnkit often shows 0.0.0.0) and
cross-reference docker-compose.yml bindings (which show 127.0.0.1:<port>:<port>)
so readers understand the apparent discrepancy and that host ports may actually
be loopback-only.

…M crashes

- Add pre-build-check.sh script to verify disk space and memory before builds
- Integrate safety check into Makefile build process
- Apply memory limits to all services in docker-compose.yml to prevent cascade failures
- Document OOM recovery procedure and prevention measures in runbook
- Update project status files to reflect new safety measures
- Add `env_ignore_empty=True` to Pydantic settings to ignore empty environment variables
- Use Header alias "Stripe-Signature" for webhook endpoint to match Stripe's header naming
- Validate Stripe-Signature header presence in production environment
- Skip signature verification only when both webhook secret is missing AND environment is not production
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/RUNBOOK.md (1)

313-316: ⚠️ Potential issue | 🟡 Minor

Orphan table row misplaced inside the Alpha Routing section.

Line 316 contains a single table row for pgAdmin dangling immediately after the rollback validation bullets. It is clearly a fragment of the Service Access Points table (lines 241–252) that was accidentally pushed out during an edit — it will render as a broken one-row table in the middle of the Alpha Routing docs.

📝 Proposed fix — move the row back into the services table

At lines 241–252, add it alongside the other services:

 | MinIO | http://127.0.0.1:9000 | Object storage |
+| pgAdmin | http://127.0.0.1:59080 | DB admin UI |

And remove the orphan at line 316:

 Validation after rollback:
 - no OpenRouter requests occur
 - local Ollama and existing cloud fallback behavior remain unchanged
-| pgAdmin | http://127.0.0.1:59080 | DB admin UI |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/RUNBOOK.md` around lines 313 - 316, The markdown has an orphan table row
"pgAdmin | http://127.0.0.1:59080 | DB admin UI" placed immediately after the
"Validation after rollback:" bullet list inside the Alpha Routing section; move
that single row back into the Service Access Points table (the table that lists
services alongside the other rows) so it becomes a proper table row with the
other service entries, and remove the orphan row from beneath the rollback
validation bullets to prevent a broken one-row table in the docs.
WHATS_DONE.md (1)

20-29: ⚠️ Potential issue | 🟡 Minor

Minor: container count inconsistency between bullets.

Line 20 still claims 29/29 Docker containers — all healthy, while the new April‑17 bullet on line 29 reports recovery to 24/24 containers. Worth reconciling so the "current state" bullet at line 20 is updated to reflect the post-OOM baseline (or the April‑17 entry is reworded to make clear it's a post-incident snapshot) — otherwise future readers won't know which number is authoritative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@WHATS_DONE.md` around lines 20 - 29, There's a mismatch between the top
"current state" bullet ("29/29 Docker containers — all healthy") and the
April‑17 incident bullet ("24/24 containers restored"); update the document so
the authoritative current-state bullet and the April‑17 snapshot are consistent:
either change the line that reads "29/29 Docker containers — all healthy" to
"24/24 Docker containers — all healthy (post‑OOM baseline)" or reword the
April‑17 line to explicitly mark it as an incident snapshot (e.g.,
"post‑incident: 24/24 restored on April 17"), ensuring both bullets reference
the same canonical container count.
🧹 Nitpick comments (1)
scripts/pre-build-check.sh (1)

8-8: Consider set -euo pipefail for stricter failure semantics.

set -e alone will not catch unset variables or broken pipes in subshells (e.g. the docker images -f "dangling=true" -q | wc -l pipeline on line 69). For a safety guard script, set -euo pipefail is the idiomatic choice and would surface unexpected environment issues instead of silently producing zero counts.

-set -e
+set -euo pipefail

Purely optional — the current checks do defensive fallbacks already.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/pre-build-check.sh` at line 8, Replace the current loose failure mode
by changing the shell options: update the script's "set -e" to "set -euo
pipefail" to catch unset variables and pipeline failures (this will affect
constructs like the pipeline "docker images -f \"dangling=true\" -q | wc -l");
ensure any intentionally optional variables are given safe defaults or are
referenced with ${VAR:-} so the new -u flag does not cause false failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/routes/stripe.py`:
- Around line 111-122: The webhook handler currently allows skipping signature
verification when STRIPE_WEBHOOK_SECRET is missing even in production; update
the logic in the handler (the block that sets is_production and inspects
webhook_secret and stripe_signature in backend/app/routes/stripe.py) so that if
webhook_secret is missing and is_production is True you log an error and
return/raise an HTTPException (fail closed), otherwise if webhook_secret is
missing and not production you emit a clear dev-mode warning and skip
verification; also separate the case where stripe_signature is missing in dev to
emit a distinct warning and only skip in non-production. Additionally, add a
startup validation in validate_security() (backend/app/core/config.py) to check
STRIPE_WEBHOOK_SECRET and raise/log an error when running in production so the
misconfiguration is detected at boot.

In `@scripts/pre-build-check.sh`:
- Around line 47-58: The memory check can silently skip when awk doesn't find
"MemAvailable" and FREE_MEM_MB becomes empty; update the logic around
FREE_MEM_MB so you validate its value before the numeric comparison: after
reading /proc/meminfo (using the existing FREE_MEM_MB assignment), check if
FREE_MEM_MB is empty or non-numeric (e.g. with a regex or POSIX arithmetic
test), and if so set FREE_MEM_MB=0 and emit a warning message that
"MemAvailable" was not found in /proc/meminfo and the check is falling back to
0; then keep the existing comparison against MIN_MEM_MB and the same exit
behavior. Reference symbols: FREE_MEM_MB, MIN_MEM_MB, the /proc/meminfo read and
the if [ "$FREE_MEM_MB" -gt 0 ] guard.

---

Outside diff comments:
In `@docs/RUNBOOK.md`:
- Around line 313-316: The markdown has an orphan table row "pgAdmin |
http://127.0.0.1:59080 | DB admin UI" placed immediately after the "Validation
after rollback:" bullet list inside the Alpha Routing section; move that single
row back into the Service Access Points table (the table that lists services
alongside the other rows) so it becomes a proper table row with the other
service entries, and remove the orphan row from beneath the rollback validation
bullets to prevent a broken one-row table in the docs.

In `@WHATS_DONE.md`:
- Around line 20-29: There's a mismatch between the top "current state" bullet
("29/29 Docker containers — all healthy") and the April‑17 incident bullet
("24/24 containers restored"); update the document so the authoritative
current-state bullet and the April‑17 snapshot are consistent: either change the
line that reads "29/29 Docker containers — all healthy" to "24/24 Docker
containers — all healthy (post‑OOM baseline)" or reword the April‑17 line to
explicitly mark it as an incident snapshot (e.g., "post‑incident: 24/24 restored
on April 17"), ensuring both bullets reference the same canonical container
count.

---

Nitpick comments:
In `@scripts/pre-build-check.sh`:
- Line 8: Replace the current loose failure mode by changing the shell options:
update the script's "set -e" to "set -euo pipefail" to catch unset variables and
pipeline failures (this will affect constructs like the pipeline "docker images
-f \"dangling=true\" -q | wc -l"); ensure any intentionally optional variables
are given safe defaults or are referenced with ${VAR:-} so the new -u flag does
not cause false failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5e3adc2-1d88-48dc-9692-185f8cb583b1

📥 Commits

Reviewing files that changed from the base of the PR and between 47341a3 and b0c11f3.

📒 Files selected for processing (8)
  • CLAUDE.md
  • Makefile
  • WHATS_DONE.md
  • backend/app/core/config.py
  • backend/app/routes/stripe.py
  • docker-compose.yml
  • docs/RUNBOOK.md
  • scripts/pre-build-check.sh
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • Makefile

Comment on lines +111 to +122
is_production = settings.ENVIRONMENT.lower() == "production"

if (not webhook_secret) or ((not is_production) and (not stripe_signature)):
logger.warning("STRIPE_WEBHOOK_SECRET not set — skipping signature check (dev mode)")
try:
import json
event = stripe.Event.construct_from(json.loads(payload), stripe.api_key)
except Exception as e:
raise HTTPException(status_code=400, detail=f"Invalid payload: {e}")
else:
if not stripe_signature:
raise HTTPException(status_code=400, detail="Missing Stripe-Signature header")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm there's no other guard refusing to boot the app in production without STRIPE_WEBHOOK_SECRET
rg -nP -C2 'STRIPE_WEBHOOK_SECRET' --type=py
ast-grep --pattern $'def validate_security($_) -> None:
  $$$'

Repository: welshDog/HyperCode-V2.4

Length of output: 2022


🏁 Script executed:

#!/bin/bash
# Check if validate_security() is called at app startup
rg -n 'validate_security\(\)' --type=py

# Find the main/startup entry point
fd -name 'main.py\|__main__.py\|app.py' --type f | head -5

# Check the stripe webhook endpoint definition for decorators
rg -nB5 'async def webhook' backend/app/routes/stripe.py | head -30

Repository: welshDog/HyperCode-V2.4

Length of output: 441


🏁 Script executed:

#!/bin/bash
# Search for rate limiting decorators/imports in stripe.py
rg -n '@.*rate\|from.*rate\|import.*rate' backend/app/routes/stripe.py

# Check the entire route/endpoint structure
head -120 backend/app/routes/stripe.py

Repository: welshDog/HyperCode-V2.4

Length of output: 4732


Production webhook endpoint silently skips signature verification when STRIPE_WEBHOOK_SECRET is unset.

The current condition at line 113 is:

if (not webhook_secret) or ((not is_production) and (not stripe_signature)):
    # skip verification

This allows unsigned webhooks in production when STRIPE_WEBHOOK_SECRET is missing or empty—a plausible misconfiguration scenario. The logic should gate missing-secret failures exclusively to non-production environments. Currently:

  • Production + secret missing → skips verification (vulnerability ✗)
  • Production + secret set + signature missing → rejects (correct)
  • Non-production + secret missing → skips verification (acceptable for development)

Additionally, validate_security() at startup in backend/app/core/config.py does not check for STRIPE_WEBHOOK_SECRET, so this misconfiguration only surfaces when the first webhook arrives, not at boot time.

Secondary issue: the warning message on line 114 is emitted for both "secret missing (dev mode)" and "secret set but signature missing in dev" cases, which will confuse log triage.

Fix: Fail closed in production when the secret is missing, and add STRIPE_WEBHOOK_SECRET validation to the startup validate_security() check.

Proposed code structure
is_production = settings.ENVIRONMENT.lower() == "production"

if not webhook_secret:
    if is_production:
        logger.error("STRIPE_WEBHOOK_SECRET missing in production — refusing to process webhook")
        raise HTTPException(status_code=500, detail="Webhook not configured")
    logger.warning("STRIPE_WEBHOOK_SECRET not set — skipping signature check (dev mode)")
    skip_verification = True
elif (not is_production) and (not stripe_signature):
    logger.warning("Stripe-Signature header missing in dev — skipping signature check")
    skip_verification = True
else:
    skip_verification = False

if skip_verification:
    # Construct event without verification
else:
    # Verify signature
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/routes/stripe.py` around lines 111 - 122, The webhook handler
currently allows skipping signature verification when STRIPE_WEBHOOK_SECRET is
missing even in production; update the logic in the handler (the block that sets
is_production and inspects webhook_secret and stripe_signature in
backend/app/routes/stripe.py) so that if webhook_secret is missing and
is_production is True you log an error and return/raise an HTTPException (fail
closed), otherwise if webhook_secret is missing and not production you emit a
clear dev-mode warning and skip verification; also separate the case where
stripe_signature is missing in dev to emit a distinct warning and only skip in
non-production. Additionally, add a startup validation in validate_security()
(backend/app/core/config.py) to check STRIPE_WEBHOOK_SECRET and raise/log an
error when running in production so the misconfiguration is detected at boot.

Comment on lines +47 to +58
FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0")

echo ""
if [ "$FREE_MEM_MB" -gt 0 ]; then
echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}"
if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then
echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}"
exit 1
else
echo -e "${GREEN}✅ Memory OK${NC}"
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory check silently skipped when /proc/meminfo lacks MemAvailable or awk yields empty.

If /proc/meminfo exists but awk doesn't match MemAvailable (older kernels, non-standard layouts, or a WSL VM edge case), FREE_MEM_MB becomes an empty string, the fallback || echo "0" only kicks in when awk exits non-zero, and the if [ "$FREE_MEM_MB" -gt 0 ] guard fails silently — the entire memory section is skipped without any warning, defeating the purpose of the guard.

🛡️ Small hardening
-FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0")
+FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null)
+FREE_MEM_MB=${FREE_MEM_MB:-0}
 
 echo ""
-if [ "$FREE_MEM_MB" -gt 0 ]; then
+if [ "$FREE_MEM_MB" -gt 0 ]; then
   echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}"
   if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then
     echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}"
     exit 1
   else
     echo -e "${GREEN}✅ Memory OK${NC}"
   fi
+else
+  echo -e "${YELLOW}⚠️  Memory check skipped (could not read MemAvailable from /proc/meminfo)${NC}"
 fi

Also worth noting in a comment: on Docker Desktop for Windows/macOS, /proc/meminfo reflects the Linux VM's memory, not the host's, so the threshold should be tuned accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0")
echo ""
if [ "$FREE_MEM_MB" -gt 0 ]; then
echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}"
if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then
echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}"
exit 1
else
echo -e "${GREEN}✅ Memory OK${NC}"
fi
fi
FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null)
FREE_MEM_MB=${FREE_MEM_MB:-0}
echo ""
if [ "$FREE_MEM_MB" -gt 0 ]; then
echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}"
if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then
echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}"
exit 1
else
echo -e "${GREEN}✅ Memory OK${NC}"
fi
else
echo -e "${YELLOW}⚠️ Memory check skipped (could not read MemAvailable from /proc/meminfo)${NC}"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/pre-build-check.sh` around lines 47 - 58, The memory check can
silently skip when awk doesn't find "MemAvailable" and FREE_MEM_MB becomes
empty; update the logic around FREE_MEM_MB so you validate its value before the
numeric comparison: after reading /proc/meminfo (using the existing FREE_MEM_MB
assignment), check if FREE_MEM_MB is empty or non-numeric (e.g. with a regex or
POSIX arithmetic test), and if so set FREE_MEM_MB=0 and emit a warning message
that "MemAvailable" was not found in /proc/meminfo and the check is falling back
to 0; then keep the existing comparison against MIN_MEM_MB and the same exit
behavior. Reference symbols: FREE_MEM_MB, MIN_MEM_MB, the /proc/meminfo read and
the if [ "$FREE_MEM_MB" -gt 0 ] guard.

welshDog and others added 2 commits April 17, 2026 21:25
…, no cold-load on repeat requests

GPU profile (hypercode-ollama-gpu) already in compose — activate with --profile gpu when NVIDIA adapter available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NER, 26 GitHub tools now live

Removing DOCKER_MCP_IN_CONTAINER=1 stops the gateway trying to resolve secrets
via the Docker Desktop gRPC resolver (which times out in containers on Windows WSL2).
Gateway now uses GITHUB_PERSONAL_ACCESS_TOKEN env var directly.

Result: mcp_connected=true in pets-bridge health, 26 GitHub MCP tools available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@welshDog
Copy link
Copy Markdown
Owner Author

✅ Session update — Moves 1, 2, 3 complete

New commits pushed to this branch:

  • feat(ollama)OLLAMA_KEEP_ALIVE=24h + OLLAMA_NUM_PARALLEL=2 + OLLAMA_MAX_LOADED_MODELS=2. Model stays hot in RAM. GPU profile already in compose for NVIDIA activation.
  • feat(pets) — Fixed DOCKER_MCP_IN_CONTAINER=1 flag causing gRPC secrets resolver timeout on WSL2. Removed it → 26 GitHub MCP tools live. mcp_connected: true in health.
  • Leaderboard deployed via image rebuild — /leaderboard returning ranked pets ✅

Verified:

  • curl http://127.0.0.1:8098/healthmcp_connected: true
  • curl http://127.0.0.1:8098/leaderboard → ranked pets JSON
  • POST /pet/{id}/chat mode=ask → Ollama reply + GitHub context ✅
  • All 24 containers healthy

🤖 Generated with Claude Code

welshDog and others added 2 commits April 17, 2026 22:04
Add nvidia-smi and Docker GPU test commands to help diagnose GPU availability issues in development environment.
@welshDog welshDog merged commit 5f81ae6 into main Apr 17, 2026
8 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant