diff --git a/.claude/CLAUDE_CONTEXT.md b/.claude/CLAUDE_CONTEXT.md new file mode 100644 index 0000000..6b193ec --- /dev/null +++ b/.claude/CLAUDE_CONTEXT.md @@ -0,0 +1,284 @@ +# 🧠⚑ HYPER SUPER CLAUDE DEV β€” HyperCode V2.4 Boot File +> You are Claude. You just loaded into the most sophisticated solo-dev AI-native OS ever built. +> Read every word. Then execute with precision. BROskiβ™Ύ mode: ON. +> **Last updated: April 14, 2026 β€” Phases 0–10B ALL COMPLETE βœ…** + +--- + +## 🧬 Who You're Working With + +- **Lyndz** aka BROskiβ™Ύ (GitHub: @welshDog, npm: @w3lshdog) β€” South Wales, UK +- **Neurodivergent:** Autistic + ADHD + Dyslexia β€” be chunked, direct, no waffle +- **Call them:** "Bro" β€” always +- **Primary:** Windows + PowerShell. Secondary: WSL2, Raspberry Pi, Docker +- **Style:** Short sentences. Emojis. Bold keys. Celebrate wins. Quick wins first. +- **Vision:** Building the **Hyperfocus Zone** β€” an AI-native OS for neurodivergent devs + +--- + +## πŸ—ΊοΈ The Ecosystem (3 Repos, 1 Mission) + +``` +Hyper-Vibe-Coding-Course ──── manifest.json ────▢ HyperCode V2.4 +github.com/welshDog/ (hyper-agent-spec) github.com/welshDog/ +Hyper-Vibe-Coding-Course HyperCode-V2.4 +(Supabase + Vercel) β”‚ (Docker, 26 containers) +Path: H:\the hyper vibe coding hub β”‚ Path: H:\HyperStation zone\ + β”‚ HyperCode\HyperCode-V2.4 + HyperAgent-SDK + github.com/welshDog/HyperAgent-SDK + npm: @w3lshdog/hyper-agent@0.1.4 βœ… + Path: H:\HyperAgent-SDK +``` + +### V2.0 vs V2.4 β€” One-Line Clarification +> **V2.4 IS the live system.** V2.0 was the origin. Skills in `.claude/skills/` were written for V2.0 but apply to V2.4 β€” ports, paths and agent names are the same. Always work in V2.4. + +--- + +## πŸ† Roadmap β€” Phases 0–10B ALL COMPLETE! + +| Phase | Name | Status | Date | +|---|---|---|---| +| 0 | Hard Conflict Fixes | βœ… DONE | Early 2026 | +| 1 | Identity Bridge (discord_id) | βœ… DONE + VERIFIED LIVE | Early 2026 | +| 2 | Token Sync | βœ… DONE + VERIFIED LIVE | Early 2026 | +| 3 | Agent Access + Shop Bridge | βœ… DONE + VERIFIED LIVE | Early 2026 | +| 4 | npm run graduate πŸ”₯ | βœ… DONE + VERIFIED LIVE | Early 2026 | +| 5 | Observability (Grafana/Prometheus) | βœ… DONE + VERIFIED LIVE | Early 2026 | +| 6 | Terminal Tools + CLI | βœ… DONE + VERIFIED LIVE | Early 2026 | +| 7 | Dockerfile Security Hardening | βœ… DONE | Apr 14, 2026 | +| 8 | CI/CD Trivy Security Pipeline | βœ… DONE | Apr 14, 2026 | +| 9 | CVE Elimination (apt + pip pinning) | βœ… DONE | Apr 14, 2026 | +| 10A | FastAPI/starlette CVE β€” already on 0.135.3/0.47.2 | βœ… ALREADY DONE | Apr 14, 2026 | +| 10B | Docker Network Isolation | βœ… DONE | Apr 14, 2026 | + +--- + +## πŸ”’ Phase 10B β€” Network Map (LIVE) + +| Network | Type | Who lives here | +|---------|------|----------------| +| `frontend-net` | bridge, internet | dashboard, mission-ui, mcp-server | +| `backend-net` | bridge, internet | hypercode-core only (bridges all layers) | +| `agents-net` | bridge, internet | all 25+ AI agents, LLM API calls | +| `data-net` | **internal: true** | redis, postgres, minio, chroma | +| `obs-net` | **internal: true** | prometheus, grafana, loki, tempo, promtail, alertmanager | + +### Key wins shipped: +- πŸ”’ **redis + postgres** β€” internet access fully blocked (was on flat `backend-net`) +- πŸ—‘οΈ **hypercode-agents-net** external network β€” removed (stale leftover) +- πŸ“ **hyper-architect/observer/worker/agent-x ports** β€” now `127.0.0.1:` bound (was `0.0.0.0`) +- βœ… **docker compose config** β€” validates clean, zero errors + +### Migration script: +```bash +# Preview +bash scripts/network-migrate.sh --dry-run + +# Apply live +bash scripts/network-migrate.sh +``` + +--- + +## πŸš€ NEXT MISSION β€” Phase 10C or 10D? + +| Option | What | Time | Priority | +|--------|------|------|----------| +| **C πŸ—οΈ** | Secrets management (Docker secrets / Vault) β€” no more `.env` in prod | ~2 hrs | High | +| **D πŸ›‘οΈ** | Per-agent API key auth β€” lock every agent endpoint | ~2-3 hrs | High | +| **E πŸ›** | Fix CognitiveUplink.tsx WS bug (`"command"` β†’ `"execute"` ~line 130) | ~15 min | Quick Win | + +--- + +## πŸ› Known Open Issues + +| Bug | File | Line | Fix | +|-----|------|------|-----| +| WS message type wrong | `CognitiveUplink.tsx` | ~130 | `"command"` β†’ `"execute"` | + +--- + +## βœ… Phase 9 β€” CVE Elimination Results (April 14, 2026) + +| Metric | Before | After | +|--------|--------|-------| +| CRITICAL CVEs | 11 | **0** πŸŽ‰ | +| HIGH CVEs | 55 | **13** (all Debian-unfixable) | + +### 13 Remaining HIGHs β€” Cannot Fix Yet +- `docker.io/runc` β€” moby Debian packaging lags behind official Docker +- `libexpat1`, `libncursesw6`, `libnghttp2`, `libsystemd0` β€” no Debian patch yet +- `starlette` β€” **RESOLVED** βœ… `fastapi==0.135.3` + `starlette==0.47.2` in `backend/requirements.txt` + +### Phase 9 Pattern β€” Applied Across ALL 20 Dockerfiles + +**Part A β€” OS hardening:** +```dockerfile +RUN apt-get update --allow-releaseinfo-change && \ + apt-get upgrade -y && \ + apt-get install -y --no-install-recommends \ + ca-certificates curl libexpat1 openssl && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* +``` + +**Part B β€” pip pinning:** +```dockerfile +RUN pip install --upgrade --no-cache-dir \ + "pip==26.0.1" "setuptools>=80.0.0" "wheel==0.46.2" \ + "jaraco.context>=6.0.0" "jaraco.functools>=4.1.0" "jaraco.text>=4.0.0" +``` + +**Part C β€” CI:** `--no-cache --pull` on every build + +**Bonus:** `healer`, `coder`, `05-devops` β†’ `docker-ce-cli` (kills moby CVEs) + +**Base image:** All `python:3.11-slim` (auto-tracks latest patch via CI) + +--- + +## 🧠 The Skills System β€” 16 Active Skills + +| Skill | What It Does | +|-------|--------------| +| `hypercode-brain` | Core system knowledge | +| `hypercode-agent-consciousness` | Agent self-reporting, petitions, handoffs | +| `hypercode-self-improver` | Meta skill β€” system evolves itself | +| `hypercode-security` | CVE scanning, Trivy, Dockerfile hardening | +| `hypercode-docker-ops` | Container ops, compose, health checks | +| `hypercode-redis-pubsub` | Redis pub/sub, stream routing | +| `hypercode-hypersync` | Cross-repo sync protocol | +| `hypercode-mcp-gateway` | MCP gateway routing + ports | +| `hypercode-broski-discord-bot` | Discord bot skill pack | +| `hypercode-broski-economy` | BROski$ token economy | +| `hypercode-frontend` | Dashboard, HTML, UI | +| `hypercode-code-review` | Code review patterns | +| `hypercode-agent-spawner` | Spawn new agents | +| `hypercode-new-agent-onboarding` | Agent onboarding flow | +| `technical-skills-audit` | Audit methodology | +| `hyper-terminal-analyser` | Terminal tool research, debug, ecosystem fit | + +--- + +## 🚨 Iron Rules β€” Never Re-Debate These + +- **Docker imports:** `from app.X import Y` β€” NEVER `from backend.app.X import Y` +- **FastAPI routing:** First-match wins β€” public routes BEFORE auth-gated compat routes +- **Alembic down_revision:** Must match EXACT revision string +- **CLI folder:** All `hyper-agent` commands run from `H:\HyperAgent-SDK` +- **Logs empty on fresh boot:** Normal β€” Redis `hypercode:logs` populates as agents run +- **Port convention:** 3100-3199 writing, 3200-3299 code, 3300-3399 data, 3400-3499 discord, 3500-3599 automation +- **Supabase ↔ V2.4 Postgres:** NEVER merge schemas β€” forever separate +- **`.env` files:** Never committed β€” use Docker secrets in production +- **One bot:** broski-bot. Old Replit bot = dead. +- **API keys:** `hc_` prefix + `secrets.token_urlsafe(32)` +- **Dockerfiles:** `python:3.11-slim` + Part A + Part B β€” Phase 9 standard (ALL 20 Dockerfiles) +- **Trivy target:** 0 CRITICAL βœ…. 13 HIGH = all Debian-unfixable. Next scan baseline = 13. +- **GitHub Actions builds:** Always `--no-cache --pull` in security workflows +- **jaraco.* packages:** Always pin explicitly β€” Trivy HIGH via setuptools transitive +- **docker-socket agents** (healer/coder/05-devops): Use `docker-ce-cli`, NOT `docker.io` +- **starlette:** RESOLVED βœ… β€” `fastapi==0.135.3` + `starlette==0.47.2` +- **V2.0 references in skills:** Apply to V2.4 β€” same ports, same agent names +- **npm package:** `@w3lshdog/hyper-agent@0.1.4` β€” all 6 CLI commands LIVE +- **CognitiveUplink.tsx ~130:** WS type = `"execute"` NOT `"command"` β€” open bug! +- **data-net + obs-net:** `internal: true` β€” NEVER expose redis/postgres/grafana to internet +- **Agent ports:** `127.0.0.1:` bound only β€” NEVER `0.0.0.0` for internal agents +- **Network migration:** `bash scripts/network-migrate.sh` (use `--dry-run` first) + +--- + +## πŸ“ Key Paths (copy-paste ready) + +```powershell +# HyperCode V2.4 (MAIN SYSTEM) +cd "H:\HyperStation zone\HyperCode\HyperCode-V2.4" + +# HyperAgent-SDK +cd "H:\HyperAgent-SDK" + +# Hyper-Vibe-Coding-Course +cd "H:\the hyper vibe coding hub" + +# Docker +docker compose up -d +docker compose build --no-cache +docker compose exec api alembic upgrade head + +# Network migration +bash scripts/network-migrate.sh --dry-run +bash scripts/network-migrate.sh + +# Security scanning +make scan-all +make scan-agent AGENT=healer +make scan-build AGENT=agent-x +make build-secure + +# CLI (from H:\HyperAgent-SDK) +$env:HYPERCODE_API_URL = "http://localhost:8000" +node cli/index.js status +node cli/index.js agents list +node cli/index.js logs --tail 20 +node cli/index.js tokens award +node cli/index.js graduate --tokens 100 +``` + +--- + +## πŸ’° BROski$ Token Economy + +- `public.users.broski_tokens` β€” balance column +- `token_transactions` β€” append-only ledger with idempotency guards +- `award_tokens()` + `spend_tokens()` β€” SECURITY DEFINER, server-side only +- `shop_items` + `shop_purchases` β€” JSONB metadata fields +- `shop_purchases.item_slug` β€” filters for `"agent-sandbox-access"` +- Stripe integration for token packs (Starter / Builder / Hyper) + +--- + +## πŸ›‘οΈ Security Posture (Post Phase 10B) + +| Layer | Status | +|-------|--------| +| CRITICAL CVEs | 0 βœ… | +| HIGH CVEs | 13 (all Debian-unfixable) | +| starlette CVE | RESOLVED βœ… (fastapi 0.135.3) | +| Non-root users | All 20 Dockerfiles βœ… | +| Multi-stage builds | All agents βœ… | +| pip pinned | All agents βœ… | +| CI Trivy gate | PR-blocking βœ… | +| Weekly fleet scan | Monday 06:00 UTC βœ… | +| Pre-push hook | Local blocking βœ… | +| docker-ce-cli swap | healer/coder/devops βœ… | +| data-net (redis/postgres/minio/chroma) | internal: true βœ… | +| obs-net (grafana/prometheus/loki) | internal: true βœ… | +| Agent ports | 127.0.0.1: bound βœ… | +| Secrets management | ⏳ Phase 10C β€” NEXT | + +--- + +## πŸ“¦ HyperAgent-SDK β€” Current State + +- **Version:** `@w3lshdog/hyper-agent@0.1.4` βœ… LIVE on npm +- **errorMessage bug:** FIXED +- **CLI commands (all 6 verified):** `validate`, `status`, `logs`, `tokens`, `agents`, `graduate` + +```powershell +npx @w3lshdog/hyper-agent validate .agents/my-agent/ +npm version patch --no-git-tag-version +npm publish --access public --tag alpha +``` + +--- + +## 🎯 Session Start Checklist + +Ask Bro these 4 questions: +1. **Which repo?** (V2.4 / SDK / Course) +2. **What mission?** (10C Secrets / 10D API auth / CognitiveUplink quick fix?) +3. **Fresh Trivy scan?** (baseline = 13 HIGH, all Debian-unfixable) +4. **PowerShell or WSL2?** + +Then: short sentences, emojis, bold keys, quick wins first. LFG! πŸ”₯ diff --git a/.claude/settings.local.json b/.claude/settings.local.json index 10b4d56..bbf4605 100644 --- a/.claude/settings.local.json +++ b/.claude/settings.local.json @@ -175,14 +175,18 @@ "Bash(MSYS_NO_PATHCONV=1 docker exec hypercode-core grep -c \"cache_response\" /app/app/cache/multi_tier.py)", "Bash(MSYS_NO_PATHCONV=1 docker exec hypercode-core grep -c \"limiter.limit\" /app/app/routes/stripe.py)", "Bash(./node_modules/.bin/tsc --noEmit)", - "Bash(MSYS_NO_PATHCONV=1 docker exec docker-socket-proxy *)", - "Bash(MSYS_NO_PATHCONV=1 docker exec docker-socket-proxy-build *)", - "Bash(MSYS_NO_PATHCONV=1 docker exec postgres cat /var/lib/postgresql/data/pg_hba.conf)", - "Bash(MSYS_NO_PATHCONV=1 docker exec postgres *)", - "Bash(sed -n '344,350p' docker-compose.yml)", - "Bash(sed -n '1079,1090p' docker-compose.yml)", - "Bash(sed -n '1921,1930p' docker-compose.yml)", - "Bash(MSYS_NO_PATHCONV=1 docker run *)" + "Bash(openssl rand *)", + "Bash(sed -n '295,320p' \"H:/HyperStation zone/HyperCode/HyperCode-V2.4/docker-compose.yml\")", + "Bash(sed -n '1183,1200p' \"H:/HyperStation zone/HyperCode/HyperCode-V2.4/docker-compose.yml\")", + "Bash(echo \"GW_PRIMARY=$\\(openssl rand -hex 32\\)\")", + "Bash(echo \"GW_SECONDARY=$\\(openssl rand -hex 32\\)\")", + "Bash(echo \"GW_ADMIN=$\\(openssl rand -hex 32\\)\")", + "Bash(\"/c/Windows/System32/nvidia-smi.exe\")", + "Bash(\"/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe\")", + "Read(//c/Program Files/**)", + "Read(//c/Windows//**)", + "Bash(MSYS_NO_PATHCONV=1 docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi)", + "Bash(MSYS_NO_PATHCONV=1 docker exec broski-pets-bridge *)" ] }, "enableAllProjectMcpServers": true, diff --git a/.env.example b/.env.example index ff4c690..7a4c324 100644 --- a/.env.example +++ b/.env.example @@ -27,6 +27,8 @@ PETS_BRIDGE_URL=http://127.0.0.1:8098 PETS_DISCORD_ID= PETS_WEBHOOK_SECRET= PETS_OLLAMA_MODEL=qwen2.5:7b +MCP_GATEWAY_URL=http://mcp-gateway:8099 +GITHUB_TOKEN= # ── CORS ──────────────────────────────────────────────────── CORS_ALLOW_ORIGINS=http://localhost:8088,http://127.0.0.1:8088,http://localhost:3000 diff --git a/CLAUDE.md b/CLAUDE.md index 4e2fbd4..4f06712 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ > **This file is Claude's brain for this project.** > Read this first. Every session. No exceptions. -> Last updated: April 17, 2026 | Status: 29/29 containers 🟒 | Grade A πŸ… | Phases 0–10O COMPLETE βœ… +> Last updated: April 17, 2026 | Status: 24/24 containers 🟒 | Grade A πŸ… | Phases 0–10O COMPLETE βœ… --- @@ -350,9 +350,9 @@ docker compose --profile agents up -d - βœ… Docker build cache pruned β€” 60GB freed - βœ… **Gordon Tier 2 COMPLETE** β€” OTLP traces, Redis cache, rate limits, circuit breakers - βœ… **Course β†’ Stripe frontend wired** β€” full money path live (April 16) -- βœ… RUNBOOK OOM Recovery playbook added (exit 137/128 legend + exact recovery commands) -- βœ… Agent X is now intentionally gated (profile-only) to prevent accidental OOM cascades -- βœ… Doc triad synced: WHATS_DONE.md + CLAUDE.md + docs/RUNBOOK.md (single source of truth) +- βœ… **OOM crash recovered** β€” 34.4GB freed, stack restored (April 17) +- βœ… **Memory limits on ALL services** β€” every container capped, no more cascade kills (April 17) +- βœ… **pre-build-check.sh** β€” disk + memory guard wired into `make build` (April 17) --- @@ -364,10 +364,12 @@ Hey Claude! You’re working with Lyndz Williams on HyperCode V2.4. 2. **Check CLAUDE_CONTEXT.md** β€” phase-by-phase source of truth (Phases 0–10O all βœ…) 3. **All Gordon Tier 1 + Tier 2 DONE** βœ… β€” OTLP, cache, rate limits, circuit breakers 4. **Course β†’ Stripe frontend DONE** βœ… β€” `/pricing` β†’ checkout β†’ `/payment-success` β†’ enrolled -5. **Agent X is gated/capped on purpose:** It’s behind the `hyper` profile because it’s build + model heavy (easy OOM on low-RAM). Don’t β€œturn it on by default” unless you’ve got headroom. -6. **Next options:** Gordon Tier 3 (DB pooling + async queues) OR E2E checkout test with `stripe listen` -7. **Style:** Short. Friendly. BROski energy. Celebrate wins. πŸ† -8. **Never:** Wall of text. Never debate the Sacred Rules. +5. **Memory limits on ALL services** βœ… β€” see docker-compose.yml `deploy.resources` on every service +6. **Agent X is capped at 1G** β€” it caused an OOM crash (April 17) by building 30+ images unlimited +7. **Pre-build guard** β€” `make build` runs `scripts/pre-build-check.sh` first, aborts if <15GB free +8. **Next options:** Gordon Tier 3 (DB pooling + async queues) OR E2E checkout test with `stripe listen` +9. **Style:** Short. Friendly. BROski energy. Celebrate wins. πŸ† +10. **Never:** Wall of text. Never debate the Sacred Rules. > *β€œYou built the future people keep saying they want. You actually did it.” β€” Gordon, Docker AI* diff --git a/Makefile b/Makefile index faeda67..52af28d 100644 --- a/Makefile +++ b/Makefile @@ -43,8 +43,12 @@ network-init: @echo "Ensuring Docker network 'hypercode_public_net' exists..." @docker network ls --format '{{.Name}}' | grep -q '^hypercode_public_net$$' || docker network create hypercode_public_net +# Pre-build safety check (disk + memory guard) +pre-build-check: + @bash scripts/pre-build-check.sh + # Build all containers -build: network-init +build: pre-build-check network-init @echo "Building all agent containers..." docker-compose -f docker-compose.yml --profile agents --env-file .env.agents build @@ -117,6 +121,13 @@ clean: docker-compose -f docker-compose.yml --profile agents down -v --remove-orphans docker system prune -f +health: ## πŸ₯ NemoClaw code health scan + Discord webhook post + @echo "πŸ” Running NemoClaw health scan..." + @python scripts/health_report.py --webhook + +health-quick: ## πŸ₯ NemoClaw scan β€” terminal only, no webhook + @python scripts/health_report.py + # Full Docker Health Check System full-docker-health: @echo "πŸš€ Starting Full Docker Health Check Pipeline..." diff --git a/WHATS_DONE.md b/WHATS_DONE.md index 8b3f405..acb4963 100644 --- a/WHATS_DONE.md +++ b/WHATS_DONE.md @@ -1,6 +1,6 @@ # βœ… WHATS_DONE.md β€” HyperCode Ecosystem > One file. Short bullets. No walls of text. -> **Updated: April 16, 2026 (evening)** β€” update this every session. +> **Updated: April 17, 2026** β€” update this every session. --- @@ -21,6 +21,12 @@ - 5 isolated networks β€” `data-net` + `obs-net` internal (no internet) βœ… - Docker secrets pattern β€” `.txt` files, never baked into images βœ… - Kubernetes + Helm charts in `k8s/` + `helm/` β€” scale path ready βœ… +- **Memory limits on ALL services** β€” every container capped, OOM cascades impossible βœ… ← **April 17** + - agent-x hard-capped at 1G RAM (was unlimited β€” caused OOM crash April 17) + - healer, alertmanager, hyper-agents, all specialists, all infra β€” all capped +- `scripts/pre-build-check.sh` β€” disk + memory guard before any Docker build βœ… ← **April 17** + - `make build` now runs it automatically β€” aborts if <15GB free +- **OOM recovery completed April 17** β€” 34.4GB freed, 24/24 containers restored βœ… ### Observability - Prometheus 7/7 targets UP β€” `monitoring/prometheus/prometheus.yml` is the live config βœ… @@ -88,6 +94,8 @@ - GitHub Actions CI β€” Trivy on every push/PR βœ… - Phase 7–9: Dockerfile hardening, CVE elimination, secrets management βœ… - Stripe keys rotated + scrubbed from 218 commits with `git filter-repo` βœ… ← **April 16** +- OOM crash root cause: Agent X built 30+ images with no memory limit β€” fixed βœ… ← **April 17** + - Exit 137 = OOM killed | Exit 128 = SIGTERM under stress (reference for future debugging) ### Celery - Celery + Redis task queue running βœ… @@ -161,6 +169,9 @@ Stripe webhook: ALWAYS rate-limit exempt β€” never add limiter to /api/stripe/w Alembic: if missing alembic_version table β†’ run 'alembic stamp 006' first Supabase table: courses uses price_pence (int) + is_active (bool) Docker context: must be 'desktop-linux' on Windows +Memory limits: ALL services capped in docker-compose.yml β€” agent-x=1G, core=1.5G, postgres=2G +Pre-build check: make build β†’ auto-runs scripts/pre-build-check.sh (aborts if <15GB free) +OOM exit codes: 137=OOM killed | 128=SIGTERM under stress ``` --- diff --git a/agents/broski-bot/src/bot.py b/agents/broski-bot/src/bot.py index 13bf236..6e4647e 100644 --- a/agents/broski-bot/src/bot.py +++ b/agents/broski-bot/src/bot.py @@ -64,7 +64,8 @@ def __init__(self) -> None: "src.cogs.life_engine", "src.cogs.profile", "src.cogs.course_stats", - "src.cogs.ops_alerts", # Phase 5: health-poll β†’ #ops-alerts Discord alerts + "src.cogs.ops_alerts", # Phase 5: health-poll β†’ #ops-alerts Discord alerts + "src.cogs.health_check", # /health β€” NemoClaw live code health scan ] async def setup_hook(self) -> None: diff --git a/agents/broski-bot/src/cogs/health_check.py b/agents/broski-bot/src/cogs/health_check.py new file mode 100644 index 0000000..c5fec68 --- /dev/null +++ b/agents/broski-bot/src/cogs/health_check.py @@ -0,0 +1,225 @@ +# pylint: disable=broad-exception-caught +""" +/health β€” NemoClaw live code health scan for HyperCode V2.4. +Grade chase: S-LEGENDARY β†’ A-CLEAN β†’ B-GOOD β†’ C-NEEDS WORK β†’ D-SOS MODE +""" +from __future__ import annotations + +import ast +import json +import os +import sys +import time +from datetime import datetime, timezone +from pathlib import Path + +import discord +from discord import app_commands +from discord.ext import commands + +# Nemoclaw analyzer mounted at /app/nemoclaw_agent in Docker +_NEMOCLAW = Path("/app/nemoclaw_agent") +if _NEMOCLAW.exists(): + sys.path.insert(0, str(_NEMOCLAW)) + +# Dirs to scan inside the Docker container +_SCAN_ROOTS = [ + Path("/app/backend_src"), # ./backend + Path("/app/project_agents"), # ./agents +] +_SKIP = frozenset({ + "__pycache__", "node_modules", ".venv", "venv", + "tests", "htmlcov", ".mypy_cache", ".next", "dist", "build", +}) + +GRADE_MAP = [ + (95, "S", "LEGENDARY", 0x00FF88, "πŸ†"), + (80, "A", "CLEAN", 0x00BFFF, "βœ…"), + (65, "B", "GOOD", 0xFFD700, "πŸ‘"), + (50, "C", "NEEDS WORK", 0xFF8C00, "⚠️"), + (0, "D", "SOS MODE", 0xFF0000, "πŸ†˜"), +] +LATEST_JSON = Path("/app/health_reports/latest.json") + + +def _grade(score: int) -> tuple[str, str, int, str]: + for threshold, letter, label, colour, emoji in GRADE_MAP: + if score >= threshold: + return letter, label, colour, emoji + return "D", "SOS MODE", 0xFF0000, "πŸ†˜" + + +def _py_files(roots: list[Path]) -> list[Path]: + files = [] + for root in roots: + if not root.exists(): + continue + for f in root.rglob("*.py"): + if not any(d in f.parts for d in _SKIP): + files.append(f) + return files + + +def _ast_scan(files: list[Path]) -> list[dict]: + issues = [] + for fp in files: + try: + tree = ast.parse(fp.read_text(errors="ignore")) + except SyntaxError as e: + issues.append({"file": fp.name, "line": e.lineno, "msg": str(e)[:60]}) + continue + for node in ast.walk(tree): + if isinstance(node, ast.ExceptHandler) and node.type is None: + issues.append({"file": fp.name, "line": node.lineno, "msg": "Bare except"}) + return issues + + +def _ruff_scan(roots: list[Path]) -> list[dict]: + import subprocess, json as _json + paths = [str(r) for r in roots if r.exists()] + if not paths: + return [] + try: + result = subprocess.run( + ["ruff", "check"] + paths + ["--output-format", "json"], + capture_output=True, text=True, timeout=60, + ) + raw = _json.loads(result.stdout) + out = [] + for item in raw: + loc = item.get("location") or {} + out.append({ + "file": Path(item.get("filename", "?")).name, + "line": loc.get("row"), + "msg": str(item.get("message", ""))[:65], + }) + return out + except Exception: + return [] + + +def _run_scan() -> dict: + files = _py_files(_SCAN_ROOTS) + lint = _ruff_scan(_SCAN_ROOTS) + ast_issues = _ast_scan(files) + total = len(lint) + len(ast_issues) + score = max(0, min(100, round(100 - (total / max(len(files), 1)) * 40))) + letter, label, colour, emoji = _grade(score) + return { + "score": score, "grade": letter, "label": label, + "colour": colour, "emoji": emoji, + "files": len(files), "lint": len(lint), "ast": len(ast_issues), + "total": total, + "top": (lint + ast_issues)[:5], + "scanned_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"), + "live": True, + } + + +def _load_cached() -> dict | None: + try: + if LATEST_JSON.exists(): + return json.loads(LATEST_JSON.read_text()) + except Exception: + pass + return None + + +def _build_embed(r: dict) -> discord.Embed: + letter, label, colour, emoji = r["grade"], r["label"], r["colour"], r["emoji"] + score = r["score"] + source = "live scan" if r.get("live") else f"cached β€’ {r.get('scanned_at','?')}" + + embed = discord.Embed( + title=f"{emoji} HyperCode Health β€” {letter} | {label}", + description=f"**Score: {score}/100** β€’ {source}", + colour=colour, + ) + embed.add_field(name="πŸ“„ Files", value=str(r["files"]), inline=True) + embed.add_field(name="πŸ” Lint", value=str(r["lint"]), inline=True) + embed.add_field(name="🌳 AST", value=str(r["ast"]), inline=True) + embed.add_field(name="⚠️ Total", value=str(r["total"]), inline=True) + embed.add_field(name="πŸ… Grade", value=f"**{letter} β€” {label}**", inline=True) + + # Progress bar + filled = round(score / 10) + bar = "β–ˆ" * filled + "β–‘" * (10 - filled) + embed.add_field(name="πŸ“Š Health", value=f"`{bar}` {score}%", inline=False) + + if r.get("top"): + top_text = "" + for i in r["top"][:5]: + f = i.get("file", "?") + ln = i.get("line", "?") + msg = i.get("msg", "")[:50] + top_text += f"`{f}:{ln}` {msg}\n" + embed.add_field(name="πŸ”Ž Top Issues", value=top_text or "None! πŸŽ‰", inline=False) + else: + embed.add_field(name="πŸŽ‰ Issues", value="Zero issues found!", inline=False) + + # Next grade hint + hints = {"D": "Fix syntax errors β†’ escape SOS", "C": "Tackle bare excepts β†’ reach B", + "B": "Clear lint issues β†’ reach A-CLEAN", "A": "Chase that S-LEGENDARY!", + "S": "S-LEGENDARY! You're the GOAT 🐐"} + embed.set_footer(text=hints.get(letter, "") + " β€’ /health to rescan") + return embed + + +class HealthCheck(commands.Cog): + def __init__(self, bot: commands.Bot): + self.bot = bot + self._nemoclaw_ready = _NEMOCLAW.exists() + + @app_commands.command(name="health", description="πŸ₯ Run NemoClaw live code health scan β€” chase the S grade!") + async def health(self, interaction: discord.Interaction) -> None: + await interaction.response.defer(thinking=True) + + if not self._nemoclaw_ready: + # Fall back to cached report + cached = _load_cached() + if cached: + cached["live"] = False + embed = _build_embed(cached) + embed.description += "\n⚠️ *Live scan unavailable β€” showing last cached result*" + await interaction.followup.send(embed=embed) + else: + await interaction.followup.send( + "❌ NemoClaw not mounted and no cached report found.\n" + "Run `make health` from the repo first.", + ephemeral=True, + ) + return + + # Scanning message + scanning_embed = discord.Embed( + title="πŸ” NemoClaw Scanning...", + description="Analysing Python files across backend + agents...", + colour=0x5865F2, + ) + msg = await interaction.followup.send(embed=scanning_embed, wait=True) + + try: + report = _run_scan() + except Exception as e: + await msg.edit(content=f"❌ Scan failed: {e}", embed=None) + return + + embed = _build_embed(report) + await msg.edit(embed=embed) + + @app_commands.command(name="health-last", description="πŸ“‹ Show last cached NemoClaw health report") + async def health_last(self, interaction: discord.Interaction) -> None: + cached = _load_cached() + if not cached: + await interaction.response.send_message( + "No cached report yet. Run `make health` or `/health` first.", + ephemeral=True, + ) + return + cached["live"] = False + embed = _build_embed(cached) + await interaction.response.send_message(embed=embed) + + +async def setup(bot: commands.Bot) -> None: + await bot.add_cog(HealthCheck(bot)) diff --git a/agents/broski-bot/src/cogs/pets.py b/agents/broski-bot/src/cogs/pets.py index 5f17f13..b99e73e 100644 --- a/agents/broski-bot/src/cogs/pets.py +++ b/agents/broski-bot/src/cogs/pets.py @@ -241,6 +241,46 @@ async def feed(self, interaction: discord.Interaction) -> None: embed.add_field(name="Happiness", value=f"{happiness}/100", inline=True) await interaction.followup.send(embed=embed, ephemeral=True) + @pet.command(name="leaderboard", description="Show the top BROskiPets by XP") + async def leaderboard(self, interaction: discord.Interaction) -> None: + await interaction.response.defer(ephemeral=False) + + try: + res = await self._bridge_get("/leaderboard") + except Exception as e: + logger.error("Pet leaderboard request failed", error=str(e)) + await interaction.followup.send("❌ Leaderboard service is unavailable right now.", ephemeral=False) + return + + if res.status_code != 200: + logger.warning("Pet leaderboard non-200", status_code=res.status_code, body=res.text) + await interaction.followup.send("❌ Couldn’t fetch leaderboard right now.", ephemeral=False) + return + + data = res.json() + if not isinstance(data, list) or len(data) == 0: + await interaction.followup.send("πŸ† No leaderboard entries yet. Hatch some pets first!", ephemeral=False) + return + + lines: list[str] = [] + for row in data: + if not isinstance(row, dict): + continue + rank = int(row.get("rank", 0)) + name = str(row.get("name", "Unknown")) + species = str(row.get("species", "Unknown")) + level = int(row.get("level", 1)) + xp = int(row.get("xp", 0)) + prefix = "⭐ " if rank == 1 else "" + lines.append(f"{prefix}#{rank} {name} ({species}) β€” Lvl {level} β€” {xp} XP") + + embed = discord.Embed( + title="πŸ† BROskiPets Leaderboard", + description="\n".join(lines)[:3900], + color=discord.Color.gold(), + ) + await interaction.followup.send(embed=embed, ephemeral=False) + async def setup(bot: commands.Bot) -> None: await bot.add_cog(Pets(bot)) diff --git a/agents/broski-pets-bridge/main.py b/agents/broski-pets-bridge/main.py index 9833348..870c264 100644 --- a/agents/broski-pets-bridge/main.py +++ b/agents/broski-pets-bridge/main.py @@ -1,6 +1,7 @@ import os import json import uuid +import asyncio import subprocess from datetime import datetime, timezone from secrets import SystemRandom @@ -123,6 +124,196 @@ def _load_squad() -> dict: return {} +def _mcp_base_url() -> str: + raw = os.getenv("MCP_GATEWAY_URL", "http://mcp-gateway:8099").strip().rstrip("/") + if raw.startswith("tcp://"): + raw = "http://" + raw[len("tcp://") :] + if not raw.startswith(("http://", "https://")): + raw = "http://" + raw + return raw + + +def _mcp_headers() -> dict[str, str]: + token = os.getenv("MCP_GATEWAY_AUTH_TOKEN", "").strip() + if not token: + return {} + return {"Authorization": f"Bearer {token}"} + +def _parse_mcp_sse_payload(text: str) -> dict: + for line in text.splitlines(): + if not line.startswith("data:"): + continue + raw = line[len("data:") :].strip() + if not raw: + continue + try: + return json.loads(raw) + except Exception: + continue + raise ValueError("Invalid MCP SSE response") + + +async def _mcp_jsonrpc_call(message: dict, timeout_s: float, session_id: str | None = None) -> tuple[dict, str | None]: + base = _mcp_base_url() + endpoints = [f"{base}/mcp", base] + headers = {"Accept": "application/json, text/event-stream", **_mcp_headers()} + if session_id: + headers["Mcp-Session-Id"] = session_id + + async with httpx.AsyncClient(timeout=timeout_s, headers=headers) as client: + last_err: Exception | None = None + for url in endpoints: + try: + resp = await client.post(url, json=message) + if resp.status_code == 200: + sid = resp.headers.get("Mcp-Session-Id") or session_id + return _parse_mcp_sse_payload(resp.text), sid + last_err = HTTPException(status_code=resp.status_code, detail=resp.text) + except Exception as e: + last_err = e + continue + raise last_err or RuntimeError("MCP call failed") + + +async def _mcp_connected(timeout_s: float = 1.0) -> bool: + try: + _resp, _sid = await _mcp_jsonrpc_call( + { + "jsonrpc": "2.0", + "id": 1, + "method": "initialize", + "params": { + "protocolVersion": "2024-11-05", + "capabilities": {}, + "clientInfo": {"name": "broski-pets-bridge", "version": "2.4"}, + }, + }, + timeout_s=timeout_s, + ) + return True + except Exception: + return False + + +def _select_github_search_tool(tools: list[dict]) -> dict | None: + preferred = [ + "github.search_issues", + "github.search_pull_requests", + "github.search_issues_and_prs", + "github.search", + "search_issues", + "search_pull_requests", + "search_issues_and_prs", + "search", + ] + by_name: dict[str, dict] = {} + for t in tools: + name = t.get("name") + if isinstance(name, str): + by_name[name] = t + + for name in preferred: + if name in by_name: + return by_name[name] + + for name, tool in by_name.items(): + if "search" in name: + return tool + return None + + +def _build_tool_args(tool: dict, query: str) -> dict: + schema = tool.get("inputSchema") or {} + props = schema.get("properties") if isinstance(schema, dict) else {} + if not isinstance(props, dict): + props = {} + + args: dict[str, object] = {} + repo = os.getenv("GITHUB_CONTEXT_REPO", "welshDog/HyperCode-V2.4") + + q_key = "query" if "query" in props else ("q" if "q" in props else None) + if q_key: + args[q_key] = f"repo:{repo} {query}" + else: + args["query"] = f"repo:{repo} {query}" + + if "repo" in props: + args["repo"] = repo + if "repository" in props: + args["repository"] = repo + if "owner" in props and "repo" in props: + if "/" in repo: + owner, name = repo.split("/", 1) + args["owner"] = owner + args["repo"] = name + + if "limit" in props: + args["limit"] = 5 + if "per_page" in props: + args["per_page"] = 5 + + return args + + +async def _github_context_via_mcp(question: str) -> str: + _resp, sid = await _mcp_jsonrpc_call( + { + "jsonrpc": "2.0", + "id": 1, + "method": "initialize", + "params": { + "protocolVersion": "2024-11-05", + "capabilities": {}, + "clientInfo": {"name": "broski-pets-bridge", "version": "2.4"}, + }, + }, + timeout_s=10.0, + ) + + tools_resp, sid = await _mcp_jsonrpc_call( + {"jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {}}, + timeout_s=15.0, + session_id=sid, + ) + tools = tools_resp.get("result", {}).get("tools", []) + if not isinstance(tools, list): + return "" + + tool = _select_github_search_tool([t for t in tools if isinstance(t, dict)]) + if not tool: + return "" + + tool_name = tool.get("name") + if not isinstance(tool_name, str): + return "" + + args = _build_tool_args(tool, question) + call_resp, _sid = await _mcp_jsonrpc_call( + { + "jsonrpc": "2.0", + "id": 3, + "method": "tools/call", + "params": {"name": tool_name, "arguments": args}, + }, + timeout_s=30.0, + session_id=sid, + ) + + result = call_resp.get("result") or {} + content = result.get("content") if isinstance(result, dict) else None + if not isinstance(content, list): + return "" + + chunks: list[str] = [] + for item in content: + if not isinstance(item, dict): + continue + if item.get("type") == "text" and isinstance(item.get("text"), str): + chunks.append(item["text"]) + out = "\n\n".join(chunks).strip() + return out[:2000] + + def _pet_key(discord_id: str) -> str: return f"pet:{discord_id}" @@ -496,6 +687,13 @@ async def pet_chat(discord_id: str, body: ChatRequest) -> dict[str, object]: if body.mode == "ask": style = "You are a senior pair-programmer. Be concrete, step-by-step, and focused on solving the coding question." + github_context = "" + if body.mode == "ask": + try: + github_context = await _github_context_via_mcp(body.message) + except Exception: + github_context = "" + prompt = ( f"{style}\n" f"Pet: {name} | Species: {species} | Rarity: {rarity} | Stage: {level}\n" @@ -504,6 +702,7 @@ async def pet_chat(discord_id: str, body: ChatRequest) -> dict[str, object]: f"Capabilities: {', '.join(caps) if caps else 'none'}\n\n" f"Recent git diff (may be empty):\n{git_diff}\n\n" f"WHATS_DONE.md (may be empty):\n{whats_done}\n\n" + f"GitHub context via MCP (may be empty):\n{github_context}\n\n" f"User message:\n{body.message}\n" ) @@ -623,6 +822,52 @@ async def pet_powers(discord_id: str) -> dict[str, object]: } +@app.get("/leaderboard") +async def leaderboard() -> list[dict[str, object]]: + r = _redis() + out: list[dict[str, object]] = [] + + for key in r.scan_iter(match="pet:*"): + if not isinstance(key, str): + continue + discord_id = key[len("pet:") :] + raw = r.get(key) + if not raw: + continue + try: + pet = json.loads(raw) + except Exception: + continue + if not isinstance(pet, dict): + continue + + out.append( + { + "discord_id": discord_id, + "name": str(pet.get("name", "Unknown")), + "species": str(pet.get("species", "Unknown")), + "level": int(pet.get("level", 1)), + "xp": int(pet.get("xp", 0)), + } + ) + + out.sort(key=lambda x: int(x.get("xp", 0)), reverse=True) + top = out[:10] + ranked: list[dict[str, object]] = [] + for i, row in enumerate(top, start=1): + ranked.append( + { + "rank": i, + "discord_id": row["discord_id"], + "name": row["name"], + "species": row["species"], + "level": row["level"], + "xp": row["xp"], + } + ) + return ranked + + @app.get("/health") async def health() -> dict[str, object]: ollama_url = os.getenv("OLLAMA_URL", "http://hypercode-ollama:11434").rstrip("/") @@ -644,11 +889,18 @@ async def health() -> dict[str, object]: except Exception: redis_connected = False + mcp_ok = False + try: + mcp_ok = await _mcp_connected() + except Exception: + mcp_ok = False + return { "status": "ok", "service": "broski-pets-bridge", "pets_enabled": pets_enabled, "ollama_connected": ollama_connected, "redis_connected": redis_connected, + "mcp_connected": mcp_ok, "redis_db": 3, } diff --git a/backend/app/core/config.py b/backend/app/core/config.py index 3377d80..33fce58 100644 --- a/backend/app/core/config.py +++ b/backend/app/core/config.py @@ -125,6 +125,7 @@ def validate_security(self) -> None: env_file=".env", env_file_encoding="utf-8", case_sensitive=True, + env_ignore_empty=True, extra="ignore" # Allow extra fields in env ) diff --git a/backend/app/routes/stripe.py b/backend/app/routes/stripe.py index 4478838..af73683 100644 --- a/backend/app/routes/stripe.py +++ b/backend/app/routes/stripe.py @@ -8,6 +8,7 @@ from fastapi import APIRouter, HTTPException, Request, Header from pydantic import BaseModel from typing import Optional +from app.core.config import settings from app.services.stripe_service import ( create_checkout_session, create_course_checkout_session, @@ -98,7 +99,7 @@ async def get_plans(request: Request): @router.post("/webhook") async def stripe_webhook( request: Request, - stripe_signature: Optional[str] = Header(None), + stripe_signature: Optional[str] = Header(None, alias="Stripe-Signature"), ): """ Handle incoming Stripe webhook events. @@ -107,7 +108,9 @@ async def stripe_webhook( payload = await request.body() webhook_secret = os.getenv("STRIPE_WEBHOOK_SECRET", "") - if not webhook_secret: + is_production = settings.ENVIRONMENT.lower() == "production" + + if (not webhook_secret) or ((not is_production) and (not stripe_signature)): logger.warning("STRIPE_WEBHOOK_SECRET not set β€” skipping signature check (dev mode)") try: import json @@ -115,6 +118,8 @@ async def stripe_webhook( except Exception as e: raise HTTPException(status_code=400, detail=f"Invalid payload: {e}") else: + if not stripe_signature: + raise HTTPException(status_code=400, detail="Missing Stripe-Signature header") try: event = stripe.Webhook.construct_event( payload, stripe_signature, webhook_secret diff --git a/docker-compose.secrets.yml b/docker-compose.secrets.yml index ada4407..15f38d7 100644 --- a/docker-compose.secrets.yml +++ b/docker-compose.secrets.yml @@ -8,6 +8,8 @@ # Solution: Both use plain env var from .env ONLY β€” no _FILE override here secrets: + anthropic_api_key: + file: ./secrets/anthropic_api_key.txt postgres_password: file: ./secrets/postgres_password.txt api_key: @@ -45,11 +47,13 @@ services: HYPERCODE_JWT_SECRET_FILE: /run/secrets/jwt_secret JWT_SECRET_FILE: /run/secrets/jwt_secret PERPLEXITY_API_KEY_FILE: /run/secrets/perplexity_api_key + ANTHROPIC_API_KEY_FILE: /run/secrets/anthropic_api_key secrets: - api_key - jwt_secret - perplexity_api_key - memory_key + - anthropic_api_key broski-bot: environment: diff --git a/docker-compose.yml b/docker-compose.yml index 52d7ecb..fa361cd 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -31,17 +31,7 @@ networks: volumes: trivy-cache: redis-data: - driver: local - driver_opts: - type: none - o: bind - device: ${HC_DATA_ROOT}/redis postgres-data: - driver: local - driver_opts: - type: none - o: bind - device: ${HC_DATA_ROOT}/postgres grafana-data: driver: local driver_opts: @@ -67,11 +57,6 @@ volumes: o: bind device: ${HC_DATA_ROOT}/agent_memory minio_data: - driver: local - driver_opts: - type: none - o: bind - device: ${HC_DATA_ROOT}/minio chroma_data: driver: local driver_opts: @@ -142,7 +127,7 @@ services: container_name: postgres environment: POSTGRES_USER: ${POSTGRES_USER:-postgres} - POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-} + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-hypercode} POSTGRES_DB: ${POSTGRES_DB:-hypercode} volumes: - postgres-data:/var/lib/postgresql/data @@ -191,7 +176,7 @@ services: - ENVIRONMENT=development - PRISMA_PY_DEBUG=0 - HYPERCODE_REDIS_URL=${HYPERCODE_REDIS_URL} - - HYPERCODE_DB_URL=postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-}@postgres:5432/${POSTGRES_DB:-hypercode} + - HYPERCODE_DB_URL=postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-hypercode}@postgres:5432/${POSTGRES_DB:-hypercode} - HYPERCODE_MEMORY_KEY=${HYPERCODE_MEMORY_KEY} - API_KEY=${API_KEY} - HYPERCODE_JWT_SECRET=${HYPERCODE_JWT_SECRET} @@ -293,22 +278,24 @@ services: environment: - DISCORD_TOKEN=${DISCORD_TOKEN} - DISCORD_GUILD_ID=${DISCORD_GUILD_ID} - - DATABASE_URL=postgresql+asyncpg://postgres:${POSTGRES_PASSWORD:-}@postgres:5432/broski + - DATABASE_URL=postgresql+asyncpg://postgres:${POSTGRES_PASSWORD:-hypercode}@postgres:5432/broski - DB_HOST=postgres - DB_PORT=5432 - DB_NAME=broski - DB_USER=postgres - - DB_PASSWORD=${POSTGRES_PASSWORD:-} + - DB_PASSWORD=${POSTGRES_PASSWORD:-hypercode} - REDIS_URL=redis://redis:6379/0 + - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} - PERPLEXITY_API_KEY=${PERPLEXITY_API_KEY} - - OPENAI_API_KEY=${OPENAI_API_KEY} + - MEMSTREAM_API_TOKEN=${MEMSTREAM_API_TOKEN} + - MEMSTREAM_API_URL=${MEMSTREAM_API_URL:-http://memstream:8011} - ENABLE_AUTO_MODERATION=true - ENABLE_ANALYTICS=true - PROMETHEUS_ENABLED=true - PROMETHEUS_PORT=8000 - HYPERCODE_CORE_URL=http://hypercode-core:8000 - OLLAMA_HOST=http://hypercode-ollama:11434 - - OLLAMA_MODEL=phi3:latest + - OLLAMA_MODEL=${OLLAMA_MODEL:-qwen2.5-coder:3b} depends_on: redis: condition: service_healthy @@ -339,6 +326,10 @@ services: volumes: - ./agents/broski-bot/src:/app/src - ./docs/outputs:/app/outputs + - ./agents/broski-nemoclaw-agent:/app/nemoclaw_agent:ro + - ./backend:/app/backend_src:ro + - ./agents:/app/project_agents:ro + - ./reports/broski-analysis:/app/health_reports # OBSERVABILITY prometheus: @@ -452,6 +443,14 @@ services: retries: 3 start_period: 40s restart: unless-stopped + deploy: + resources: + limits: + cpus: "0.25" + memory: 256M + reservations: + cpus: "0.05" + memory: 64M security_opt: - no-new-privileges:true logging: @@ -503,8 +502,8 @@ services: container_name: minio command: server /data --console-address ":9001" environment: - - MINIO_ROOT_USER=${MINIO_ROOT_USER} - - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD} + - MINIO_ROOT_USER=${MINIO_ROOT_USER:-hypercode} + - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-hypercode} - MINIO_PROMETHEUS_AUTH_TYPE=public volumes: - minio_data:/data @@ -520,6 +519,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M security_opt: - no-new-privileges:true logging: @@ -547,6 +554,14 @@ services: timeout: 10s retries: 3 start_period: 30s + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 256M logging: driver: "json-file" options: @@ -638,6 +653,14 @@ services: retries: 3 start_period: 20s restart: unless-stopped + deploy: + resources: + limits: + cpus: "0.25" + memory: 256M + reservations: + cpus: "0.05" + memory: 128M logging: driver: "json-file" options: @@ -659,7 +682,7 @@ services: CELERY_BROKER_URL: ${CELERY_BROKER_URL:-redis://redis:6379/0} CELERY_RESULT_BACKEND: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1} PYTHONPATH: /home/appuser/.local/lib/python3.11/site-packages:/app - HYPERCODE_DB_URL: postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-}@postgres:5432/${POSTGRES_DB:-hypercode} + HYPERCODE_DB_URL: postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-hypercode}@postgres:5432/${POSTGRES_DB:-hypercode} HYPERCODE_REDIS_URL: redis://redis:6379/0 HYPERCODE_MEMORY_KEY: ${HYPERCODE_MEMORY_KEY} PERPLEXITY_API_KEY: ${PERPLEXITY_API_KEY} @@ -718,6 +741,14 @@ services: celery-worker: condition: service_healthy restart: unless-stopped + deploy: + resources: + limits: + cpus: "0.25" + memory: 256M + reservations: + cpus: "0.05" + memory: 64M logging: driver: "json-file" options: @@ -748,6 +779,14 @@ services: timeout: 10s retries: 3 restart: unless-stopped + deploy: + resources: + limits: + cpus: "0.25" + memory: 128M + reservations: + cpus: "0.05" + memory: 32M networks: - agents-net volumes: @@ -802,6 +841,14 @@ services: timeout: 10s retries: 3 start_period: 10s + deploy: + resources: + limits: + cpus: "0.25" + memory: 128M + reservations: + cpus: "0.05" + memory: 32M auto-prune: profiles: ["ops", "health"] @@ -823,6 +870,14 @@ services: timeout: 10s retries: 3 start_period: 10s + deploy: + resources: + limits: + cpus: "0.1" + memory: 128M + reservations: + cpus: "0.02" + memory: 32M logging: driver: "json-file" options: @@ -911,6 +966,45 @@ services: cap_drop: - ALL + mcp-gateway: + profiles: ["agents"] + image: docker/mcp-gateway:latest + container_name: mcp-gateway + command: + - --servers=github + - --transport=streaming + - --port=8099 + volumes: + - /var/run/docker.sock:/var/run/docker.sock:ro + environment: + - GITHUB_TOKEN=${GITHUB_TOKEN} + - GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_TOKEN} + expose: + - "8099" + networks: + - agents-net + depends_on: + docker-socket-proxy: + condition: service_started + restart: unless-stopped + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + security_opt: + - no-new-privileges:true + cap_drop: + - ALL + broski-pets-bridge: profiles: ["agents"] build: @@ -925,6 +1019,7 @@ services: - PETS_OLLAMA_MODEL=${PETS_OLLAMA_MODEL:-qwen2.5:7b} - WORKSPACE_PATH=/workspace - SQUAD_JSON_PATH=/workspace/squad.json + - MCP_GATEWAY_URL=${MCP_GATEWAY_URL:-tcp://mcp-gateway:8099} - PETS_WEBHOOK_SECRET=${PETS_WEBHOOK_SECRET:-} - COURSE_SYNC_SECRET=${COURSE_SYNC_SECRET:-} volumes: @@ -939,6 +1034,8 @@ services: condition: service_healthy hypercode-ollama: condition: service_healthy + mcp-gateway: + condition: service_started healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8098/health"] interval: 30s @@ -999,6 +1096,9 @@ services: limits: cpus: "0.5" memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1039,6 +1139,9 @@ services: limits: cpus: "0.5" memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1067,6 +1170,9 @@ services: limits: cpus: "0.5" memory: 256M + reservations: + cpus: "0.05" + memory: 64M logging: driver: "json-file" options: @@ -1089,7 +1195,7 @@ services: - ORCHESTRATOR_REDIS_PORT=6379 - ORCHESTRATOR_ENABLED_AGENTS=${ORCHESTRATOR_ENABLED_AGENTS:-} - POSTGRES_USER=${POSTGRES_USER} - - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-} + - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-hypercode} - ORCHESTRATOR_API_KEY=${ORCHESTRATOR_API_KEY:-} - SMOKE_ENDPOINT_ENABLED=${SMOKE_ENDPOINT_ENABLED:-false} - SMOKE_KEY_ALLOWLIST=${SMOKE_KEY_ALLOWLIST:-} @@ -1144,8 +1250,8 @@ services: - AGENT_PORT=8001 - REDIS_HOST=redis - REDIS_PORT=6379 + - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} - PERPLEXITY_API_KEY=${PERPLEXITY_API_KEY} - - OPENAI_API_KEY=${OPENAI_API_KEY} volumes: - ./agents/business/project-strategist:/app - ./Configuration_Kit:/app/hive_mind:ro @@ -1166,6 +1272,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1203,6 +1317,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1240,6 +1362,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1277,6 +1407,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1314,6 +1452,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1351,6 +1497,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1425,6 +1579,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1462,6 +1624,14 @@ services: interval: 30s timeout: 10s retries: 3 + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 128M logging: driver: "json-file" options: @@ -1672,8 +1842,46 @@ services: container_name: hypercode-ollama volumes: - ollama-data:/root/.ollama + ports: + - "127.0.0.1:11434:11434" networks: - agents-net + environment: + - OLLAMA_KEEP_ALIVE=24h + - OLLAMA_NUM_PARALLEL=2 + - OLLAMA_MAX_LOADED_MODELS=2 + deploy: + resources: + limits: + cpus: "2" + memory: 3G + reservations: + cpus: "1" + memory: 1G + healthcheck: + test: ["CMD", "ollama", "list"] + interval: 30s + timeout: 10s + retries: 3 + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + security_opt: + - no-new-privileges:true + + hypercode-ollama-gpu: + image: ${OLLAMA_IMAGE:-ollama/ollama:0.3.14} + profiles: ["gpu"] + volumes: + - ollama-data:/root/.ollama + ports: + - "127.0.0.1:11434:11434" + networks: + agents-net: + aliases: + - hypercode-ollama deploy: resources: limits: @@ -1682,6 +1890,10 @@ services: reservations: cpus: "1" memory: 1G + devices: + - driver: nvidia + count: all + capabilities: ["gpu"] healthcheck: test: ["CMD", "ollama", "list"] interval: 30s @@ -1720,6 +1932,16 @@ services: REDIS_URL: redis://redis:6379 POSTGRES_URL: postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-hypercode}@postgres:5432/${POSTGRES_DB:-hypercode} DOCKER_HOST: tcp://docker-socket-proxy:2375 + ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY} + PERPLEXITY_API_KEY: ${PERPLEXITY_API_KEY} + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 256M logging: driver: "json-file" options: @@ -1748,6 +1970,14 @@ services: - obs-net - agents-net restart: unless-stopped + deploy: + resources: + limits: + cpus: "0.25" + memory: 256M + reservations: + cpus: "0.05" + memory: 64M logging: driver: "json-file" options: @@ -1783,6 +2013,14 @@ services: hypercode.agent: "true" hypercode.archetype: "architect" hypercode.version: "2.0" + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 256M logging: driver: "json-file" options: @@ -1820,6 +2058,14 @@ services: hypercode.agent: "true" hypercode.archetype: "observer" hypercode.version: "2.0" + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 256M logging: driver: "json-file" options: @@ -1858,6 +2104,14 @@ services: hypercode.agent: "true" hypercode.archetype: "worker" hypercode.version: "2.0" + deploy: + resources: + limits: + cpus: "0.5" + memory: 512M + reservations: + cpus: "0.1" + memory: 256M logging: driver: "json-file" options: @@ -1911,6 +2165,14 @@ services: hypercode.archetype: "architect" hypercode.version: "2.0" hypercode.role: "meta-architect" + deploy: + resources: + limits: + cpus: "1" + memory: 1G + reservations: + cpus: "0.25" + memory: 512M logging: driver: "json-file" options: @@ -1929,7 +2191,7 @@ services: - "127.0.0.1:8095:8090" environment: - PORT=8090 - - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-}@postgres:5432/${POSTGRES_DB:-hypercode} + - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-hypercode}@postgres:5432/${POSTGRES_DB:-hypercode} - REDIS_URL=redis://redis:6379/0 - HEALER_URL=http://healer-agent:8008 - CORE_URL=http://hypercode-core:8000 @@ -1986,7 +2248,7 @@ services: restart: unless-stopped command: ["python", "-m", "worker"] environment: - - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-}@postgres:5432/${POSTGRES_DB:-hypercode} + - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-hypercode}@postgres:5432/${POSTGRES_DB:-hypercode} - REDIS_URL=redis://redis:6379/0 - HEALER_URL=http://healer-agent:8008 - CORE_URL=http://hypercode-core:8000 diff --git a/docs/RUNBOOK.md b/docs/RUNBOOK.md index f201d8d..45db2d1 100644 --- a/docs/RUNBOOK.md +++ b/docs/RUNBOOK.md @@ -1,7 +1,38 @@ -# πŸ¦… HyperCode V2.0 β€” Golden Startup Runbook +# πŸ¦… HyperCode V2.4 β€” Golden Startup Runbook > **The definitive "never get stuck again" guide.** -> Verified live on March 15, 2026. Keep this updated as the stack evolves. +> Verified live on March 15, 2026. Updated April 17, 2026 β€” OOM recovery section added. + +--- + +## 🚨 OOM / Disk Crash Recovery (April 17, 2026 β€” real incident) + +**What happened:** Agent X built 30+ images with no memory limit β†’ observability stack OOM killed (exit 137) β†’ 6 containers down. + +**Exit code reference:** +- `137` = OOM killed (container hit memory limit or host ran out of RAM) +- `128` = SIGTERM under stress (system killed it externally) + +**Recovery steps (in order):** +```powershell +# 1. Check disk + what's using space +docker system df + +# 2. Free space β€” prune dead containers + dangling images + old build cache +docker container prune -f +docker image prune -f --filter until=48h +docker builder prune -f --keep-storage=5gb + +# 3. Restart the OOM-killed services (observability stack in this case) +docker compose -f docker-compose.yml -f docker-compose.secrets.yml up -d prometheus grafana loki tempo alertmanager + +# 4. Verify all back up +docker compose ps +``` + +**Prevention (now in place):** +- Every service has `deploy.resources.limits` in `docker-compose.yml` β€” no unlimited containers +- `make build` runs `scripts/pre-build-check.sh` first β€” aborts if <15GB free or <1GB RAM --- diff --git a/reports/broski-analysis/latest.json b/reports/broski-analysis/latest.json new file mode 100644 index 0000000..3f3f6fe --- /dev/null +++ b/reports/broski-analysis/latest.json @@ -0,0 +1,59 @@ +{ + "score": 99, + "grade": "S", + "label": "LEGENDARY", + "emoji": "\ud83c\udfc6", + "colour": 65416, + "files_scanned": 10783, + "lint_issues": 225, + "ast_issues": 129, + "total_issues": 354, + "top_lint": [ + { + "file": "agents\\01-frontend-specialist\\base_agent.py", + "line": 14, + "msg": "`asyncio` imported but unused" + }, + { + "file": "agents\\09-tips-tricks-writer\\agent.py", + "line": 6, + "msg": "`os` imported but unused" + }, + { + "file": "agents\\agent-x\\designer.py", + "line": 16, + "msg": "`typing.Optional` imported but unused" + }, + { + "file": "agents\\agent-x\\designer.py", + "line": 132, + "msg": "Multiple imports on one line" + }, + { + "file": "agents\\agent-x\\designer.py", + "line": 206, + "msg": "Multiple imports on one line" + } + ], + "top_ast": [ + { + "file": "examples\\api_usage.py", + "line": 69, + "msg": "Bare except \u2014 catches everything", + "severity": "medium" + }, + { + "file": "hyperstudio-platform\\api-main.py", + "line": 461, + "msg": "Bare except \u2014 catches everything", + "severity": "medium" + }, + { + "file": "scripts\\generate_health_check_compose.py", + "line": 274, + "msg": "Bare except \u2014 catches everything", + "severity": "medium" + } + ], + "scanned_at": "2026-04-17T10:04:07.026309+00:00" +} \ No newline at end of file diff --git a/reports/docker-environment-status-report.md b/reports/docker-environment-status-report.md new file mode 100644 index 0000000..5137d81 --- /dev/null +++ b/reports/docker-environment-status-report.md @@ -0,0 +1,195 @@ +# HyperCode V2.4 β€” Docker Environment Health Report + +Generated: 2026-04-17 + +## Executive Summary + +Overall Status: CRITICAL + +Primary blockers: +- Core data services are down (postgres, redis). Multiple dependent containers are failing or restarting. +- Persistent volume bind targets resolve to a host path that does not exist on this Windows filesystem, consistent with data-layer instability. +- Security scanning for hypercode-core via Trivy is timing out, so current vulnerability posture for the largest image is unknown. + +## Severity Scale + +- CRITICAL: service outage, data risk, or security visibility loss +- HIGH: major degradation, repeated restarts, or broad reliability impact +- MEDIUM: partial degradation or misconfiguration with limited blast radius +- LOW: minor issue, optimization, or hygiene + +## System Baselines (Snapshot) + +### Docker Engine + +- Docker Desktop: 4.69.0 +- Engine: 29.4.0 (linux/amd64, WSL2 kernel 6.6.87.2) +- Context: desktop-linux +- Resources: 6 CPUs, 4.803 GiB memory + +### Docker Daemon Responsiveness + +- docker ps latency: ~249 ms +- docker images latency: ~853 ms +- docker inspect latency: ~299 ms + +### Disk Utilization (docker system df) + +- Images: 71.82 GB (reclaimable 22.08 GB) +- Build cache: 14.81 GB +- Local volumes: 1.089 GB + +Notable large images: +- hypercode-core:latest ~13.9 GB +- hypercode-v24-celery-worker ~13.8 GB + +### Resource Utilization (docker stats --no-stream) + +Highest memory consumers: +- hypercode-core: ~553.5 MiB / 1.5 GiB (36%) +- hypercode-dashboard: ~72.5 MiB / 512 MiB (14%) +- broski-pets-bridge: ~64.5 MiB / 256 MiB (25%) + +Notable CPU spikes: +- celery-exporter: >100% at snapshot +- celery-worker: ~83% at snapshot + +## Container Health & Restart Behavior + +### CRITICAL + +- broski-bot + - Status: restarting + - RestartCount: 49 + - Log root cause: missing required database settings (db_password) +- hypercode-core + - Status: running but failing startup loops + - RestartCount: 35 + - Log root cause: cannot resolve postgres host; startup fails during DB metadata init +- redis (exited) + - Symptoms: shutdown errors and persistence write issues +- postgres (exited) + - Symptoms: repeated broken pipe / client disconnects then shutdown +- minio (exited) + - Symptoms: read/write quorum failures, indicates storage drives not online +- observability stack (prometheus, grafana, loki, tempo, promtail, alertmanager) are not currently running + +### HIGH + +- celery-exporter + - Status: running (health: starting) + - RestartCount: 123 + - Log root cause: cannot resolve redis:6379 (name resolution failure) +- celery-worker + - Status: running (health: starting) + - RestartCount: 73 + +### OK (Healthy) + +- broski-pets-bridge (healthy) +- healer-agent (healthy) +- hypercode-dashboard (healthy) +- hypercode-mcp-server (healthy) +- docker-socket-proxy, docker-socket-proxy-build (healthy) +- node-exporter, cadvisor (healthy) +- agent-x, hyper-architect, hyper-worker, hyper-observer, super-hyper-broski-agent (healthy) + +## Network Connectivity + +### Findings + +- From broski-pets-bridge container (agents-net + data-net): + - mcp-gateway:8099 reachable + - hypercode-core:8000 connection refused at test time + - redis:6379 DNS not resolvable (container is stopped, so DNS entry not present) + - postgres:5432 DNS not resolvable (container is stopped, so DNS entry not present) + +Impact: +- Any service requiring redis/postgres is unstable or down. + +## Volume Mount Integrity + +### CRITICAL + +Multiple Docker volumes are configured as local-driver bind mounts to: +- H:/HyperStation zone/HyperCode/volumes/ + +On this Windows filesystem, H:\HyperStation zone\HyperCode\volumes does not exist. + +Observed symptoms consistent with missing/invalid bind targets: +- redis log indicates failure saving RDB temp file under /data +- minio reports β€œno online disks found” and quorum failures + +## Port Exposure Validation + +### HIGH (Public binds on 0.0.0.0) + +The following running containers publish ports on all interfaces: +- agent-x: 8080/tcp (0.0.0.0 and [::]) +- hyper-architect: 8091/tcp (0.0.0.0 and [::]) +- hyper-observer: 8092/tcp (0.0.0.0 and [::]) +- hyper-worker: 8093/tcp (0.0.0.0 and [::]) + +Everything else observed is localhost-bound (127.0.0.1) or internal-only. + +## Security Vulnerabilities (Images) + +### HIGH (Security visibility gap) + +The always-on Trivy scanner container is running, but it is not producing the expected output file: +- Expected: reports/security/trivy-hypercode-core.json +- Current state: missing +- Scanner log shows: context deadline exceeded during analysis of hypercode-core:latest + +Impact: +- Current HIGH/CRITICAL vulnerability counts for hypercode-core cannot be confirmed from this environment snapshot. + +## Environment Variable Consistency + +Compose evaluation reports unset variables (defaulting to blank string): +- API_KEY +- DISCORD_TOKEN +- HYPERCODE_JWT_SECRET +- HYPERCODE_MEMORY_KEY +- MINIO_ROOT_USER +- MINIO_ROOT_PASSWORD + +Observed runtime impact: +- broski-bot fails because DB password resolves to blank. + +## Remediation Recommendations + +### CRITICAL (Do first) + +- Fix persistent volume bind targets: + - Create the missing host directory tree under H:\HyperStation zone\HyperCode\volumes\* + - Or update HC_DATA_ROOT so it points to an existing, Docker-shared path and recreate the volumes +- Restore data services: + - Bring up postgres + redis and confirm they become healthy before starting dependent services +- Fix required secrets/env: + - Ensure POSTGRES_PASSWORD (or DB_PASSWORD) is set so broski-bot can initialize its DB + +### HIGH + +- Reduce restart storms: + - celery-exporter and celery-worker should not restart repeatedly; validate redis DNS + broker URL and network attachment +- Restore observability stack: + - Bring up prometheus/grafana/loki/tempo/promtail/alertmanager to regain metrics/logs/traces +- Close public port binds unless explicitly intended: + - Consider binding agent ports to 127.0.0.1 or moving them behind a gateway + +### MEDIUM + +- Trivy stability: + - Increase Trivy timeout for hypercode-core scans and disable secret scanning for image scans if needed + - Store the generated JSON under reports/security and track 0 CRITICAL as a release gate + +## Next Health Check Run + +Re-run these after remediation to confirm improvement: +- docker compose ps +- docker stats --no-stream +- curl http://127.0.0.1:8000/health +- curl http://127.0.0.1:8098/health +- docker logs broski-bot (should show DB init success) + diff --git a/scripts/health_report.py b/scripts/health_report.py new file mode 100644 index 0000000..76b76f3 --- /dev/null +++ b/scripts/health_report.py @@ -0,0 +1,239 @@ +#!/usr/bin/env python3 +""" +NemoClaw Health Reporter β€” run from repo root. + python scripts/health_report.py + python scripts/health_report.py --webhook # also POST to Discord webhook +""" +from __future__ import annotations + +import ast +import json +import os +import sys +import subprocess +import argparse +import urllib.request +import urllib.error +from pathlib import Path +from datetime import datetime, timezone + +ROOT = Path(__file__).parent.parent +sys.path.insert(0, str(ROOT / "agents" / "broski-nemoclaw-agent")) + +# Windows UTF-8 fix for stdout + subprocess +if hasattr(sys.stdout, "reconfigure"): + sys.stdout.reconfigure(encoding="utf-8", errors="replace") # type: ignore[union-attr] +os.environ.setdefault("PYTHONIOENCODING", "utf-8") +os.environ.setdefault("PYTHONUTF8", "1") + +try: + from analyzer import BROskiAnalyzer, Issue +except ImportError as e: + print(f"❌ Cannot import BROskiAnalyzer: {e}") + sys.exit(1) + +REPORTS_DIR = ROOT / "reports" / "broski-analysis" +REPORTS_DIR.mkdir(parents=True, exist_ok=True) + +SKIP_DIRS = frozenset({ + ".git", ".venv", "venv", "__pycache__", "node_modules", + "backups", "reports", "htmlcov", "tests", ".mypy_cache", + "k8s", "helm", "docs", ".next", "dist", "build", +}) + +GRADE_MAP = [ + (95, "S", "LEGENDARY", 0x00FF88, "πŸ†"), + (80, "A", "CLEAN", 0x00BFFF, "βœ…"), + (65, "B", "GOOD", 0xFFD700, "πŸ‘"), + (50, "C", "NEEDS WORK", 0xFF8C00, "⚠️"), + (0, "D", "SOS MODE", 0xFF0000, "πŸ†˜"), +] + + +def grade_for(score: int) -> tuple[str, str, int, str]: + for threshold, letter, label, colour, emoji in GRADE_MAP: + if score >= threshold: + return letter, label, colour, emoji + return "D", "SOS MODE", 0xFF0000, "πŸ†˜" + + +def ast_scan(root: Path, files: list[Path]) -> list[dict]: + issues = [] + for fp in files: + if any(d in fp.parts for d in SKIP_DIRS): + continue + try: + tree = ast.parse(fp.read_text(errors="ignore")) + except SyntaxError as e: + issues.append({"file": str(fp.relative_to(root)), "line": e.lineno, "msg": str(e), "severity": "critical"}) + continue + for node in ast.walk(tree): + if isinstance(node, ast.ExceptHandler) and node.type is None: + issues.append({"file": str(fp.relative_to(root)), "line": node.lineno, "msg": "Bare except β€” catches everything", "severity": "medium"}) + return issues + + +def _ruff_utf8(root: Path) -> list: + """Run ruff with explicit UTF-8 encoding β€” Windows safe.""" + try: + r = subprocess.run( + ["ruff", "check", ".", "--output-format", "json"], + capture_output=True, cwd=root, timeout=120, + encoding="utf-8", errors="replace", + ) + import json as _j + raw = _j.loads(r.stdout or "[]") + issues = [] + for item in raw: + loc = item.get("location") or {} + from analyzer import Issue + issues.append(Issue( + file=str(item.get("filename", "")), + line=int(str(loc.get("row", 0))) if loc.get("row") else None, + severity="high" if str(item.get("code", "")).startswith(("S", "E9", "F8")) else "medium", + category=f"lint:{item.get('code','')}", + message=str(item.get("message", "")), + auto_fixable=item.get("fix") is not None, + )) + return issues + except Exception as exc: + print(f"[warn] ruff scan failed: {exc}") + return [] + + +def run_scan() -> dict: + analyzer = BROskiAnalyzer(ROOT) + files = analyzer.py_files() + ruff_issues = _ruff_utf8(ROOT) + ast_issues = ast_scan(ROOT, files) + + total = len(ruff_issues) + len(ast_issues) + score = max(0, min(100, round(100 - (total / max(len(files), 1)) * 40))) + letter, label, colour, emoji = grade_for(score) + + top_lint = [ + {"file": i.file.replace(str(ROOT) + "/", "").replace(str(ROOT) + "\\", ""), "line": i.line, "msg": i.message[:70]} + for i in ruff_issues[:5] + ] + top_ast = ast_issues[:3] + + report = { + "score": score, + "grade": letter, + "label": label, + "emoji": emoji, + "colour": colour, + "files_scanned": len(files), + "lint_issues": len(ruff_issues), + "ast_issues": len(ast_issues), + "total_issues": total, + "top_lint": top_lint, + "top_ast": top_ast, + "scanned_at": datetime.now(timezone.utc).isoformat(), + } + + # Save latest + latest = REPORTS_DIR / "latest.json" + latest.write_text(json.dumps(report, indent=2)) + + return report + + +def print_report(r: dict) -> None: + lines = [ + "", + "══════════════════════════════════════", + " πŸ€– NemoClaw Code Health Report", + "══════════════════════════════════════", + f" Files scanned : {r['files_scanned']}", + f" Lint issues : {r['lint_issues']}", + f" AST issues : {r['ast_issues']}", + f" Total issues : {r['total_issues']}", + f" Health Score : {r['score']}/100", + f" Grade : {r['grade']} β€” {r['label']} {r['emoji']}", + "══════════════════════════════════════", + ] + if r["top_lint"]: + lines.append("\nTOP LINT ISSUES:") + for i in r["top_lint"]: + lines.append(f" [LINT] {i['file']}:{i['line']} β€” {i['msg']}") + if r["top_ast"]: + lines.append("\nAST ISSUES:") + for i in r["top_ast"]: + lines.append(f" [AST] {i['file']}:{i['line']} β€” {i['msg']}") + lines.append("") + print("\n".join(lines)) + + +def post_discord_webhook(r: dict, webhook_url: str) -> None: + letter, label, colour, emoji = r["grade"], r["label"], r["colour"], r["emoji"] + score = r["score"] + + top_issues = "" + for i in (r["top_lint"] + r["top_ast"])[:5]: + f = i.get("file", "?") + ln = i.get("line", "?") + msg = i.get("msg", "")[:55] + top_issues += f"`{f}:{ln}` β€” {msg}\n" + + embed = { + "title": f"{emoji} NemoClaw Health β€” {letter} | {label}", + "color": colour, + "fields": [ + {"name": "Score", "value": f"**{score}/100**", "inline": True}, + {"name": "Grade", "value": f"**{letter} β€” {label}**", "inline": True}, + {"name": "Files", "value": str(r["files_scanned"]), "inline": True}, + {"name": "Lint", "value": str(r["lint_issues"]), "inline": True}, + {"name": "AST", "value": str(r["ast_issues"]), "inline": True}, + {"name": "Total", "value": str(r["total_issues"]), "inline": True}, + ], + "footer": {"text": f"HyperCode V2.4 β€’ {r['scanned_at'][:19]}Z"}, + } + if top_issues: + embed["fields"].append({"name": "Top Issues", "value": top_issues or "None πŸŽ‰", "inline": False}) + + payload = json.dumps({"embeds": [embed]}).encode() + req = urllib.request.Request( + webhook_url, + data=payload, + headers={"Content-Type": "application/json"}, + method="POST", + ) + try: + with urllib.request.urlopen(req, timeout=10) as resp: + print(f"βœ… Discord webhook posted (HTTP {resp.status})") + except urllib.error.HTTPError as e: + print(f"❌ Discord webhook failed: HTTP {e.code}") + except Exception as e: + print(f"❌ Discord webhook error: {e}") + + +def main() -> None: + parser = argparse.ArgumentParser(description="NemoClaw Health Reporter") + parser.add_argument("--webhook", action="store_true", help="Post result to DISCORD_HEALTH_WEBHOOK") + parser.add_argument("--json", action="store_true", help="Output raw JSON only") + args = parser.parse_args() + + print("πŸ” NemoClaw scanning...") if not args.json else None + + report = run_scan() + + if args.json: + print(json.dumps(report, indent=2)) + return + + print_report(report) + + if args.webhook: + webhook = os.getenv("DISCORD_HEALTH_WEBHOOK", "") + if webhook: + post_discord_webhook(report, webhook) + else: + print("⚠️ DISCORD_HEALTH_WEBHOOK not set β€” skipping webhook post") + + # Exit non-zero if grade D + sys.exit(1 if report["grade"] == "D" else 0) + + +if __name__ == "__main__": + main() diff --git a/scripts/pre-build-check.sh b/scripts/pre-build-check.sh new file mode 100644 index 0000000..d4df3d3 --- /dev/null +++ b/scripts/pre-build-check.sh @@ -0,0 +1,78 @@ +#!/bin/bash +# ============================================================ +# HyperCode Pre-Build Disk + Memory Safety Check +# Run this before any docker build to prevent OOM/OOD crashes +# Agent X should call this before spinning up new builds +# ============================================================ + +set -e + +RED='\033[0;31m' +YELLOW='\033[1;33m' +GREEN='\033[0;32m' +BOLD='\033[1m' +NC='\033[0m' + +MIN_DISK_GB=${MIN_DISK_GB:-15} +WARN_DISK_GB=${WARN_DISK_GB:-25} +MIN_MEM_MB=${MIN_MEM_MB:-1024} + +echo "" +echo -e "${BOLD}⚑ HyperCode Pre-Build Safety Check${NC}" +echo "=======================================" + +# ── Disk Check ────────────────────────────────────────────── +AVAILABLE_KB=$(df / | awk 'NR==2 {print $4}') +AVAILABLE_GB=$((AVAILABLE_KB / 1024 / 1024)) + +echo -e "πŸ’Ύ Disk available: ${BOLD}${AVAILABLE_GB}GB${NC}" + +if [ "$AVAILABLE_GB" -lt "$MIN_DISK_GB" ]; then + echo -e "${RED}❌ ABORT: Less than ${MIN_DISK_GB}GB free (${AVAILABLE_GB}GB available)${NC}" + echo -e "${YELLOW} Run: docker system prune -f && docker builder prune -f --keep-storage=5gb${NC}" + exit 1 +elif [ "$AVAILABLE_GB" -lt "$WARN_DISK_GB" ]; then + echo -e "${YELLOW}⚠️ WARNING: Only ${AVAILABLE_GB}GB free β€” consider pruning before building${NC}" + echo -e "${YELLOW} Run: docker image prune -f --filter until=48h${NC}" +else + echo -e "${GREEN}βœ… Disk OK${NC}" +fi + +# ── Docker Disk Check ──────────────────────────────────────── +echo "" +echo "🐳 Docker disk usage:" +docker system df --format " {{.Type}}: {{.Size}} (reclaimable: {{.Reclaimable}})" 2>/dev/null || echo " (docker system df unavailable)" + +# ── Memory Check ──────────────────────────────────────────── +FREE_MEM_MB=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo 2>/dev/null || echo "0") + +echo "" +if [ "$FREE_MEM_MB" -gt 0 ]; then + echo -e "🧠 Memory available: ${BOLD}${FREE_MEM_MB}MB${NC}" + if [ "$FREE_MEM_MB" -lt "$MIN_MEM_MB" ]; then + echo -e "${RED}❌ ABORT: Less than ${MIN_MEM_MB}MB RAM free (${FREE_MEM_MB}MB available)${NC}" + exit 1 + else + echo -e "${GREEN}βœ… Memory OK${NC}" + fi +fi + +# ── Running Container Count ────────────────────────────────── +RUNNING=$(docker ps -q | wc -l | tr -d ' ') +echo "" +echo -e "πŸ“¦ Running containers: ${BOLD}${RUNNING}${NC}" +if [ "$RUNNING" -gt 30 ]; then + echo -e "${YELLOW}⚠️ High container count β€” check if all are needed before building${NC}" +fi + +# ── Stale Image Warning ────────────────────────────────────── +DANGLING=$(docker images -f "dangling=true" -q | wc -l | tr -d ' ') +if [ "$DANGLING" -gt 5 ]; then + echo "" + echo -e "${YELLOW}⚠️ ${DANGLING} dangling images detected β€” run: docker image prune -f${NC}" +fi + +echo "" +echo -e "${GREEN}${BOLD}βœ… Pre-build checks passed β€” safe to build!${NC}" +echo "=======================================" +echo ""