diff --git a/README.md b/README.md index e1d1e618..1252cc28 100644 --- a/README.md +++ b/README.md @@ -28,20 +28,21 @@ Each skill is a self-contained module with its own model, parameters, and [communication protocol](docs/skill-development.md). See the [Skill Development Guide](docs/skill-development.md) and [Platform Parameters](docs/skill-params.md) to build your own. -| Category | Skill | What It Does | -|----------|-------|--------------| -| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | -| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | -| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | -| **Analysis** | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | -| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | -| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | -| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | -| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | -| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | -| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | -| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | -| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | +| Category | Skill | What It Does | Status | +|----------|-------|--------------|--------| +| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | 🧪 Testing | +| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 Planned | +| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 Planned | +| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ Ready | +| | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | 📐 Planned | +| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | 📐 Planned | +| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | 📐 Planned | +| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | 📐 Planned | +| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | 📐 Planned | +| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | 📐 Planned | +| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | 📐 Planned | +| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | 📐 Planned | +| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | 📐 Planned | > **Registry:** All skills are indexed in [`skills.json`](skills.json) for programmatic discovery. diff --git a/docs/detection-protocol.md b/docs/detection-protocol.md new file mode 100644 index 00000000..865c88b4 --- /dev/null +++ b/docs/detection-protocol.md @@ -0,0 +1,94 @@ +# Detection Skill Protocol + +Communication protocol for DeepCamera detection skills integrated with SharpAI Aegis. + +## Transport + +- **stdin** (Aegis → Skill): frame events and commands +- **stdout** (Skill → Aegis): detection results, ready/error events +- **stderr**: logging only — ignored by Aegis data parser + +Format: **JSON Lines** (one JSON object per line, newline-delimited). + +## Events + +### Ready (Skill → Aegis) + +Emitted after model loads successfully. `fps` reflects the skill's configured processing rate. `available_sizes` lists the model variants the skill supports. + +```jsonl +{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5, "available_sizes": ["nano", "small", "medium", "large"]} +``` + +### Frame (Aegis → Skill) + +Instruction to analyze a specific frame. `frame_id` is an incrementing integer used to correlate request/response. + +```jsonl +{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080} +``` + +### Detections (Skill → Aegis) + +Results of frame analysis. Must echo the same `frame_id` received in the frame event. + +```jsonl +{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [ + {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}, + {"class": "car", "confidence": 0.87, "bbox": [500, 200, 900, 500]} +]} +``` + +### Error (Skill → Aegis) + +Indicates a processing error. `retriable: true` means Aegis can send the next frame. + +```jsonl +{"event": "error", "frame_id": 42, "message": "Inference error: ...", "retriable": true} +``` + +### Stop (Aegis → Skill) + +Graceful shutdown command. + +```jsonl +{"command": "stop"} +``` + +## Data Formats + +### Bounding Boxes + +**Format**: `[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy). + +| Field | Type | Description | +|-------|------|-------------| +| `x_min` | int | Left edge (pixels) | +| `y_min` | int | Top edge (pixels) | +| `x_max` | int | Right edge (pixels) | +| `y_max` | int | Bottom edge (pixels) | + +Coordinates are in the original image space (not normalized). + +### Timestamps + +ISO 8601 format: `2026-03-01T14:30:00Z` + +### Frame Transfer + +Frames are written to `/tmp/aegis_detection/frame_{camera_id}.jpg` as JPEG files with recycled per-camera filenames (overwritten each cycle). The `frame_path` in the frame event is the absolute path to the JPEG file. + +## FPS Presets + +| Preset | FPS | Use Case | +|--------|-----|----------| +| Ultra Low | 0.2 | Battery saver | +| Low | 0.5 | Passive surveillance | +| Normal | 1 | Standard monitoring | +| Active | 3 | Active area monitoring | +| High | 5 | Security-critical zones | +| Real-time | 15 | Live tracking | + +## Backpressure + +The protocol is **request-response**: Aegis sends one frame, waits for the detection result, then sends the next. This provides natural backpressure — if the skill is slow, Aegis automatically drops frames (always uses the latest available frame). diff --git a/docs/legacy-applications.md b/docs/legacy-applications.md index 6db48503..cc0a82fd 100644 --- a/docs/legacy-applications.md +++ b/docs/legacy-applications.md @@ -7,7 +7,7 @@ ## Application 1: Self-supervised Person Recognition (REID) for Intruder Detection -SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector_cpu.py) +SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector.py) It leverages Yolov7 as person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as labeling data and training your own classifier. It also integrates with Home-Assistant to empower smart home with AI technology. diff --git a/docs/skill-development.md b/docs/skill-development.md index d9d0cddc..a3fb8563 100644 --- a/docs/skill-development.md +++ b/docs/skill-development.md @@ -11,7 +11,13 @@ A skill is a self-contained folder that provides an AI capability to [SharpAI Ae ``` skills/// ├── SKILL.md # Manifest + setup instructions -├── requirements.txt # Python dependencies +├── config.yaml # Configuration schema for Aegis UI +├── deploy.sh # Zero-assumption installer +├── requirements.txt # Default Python dependencies +├── requirements_cuda.txt # NVIDIA GPU dependencies +├── requirements_rocm.txt # AMD GPU dependencies +├── requirements_mps.txt # Apple Silicon dependencies +├── requirements_cpu.txt # CPU-only dependencies ├── scripts/ │ └── main.py # Entry point ├── assets/ @@ -68,6 +74,70 @@ LLM agent can read and execute. | `url` | URL input with validation | Server address | | `camera_select` | Camera picker | Target cameras | +## config.yaml — Configuration Schema + +Defines user-configurable options shown in the Aegis Skills UI. Parsed by `parseConfigYaml()`. + +```yaml +params: + - key: auto_start + label: Auto Start + type: boolean + default: false + description: "Start automatically on Aegis launch" + + - key: model_size + label: Model Size + type: select + default: nano + description: "Choose model variant" + options: + - { value: nano, label: "Nano (fastest)" } + - { value: small, label: "Small (balanced)" } + + - key: confidence + label: Confidence + type: number + default: 0.5 + description: "Min confidence (0.1–1.0)" +``` + +### Reserved Keys + +| Key | Type | Behavior | +|-----|------|----------| +| `auto_start` | boolean | Aegis auto-starts the skill on boot when `true` | + +## deploy.sh — Zero-Assumption Installer + +Bootstraps the environment from scratch. Must handle: + +1. **Find Python** — check system → conda → pyenv +2. **Create venv** — isolated `.venv/` inside skill directory +3. **Detect GPU** — CUDA → ROCm → MPS → CPU fallback +4. **Install deps** — from matching `requirements_.txt` +5. **Verify** — import test + +Emit JSONL progress for Aegis UI: +```bash +echo '{"event": "progress", "stage": "gpu", "backend": "mps"}' +echo '{"event": "complete", "backend": "mps", "message": "Installed!"}' +``` + +## Environment Variables + +Aegis injects these into every skill process: + +| Variable | Description | +|----------|-------------| +| `AEGIS_SKILL_ID` | Skill identifier | +| `AEGIS_SKILL_PARAMS` | JSON string of user config values | +| `AEGIS_GATEWAY_URL` | LLM gateway URL | +| `AEGIS_VLM_URL` | VLM server URL | +| `AEGIS_LLM_MODEL` | Active LLM model name | +| `AEGIS_VLM_MODEL` | Active VLM model name | +| `PYTHONUNBUFFERED` | Set to `1` for real-time output | + ## JSON Lines Protocol Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object. @@ -108,6 +178,36 @@ Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object. echo '{"event": "frame", "camera_id": "test", "frame_path": "/tmp/test.jpg"}' | python scripts/main.py ``` +## skills.json — Catalog Registration + +Register skills in the repo root `skills.json`: + +```json +{ + "skills": [ + { + "id": "my-skill", + "name": "My Skill", + "description": "What it does", + "category": "detection", + "tags": ["tag1"], + "path": "skills/detection/my-skill", + "status": "testing", + "platforms": ["darwin-arm64", "linux-x64"] + } + ] +} +``` + +### Status Values + +| Status | Emoji | Meaning | +|--------|-------|---------| +| `ready` | ✅ | Production-quality, tested | +| `testing` | 🧪 | Functional, needs validation | +| `experimental` | ⚗️ | Proof of concept | +| `planned` | 📐 | Not yet implemented | + ## Reference See [`skills/detection/yolo-detection-2026/`](../skills/detection/yolo-detection-2026/) for a complete working example. diff --git a/skills.json b/skills.json index 2438d590..50f50d66 100644 --- a/skills.json +++ b/skills.json @@ -48,6 +48,54 @@ "ui_unlocks": [ "benchmark_report" ] + }, + { + "id": "yolo-detection-2026", + "name": "YOLO 2026 Object Detection", + "description": "State-of-the-art real-time object detection — 80+ COCO classes, bounding box overlays, multi-size model selection.", + "version": "1.0.0", + "category": "detection", + "path": "skills/detection/yolo-detection-2026", + "tags": [ + "detection", + "yolo", + "object-detection", + "real-time", + "coco" + ], + "platforms": [ + "linux-x64", + "linux-arm64", + "darwin-arm64", + "darwin-x64", + "win-x64" + ], + "requirements": { + "python": ">=3.9", + "ram_gb": 2 + }, + "capabilities": [ + "live_detection", + "bbox_overlay" + ], + "ui_unlocks": [ + "detection_overlay", + "detection_results" + ], + "fps_presets": [ + 0.2, + 0.5, + 1, + 3, + 5, + 15 + ], + "model_sizes": [ + "nano", + "small", + "medium", + "large" + ] } ] } \ No newline at end of file diff --git a/skills/analysis/home-security-benchmark/SKILL.md b/skills/analysis/home-security-benchmark/SKILL.md index c5fba21c..f4fb0b68 100644 --- a/skills/analysis/home-security-benchmark/SKILL.md +++ b/skills/analysis/home-security-benchmark/SKILL.md @@ -5,7 +5,7 @@ version: 2.0.0 category: analysis runtime: node entry: scripts/run-benchmark.cjs -install: none +install: npm --- # Home Security AI Benchmark @@ -14,7 +14,7 @@ Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** acr ## Setup -**No installation required.** This skill has zero external dependencies — it uses only Node.js built-in modules. No `npm install` needed. +**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching. Entry script: `scripts/run-benchmark.cjs` @@ -53,7 +53,13 @@ node scripts/run-benchmark.cjs --no-open | Variable | Default | Description | |----------|---------|-------------| | `AEGIS_GATEWAY_URL` | `http://localhost:5407` | LLM gateway (OpenAI-compatible) | +| `AEGIS_LLM_URL` | — | Direct llama-server LLM endpoint | +| `AEGIS_LLM_API_TYPE` | `openai` | LLM provider type (builtin, openai, etc.) | +| `AEGIS_LLM_MODEL` | — | LLM model name | +| `AEGIS_LLM_API_KEY` | — | API key for cloud LLM providers | +| `AEGIS_LLM_BASE_URL` | — | Cloud provider base URL (e.g. `https://api.openai.com/v1`) | | `AEGIS_VLM_URL` | *(disabled)* | VLM server base URL | +| `AEGIS_VLM_MODEL` | — | Loaded VLM model ID | | `AEGIS_SKILL_ID` | — | Skill identifier (enables skill mode) | | `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config | @@ -129,5 +135,6 @@ Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cros ## Requirements - Node.js ≥ 18 -- Running LLM server (llama-cpp, vLLM, or any OpenAI-compatible API) +- `npm install` (for `openai` SDK dependency) +- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint) - Optional: Running VLM server for scene analysis tests (35 tests) diff --git a/skills/analysis/home-security-benchmark/package-lock.json b/skills/analysis/home-security-benchmark/package-lock.json new file mode 100644 index 00000000..9f787d7f --- /dev/null +++ b/skills/analysis/home-security-benchmark/package-lock.json @@ -0,0 +1,37 @@ +{ + "name": "home-security-benchmark", + "version": "1.0.0", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "name": "home-security-benchmark", + "version": "1.0.0", + "license": "ISC", + "dependencies": { + "openai": "^6.27.0" + } + }, + "node_modules/openai": { + "version": "6.27.0", + "resolved": "https://registry.npmjs.org/openai/-/openai-6.27.0.tgz", + "integrity": "sha512-osTKySlrdYrLYTt0zjhY8yp0JUBmWDCN+Q+QxsV4xMQnnoVFpylgKGgxwN8sSdTNw0G4y+WUXs4eCMWpyDNWZQ==", + "license": "Apache-2.0", + "bin": { + "openai": "bin/cli" + }, + "peerDependencies": { + "ws": "^8.18.0", + "zod": "^3.25 || ^4.0" + }, + "peerDependenciesMeta": { + "ws": { + "optional": true + }, + "zod": { + "optional": true + } + } + } + } +} diff --git a/skills/analysis/home-security-benchmark/package.json b/skills/analysis/home-security-benchmark/package.json new file mode 100644 index 00000000..b65304c4 --- /dev/null +++ b/skills/analysis/home-security-benchmark/package.json @@ -0,0 +1,16 @@ +{ + "name": "home-security-benchmark", + "version": "1.0.0", + "description": "", + "main": "index.js", + "scripts": { + "test": "echo \"Error: no test specified\" && exit 1" + }, + "keywords": [], + "author": "", + "license": "ISC", + "type": "commonjs", + "dependencies": { + "openai": "^6.27.0" + } +} diff --git a/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs b/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs index 6fd2b46c..b45306cd 100644 --- a/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs +++ b/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs @@ -80,14 +80,45 @@ try { skillParams = JSON.parse(process.env.AEGIS_SKILL_PARAMS || '{}'); } catch // Aegis provides config via env vars; CLI args are fallback for standalone const GATEWAY_URL = process.env.AEGIS_GATEWAY_URL || getArg('gateway', 'http://localhost:5407'); +const LLM_URL = process.env.AEGIS_LLM_URL || getArg('llm', ''); // Direct llama-server LLM port const VLM_URL = process.env.AEGIS_VLM_URL || getArg('vlm', ''); const RESULTS_DIR = getArg('out', path.join(os.homedir(), '.aegis-ai', 'benchmarks')); const IS_SKILL_MODE = !!process.env.AEGIS_SKILL_ID; const NO_OPEN = args.includes('--no-open') || skillParams.noOpen || false; const TEST_MODE = skillParams.mode || 'full'; -const TIMEOUT_MS = 30000; +const IDLE_TIMEOUT_MS = 30000; // Streaming idle timeout — resets on each received token const FIXTURES_DIR = path.join(__dirname, '..', 'fixtures'); +// API type and model info from Aegis (or defaults for standalone) +const LLM_API_TYPE = process.env.AEGIS_LLM_API_TYPE || 'openai'; +const LLM_MODEL = process.env.AEGIS_LLM_MODEL || ''; +const LLM_API_KEY = process.env.AEGIS_LLM_API_KEY || ''; +const LLM_BASE_URL = process.env.AEGIS_LLM_BASE_URL || ''; +const VLM_API_TYPE = process.env.AEGIS_VLM_API_TYPE || 'openai-compatible'; +const VLM_MODEL = process.env.AEGIS_VLM_MODEL || ''; + +// ─── OpenAI SDK Clients ────────────────────────────────────────────────────── +const OpenAI = require('openai'); + +// Resolve LLM base URL — priority: cloud provider → direct llama-server → gateway +const strip = (u) => u.replace(/\/v1\/?$/, ''); +const llmBaseUrl = LLM_BASE_URL + ? `${strip(LLM_BASE_URL)}/v1` + : LLM_URL + ? `${strip(LLM_URL)}/v1` + : `${GATEWAY_URL}/v1`; + +const llmClient = new OpenAI({ + apiKey: LLM_API_KEY || 'not-needed', // Local servers don't require auth + baseURL: llmBaseUrl, +}); + +// VLM client — always local llama-server +const vlmClient = VLM_URL ? new OpenAI({ + apiKey: 'not-needed', + baseURL: `${strip(VLM_URL)}/v1`, +}) : null; + // ─── Skill Protocol: JSON lines on stdout, human text on stderr ────────────── /** @@ -127,44 +158,95 @@ const results = { }; async function llmCall(messages, opts = {}) { - const body = { messages, stream: false }; - if (opts.maxTokens) body.max_tokens = opts.maxTokens; - if (opts.temperature !== undefined) body.temperature = opts.temperature; - if (opts.tools) body.tools = opts.tools; - - // Strip trailing /v1 from VLM_URL to avoid double-path (e.g. host:5405/v1/v1/...) - const vlmBase = VLM_URL ? VLM_URL.replace(/\/v1\/?$/, '') : ''; - const url = opts.vlm ? `${vlmBase}/v1/chat/completions` : `${GATEWAY_URL}/v1/chat/completions`; - const response = await fetch(url, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify(body), - signal: AbortSignal.timeout(opts.timeout || TIMEOUT_MS), - }); - - if (!response.ok) { - const errBody = await response.text().catch(() => ''); - throw new Error(`HTTP ${response.status}: ${errBody.slice(0, 200)}`); + // Select the appropriate OpenAI client (LLM or VLM) + const client = opts.vlm ? vlmClient : llmClient; + if (!client) { + throw new Error(opts.vlm ? 'VLM client not configured' : 'LLM client not configured'); } - const data = await response.json(); - const content = data.choices?.[0]?.message?.content || ''; - const toolCalls = data.choices?.[0]?.message?.tool_calls || null; - const usage = data.usage || {}; - - // Track token totals - results.tokenTotals.prompt += usage.prompt_tokens || 0; - results.tokenTotals.completion += usage.completion_tokens || 0; - results.tokenTotals.total += usage.total_tokens || 0; - - // Capture model name from first response - if (opts.vlm) { - if (!results.model.vlm && data.model) results.model.vlm = data.model; - } else { - if (!results.model.name && data.model) results.model.name = data.model; + const model = opts.model || (opts.vlm ? VLM_MODEL : LLM_MODEL) || undefined; + + // Build request params + const params = { + messages, + stream: true, + ...(model && { model }), + ...(opts.temperature !== undefined && { temperature: opts.temperature }), + ...(opts.maxTokens && { max_completion_tokens: opts.maxTokens }), + ...(opts.tools && { tools: opts.tools }), + }; + + // Use an AbortController with idle timeout that resets on each streamed chunk. + const controller = new AbortController(); + const idleMs = opts.timeout || IDLE_TIMEOUT_MS; + let idleTimer = setTimeout(() => controller.abort(), idleMs); + const resetIdle = () => { clearTimeout(idleTimer); idleTimer = setTimeout(() => controller.abort(), idleMs); }; + + try { + const stream = await client.chat.completions.create(params, { + signal: controller.signal, + }); + + let content = ''; + let reasoningContent = ''; + let toolCalls = null; + let model = ''; + let usage = {}; + let tokenCount = 0; + + for await (const chunk of stream) { + resetIdle(); + + if (chunk.model) model = chunk.model; + + const delta = chunk.choices?.[0]?.delta; + if (delta?.content) content += delta.content; + if (delta?.reasoning_content) reasoningContent += delta.reasoning_content; + if (delta?.content || delta?.reasoning_content) { + tokenCount++; + if (tokenCount % 100 === 0) { + log(` … ${tokenCount} tokens received`); + } + } + + if (delta?.tool_calls) { + if (!toolCalls) toolCalls = []; + for (const tc of delta.tool_calls) { + const idx = tc.index ?? 0; + if (!toolCalls[idx]) { + toolCalls[idx] = { id: tc.id, type: tc.type || 'function', function: { name: '', arguments: '' } }; + } + if (tc.function?.name) toolCalls[idx].function.name += tc.function.name; + if (tc.function?.arguments) toolCalls[idx].function.arguments += tc.function.arguments; + } + } + + if (chunk.usage) usage = chunk.usage; + } + + // If the model only produced reasoning_content (thinking) with no content, + // use the reasoning output as the response content for evaluation purposes. + if (!content && reasoningContent) { + content = reasoningContent; + } + + // Track token totals + results.tokenTotals.prompt += usage.prompt_tokens || 0; + results.tokenTotals.completion += usage.completion_tokens || 0; + results.tokenTotals.total += usage.total_tokens || 0; + + // Capture model name from first response + if (opts.vlm) { + if (!results.model.vlm && model) results.model.vlm = model; + } else { + if (!results.model.name && model) results.model.name = model; + } + + return { content, toolCalls, usage, model }; + } finally { + clearTimeout(idleTimer); } - return { content, toolCalls, usage, model: data.model }; } function stripThink(text) { @@ -1675,28 +1757,33 @@ async function main() { log('╔══════════════════════════════════════════════════════════════════╗'); log('║ Home Security AI Benchmark Suite • DeepCamera / SharpAI ║'); log('╚══════════════════════════════════════════════════════════════════╝'); - log(` Gateway: ${GATEWAY_URL}`); - log(` VLM: ${VLM_URL || '(disabled — use --vlm URL to enable)'}`); + // Resolve the LLM endpoint that will actually be used + const effectiveLlmUrl = LLM_BASE_URL + ? LLM_BASE_URL.replace(/\/v1\/?$/, '') + : LLM_URL + ? LLM_URL.replace(/\/v1\/?$/, '') + : GATEWAY_URL; + + log(` LLM: ${LLM_API_TYPE} @ ${effectiveLlmUrl}${LLM_MODEL ? ' → ' + LLM_MODEL : ''}`); + log(` VLM: ${VLM_URL || '(disabled — use --vlm URL to enable)'}${VLM_MODEL ? ' → ' + VLM_MODEL : ''}`); log(` Results: ${RESULTS_DIR}`); - log(` Mode: ${IS_SKILL_MODE ? 'Aegis Skill' : 'Standalone'}`); + log(` Mode: ${IS_SKILL_MODE ? 'Aegis Skill' : 'Standalone'} (streaming, ${IDLE_TIMEOUT_MS / 1000}s idle timeout)`); log(` Time: ${new Date().toLocaleString()}`); - // Healthcheck + // Healthcheck — ping the LLM endpoint via SDK try { - const ping = await fetch(`${GATEWAY_URL}/v1/chat/completions`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ messages: [{ role: 'user', content: 'ping' }], stream: false, max_tokens: 1 }), - signal: AbortSignal.timeout(15000), + const ping = await llmClient.chat.completions.create({ + ...(LLM_MODEL && { model: LLM_MODEL }), + messages: [{ role: 'user', content: 'ping' }], + max_completion_tokens: 5, }); - if (!ping.ok) throw new Error(`HTTP ${ping.status}`); - const data = await ping.json(); - results.model.name = data.model || 'unknown'; + results.model.name = ping.model || 'unknown'; log(` Model: ${results.model.name}`); } catch (err) { - log(`\n ❌ Cannot reach LLM gateway: ${err.message}`); - log(' Start the llama-cpp server and gateway, then re-run.\n'); - emit({ event: 'error', message: `Cannot reach LLM gateway: ${err.message}` }); + log(`\n ❌ Cannot reach LLM endpoint: ${err.message}`); + log(` Base URL: ${llmBaseUrl}`); + log(' Check that the LLM server is running.\n'); + emit({ event: 'error', message: `Cannot reach LLM endpoint: ${err.message}` }); process.exit(1); } diff --git a/skills/detection/yolo-detection-2026/SKILL.md b/skills/detection/yolo-detection-2026/SKILL.md index 1cdf6ed5..60677f93 100644 --- a/skills/detection/yolo-detection-2026/SKILL.md +++ b/skills/detection/yolo-detection-2026/SKILL.md @@ -1,15 +1,17 @@ --- name: yolo-detection-2026 -description: "State-of-the-art real-time object detection using YOLO" +description: "YOLO 2026 — state-of-the-art real-time object detection" version: 1.0.0 icon: assets/icon.png +entry: scripts/detect.py parameters: - - name: model - label: "Model" + - name: model_size + label: "Model Size" type: select - options: ["yolov11n", "yolov11s", "yolov11m", "yolov10n", "yolov10s", "yolov8n"] - default: "yolov11n" + options: ["nano", "small", "medium", "large"] + default: "nano" + description: "Larger models are more accurate but slower" group: Model - name: confidence @@ -29,18 +31,18 @@ parameters: - name: fps label: "Processing FPS" - type: number - min: 1 - max: 30 + type: select + options: [0.2, 0.5, 1, 3, 5, 15] default: 5 + description: "Frames per second — higher = more CPU/GPU usage" group: Performance - name: device label: "Inference Device" type: select - options: ["auto", "cpu", "cuda", "mps"] + options: ["auto", "cpu", "cuda", "mps", "rocm"] default: "auto" - description: "auto = GPU if available, else CPU" + description: "auto = best available GPU, else CPU" group: Performance capabilities: @@ -49,78 +51,59 @@ capabilities: description: "Real-time object detection on live camera frames" --- -# YOLO Object Detection (2026) - -Real-time object detection using state-of-the-art YOLO models. Detects 80+ COCO object classes including people, vehicles, animals, and everyday objects. Outputs bounding boxes with labels and confidence scores that SharpAI Aegis renders as overlays on the live camera feed. - -## What You Get - -When installed in SharpAI Aegis, this skill unlocks: -- **Live detection overlays** on camera feeds — bounding boxes around detected objects -- **Smart alert triggers** — configure alerts when specific objects are detected -- **Detection history** — searchable log of all detections - -## Models - -| Model | Size | Speed (FPS) | Accuracy (mAP) | Best For | -|-------|------|-------------|-----------------|----------| -| YOLOv11n | 6 MB | 30+ | 39.5 | Real-time on CPU | -| YOLOv11s | 22 MB | 20+ | 47.0 | Balanced | -| YOLOv11m | 68 MB | 12+ | 51.5 | High accuracy | -| YOLOv10n | 7 MB | 28+ | 38.5 | Ultra-fast | -| YOLOv10s | 24 MB | 18+ | 46.3 | Balanced (v10) | -| YOLOv8n | 6 MB | 30+ | 37.3 | Legacy compatible | +# YOLO 2026 Object Detection -## Setup +Real-time object detection using the latest YOLO 2026 models. Detects 80+ COCO object classes including people, vehicles, animals, and everyday objects. Outputs bounding boxes with labels and confidence scores. -1. Create a Python virtual environment: - ```bash - python3 -m venv .venv && source .venv/bin/activate - ``` +## Model Sizes -2. Install dependencies: - ```bash - pip install -r requirements.txt - ``` - -3. Download model weights (automatic on first run, or manually): - ```bash - python scripts/download_models.py --model yolov11n - ``` +| Size | Speed | Accuracy | Best For | +|------|-------|----------|----------| +| nano | Fastest | Good | Real-time on CPU, edge devices | +| small | Fast | Better | Balanced speed/accuracy | +| medium | Moderate | High | Accuracy-focused deployments | +| large | Slower | Highest | Maximum detection quality | ## Protocol -This skill communicates with SharpAI Aegis via **JSON lines** over stdin/stdout. - -### Aegis → Skill (stdin): frames to process +Communicates via **JSON lines** over stdin/stdout. +### Aegis → Skill (stdin) ```jsonl -{"event": "frame", "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/frame_001.jpg", "width": 1920, "height": 1080} +{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080} ``` -### Skill → Aegis (stdout): detection results - +### Skill → Aegis (stdout) ```jsonl -{"event": "ready", "model": "yolov11n", "device": "mps", "classes": 80} -{"event": "detections", "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [ - {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}, - {"class": "car", "confidence": 0.87, "bbox": [500, 200, 900, 500]} +{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5} +{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "objects": [ + {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]} ]} +{"event": "error", "message": "...", "retriable": true} ``` ### Bounding Box Format +`[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy). -`[x_min, y_min, x_max, y_max]` in pixel coordinates. +### Stop Command +```jsonl +{"command": "stop"} +``` + +## Hardware Support -## Hardware Requirements +| Platform | Backend | Performance | +|----------|---------|-------------| +| Apple Silicon (M1+) | MPS | 20-30 FPS | +| NVIDIA GPU | CUDA | 25-60 FPS | +| AMD GPU | ROCm | 15-40 FPS | +| CPU (modern x86) | CPU | 5-15 FPS | +| Raspberry Pi 5 | CPU | 2-5 FPS | -| Device | Performance | -|--------|------------| -| Apple Silicon (M1+) | 20-30 FPS with MPS acceleration | -| NVIDIA GPU | 25-60 FPS with CUDA | -| CPU (modern x86) | 5-15 FPS | -| Raspberry Pi 5 | 2-5 FPS | +## Installation -## Contributing +The `deploy.sh` bootstrapper handles everything — Python environment, GPU backend detection, and dependency installation. No manual setup required. -This skill is part of the [DeepCamera](https://github.com/SharpAI/DeepCamera) open-source project. Contributions welcome — see [Contributions.md](../../Contributions.md). +```bash +./deploy.sh +``` diff --git a/skills/detection/yolo-detection-2026/config.yaml b/skills/detection/yolo-detection-2026/config.yaml new file mode 100644 index 00000000..d37254b2 --- /dev/null +++ b/skills/detection/yolo-detection-2026/config.yaml @@ -0,0 +1,58 @@ +# YOLO 2026 Detection Skill — Configuration Schema +# Parsed by Aegis skill-registry-service.cjs → parseConfigYaml() +# Format: params[] with key, type, label, default, description, options + +params: + - key: auto_start + label: Auto Start + type: boolean + default: false + description: "Start this skill automatically when Aegis launches" + + - key: model_size + label: Model Size + type: select + default: nano + description: "YOLO26 model variant — larger = more accurate but slower" + options: + - { value: nano, label: "Nano (fastest, ~2ms)" } + - { value: small, label: "Small (balanced, ~5ms)" } + - { value: medium, label: "Medium (accurate, ~12ms)" } + - { value: large, label: "Large (most accurate, ~25ms)" } + + - key: confidence + label: Confidence Threshold + type: number + default: 0.5 + description: "Minimum detection confidence (0.1–1.0)" + + - key: fps + label: Frame Rate + type: select + default: 5 + description: "Detection processing rate — higher = more CPU/GPU usage" + options: + - { value: 0.2, label: "Ultra Low (0.2 FPS)" } + - { value: 0.5, label: "Low (0.5 FPS)" } + - { value: 1, label: "Normal (1 FPS)" } + - { value: 3, label: "Active (3 FPS)" } + - { value: 5, label: "High (5 FPS)" } + - { value: 15, label: "Real-time (15 FPS)" } + + - key: classes + label: Detection Classes + type: string + default: "person,car,dog,cat" + description: "Comma-separated COCO class names to detect" + + - key: device + label: Inference Device + type: select + default: auto + description: "Compute backend for inference" + options: + - { value: auto, label: "Auto-detect" } + - { value: cpu, label: "CPU" } + - { value: cuda, label: "NVIDIA CUDA" } + - { value: mps, label: "Apple Silicon (MPS)" } + - { value: rocm, label: "AMD ROCm" } diff --git a/skills/detection/yolo-detection-2026/deploy.sh b/skills/detection/yolo-detection-2026/deploy.sh new file mode 100755 index 00000000..9ba2bc61 --- /dev/null +++ b/skills/detection/yolo-detection-2026/deploy.sh @@ -0,0 +1,158 @@ +#!/usr/bin/env bash +# deploy.sh — Zero-assumption bootstrapper for YOLO 2026 Detection Skill +# +# Probes the system for Python, GPU backends, and installs the minimum +# viable stack. Called by Aegis skill-runtime-manager during installation. +# +# Exit codes: +# 0 = success +# 1 = fatal error (no Python found and cannot install) +# 2 = partial success (CPU-only fallback) + +set -euo pipefail + +SKILL_DIR="$(cd "$(dirname "$0")" && pwd)" +VENV_DIR="$SKILL_DIR/.venv" +LOG_PREFIX="[YOLO-2026-deploy]" + +log() { echo "$LOG_PREFIX $*" >&2; } +emit() { echo "$1"; } # JSON to stdout for Aegis to parse + +# ─── Step 1: Find or install Python ───────────────────────────────────────── + +find_python() { + # Check common Python 3 locations + for cmd in python3.12 python3.11 python3.10 python3.9 python3; do + if command -v "$cmd" &>/dev/null; then + local ver + ver="$("$cmd" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+')" + local major minor + major=$(echo "$ver" | cut -d. -f1) + minor=$(echo "$ver" | cut -d. -f2) + if [ "$major" -ge 3 ] && [ "$minor" -ge 9 ]; then + echo "$cmd" + return 0 + fi + fi + done + + # Check conda + if command -v conda &>/dev/null; then + log "No system Python >=3.9 found, but conda is available" + log "Creating conda environment..." + conda create -n aegis-yolo2026 python=3.11 -y >/dev/null 2>&1 + # shellcheck disable=SC1091 + eval "$(conda shell.bash hook 2>/dev/null)" + conda activate aegis-yolo2026 + echo "python3" + return 0 + fi + + # Check pyenv + if command -v pyenv &>/dev/null; then + log "No system Python >=3.9 found, using pyenv..." + pyenv install -s 3.11.9 + pyenv local 3.11.9 + echo "$(pyenv which python3)" + return 0 + fi + + return 1 +} + +PYTHON_CMD=$(find_python) || { + log "ERROR: No Python >=3.9 found. Install Python 3.9+ and retry." + emit '{"event": "error", "stage": "python", "message": "No Python >=3.9 found"}' + exit 1 +} + +log "Using Python: $PYTHON_CMD ($($PYTHON_CMD --version 2>&1))" +emit "{\"event\": \"progress\", \"stage\": \"python\", \"message\": \"Found $($PYTHON_CMD --version 2>&1)\"}" + +# ─── Step 2: Create virtual environment ───────────────────────────────────── + +if [ ! -d "$VENV_DIR" ]; then + log "Creating virtual environment..." + "$PYTHON_CMD" -m venv "$VENV_DIR" +fi + +# Activate venv +# shellcheck disable=SC1091 +source "$VENV_DIR/bin/activate" +PIP="$VENV_DIR/bin/pip" + +# Upgrade pip +"$PIP" install --upgrade pip -q 2>/dev/null || true + +emit '{"event": "progress", "stage": "venv", "message": "Virtual environment ready"}' + +# ─── Step 3: Detect compute backend ───────────────────────────────────────── + +BACKEND="cpu" + +detect_gpu() { + # NVIDIA CUDA + if command -v nvidia-smi &>/dev/null; then + local cuda_ver + cuda_ver=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null | head -1) + if [ -n "$cuda_ver" ]; then + BACKEND="cuda" + log "Detected NVIDIA GPU (driver: $cuda_ver)" + return 0 + fi + fi + + # AMD ROCm + if command -v rocm-smi &>/dev/null || [ -d "/opt/rocm" ]; then + BACKEND="rocm" + log "Detected AMD ROCm" + return 0 + fi + + # Apple Silicon MPS + if [ "$(uname)" = "Darwin" ]; then + local arch + arch=$(uname -m) + if [ "$arch" = "arm64" ]; then + BACKEND="mps" + log "Detected Apple Silicon (MPS)" + return 0 + fi + fi + + log "No GPU detected, using CPU backend" + return 0 +} + +detect_gpu +emit "{\"event\": \"progress\", \"stage\": \"gpu\", \"backend\": \"$BACKEND\", \"message\": \"Compute backend: $BACKEND\"}" + +# ─── Step 4: Install requirements ──────────────────────────────────────────── + +REQ_FILE="$SKILL_DIR/requirements_${BACKEND}.txt" + +if [ ! -f "$REQ_FILE" ]; then + log "WARNING: $REQ_FILE not found, falling back to CPU" + REQ_FILE="$SKILL_DIR/requirements_cpu.txt" + BACKEND="cpu" +fi + +log "Installing dependencies from $REQ_FILE ..." +emit "{\"event\": \"progress\", \"stage\": \"install\", \"message\": \"Installing $BACKEND dependencies...\"}" + +"$PIP" install -r "$REQ_FILE" -q 2>&1 | tail -5 >&2 + +# ─── Step 5: Verify installation ──────────────────────────────────────────── + +log "Verifying installation..." +"$VENV_DIR/bin/python" -c " +from ultralytics import YOLO +import torch +device = 'cpu' +if torch.cuda.is_available(): device = 'cuda' +elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available(): device = 'mps' +print(f'OK: ultralytics loaded, torch device={device}') +" 2>&1 | while read -r line; do log "$line"; done + +emit "{\"event\": \"complete\", \"backend\": \"$BACKEND\", \"message\": \"YOLO 2026 skill installed ($BACKEND backend)\"}" +log "Done! Backend: $BACKEND" diff --git a/skills/detection/yolo-detection-2026/requirements_cpu.txt b/skills/detection/yolo-detection-2026/requirements_cpu.txt new file mode 100644 index 00000000..cdb172fc --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_cpu.txt @@ -0,0 +1,9 @@ +# YOLO 2026 — CPU-only requirements +# Smallest install — no GPU acceleration +--extra-index-url https://download.pytorch.org/whl/cpu +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/requirements_cuda.txt b/skills/detection/yolo-detection-2026/requirements_cuda.txt new file mode 100644 index 00000000..0240bd7b --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_cuda.txt @@ -0,0 +1,9 @@ +# YOLO 2026 — CUDA (NVIDIA GPU) requirements +# Installs PyTorch with CUDA 12.4 support +--extra-index-url https://download.pytorch.org/whl/cu124 +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/requirements_mps.txt b/skills/detection/yolo-detection-2026/requirements_mps.txt new file mode 100644 index 00000000..5498200a --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_mps.txt @@ -0,0 +1,8 @@ +# YOLO 2026 — MPS (Apple Silicon) requirements +# Standard PyTorch — MPS backend is included by default on macOS +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/requirements_rocm.txt b/skills/detection/yolo-detection-2026/requirements_rocm.txt new file mode 100644 index 00000000..e665dff0 --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_rocm.txt @@ -0,0 +1,9 @@ +# YOLO 2026 — ROCm (AMD GPU) requirements +# Installs PyTorch with ROCm 6.2 support +--extra-index-url https://download.pytorch.org/whl/rocm6.2 +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/scripts/detect.py b/skills/detection/yolo-detection-2026/scripts/detect.py index c6de996c..903a4348 100644 --- a/skills/detection/yolo-detection-2026/scripts/detect.py +++ b/skills/detection/yolo-detection-2026/scripts/detect.py @@ -1,14 +1,14 @@ #!/usr/bin/env python3 """ -YOLO Detection Skill — Real-time object detection for SharpAI Aegis. +YOLO 2026 Detection Skill — Real-time object detection for SharpAI Aegis. Communicates via JSON lines over stdin/stdout: - stdin: {"event": "frame", "camera_id": "...", "frame_path": "...", ...} - stdout: {"event": "detections", "camera_id": "...", "objects": [...]} + stdin: {"event": "frame", "frame_id": N, "camera_id": "...", "frame_path": "...", ...} + stdout: {"event": "detections", "frame_id": N, "camera_id": "...", "objects": [...]} Usage: python detect.py --config config.json - python detect.py --model yolov11n --confidence 0.5 --device auto + python detect.py --model-size nano --confidence 0.5 --device auto """ import sys @@ -17,27 +17,51 @@ import signal from pathlib import Path + +# Model size → ultralytics model name mapping (YOLO26, released Jan 2026) +MODEL_SIZE_MAP = { + "nano": "yolo26n", + "small": "yolo26s", + "medium": "yolo26m", + "large": "yolo26l", +} + + def parse_args(): - parser = argparse.ArgumentParser(description="YOLO Detection Skill") + parser = argparse.ArgumentParser(description="YOLO 2026 Detection Skill") parser.add_argument("--config", type=str, help="Path to config JSON file") - parser.add_argument("--model", type=str, default="yolov11n", - choices=["yolov11n", "yolov11s", "yolov11m", "yolov10n", "yolov10s", "yolov8n"]) + parser.add_argument("--model-size", type=str, default="nano", + choices=["nano", "small", "medium", "large"]) parser.add_argument("--confidence", type=float, default=0.5) parser.add_argument("--classes", type=str, default="person,car,dog,cat") - parser.add_argument("--device", type=str, default="auto", choices=["auto", "cpu", "cuda", "mps"]) - parser.add_argument("--fps", type=int, default=5) + parser.add_argument("--device", type=str, default="auto", + choices=["auto", "cpu", "cuda", "mps", "rocm"]) + parser.add_argument("--fps", type=float, default=5) return parser.parse_args() def load_config(args): - """Load config from JSON file or CLI args.""" + """Load config from JSON file, CLI args, or AEGIS_SKILL_PARAMS env var.""" + import os + + # Priority 1: AEGIS_SKILL_PARAMS env var (set by Aegis skill-runtime-manager) + env_params = os.environ.get("AEGIS_SKILL_PARAMS") + if env_params: + try: + return json.loads(env_params) + except json.JSONDecodeError: + pass + + # Priority 2: Config file if args.config: config_path = Path(args.config) if config_path.exists(): with open(config_path) as f: return json.load(f) + + # Priority 3: CLI args return { - "model": args.model, + "model_size": args.model_size, "confidence": args.confidence, "classes": args.classes.split(","), "device": args.device, @@ -47,7 +71,7 @@ def load_config(args): def select_device(preference: str) -> str: """Select the best available inference device.""" - if preference != "auto": + if preference not in ("auto", ""): return preference try: import torch @@ -55,6 +79,7 @@ def select_device(preference: str) -> str: return "cuda" if hasattr(torch.backends, "mps") and torch.backends.mps.is_available(): return "mps" + # ROCm exposes as CUDA in PyTorch with ROCm builds except ImportError: pass return "cpu" @@ -69,11 +94,18 @@ def main(): args = parse_args() config = load_config(args) - # Select device + # Resolve config values + model_size = config.get("model_size", "nano") device = select_device(config.get("device", "auto")) - model_name = config.get("model", "yolov11n") confidence = config.get("confidence", 0.5) + fps = config.get("fps", 5) + + # Map size to ultralytics model name + model_name = MODEL_SIZE_MAP.get(model_size, "yolo11n") + target_classes = config.get("classes", ["person", "car", "dog", "cat"]) + if isinstance(target_classes, str): + target_classes = [c.strip() for c in target_classes.split(",")] # Load YOLO model try: @@ -82,9 +114,12 @@ def main(): model.to(device) emit({ "event": "ready", - "model": model_name, + "model": f"yolo2026{model_size[0]}", + "model_size": model_size, "device": device, "classes": len(model.names), + "fps": fps, + "available_sizes": list(MODEL_SIZE_MAP.keys()), }) except Exception as e: emit({"event": "error", "message": f"Failed to load model: {e}", "retriable": False}) @@ -117,11 +152,17 @@ def handle_signal(signum, frame): if msg.get("event") == "frame": frame_path = msg.get("frame_path") + frame_id = msg.get("frame_id") camera_id = msg.get("camera_id", "unknown") timestamp = msg.get("timestamp", "") if not frame_path or not Path(frame_path).exists(): - emit({"event": "error", "message": f"Frame not found: {frame_path}", "retriable": True}) + emit({ + "event": "error", + "frame_id": frame_id, + "message": f"Frame not found: {frame_path}", + "retriable": True, + }) continue # Run inference @@ -142,12 +183,18 @@ def handle_signal(signum, frame): emit({ "event": "detections", + "frame_id": frame_id, "camera_id": camera_id, "timestamp": timestamp, "objects": objects, }) except Exception as e: - emit({"event": "error", "message": f"Inference error: {e}", "retriable": True}) + emit({ + "event": "error", + "frame_id": frame_id, + "message": f"Inference error: {e}", + "retriable": True, + }) if __name__ == "__main__":