diff --git a/README.md b/README.md index e1d1e618..1252cc28 100644 --- a/README.md +++ b/README.md @@ -28,20 +28,21 @@ Each skill is a self-contained module with its own model, parameters, and [communication protocol](docs/skill-development.md). See the [Skill Development Guide](docs/skill-development.md) and [Platform Parameters](docs/skill-params.md) to build your own. -| Category | Skill | What It Does | -|----------|-------|--------------| -| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | -| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | -| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | -| **Analysis** | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | -| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | -| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | -| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | -| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | -| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | -| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | -| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | -| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | +| Category | Skill | What It Does | Status | +|----------|-------|--------------|--------| +| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | 🧪 Testing | +| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 Planned | +| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 Planned | +| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ Ready | +| | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | 📐 Planned | +| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | 📐 Planned | +| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | 📐 Planned | +| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | 📐 Planned | +| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | 📐 Planned | +| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | 📐 Planned | +| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | 📐 Planned | +| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | 📐 Planned | +| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | 📐 Planned | > **Registry:** All skills are indexed in [`skills.json`](skills.json) for programmatic discovery. diff --git a/docs/detection-protocol.md b/docs/detection-protocol.md new file mode 100644 index 00000000..865c88b4 --- /dev/null +++ b/docs/detection-protocol.md @@ -0,0 +1,94 @@ +# Detection Skill Protocol + +Communication protocol for DeepCamera detection skills integrated with SharpAI Aegis. + +## Transport + +- **stdin** (Aegis → Skill): frame events and commands +- **stdout** (Skill → Aegis): detection results, ready/error events +- **stderr**: logging only — ignored by Aegis data parser + +Format: **JSON Lines** (one JSON object per line, newline-delimited). + +## Events + +### Ready (Skill → Aegis) + +Emitted after model loads successfully. `fps` reflects the skill's configured processing rate. `available_sizes` lists the model variants the skill supports. + +```jsonl +{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5, "available_sizes": ["nano", "small", "medium", "large"]} +``` + +### Frame (Aegis → Skill) + +Instruction to analyze a specific frame. `frame_id` is an incrementing integer used to correlate request/response. + +```jsonl +{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080} +``` + +### Detections (Skill → Aegis) + +Results of frame analysis. Must echo the same `frame_id` received in the frame event. + +```jsonl +{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [ + {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}, + {"class": "car", "confidence": 0.87, "bbox": [500, 200, 900, 500]} +]} +``` + +### Error (Skill → Aegis) + +Indicates a processing error. `retriable: true` means Aegis can send the next frame. + +```jsonl +{"event": "error", "frame_id": 42, "message": "Inference error: ...", "retriable": true} +``` + +### Stop (Aegis → Skill) + +Graceful shutdown command. + +```jsonl +{"command": "stop"} +``` + +## Data Formats + +### Bounding Boxes + +**Format**: `[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy). + +| Field | Type | Description | +|-------|------|-------------| +| `x_min` | int | Left edge (pixels) | +| `y_min` | int | Top edge (pixels) | +| `x_max` | int | Right edge (pixels) | +| `y_max` | int | Bottom edge (pixels) | + +Coordinates are in the original image space (not normalized). + +### Timestamps + +ISO 8601 format: `2026-03-01T14:30:00Z` + +### Frame Transfer + +Frames are written to `/tmp/aegis_detection/frame_{camera_id}.jpg` as JPEG files with recycled per-camera filenames (overwritten each cycle). The `frame_path` in the frame event is the absolute path to the JPEG file. + +## FPS Presets + +| Preset | FPS | Use Case | +|--------|-----|----------| +| Ultra Low | 0.2 | Battery saver | +| Low | 0.5 | Passive surveillance | +| Normal | 1 | Standard monitoring | +| Active | 3 | Active area monitoring | +| High | 5 | Security-critical zones | +| Real-time | 15 | Live tracking | + +## Backpressure + +The protocol is **request-response**: Aegis sends one frame, waits for the detection result, then sends the next. This provides natural backpressure — if the skill is slow, Aegis automatically drops frames (always uses the latest available frame). diff --git a/docs/skill-development.md b/docs/skill-development.md index d9d0cddc..a3fb8563 100644 --- a/docs/skill-development.md +++ b/docs/skill-development.md @@ -11,7 +11,13 @@ A skill is a self-contained folder that provides an AI capability to [SharpAI Ae ``` skills/// ├── SKILL.md # Manifest + setup instructions -├── requirements.txt # Python dependencies +├── config.yaml # Configuration schema for Aegis UI +├── deploy.sh # Zero-assumption installer +├── requirements.txt # Default Python dependencies +├── requirements_cuda.txt # NVIDIA GPU dependencies +├── requirements_rocm.txt # AMD GPU dependencies +├── requirements_mps.txt # Apple Silicon dependencies +├── requirements_cpu.txt # CPU-only dependencies ├── scripts/ │ └── main.py # Entry point ├── assets/ @@ -68,6 +74,70 @@ LLM agent can read and execute. | `url` | URL input with validation | Server address | | `camera_select` | Camera picker | Target cameras | +## config.yaml — Configuration Schema + +Defines user-configurable options shown in the Aegis Skills UI. Parsed by `parseConfigYaml()`. + +```yaml +params: + - key: auto_start + label: Auto Start + type: boolean + default: false + description: "Start automatically on Aegis launch" + + - key: model_size + label: Model Size + type: select + default: nano + description: "Choose model variant" + options: + - { value: nano, label: "Nano (fastest)" } + - { value: small, label: "Small (balanced)" } + + - key: confidence + label: Confidence + type: number + default: 0.5 + description: "Min confidence (0.1–1.0)" +``` + +### Reserved Keys + +| Key | Type | Behavior | +|-----|------|----------| +| `auto_start` | boolean | Aegis auto-starts the skill on boot when `true` | + +## deploy.sh — Zero-Assumption Installer + +Bootstraps the environment from scratch. Must handle: + +1. **Find Python** — check system → conda → pyenv +2. **Create venv** — isolated `.venv/` inside skill directory +3. **Detect GPU** — CUDA → ROCm → MPS → CPU fallback +4. **Install deps** — from matching `requirements_.txt` +5. **Verify** — import test + +Emit JSONL progress for Aegis UI: +```bash +echo '{"event": "progress", "stage": "gpu", "backend": "mps"}' +echo '{"event": "complete", "backend": "mps", "message": "Installed!"}' +``` + +## Environment Variables + +Aegis injects these into every skill process: + +| Variable | Description | +|----------|-------------| +| `AEGIS_SKILL_ID` | Skill identifier | +| `AEGIS_SKILL_PARAMS` | JSON string of user config values | +| `AEGIS_GATEWAY_URL` | LLM gateway URL | +| `AEGIS_VLM_URL` | VLM server URL | +| `AEGIS_LLM_MODEL` | Active LLM model name | +| `AEGIS_VLM_MODEL` | Active VLM model name | +| `PYTHONUNBUFFERED` | Set to `1` for real-time output | + ## JSON Lines Protocol Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object. @@ -108,6 +178,36 @@ Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object. echo '{"event": "frame", "camera_id": "test", "frame_path": "/tmp/test.jpg"}' | python scripts/main.py ``` +## skills.json — Catalog Registration + +Register skills in the repo root `skills.json`: + +```json +{ + "skills": [ + { + "id": "my-skill", + "name": "My Skill", + "description": "What it does", + "category": "detection", + "tags": ["tag1"], + "path": "skills/detection/my-skill", + "status": "testing", + "platforms": ["darwin-arm64", "linux-x64"] + } + ] +} +``` + +### Status Values + +| Status | Emoji | Meaning | +|--------|-------|---------| +| `ready` | ✅ | Production-quality, tested | +| `testing` | 🧪 | Functional, needs validation | +| `experimental` | ⚗️ | Proof of concept | +| `planned` | 📐 | Not yet implemented | + ## Reference See [`skills/detection/yolo-detection-2026/`](../skills/detection/yolo-detection-2026/) for a complete working example. diff --git a/skills.json b/skills.json index 2438d590..50f50d66 100644 --- a/skills.json +++ b/skills.json @@ -48,6 +48,54 @@ "ui_unlocks": [ "benchmark_report" ] + }, + { + "id": "yolo-detection-2026", + "name": "YOLO 2026 Object Detection", + "description": "State-of-the-art real-time object detection — 80+ COCO classes, bounding box overlays, multi-size model selection.", + "version": "1.0.0", + "category": "detection", + "path": "skills/detection/yolo-detection-2026", + "tags": [ + "detection", + "yolo", + "object-detection", + "real-time", + "coco" + ], + "platforms": [ + "linux-x64", + "linux-arm64", + "darwin-arm64", + "darwin-x64", + "win-x64" + ], + "requirements": { + "python": ">=3.9", + "ram_gb": 2 + }, + "capabilities": [ + "live_detection", + "bbox_overlay" + ], + "ui_unlocks": [ + "detection_overlay", + "detection_results" + ], + "fps_presets": [ + 0.2, + 0.5, + 1, + 3, + 5, + 15 + ], + "model_sizes": [ + "nano", + "small", + "medium", + "large" + ] } ] } \ No newline at end of file diff --git a/skills/detection/yolo-detection-2026/SKILL.md b/skills/detection/yolo-detection-2026/SKILL.md index 1cdf6ed5..60677f93 100644 --- a/skills/detection/yolo-detection-2026/SKILL.md +++ b/skills/detection/yolo-detection-2026/SKILL.md @@ -1,15 +1,17 @@ --- name: yolo-detection-2026 -description: "State-of-the-art real-time object detection using YOLO" +description: "YOLO 2026 — state-of-the-art real-time object detection" version: 1.0.0 icon: assets/icon.png +entry: scripts/detect.py parameters: - - name: model - label: "Model" + - name: model_size + label: "Model Size" type: select - options: ["yolov11n", "yolov11s", "yolov11m", "yolov10n", "yolov10s", "yolov8n"] - default: "yolov11n" + options: ["nano", "small", "medium", "large"] + default: "nano" + description: "Larger models are more accurate but slower" group: Model - name: confidence @@ -29,18 +31,18 @@ parameters: - name: fps label: "Processing FPS" - type: number - min: 1 - max: 30 + type: select + options: [0.2, 0.5, 1, 3, 5, 15] default: 5 + description: "Frames per second — higher = more CPU/GPU usage" group: Performance - name: device label: "Inference Device" type: select - options: ["auto", "cpu", "cuda", "mps"] + options: ["auto", "cpu", "cuda", "mps", "rocm"] default: "auto" - description: "auto = GPU if available, else CPU" + description: "auto = best available GPU, else CPU" group: Performance capabilities: @@ -49,78 +51,59 @@ capabilities: description: "Real-time object detection on live camera frames" --- -# YOLO Object Detection (2026) - -Real-time object detection using state-of-the-art YOLO models. Detects 80+ COCO object classes including people, vehicles, animals, and everyday objects. Outputs bounding boxes with labels and confidence scores that SharpAI Aegis renders as overlays on the live camera feed. - -## What You Get - -When installed in SharpAI Aegis, this skill unlocks: -- **Live detection overlays** on camera feeds — bounding boxes around detected objects -- **Smart alert triggers** — configure alerts when specific objects are detected -- **Detection history** — searchable log of all detections - -## Models - -| Model | Size | Speed (FPS) | Accuracy (mAP) | Best For | -|-------|------|-------------|-----------------|----------| -| YOLOv11n | 6 MB | 30+ | 39.5 | Real-time on CPU | -| YOLOv11s | 22 MB | 20+ | 47.0 | Balanced | -| YOLOv11m | 68 MB | 12+ | 51.5 | High accuracy | -| YOLOv10n | 7 MB | 28+ | 38.5 | Ultra-fast | -| YOLOv10s | 24 MB | 18+ | 46.3 | Balanced (v10) | -| YOLOv8n | 6 MB | 30+ | 37.3 | Legacy compatible | +# YOLO 2026 Object Detection -## Setup +Real-time object detection using the latest YOLO 2026 models. Detects 80+ COCO object classes including people, vehicles, animals, and everyday objects. Outputs bounding boxes with labels and confidence scores. -1. Create a Python virtual environment: - ```bash - python3 -m venv .venv && source .venv/bin/activate - ``` +## Model Sizes -2. Install dependencies: - ```bash - pip install -r requirements.txt - ``` - -3. Download model weights (automatic on first run, or manually): - ```bash - python scripts/download_models.py --model yolov11n - ``` +| Size | Speed | Accuracy | Best For | +|------|-------|----------|----------| +| nano | Fastest | Good | Real-time on CPU, edge devices | +| small | Fast | Better | Balanced speed/accuracy | +| medium | Moderate | High | Accuracy-focused deployments | +| large | Slower | Highest | Maximum detection quality | ## Protocol -This skill communicates with SharpAI Aegis via **JSON lines** over stdin/stdout. - -### Aegis → Skill (stdin): frames to process +Communicates via **JSON lines** over stdin/stdout. +### Aegis → Skill (stdin) ```jsonl -{"event": "frame", "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/frame_001.jpg", "width": 1920, "height": 1080} +{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080} ``` -### Skill → Aegis (stdout): detection results - +### Skill → Aegis (stdout) ```jsonl -{"event": "ready", "model": "yolov11n", "device": "mps", "classes": 80} -{"event": "detections", "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [ - {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]}, - {"class": "car", "confidence": 0.87, "bbox": [500, 200, 900, 500]} +{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5} +{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "...", "objects": [ + {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]} ]} +{"event": "error", "message": "...", "retriable": true} ``` ### Bounding Box Format +`[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy). -`[x_min, y_min, x_max, y_max]` in pixel coordinates. +### Stop Command +```jsonl +{"command": "stop"} +``` + +## Hardware Support -## Hardware Requirements +| Platform | Backend | Performance | +|----------|---------|-------------| +| Apple Silicon (M1+) | MPS | 20-30 FPS | +| NVIDIA GPU | CUDA | 25-60 FPS | +| AMD GPU | ROCm | 15-40 FPS | +| CPU (modern x86) | CPU | 5-15 FPS | +| Raspberry Pi 5 | CPU | 2-5 FPS | -| Device | Performance | -|--------|------------| -| Apple Silicon (M1+) | 20-30 FPS with MPS acceleration | -| NVIDIA GPU | 25-60 FPS with CUDA | -| CPU (modern x86) | 5-15 FPS | -| Raspberry Pi 5 | 2-5 FPS | +## Installation -## Contributing +The `deploy.sh` bootstrapper handles everything — Python environment, GPU backend detection, and dependency installation. No manual setup required. -This skill is part of the [DeepCamera](https://github.com/SharpAI/DeepCamera) open-source project. Contributions welcome — see [Contributions.md](../../Contributions.md). +```bash +./deploy.sh +``` diff --git a/skills/detection/yolo-detection-2026/config.yaml b/skills/detection/yolo-detection-2026/config.yaml new file mode 100644 index 00000000..d37254b2 --- /dev/null +++ b/skills/detection/yolo-detection-2026/config.yaml @@ -0,0 +1,58 @@ +# YOLO 2026 Detection Skill — Configuration Schema +# Parsed by Aegis skill-registry-service.cjs → parseConfigYaml() +# Format: params[] with key, type, label, default, description, options + +params: + - key: auto_start + label: Auto Start + type: boolean + default: false + description: "Start this skill automatically when Aegis launches" + + - key: model_size + label: Model Size + type: select + default: nano + description: "YOLO26 model variant — larger = more accurate but slower" + options: + - { value: nano, label: "Nano (fastest, ~2ms)" } + - { value: small, label: "Small (balanced, ~5ms)" } + - { value: medium, label: "Medium (accurate, ~12ms)" } + - { value: large, label: "Large (most accurate, ~25ms)" } + + - key: confidence + label: Confidence Threshold + type: number + default: 0.5 + description: "Minimum detection confidence (0.1–1.0)" + + - key: fps + label: Frame Rate + type: select + default: 5 + description: "Detection processing rate — higher = more CPU/GPU usage" + options: + - { value: 0.2, label: "Ultra Low (0.2 FPS)" } + - { value: 0.5, label: "Low (0.5 FPS)" } + - { value: 1, label: "Normal (1 FPS)" } + - { value: 3, label: "Active (3 FPS)" } + - { value: 5, label: "High (5 FPS)" } + - { value: 15, label: "Real-time (15 FPS)" } + + - key: classes + label: Detection Classes + type: string + default: "person,car,dog,cat" + description: "Comma-separated COCO class names to detect" + + - key: device + label: Inference Device + type: select + default: auto + description: "Compute backend for inference" + options: + - { value: auto, label: "Auto-detect" } + - { value: cpu, label: "CPU" } + - { value: cuda, label: "NVIDIA CUDA" } + - { value: mps, label: "Apple Silicon (MPS)" } + - { value: rocm, label: "AMD ROCm" } diff --git a/skills/detection/yolo-detection-2026/deploy.sh b/skills/detection/yolo-detection-2026/deploy.sh new file mode 100755 index 00000000..9ba2bc61 --- /dev/null +++ b/skills/detection/yolo-detection-2026/deploy.sh @@ -0,0 +1,158 @@ +#!/usr/bin/env bash +# deploy.sh — Zero-assumption bootstrapper for YOLO 2026 Detection Skill +# +# Probes the system for Python, GPU backends, and installs the minimum +# viable stack. Called by Aegis skill-runtime-manager during installation. +# +# Exit codes: +# 0 = success +# 1 = fatal error (no Python found and cannot install) +# 2 = partial success (CPU-only fallback) + +set -euo pipefail + +SKILL_DIR="$(cd "$(dirname "$0")" && pwd)" +VENV_DIR="$SKILL_DIR/.venv" +LOG_PREFIX="[YOLO-2026-deploy]" + +log() { echo "$LOG_PREFIX $*" >&2; } +emit() { echo "$1"; } # JSON to stdout for Aegis to parse + +# ─── Step 1: Find or install Python ───────────────────────────────────────── + +find_python() { + # Check common Python 3 locations + for cmd in python3.12 python3.11 python3.10 python3.9 python3; do + if command -v "$cmd" &>/dev/null; then + local ver + ver="$("$cmd" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+')" + local major minor + major=$(echo "$ver" | cut -d. -f1) + minor=$(echo "$ver" | cut -d. -f2) + if [ "$major" -ge 3 ] && [ "$minor" -ge 9 ]; then + echo "$cmd" + return 0 + fi + fi + done + + # Check conda + if command -v conda &>/dev/null; then + log "No system Python >=3.9 found, but conda is available" + log "Creating conda environment..." + conda create -n aegis-yolo2026 python=3.11 -y >/dev/null 2>&1 + # shellcheck disable=SC1091 + eval "$(conda shell.bash hook 2>/dev/null)" + conda activate aegis-yolo2026 + echo "python3" + return 0 + fi + + # Check pyenv + if command -v pyenv &>/dev/null; then + log "No system Python >=3.9 found, using pyenv..." + pyenv install -s 3.11.9 + pyenv local 3.11.9 + echo "$(pyenv which python3)" + return 0 + fi + + return 1 +} + +PYTHON_CMD=$(find_python) || { + log "ERROR: No Python >=3.9 found. Install Python 3.9+ and retry." + emit '{"event": "error", "stage": "python", "message": "No Python >=3.9 found"}' + exit 1 +} + +log "Using Python: $PYTHON_CMD ($($PYTHON_CMD --version 2>&1))" +emit "{\"event\": \"progress\", \"stage\": \"python\", \"message\": \"Found $($PYTHON_CMD --version 2>&1)\"}" + +# ─── Step 2: Create virtual environment ───────────────────────────────────── + +if [ ! -d "$VENV_DIR" ]; then + log "Creating virtual environment..." + "$PYTHON_CMD" -m venv "$VENV_DIR" +fi + +# Activate venv +# shellcheck disable=SC1091 +source "$VENV_DIR/bin/activate" +PIP="$VENV_DIR/bin/pip" + +# Upgrade pip +"$PIP" install --upgrade pip -q 2>/dev/null || true + +emit '{"event": "progress", "stage": "venv", "message": "Virtual environment ready"}' + +# ─── Step 3: Detect compute backend ───────────────────────────────────────── + +BACKEND="cpu" + +detect_gpu() { + # NVIDIA CUDA + if command -v nvidia-smi &>/dev/null; then + local cuda_ver + cuda_ver=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null | head -1) + if [ -n "$cuda_ver" ]; then + BACKEND="cuda" + log "Detected NVIDIA GPU (driver: $cuda_ver)" + return 0 + fi + fi + + # AMD ROCm + if command -v rocm-smi &>/dev/null || [ -d "/opt/rocm" ]; then + BACKEND="rocm" + log "Detected AMD ROCm" + return 0 + fi + + # Apple Silicon MPS + if [ "$(uname)" = "Darwin" ]; then + local arch + arch=$(uname -m) + if [ "$arch" = "arm64" ]; then + BACKEND="mps" + log "Detected Apple Silicon (MPS)" + return 0 + fi + fi + + log "No GPU detected, using CPU backend" + return 0 +} + +detect_gpu +emit "{\"event\": \"progress\", \"stage\": \"gpu\", \"backend\": \"$BACKEND\", \"message\": \"Compute backend: $BACKEND\"}" + +# ─── Step 4: Install requirements ──────────────────────────────────────────── + +REQ_FILE="$SKILL_DIR/requirements_${BACKEND}.txt" + +if [ ! -f "$REQ_FILE" ]; then + log "WARNING: $REQ_FILE not found, falling back to CPU" + REQ_FILE="$SKILL_DIR/requirements_cpu.txt" + BACKEND="cpu" +fi + +log "Installing dependencies from $REQ_FILE ..." +emit "{\"event\": \"progress\", \"stage\": \"install\", \"message\": \"Installing $BACKEND dependencies...\"}" + +"$PIP" install -r "$REQ_FILE" -q 2>&1 | tail -5 >&2 + +# ─── Step 5: Verify installation ──────────────────────────────────────────── + +log "Verifying installation..." +"$VENV_DIR/bin/python" -c " +from ultralytics import YOLO +import torch +device = 'cpu' +if torch.cuda.is_available(): device = 'cuda' +elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available(): device = 'mps' +print(f'OK: ultralytics loaded, torch device={device}') +" 2>&1 | while read -r line; do log "$line"; done + +emit "{\"event\": \"complete\", \"backend\": \"$BACKEND\", \"message\": \"YOLO 2026 skill installed ($BACKEND backend)\"}" +log "Done! Backend: $BACKEND" diff --git a/skills/detection/yolo-detection-2026/requirements_cpu.txt b/skills/detection/yolo-detection-2026/requirements_cpu.txt new file mode 100644 index 00000000..cdb172fc --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_cpu.txt @@ -0,0 +1,9 @@ +# YOLO 2026 — CPU-only requirements +# Smallest install — no GPU acceleration +--extra-index-url https://download.pytorch.org/whl/cpu +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/requirements_cuda.txt b/skills/detection/yolo-detection-2026/requirements_cuda.txt new file mode 100644 index 00000000..0240bd7b --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_cuda.txt @@ -0,0 +1,9 @@ +# YOLO 2026 — CUDA (NVIDIA GPU) requirements +# Installs PyTorch with CUDA 12.4 support +--extra-index-url https://download.pytorch.org/whl/cu124 +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/requirements_mps.txt b/skills/detection/yolo-detection-2026/requirements_mps.txt new file mode 100644 index 00000000..5498200a --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_mps.txt @@ -0,0 +1,8 @@ +# YOLO 2026 — MPS (Apple Silicon) requirements +# Standard PyTorch — MPS backend is included by default on macOS +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/requirements_rocm.txt b/skills/detection/yolo-detection-2026/requirements_rocm.txt new file mode 100644 index 00000000..e665dff0 --- /dev/null +++ b/skills/detection/yolo-detection-2026/requirements_rocm.txt @@ -0,0 +1,9 @@ +# YOLO 2026 — ROCm (AMD GPU) requirements +# Installs PyTorch with ROCm 6.2 support +--extra-index-url https://download.pytorch.org/whl/rocm6.2 +torch>=2.4.0 +torchvision>=0.19.0 +ultralytics>=8.3.0 +numpy>=1.24.0 +opencv-python-headless>=4.8.0 +Pillow>=10.0.0 diff --git a/skills/detection/yolo-detection-2026/scripts/detect.py b/skills/detection/yolo-detection-2026/scripts/detect.py index c6de996c..903a4348 100644 --- a/skills/detection/yolo-detection-2026/scripts/detect.py +++ b/skills/detection/yolo-detection-2026/scripts/detect.py @@ -1,14 +1,14 @@ #!/usr/bin/env python3 """ -YOLO Detection Skill — Real-time object detection for SharpAI Aegis. +YOLO 2026 Detection Skill — Real-time object detection for SharpAI Aegis. Communicates via JSON lines over stdin/stdout: - stdin: {"event": "frame", "camera_id": "...", "frame_path": "...", ...} - stdout: {"event": "detections", "camera_id": "...", "objects": [...]} + stdin: {"event": "frame", "frame_id": N, "camera_id": "...", "frame_path": "...", ...} + stdout: {"event": "detections", "frame_id": N, "camera_id": "...", "objects": [...]} Usage: python detect.py --config config.json - python detect.py --model yolov11n --confidence 0.5 --device auto + python detect.py --model-size nano --confidence 0.5 --device auto """ import sys @@ -17,27 +17,51 @@ import signal from pathlib import Path + +# Model size → ultralytics model name mapping (YOLO26, released Jan 2026) +MODEL_SIZE_MAP = { + "nano": "yolo26n", + "small": "yolo26s", + "medium": "yolo26m", + "large": "yolo26l", +} + + def parse_args(): - parser = argparse.ArgumentParser(description="YOLO Detection Skill") + parser = argparse.ArgumentParser(description="YOLO 2026 Detection Skill") parser.add_argument("--config", type=str, help="Path to config JSON file") - parser.add_argument("--model", type=str, default="yolov11n", - choices=["yolov11n", "yolov11s", "yolov11m", "yolov10n", "yolov10s", "yolov8n"]) + parser.add_argument("--model-size", type=str, default="nano", + choices=["nano", "small", "medium", "large"]) parser.add_argument("--confidence", type=float, default=0.5) parser.add_argument("--classes", type=str, default="person,car,dog,cat") - parser.add_argument("--device", type=str, default="auto", choices=["auto", "cpu", "cuda", "mps"]) - parser.add_argument("--fps", type=int, default=5) + parser.add_argument("--device", type=str, default="auto", + choices=["auto", "cpu", "cuda", "mps", "rocm"]) + parser.add_argument("--fps", type=float, default=5) return parser.parse_args() def load_config(args): - """Load config from JSON file or CLI args.""" + """Load config from JSON file, CLI args, or AEGIS_SKILL_PARAMS env var.""" + import os + + # Priority 1: AEGIS_SKILL_PARAMS env var (set by Aegis skill-runtime-manager) + env_params = os.environ.get("AEGIS_SKILL_PARAMS") + if env_params: + try: + return json.loads(env_params) + except json.JSONDecodeError: + pass + + # Priority 2: Config file if args.config: config_path = Path(args.config) if config_path.exists(): with open(config_path) as f: return json.load(f) + + # Priority 3: CLI args return { - "model": args.model, + "model_size": args.model_size, "confidence": args.confidence, "classes": args.classes.split(","), "device": args.device, @@ -47,7 +71,7 @@ def load_config(args): def select_device(preference: str) -> str: """Select the best available inference device.""" - if preference != "auto": + if preference not in ("auto", ""): return preference try: import torch @@ -55,6 +79,7 @@ def select_device(preference: str) -> str: return "cuda" if hasattr(torch.backends, "mps") and torch.backends.mps.is_available(): return "mps" + # ROCm exposes as CUDA in PyTorch with ROCm builds except ImportError: pass return "cpu" @@ -69,11 +94,18 @@ def main(): args = parse_args() config = load_config(args) - # Select device + # Resolve config values + model_size = config.get("model_size", "nano") device = select_device(config.get("device", "auto")) - model_name = config.get("model", "yolov11n") confidence = config.get("confidence", 0.5) + fps = config.get("fps", 5) + + # Map size to ultralytics model name + model_name = MODEL_SIZE_MAP.get(model_size, "yolo11n") + target_classes = config.get("classes", ["person", "car", "dog", "cat"]) + if isinstance(target_classes, str): + target_classes = [c.strip() for c in target_classes.split(",")] # Load YOLO model try: @@ -82,9 +114,12 @@ def main(): model.to(device) emit({ "event": "ready", - "model": model_name, + "model": f"yolo2026{model_size[0]}", + "model_size": model_size, "device": device, "classes": len(model.names), + "fps": fps, + "available_sizes": list(MODEL_SIZE_MAP.keys()), }) except Exception as e: emit({"event": "error", "message": f"Failed to load model: {e}", "retriable": False}) @@ -117,11 +152,17 @@ def handle_signal(signum, frame): if msg.get("event") == "frame": frame_path = msg.get("frame_path") + frame_id = msg.get("frame_id") camera_id = msg.get("camera_id", "unknown") timestamp = msg.get("timestamp", "") if not frame_path or not Path(frame_path).exists(): - emit({"event": "error", "message": f"Frame not found: {frame_path}", "retriable": True}) + emit({ + "event": "error", + "frame_id": frame_id, + "message": f"Frame not found: {frame_path}", + "retriable": True, + }) continue # Run inference @@ -142,12 +183,18 @@ def handle_signal(signum, frame): emit({ "event": "detections", + "frame_id": frame_id, "camera_id": camera_id, "timestamp": timestamp, "objects": objects, }) except Exception as e: - emit({"event": "error", "message": f"Inference error: {e}", "retriable": True}) + emit({ + "event": "error", + "frame_id": frame_id, + "message": f"Inference error: {e}", + "retriable": True, + }) if __name__ == "__main__":