Skip to content

Integrate ruvllm_sparse_attention for on-ESP32-S3 temporal modeling + AETHER temporal head #513

@ruvnet

Description

@ruvnet

Summary

Two architecture decisions to capture before any code lands, both stemming from the v0.1.1 release of ruvllm_sparse_attention (vendored at vendor/ruvector/crates/ruvllm_sparse_attention):

  1. On-ESP32-S3 temporal modeling at the edge — use the no_std + alloc build (376 KB rlib on xtensa-esp32s3-none-elf per upstream ADR-192) to add learned temporal heads (gesture classification, fall/anomaly classification with sequence context, breathing-quality scoring) on the firmware itself, alongside the existing physics-only DSP. No other CSI sensing project ships transformer inference on the MCU.
  2. AETHER contrastive embedding temporal head — the embedding pipeline from ADR-024 currently uses dense MHA over CSI frame sequences. Swap to forward_gqa with KvCache for streaming. At T=1000 frames (10 s @ 100 Hz) the speedup is 30–100×, and KV cache makes incremental decode O(1) per new frame.

Source crate facts

  • ruvllm_sparse_attention v0.1.1 (published 2026-05-07, MIT)
  • API: SubquadraticSparseAttention, forward / forward_flash / forward_gqa / forward_auto / decode_step, KvCache (FP16 optional), IncrementalLandmarks, FastGrnnGate, RuvLlmSparseBlock, Tensor3
  • no_std + alloc validated on real ESP32-S3 (ADR-192); 376 KB release rlib
  • O(N log N) sparse attention; near-linear O(N) with FastGRNN salience gate
  • Zero runtime deps (libm, optional half/rayon); no candle/tch/ndarray coupling
  • Validated on Pi 5, Pi Zero 2W, Hailo-10H

Scope of THIS issue

ADRs only. No code changes. Two ADRs to be authored:

  • ADR-095 — On-ESP32-S3 temporal modeling via no_std sparse attention
  • ADR-096 — AETHER temporal head: dense MHA → sparse forward_gqa with streaming KV cache

Each ADR should follow the existing docs/adr/ADR-NNN-<slug>.md template (frontmatter with adr / title / status / date / authors / related / tags, then Status / Context / Decision / Consequences sections).

Out of scope (separate follow-up issues)

  • Implementing a wifi-densepose-temporal crate
  • Wiring the kernel into the firmware (Rust component inside ESP-IDF)
  • Bumping ruvector workspace deps from 2.0.4 → 2.2.0/2.1.0/2.0.6 (skewed across crates; ruvector-attn-mincut is stuck at 2.0.4 upstream)

Branch

feat/ruvllm-sparse-attention-edge (already pushed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions