Summary
Two architecture decisions to capture before any code lands, both stemming from the v0.1.1 release of ruvllm_sparse_attention (vendored at vendor/ruvector/crates/ruvllm_sparse_attention):
- On-ESP32-S3 temporal modeling at the edge — use the no_std + alloc build (376 KB rlib on
xtensa-esp32s3-none-elf per upstream ADR-192) to add learned temporal heads (gesture classification, fall/anomaly classification with sequence context, breathing-quality scoring) on the firmware itself, alongside the existing physics-only DSP. No other CSI sensing project ships transformer inference on the MCU.
- AETHER contrastive embedding temporal head — the embedding pipeline from ADR-024 currently uses dense MHA over CSI frame sequences. Swap to
forward_gqa with KvCache for streaming. At T=1000 frames (10 s @ 100 Hz) the speedup is 30–100×, and KV cache makes incremental decode O(1) per new frame.
Source crate facts
ruvllm_sparse_attention v0.1.1 (published 2026-05-07, MIT)
- API:
SubquadraticSparseAttention, forward / forward_flash / forward_gqa / forward_auto / decode_step, KvCache (FP16 optional), IncrementalLandmarks, FastGrnnGate, RuvLlmSparseBlock, Tensor3
no_std + alloc validated on real ESP32-S3 (ADR-192); 376 KB release rlib
- O(N log N) sparse attention; near-linear O(N) with FastGRNN salience gate
- Zero runtime deps (
libm, optional half/rayon); no candle/tch/ndarray coupling
- Validated on Pi 5, Pi Zero 2W, Hailo-10H
Scope of THIS issue
ADRs only. No code changes. Two ADRs to be authored:
- ADR-095 — On-ESP32-S3 temporal modeling via no_std sparse attention
- ADR-096 — AETHER temporal head: dense MHA → sparse
forward_gqa with streaming KV cache
Each ADR should follow the existing docs/adr/ADR-NNN-<slug>.md template (frontmatter with adr / title / status / date / authors / related / tags, then Status / Context / Decision / Consequences sections).
Out of scope (separate follow-up issues)
- Implementing a
wifi-densepose-temporal crate
- Wiring the kernel into the firmware (Rust component inside ESP-IDF)
- Bumping ruvector workspace deps from 2.0.4 → 2.2.0/2.1.0/2.0.6 (skewed across crates;
ruvector-attn-mincut is stuck at 2.0.4 upstream)
Branch
feat/ruvllm-sparse-attention-edge (already pushed)
Summary
Two architecture decisions to capture before any code lands, both stemming from the v0.1.1 release of
ruvllm_sparse_attention(vendored atvendor/ruvector/crates/ruvllm_sparse_attention):xtensa-esp32s3-none-elfper upstream ADR-192) to add learned temporal heads (gesture classification, fall/anomaly classification with sequence context, breathing-quality scoring) on the firmware itself, alongside the existing physics-only DSP. No other CSI sensing project ships transformer inference on the MCU.forward_gqawithKvCachefor streaming. At T=1000 frames (10 s @ 100 Hz) the speedup is 30–100×, and KV cache makes incremental decode O(1) per new frame.Source crate facts
ruvllm_sparse_attentionv0.1.1 (published 2026-05-07, MIT)SubquadraticSparseAttention,forward / forward_flash / forward_gqa / forward_auto / decode_step,KvCache(FP16 optional),IncrementalLandmarks,FastGrnnGate,RuvLlmSparseBlock,Tensor3no_std + allocvalidated on real ESP32-S3 (ADR-192); 376 KB release rliblibm, optionalhalf/rayon); no candle/tch/ndarray couplingScope of THIS issue
ADRs only. No code changes. Two ADRs to be authored:
forward_gqawith streaming KV cacheEach ADR should follow the existing
docs/adr/ADR-NNN-<slug>.mdtemplate (frontmatter withadr / title / status / date / authors / related / tags, then Status / Context / Decision / Consequences sections).Out of scope (separate follow-up issues)
wifi-densepose-temporalcrateruvector-attn-mincutis stuck at 2.0.4 upstream)Branch
feat/ruvllm-sparse-attention-edge(already pushed)