Integrate ruvllm_sparse_attention for on-ESP32-S3 temporal modeling + AETHER temporal head

## Summary

Two architecture decisions to capture before any code lands, both stemming from the v0.1.1 release of [`ruvllm_sparse_attention`](https://crates.io/crates/ruvllm_sparse_attention) (vendored at `vendor/ruvector/crates/ruvllm_sparse_attention`):

1. **On-ESP32-S3 temporal modeling at the edge** — use the no_std + alloc build (376 KB rlib on `xtensa-esp32s3-none-elf` per upstream ADR-192) to add learned temporal heads (gesture classification, fall/anomaly classification with sequence context, breathing-quality scoring) on the firmware itself, alongside the existing physics-only DSP. No other CSI sensing project ships transformer inference on the MCU.
2. **AETHER contrastive embedding temporal head** — the embedding pipeline from [ADR-024](docs/adr/ADR-024-contrastive-csi-embedding-model.md) currently uses dense MHA over CSI frame sequences. Swap to `forward_gqa` with `KvCache` for streaming. At T=1000 frames (10 s @ 100 Hz) the speedup is **30–100×**, and KV cache makes incremental decode O(1) per new frame.

## Source crate facts

- `ruvllm_sparse_attention` v0.1.1 (published 2026-05-07, MIT)
- API: `SubquadraticSparseAttention`, `forward / forward_flash / forward_gqa / forward_auto / decode_step`, `KvCache` (FP16 optional), `IncrementalLandmarks`, `FastGrnnGate`, `RuvLlmSparseBlock`, `Tensor3`
- `no_std + alloc` validated on real ESP32-S3 (ADR-192); 376 KB release rlib
- O(N log N) sparse attention; near-linear O(N) with FastGRNN salience gate
- Zero runtime deps (`libm`, optional `half`/`rayon`); no candle/tch/ndarray coupling
- Validated on Pi 5, Pi Zero 2W, Hailo-10H

## Scope of THIS issue

ADRs only. No code changes. Two ADRs to be authored:

- **ADR-095 — On-ESP32-S3 temporal modeling via no_std sparse attention**
- **ADR-096 — AETHER temporal head: dense MHA → sparse `forward_gqa` with streaming KV cache**

Each ADR should follow the existing `docs/adr/ADR-NNN-<slug>.md` template (frontmatter with `adr / title / status / date / authors / related / tags`, then Status / Context / Decision / Consequences sections).

## Out of scope (separate follow-up issues)

- Implementing a `wifi-densepose-temporal` crate
- Wiring the kernel into the firmware (Rust component inside ESP-IDF)
- Bumping ruvector workspace deps from 2.0.4 → 2.2.0/2.1.0/2.0.6 (skewed across crates; `ruvector-attn-mincut` is stuck at 2.0.4 upstream)

## Branch

`feat/ruvllm-sparse-attention-edge` (already pushed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ruvllm_sparse_attention for on-ESP32-S3 temporal modeling + AETHER temporal head #513

Summary

Source crate facts

Scope of THIS issue

Out of scope (separate follow-up issues)

Branch

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Integrate ruvllm_sparse_attention for on-ESP32-S3 temporal modeling + AETHER temporal head #513

Description

Summary

Source crate facts

Scope of THIS issue

Out of scope (separate follow-up issues)

Branch

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions