Skip to content

ruvllm_retrieval_diffusion: corpus-agnostic generalisation of sparse-mario #449

@ruvnet

Description

@ruvnet

Follow-up to #448. Lifts the sparse-mario algorithm into a domain-agnostic crate so anyone can point it at their own examples instead of just Mario level slices.

Plain-language summary

Sparse-Mario was a working demo of "use a sparse attention kernel as a lookup table over a corpus of examples, no training required." It generated Mario levels.

This new crate is the same idea, but corpus-agnostic — you supply any small token alphabet and a few example sequences, and you get back two pipelines:

  • Stream mode — produce one token at a time, like writing into a text box. ~12 microseconds per token.
  • Fill mode — start from a blank canvas and fill it in everywhere at once over a few rounds. Like content-aware fill, but for tokens. Can also repair partial sequences.

No GPUs, no PyTorch, no model files. The examples are the model.

What landed

New crate at crates/ruvllm_retrieval_diffusion/:

  • src/lib.rs (~600 lines) — generic Retriever + Diffuser + SamplingConfig, parameterised by RetrievalConfig (vocab_size, head_dim, pos_scale, mask_sentinel, diffusion context weights).
  • examples/drum_patterns.rs — second-domain proof: 5-token drum-machine vocab, 4 hand-authored 16-step patterns as corpus, generates 4-bar loops via both modes (AR 268µs, diffusion 5.7ms on a 9950X).
  • README.md — public-facing writeup.
  • 10 unit tests, all passing.

Branch: sparse-mario (commit 977479eff, just pushed).
Public gist (plain-language version of the README): https://gist.github.com/ruvnet/af1638d7db2961f60d732467b4282ad5

Why this is interesting beyond Mario

The Sparse-Mario benchmark already showed that bidirectional fill mode beats every non-trivial baseline by ~4× on aggregate quality metrics. That win is structural — the Markov-1 baseline has perfect bigram statistics and still loses, because it can't use right-context to inform left-context.

This crate makes that bidirectional fill capability available to any small-vocab token domain in four lines of Rust.

Suggested follow-up domains

Easy plug-ins (corpus and tokenizer needed):

  • Terraform / k8s configs — real engineering ROI; team-style templating without a code-generation LLM. Tokenizer = vocab of known keywords + values + structural tokens.
  • MIDI loops — same shape as drum patterns but with pitched notes; output goes directly into a DAW.
  • Log-line templates — corpus = past incidents; generates plausible next-line shapes for synthetic test data.
  • MAGVIT-style visual tokens — if a VQ image codec is wired in, this becomes a tiny Rust-only image-fill demo (matches the original 'diffusion image/video' request from the discussion that spawned all of this).

Suggested architectural follow-ups (already filed in #448)

  • AR positional-bias removal — 3-line change, expected to halve AR L2 distance on grid-shaped corpora.
  • Diffusion floor-anchor pre-step — would close most of the remaining diffusion-to-corpus gap on Mario.

How to access the work

# clone + build
git fetch origin sparse-mario
git checkout sparse-mario
cargo run --release -p ruvllm_retrieval_diffusion --example drum_patterns
cargo test  -p ruvllm_retrieval_diffusion

Filed with gh issue create on behalf of @ruvnet from a Claude Code session that drove the generalisation iter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions