A zero-dependency C implementation of semantic intent recognition using EmbeddingGemma-300M sentence embeddings.
Registers trigger phrases, embeds them with a Gemma 3 transformer, and matches utterances via cosine similarity. All weights are float32.
pip install -r scripts/requirements.txtpython scripts/export-weights.pyOutput goes to models/embeddinggemma/ containing:
embedding.bin— float32 transformer + projection weights (~1.2 GB)tokenizer.bin— SentencePiece tokenizer (262K vocab)
make
./test_embedding models/embeddinggemma#include "embedding.h"
// Load model (immutable, thread-safe, load once)
embedding_model *model = embedding_model_load("models/embeddinggemma");
// Create per-thread state (mutable scratch buffers)
// Second arg caps sequence length: 128 ≈ 5.7 MB, 0 = model max (2048 ≈ 92 MB)
embedding_state *state = embedding_state_create(model, 128);
// Get a 768-dim L2-normalized embedding
float emb[768];
embedding_model_embed(model, state, "turn on the lights", emb, 768);
// Intent recognition
intent_recognizer *ir = intent_recognizer_create(model, state, 0.7f);
intent_recognizer_register(ir, "turn on the lights", my_callback, NULL);
intent_recognizer_register(ir, "what is the weather", my_callback, NULL);
// Process from any thread (with its own state)
intent_recognizer_process(ir, model, state, "switch on the lights");
intent_recognizer_free(ir);
embedding_state_free(state);
embedding_model_free(model);embedding_model— immutable after load, share across threadsembedding_state— mutable scratch buffers, one per concurrent call. Passmax_seqto control memory usage (~5.7 MB at 128 vs ~92 MB at 2048). For intent recognition, 64-128 is typically sufficientintent_recognizer— register intents during setup (single-threaded), thenprocessis thread-safe with separate states
EmbeddingGemma-300M is a Gemma 3 bidirectional transformer:
- Tokenize text (SentencePiece, 262K vocab)
- Embed tokens + scale by sqrt(768)
- 24 transformer layers (GQA 3Q/1KV, head_dim=256, RMSNorm, gated GELU MLP, RoPE)
- Mean pool across sequence
- Dense 768 → 3072 → 768
- L2 normalize