-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Design the pipeline to turn Semfora's existing outputs (toon, sqlite, jsonl) into embeddings for Retrieval‑Augmented Generation.
Goals:
- Use lightweight outputs to generate embeddings on client machines of unknown power.
- Handle massive codebases via chunking, on‑disk vector stores, and incremental updates.
- Keep embeddings up‑to‑date when files change or re‑indexing occurs.
Deliverables:
- Architecture diagram (Mermaid) linking Semfora indexing, chunking, embedding model, and vector DB.
- Recommended embedding models (open‑source sentence‑transformers, OpenAI embeddings, etc.) and fallback strategies.
- Strategy for incremental updates (hash‑based change detection, delta indexing).
Reactions are currently unavailable