feat(embedding): authenticated, size-robust API embedder + vector persistence + warm-restart fix by avfirsov · Pull Request #109 · zzet/gortex

avfirsov · 2026-06-18T07:33:58Z

Makes the OpenAI-compatible api embedder usable against real hosted backends and large repos, fixes vector persistence under the bulk loader, and fixes a warm-restart re-index bug. Four focused commits; CI-green; tests included.

What & why

feat(embedding) — authenticated + size-robust embedder
- Send Authorization: Bearer from GORTEX_EMBEDDINGS_API_KEY (falls back to OPENAI_API_KEY only for *openai.com* URLs). The embedder was Ollama-oriented and keyless → OpenAI returned 401.
- Head-truncate each embedding input to 8000 bytes. OpenAI rejects >8192-token inputs with a 400 that aborts the whole vector index; tokens ≤ bytes for ASCII source, so an 8000-byte head is provably safe.
- GORTEX_EMBEDDINGS_MAX_SYMBOLS env override for the vector-index size cap — embedding.max_symbols config did not reach the indexer via the flag/env embedder path.
feat(embedding,indexer,daemon) — vector persistence + warm-restart prefix fix
- Vectors never persisted under the bulk loader (bulkVectorSink): during a bulk index idx.graph is the in-memory shadow (no VectorSearcher), so buildSearchIndex skipped BulkUpsertEmbeddings — the sqlite vectors table stayed empty and a restart had no vectors to restore. Fix: capture the disk store at the shadow swap and persist there.
- Warm-restart prefix bug: single-repo daemons persist file_mtimes under prefix "" but priorMtimesFromStore looked them up under the path basename → 0 rows → every restart did a full cold re-index (and a paid re-embed). Fix: single-repo lookup under "".
feat(daemon) — expose --embeddings-url / --embeddings-model on daemon start.
fix(embedding) — probe API embedder dims at startup + tolerate a /v1 base URL.

Tests

internal/embedding/api_test.go (auth header, no-key case, truncation, dims probe / /v1), internal/indexer/vector_persist_test.go, cmd/gortex/daemon_state_test.go. go build ./... clean; embedding / indexer / serverstack / cmd/gortex test packages pass.

These were developed in a downstream fork and split out here as a self-contained, generic bundle. Happy to split further or adjust per your preference.

🤖 Generated with Claude Code

Make the OpenAI-compatible api embedder usable for real hosted backends and large repos. - api.go: send Authorization: Bearer from GORTEX_EMBEDDINGS_API_KEY (falling back to OPENAI_API_KEY only for *openai.com* URLs). The embedder was Ollama-oriented and keyless → OpenAI returned 401. - api.go: head-truncate each embedding input to 8000 bytes. OpenAI rejects >8192-token inputs with a 400 that aborts the WHOLE vector index; tokens ≤ bytes for ASCII source, so an 8000-byte head is provably safe. - indexer.go: GORTEX_EMBEDDINGS_MAX_SYMBOLS env override for the vector-index size cap — embedding.max_symbols config did not reach the indexer via the flag/env embedder path. Tests in api_test.go cover the auth header, the no-key case, and truncation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit e31b66e)

…tence, warm-restart prefix fix Four robustness fixes surfaced while indexing a large external repo (Apache Drools, ~10.5k files) with an OpenAI embedder on a memory-constrained host, plus a running operational playbook (AGENTS.md). internal/embedding (api.go, api_test.go): - Send `Authorization: Bearer` from GORTEX_EMBEDDINGS_API_KEY (fallback to OPENAI_API_KEY only for *.openai.com) — the api-embedder was Ollama-oriented and keyless, so OpenAI returned 401. - Head-truncate each input to 8000 bytes: OpenAI rejects >8192-token inputs with a 400 that aborts the WHOLE vector index; an 8000-byte head guarantees <=8000 tokens (BPE never emits more tokens than chars; ASCII char==byte). - Accumulate usage.total_tokens (atomic) and expose TokensUsed(); the indexer logs `embed_tokens` on "vector index built" so a paid pass reports its spend. internal/indexer (indexer.go, vector_persist_test.go): - GORTEX_EMBEDDINGS_MAX_SYMBOLS env override for the vector-index size cap that config plumbing didn't reach. - Persist the vector index under the bulk loader. During a bulk index idx.graph is the in-memory shadow (no graph.VectorSearcher), so buildSearchIndex never called BulkUpsertEmbeddings — vectors lived only in the in-process HNSW and the sqlite `vectors` table stayed empty, lost on restart. Capture the disk store at the shadow swap (bulkVectorSink) and persist against it (the vectors table has no FK to nodes, so upsert before FlushBulk is safe). cmd/gortex/daemon (daemon_state.go, daemon_state_test.go): - Warm-restart prefix fix (warmMtimePrefix). Single-repo daemons index unprefixed (file_mtimes rows keyed by ""), but priorMtimesFromStore looked them up under the path basename, so 0 rows matched and every restart did a full cold re-index (+ a paid re-embed). Look up under "" in single-repo mode. Together the last two enable a two-pass index on a RAM-tight box: embed-only first (vectors persist to sqlite), then a warm restart restores graph+vectors with no re-parse/re-embed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 7a38cd9)

…tart` Mirror `gortex mcp`'s embedding-API flags onto `gortex daemon start` so the long-lived daemon can use an explicit OpenAI-compatible (or Ollama) embedding endpoint instead of only the built-in GloVe/transformer providers. The flags thread into the existing serverstack.EmbedderRequest{FlagURL,FlagModel} -> ResolveEmbedder path (a non-empty URL forces the api provider; key via $GORTEX_EMBEDDINGS_API_KEY or $OPENAI_API_KEY). No new embedding code — the OpenAI APIProvider already existed; this just makes the daemon flag-drivable like mcp. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit ee712fa)

An APIProvider reported Dimensions()==0 until its first embed, so the daemon logged dim:0 and the snapshot-vector reload gate (daemon_state.go: vec.Dims == EmbedderDims) rejected a correctly-sized cached index, re-embedding the whole graph on every restart. Add APIProvider.ProbeDimensions(ctx): one tiny embed call that caches the true width up front — idempotent, best-effort (a failure only warns and the lazy path still fills it in), and doubles as an early key/URL connectivity check. NewSharedServer probes any API-backed provider before logging "embeddings enabled", so the width is truthful from the start. Also fix a double-/v1 bug: NewAPIProvider("…/v1") + embedOpenAI appending "/v1/embeddings" produced "…/v1/v1/embeddings" → 404 → silent fallback to BM25. OpenAI-compatible bases are conventionally given with /v1 (OpenAI, OpenRouter), so append it only when absent. Tests: probe unit/error/URL-variant tests + a live OpenAI integration test (skipped without a key) asserting a 1536-d width and token accounting. Verified live: daemon now logs "embedding dimension probed dim:1536". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit d4b6a41)

avfirsov and others added 4 commits June 18, 2026 10:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embedding): authenticated, size-robust API embedder + vector persistence + warm-restart fix#109

feat(embedding): authenticated, size-robust API embedder + vector persistence + warm-restart fix#109
avfirsov wants to merge 4 commits into
zzet:mainfrom
avfirsov:pr/embedder-robustness

avfirsov commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avfirsov commented Jun 18, 2026

What & why

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant