feat(embedding): authenticated, size-robust API embedder + vector persistence + warm-restart fix#109
Open
avfirsov wants to merge 4 commits into
Open
feat(embedding): authenticated, size-robust API embedder + vector persistence + warm-restart fix#109avfirsov wants to merge 4 commits into
avfirsov wants to merge 4 commits into
Conversation
Make the OpenAI-compatible api embedder usable for real hosted backends and large repos. - api.go: send Authorization: Bearer from GORTEX_EMBEDDINGS_API_KEY (falling back to OPENAI_API_KEY only for *openai.com* URLs). The embedder was Ollama-oriented and keyless → OpenAI returned 401. - api.go: head-truncate each embedding input to 8000 bytes. OpenAI rejects >8192-token inputs with a 400 that aborts the WHOLE vector index; tokens ≤ bytes for ASCII source, so an 8000-byte head is provably safe. - indexer.go: GORTEX_EMBEDDINGS_MAX_SYMBOLS env override for the vector-index size cap — embedding.max_symbols config did not reach the indexer via the flag/env embedder path. Tests in api_test.go cover the auth header, the no-key case, and truncation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit e31b66e)
…tence, warm-restart prefix fix Four robustness fixes surfaced while indexing a large external repo (Apache Drools, ~10.5k files) with an OpenAI embedder on a memory-constrained host, plus a running operational playbook (AGENTS.md). internal/embedding (api.go, api_test.go): - Send `Authorization: Bearer` from GORTEX_EMBEDDINGS_API_KEY (fallback to OPENAI_API_KEY only for *.openai.com) — the api-embedder was Ollama-oriented and keyless, so OpenAI returned 401. - Head-truncate each input to 8000 bytes: OpenAI rejects >8192-token inputs with a 400 that aborts the WHOLE vector index; an 8000-byte head guarantees <=8000 tokens (BPE never emits more tokens than chars; ASCII char==byte). - Accumulate usage.total_tokens (atomic) and expose TokensUsed(); the indexer logs `embed_tokens` on "vector index built" so a paid pass reports its spend. internal/indexer (indexer.go, vector_persist_test.go): - GORTEX_EMBEDDINGS_MAX_SYMBOLS env override for the vector-index size cap that config plumbing didn't reach. - Persist the vector index under the bulk loader. During a bulk index idx.graph is the in-memory shadow (no graph.VectorSearcher), so buildSearchIndex never called BulkUpsertEmbeddings — vectors lived only in the in-process HNSW and the sqlite `vectors` table stayed empty, lost on restart. Capture the disk store at the shadow swap (bulkVectorSink) and persist against it (the vectors table has no FK to nodes, so upsert before FlushBulk is safe). cmd/gortex/daemon (daemon_state.go, daemon_state_test.go): - Warm-restart prefix fix (warmMtimePrefix). Single-repo daemons index unprefixed (file_mtimes rows keyed by ""), but priorMtimesFromStore looked them up under the path basename, so 0 rows matched and every restart did a full cold re-index (+ a paid re-embed). Look up under "" in single-repo mode. Together the last two enable a two-pass index on a RAM-tight box: embed-only first (vectors persist to sqlite), then a warm restart restores graph+vectors with no re-parse/re-embed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 7a38cd9)
…tart`
Mirror `gortex mcp`'s embedding-API flags onto `gortex daemon start` so the long-lived daemon can use an explicit OpenAI-compatible (or Ollama) embedding endpoint instead of only the built-in GloVe/transformer providers. The flags thread into the existing serverstack.EmbedderRequest{FlagURL,FlagModel} -> ResolveEmbedder path (a non-empty URL forces the api provider; key via $GORTEX_EMBEDDINGS_API_KEY or $OPENAI_API_KEY). No new embedding code — the OpenAI APIProvider already existed; this just makes the daemon flag-drivable like mcp.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit ee712fa)
An APIProvider reported Dimensions()==0 until its first embed, so the
daemon logged dim:0 and the snapshot-vector reload gate
(daemon_state.go: vec.Dims == EmbedderDims) rejected a correctly-sized
cached index, re-embedding the whole graph on every restart.
Add APIProvider.ProbeDimensions(ctx): one tiny embed call that caches the
true width up front — idempotent, best-effort (a failure only warns and
the lazy path still fills it in), and doubles as an early key/URL
connectivity check. NewSharedServer probes any API-backed provider before
logging "embeddings enabled", so the width is truthful from the start.
Also fix a double-/v1 bug: NewAPIProvider("…/v1") + embedOpenAI appending
"/v1/embeddings" produced "…/v1/v1/embeddings" → 404 → silent fallback to
BM25. OpenAI-compatible bases are conventionally given with /v1 (OpenAI,
OpenRouter), so append it only when absent.
Tests: probe unit/error/URL-variant tests + a live OpenAI integration test
(skipped without a key) asserting a 1536-d width and token accounting.
Verified live: daemon now logs "embedding dimension probed dim:1536".
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit d4b6a41)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Makes the OpenAI-compatible api embedder usable against real hosted backends and large repos, fixes vector persistence under the bulk loader, and fixes a warm-restart re-index bug. Four focused commits; CI-green; tests included.
What & why
feat(embedding)— authenticated + size-robust embedderAuthorization: BearerfromGORTEX_EMBEDDINGS_API_KEY(falls back toOPENAI_API_KEYonly for*openai.com*URLs). The embedder was Ollama-oriented and keyless → OpenAI returned 401.GORTEX_EMBEDDINGS_MAX_SYMBOLSenv override for the vector-index size cap —embedding.max_symbolsconfig did not reach the indexer via the flag/env embedder path.feat(embedding,indexer,daemon)— vector persistence + warm-restart prefix fixbulkVectorSink): during a bulk indexidx.graphis the in-memory shadow (noVectorSearcher), sobuildSearchIndexskippedBulkUpsertEmbeddings— the sqlitevectorstable stayed empty and a restart had no vectors to restore. Fix: capture the disk store at the shadow swap and persist there.file_mtimesunder prefix""butpriorMtimesFromStorelooked them up under the path basename → 0 rows → every restart did a full cold re-index (and a paid re-embed). Fix: single-repo lookup under"".feat(daemon)— expose--embeddings-url/--embeddings-modelondaemon start.fix(embedding)— probe API embedder dims at startup + tolerate a/v1base URL.Tests
internal/embedding/api_test.go(auth header, no-key case, truncation, dims probe //v1),internal/indexer/vector_persist_test.go,cmd/gortex/daemon_state_test.go.go build ./...clean;embedding/indexer/serverstack/cmd/gortextest packages pass.These were developed in a downstream fork and split out here as a self-contained, generic bundle. Happy to split further or adjust per your preference.
🤖 Generated with Claude Code