Skip to content

fix: stabilize hunyuan pglite embeddings and retrieval#765

Open
313094319-sudo wants to merge 1 commit intogarrytan:masterfrom
313094319-sudo:wuyun/hunyuan-pglite-fix
Open

fix: stabilize hunyuan pglite embeddings and retrieval#765
313094319-sudo wants to merge 1 commit intogarrytan:masterfrom
313094319-sudo:wuyun/hunyuan-pglite-fix

Conversation

@313094319-sudo
Copy link
Copy Markdown

@313094319-sudo 313094319-sudo commented May 9, 2026

  • support custom embedding base URL, model, and dimensions from config/env
  • use raw HTTP embeddings for non-OpenAI compatible endpoints
  • add CJK-aware PGLite keyword fallback for Chinese retrieval
  • align chunk metadata and tests with dynamic embedding dimensions
  • document the verified local Hunyuan + PGLite recovery and validation flow

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

- support custom embedding base URL, model, and dimensions from config/env
- use raw HTTP embeddings for non-OpenAI compatible endpoints
- add CJK-aware PGLite keyword fallback for Chinese retrieval
- align chunk metadata and tests with dynamic embedding dimensions
- document the verified local Hunyuan + PGLite recovery and validation flow
@313094319-sudo 313094319-sudo force-pushed the wuyun/hunyuan-pglite-fix branch from ee0289b to 99bf659 Compare May 9, 2026 05:57
garrytan added a commit that referenced this pull request May 10, 2026
…#121)

Two small ergonomics fixes folded together (#765 deferred — see TODOS.md
follow-up; the CJK PGLite extraction was bigger than the plan estimated).

#779 reworked (alexandreroumieu-codeapprentice): silence the
missing-max_batch_tokens startup warning for recipes with genuinely
dynamic batch capacity. New `EmbeddingTouchpoint.no_batch_cap?: true`
field. Set on ollama (capacity depends on locally loaded model +
OLLAMA_NUM_PARALLEL), litellm-proxy (depends on backend), llama-server
(set by --ctx-size at server launch). Three less stderr warnings on
every gateway configure; google still warns (it's a real fixed-cap
provider that ought to ship a max_batch_tokens declaration).

Bonus: litellm-proxy now declares `user_provided_models: true`, removing
the last consumer of the legacy `recipe.id === 'litellm'` hardcode in
gateway.ts:223 (D8=A wire-through completion).

#121 reworked (vinsew): self-contained API keys. Two parts:

  1. config.ts: ANTHROPIC_API_KEY env merge was silently missing.
     loadConfig() merged OPENAI_API_KEY but not ANTHROPIC_API_KEY into
     the file-config-shape result. One-line addition.

  2. cli.ts:buildGatewayConfig: when ~/.gbrain/config.json declares
     openai_api_key / anthropic_api_key but the process env doesn't
     have those env vars set (common for launchd-spawned daemons,
     agent subprocess tools, containers that don't propagate
     ~/.zshrc), fold the config-file values into the gateway env
     snapshot. Process env still wins (loaded last) so per-process
     overrides keep working.

Tests (4 cases in test/ai/no-batch-cap-suppression.test.ts):
- Ollama / LiteLLM / llama-server all declare no_batch_cap: true
- configureGateway does NOT warn for those three
- configureGateway STILL warns for google (regression guard)
- Cross-cutting invariant: empty-models recipes declare user_provided_models

Tests: bun test test/ai/ — 128/128 (4 new + 124 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 9 of 11).
#765 (Hunyuan PGLite + CJK keyword fallback) deferred to TODOS.md
follow-up; the CJK extraction (~150 lines + scoring logic + tests) is
larger than the wave's adjacent-fix lane should carry. Closes that PR
with a deferral note.

Co-Authored-By: alexandreroumieu-codeapprentice <noreply@github.com>
Co-Authored-By: vinsew <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant