fix: support custom model architectures and restore model param wiring in search by eugenepro2 · Pull Request #79 · tirth8205/code-review-graph

eugenepro2 · 2026-03-28T09:05:17Z

Title: fix: support custom model architectures and restore model param wiring in search

Summary

Pass trust_remote_code=True in LocalEmbeddingProvider so models with custom architectures (e.g. jinaai/jina-embeddings-v2-base-code) load correctly via SentenceTransformer
Restore model parameter wiring through the search path (semantic_search_nodes -> hybrid_search -> _embedding_search -> EmbeddingStore) lost during the v2 refactoring

Problem

1. Broken embeddings with custom-architecture models

jinaai/jina-embeddings-v2-base-code uses a custom JinaBertModel with ALiBi positional embeddings. Without trust_remote_code=True, transformers falls back to standard BertModel and randomly initializes missing weights (position_embeddings, all encoder layers).

This produces garbage embeddings where all cosine similarities converge to ~0.77, making semantic search return effectively random results. The issue is silent — no errors are raised, embeddings are stored and searched, but results are meaningless.

Before fix (broken):

Query: "authentication and session management"
  0.7738  renderIndex          (unrelated)
  0.7738  escapeRegExp         (unrelated)
  0.7738  CCookies             (unrelated)

After fix:

Query: "authentication and session management"
  0.7012  getSession           ✓
  0.5719  useSession           ✓
  0.5317  AuthPage             ✓
  0.4914  TrpcAuthMiddleware   ✓

2. `model` parameter dropped during v2 refactoring

In PR #55 (v1.x), semantic_search_nodes passed model directly to EmbeddingStore:

emb_store = EmbeddingStore(db_path, model=model)  # v1.x — worked

In v2.0.0, tools.py was split into tools/ sub-modules and search.py was extracted. The model parameter is accepted by semantic_search_nodes_tool MCP wrapper but dropped before reaching EmbeddingStore:

# search.py — v2.0.0
emb_store = EmbeddingStore(store.db_path)  # model lost

This forces users to rely solely on CRG_EMBEDDING_MODEL env var.

Changes

embeddings.py — Pass trust_remote_code=True and model_kwargs={"trust_remote_code": True} to SentenceTransformer(). Safe for all models (flag is ignored when no custom code exists).
search.py — Add model parameter to _embedding_search() and hybrid_search(), forward to EmbeddingStore.
tools/query.py — Forward model from semantic_search_nodes() to hybrid_search().

Affected models

Any HuggingFace model with auto_map in its config, including:

jinaai/jina-embeddings-v2-base-code
jinaai/jina-embeddings-v2-base-en
Other models with custom architectures

Models without custom code (e.g. all-MiniLM-L6-v2, BAAI/bge-small-en-v1.5) are unaffected — the flag is silently ignored.

…ng in search

fix: support custom model architectures and restore model param wiri…

d2eff21

…ng in search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support custom model architectures and restore model param wiring in search#79

fix: support custom model architectures and restore model param wiring in search#79
eugenepro2 wants to merge 1 commit intotirth8205:mainfrom
eugenepro2:fix/trust-remote-code

eugenepro2 commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eugenepro2 commented Mar 28, 2026

Summary

Problem

1. Broken embeddings with custom-architecture models

2. model parameter dropped during v2 refactoring

Changes

Affected models

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. `model` parameter dropped during v2 refactoring