chore(providers): bump Gemini defaults to current GA models#370
Conversation
Bundles two upstream PRs into one chore — both are blocking real users today and both are simple default-string bumps with no API contract change. LLM default (was PR #368, @yut304) - `gemini-2.0-flash` is deprecated in Google's Gemini API and returns 429 rate-limit errors under load. Replace the default with `gemini-flash-latest`. Users on a pinned `GEMINI_MODEL` in `~/.agentmemory/.env` are unaffected. Embedding default (was PR #246, @AmmarSaleh50) - `text-embedding-004` is deprecated (shutdown Jan 14 2026). Replace with `gemini-embedding-001` (GA): 100+ languages, MRL dims (768 / 1536 / 3072), 2048-token input. - URL path changes from `:batchEmbedContent` to `:batchEmbedContents` (plural — the new model's batch endpoint). - Each request now sends `outputDimensionality: 768` so the returned vectors match the existing index dim guard from #248 — no reindex needed. - L2-normalize each returned vector before pushing to the result array. `gemini-embedding-001` does not normalize by default, unlike `text-embedding-004`. Without this the cosine-similarity math elsewhere in the search pipeline (which assumes unit-length vectors) collapses. Verified - `npm test` clean: 903 / 903. - `npm run build` clean. Closes #368, closes #246.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughSwitch Gemini embedding to ChangesGemini Provider Migration and Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/providers/embedding/gemini.ts (1)
58-65: ⚡ Quick winConsider logging or throwing on zero-norm vectors.
The function silently returns the unnormalized vector when
norm === 0(line 62). A zero-norm embedding from the API would indicate a problem upstream, but this implementation swallows it. Consider logging a warning or throwing an error to surface the issue rather than injecting an unnormalized (zero) vector into results that are expected to be unit-length for cosine similarity.🔍 Proposed enhancement
function l2Normalize(vec: Float32Array): Float32Array { let sum = 0; for (let i = 0; i < vec.length; i++) sum += vec[i]! * vec[i]!; const norm = Math.sqrt(sum); - if (norm === 0) return vec; + if (norm === 0) { + throw new Error("Cannot normalize zero-length embedding vector"); + } for (let i = 0; i < vec.length; i++) vec[i] = vec[i]! / norm; return vec; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/providers/embedding/gemini.ts` around lines 58 - 65, The l2Normalize function currently returns the original array when norm === 0, silently allowing zero-length embeddings; update l2Normalize to surface this upstream error by either throwing a descriptive Error (e.g., "zero-norm embedding returned from upstream") or logging a warning with context before failing, and ensure callers can handle the exception; refer to the function name l2Normalize and modify its norm === 0 branch to throw or log (and return a safe fallback only if explicitly wanted), including details such as the embedding length or source to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/config.ts`:
- Line 79: The default model string used for the config key "model" (fallback
when env["GEMINI_MODEL"] is unset) should not use the auto-updating alias
"gemini-flash-latest"; change the fallback to a stable pinned identifier such as
"gemini-2.5-flash" so production behavior is deterministic, i.e., update the
expression that sets model (the `model: env["GEMINI_MODEL"] ||
"gemini-flash-latest",` assignment) to use a stable model name as the default.
In `@src/providers/embedding/gemini.ts`:
- Around line 32-36: The requests payload is using camelCase key
outputDimensionality which Gemini expects as snake_case output_dimensionality;
update the chunk.map(...) object so the property is output_dimensionality:
this.dimensions (instead of outputDimensionality) wherever you build requests
for MODEL in the embedding/gemini provider, ensuring any other occurrences of
outputDimensionality are renamed to output_dimensionality so the API receives
the intended 768-dimension vector setting.
---
Nitpick comments:
In `@src/providers/embedding/gemini.ts`:
- Around line 58-65: The l2Normalize function currently returns the original
array when norm === 0, silently allowing zero-length embeddings; update
l2Normalize to surface this upstream error by either throwing a descriptive
Error (e.g., "zero-norm embedding returned from upstream") or logging a warning
with context before failing, and ensure callers can handle the exception; refer
to the function name l2Normalize and modify its norm === 0 branch to throw or
log (and return a safe fallback only if explicitly wanted), including details
such as the embedding length or source to aid debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 69176075-be9c-4340-a799-ed5ea3ec84c2
📒 Files selected for processing (3)
README.mdsrc/config.tssrc/providers/embedding/gemini.ts
| requests: chunk.map((t) => ({ | ||
| model: "models/text-embedding-004", | ||
| model: MODEL, | ||
| content: { parts: [{ text: t }] }, | ||
| outputDimensionality: this.dimensions, | ||
| })), |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini gemini-embedding-001 outputDimensionality parameter reduce dimensions
💡 Result:
For the Gemini embedding model gemini-embedding-001, the parameter you use to reduce the embedding dimensionality (i.e., the “outputDimensionality/output_dimensionality” behavior) is output_dimensionality / outputDimensionality. 1) What it does - By default, gemini-embedding-001 returns 3072-dimensional embeddings. [1] - You can pass output_dimensionality (or outputDimensionality) to control the size of the output embedding vector; smaller values truncate the embedding to the requested size (to save storage and improve downstream efficiency). [1][2] 2) Parameter name(s) by API surface - Gemini API (ai.google.dev docs): use output_dimensionality in the embed content request. [3] - Vertex AI Generative AI API (cloud.google.com docs): use output_dimensionality in EmbedContentConfig (shown as output_dimensionality=... in examples). [1] - Vertex AI model reference also describes outputDimensionality as an optional int used to specify output embedding size and truncates if set. [2] 3) Practical example (what to change) - Request without dimensionality: defaults to 3072 dims. [1] - Request with dimensionality: set output_dimensionality/outputDimensionality to e.g. 768, 1536, or 3072 (Google recommends 768/1536/3072 for quality/storage tradeoffs). [3][4] 4) Common gotcha - Ensure your client/library actually sends the parameter; otherwise you’ll get the default 3072-dimension vectors, which can break downstream vector databases expecting the reduced dimension. [5] So, to “reduce dimensions” for gemini-embedding-001, set output_dimensionality (Gemini API) or outputDimensionality/output_dimensionality (Vertex AI) in the embedContent request config; e.g., 768 instead of the default 3072. [1][2][3]
Citations:
- 1: https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings#:~:text=For%20gemini%2Dembedding%2D001%20%2C,of%20the%20output%20embedding%20vector.
- 2: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api
- 3: https://ai.google.dev/gemini-api/docs/embeddings
- 4: https://developers.googleblog.com/gemini-embedding-available-gemini-api/
- 5: Google Gemini embedding model outputDimensionality not working vercel/ai#8033
🏁 Script executed:
cat -n src/providers/embedding/gemini.ts | head -50Repository: rohitg00/agentmemory
Length of output: 2110
Change outputDimensionality to output_dimensionality (snake_case).
The code targets the Gemini API (ai.google.dev), which expects the parameter as output_dimensionality in snake_case, not outputDimensionality in camelCase (which is the Vertex AI convention). Using the wrong parameter name will cause the API to ignore it and return default 3072-dimensional vectors instead of the intended 768 dimensions, breaking compatibility with existing indexes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/providers/embedding/gemini.ts` around lines 32 - 36, The requests payload
is using camelCase key outputDimensionality which Gemini expects as snake_case
output_dimensionality; update the chunk.map(...) object so the property is
output_dimensionality: this.dimensions (instead of outputDimensionality)
wherever you build requests for MODEL in the embedding/gemini provider, ensuring
any other occurrences of outputDimensionality are renamed to
output_dimensionality so the API receives the intended 768-dimension vector
setting.
…norm Addresses CodeRabbit findings on PR #370. 1. Pin Gemini LLM default to gemini-2.5-flash. `gemini-flash-latest` is a moving alias that points to whatever Google promotes next. Production behaviour should be deterministic from a release perspective — users who upgrade agentmemory should not also get a Gemini model rotation in the same step. Switch the default to the current stable GA model `gemini-2.5-flash`. Users who want the moving alias keep getting it via `GEMINI_MODEL=gemini-flash-latest` in `~/.agentmemory/.env`. 2. Warn-once on zero-norm embedding in l2Normalize. `gemini-embedding-001` can return a zero-norm vector for degenerate input. The previous code silently returned the zero vector — downstream cosine-similarity math then divides by zero and the call site sees `NaN` scores with no signal as to why. Emit a one-time stderr warning naming the model + vector length so operators can correlate index quality dips with upstream embedding regressions. Behaviour otherwise unchanged: return the zero vector and let BM25 carry the search signal. Throwing was the other option — rejected because a single bad embedding in a 100-item batch would abort the whole batch and surface as an indexing pipeline halt. Soft-fail + warn matches the rest of the embedding provider error handling. Skipped finding: - `outputDimensionality` → `output_dimensionality` snake_case rename. CodeRabbit asserts the REST API expects snake_case. The Gemini REST API actually uses camelCase on the wire — confirmed against ai.google.dev/api/embeddings (field labelled `outputDimensionality` in the REST schema; the Python SDK alone uses snake_case and translates internally). Current code is correct as-shipped; the snake_case rename would silently break the dim override. Verified: 903 / 903 tests pass; build clean.
Summary
Bundles two upstream PRs into one chore — both block real users today and both are default-string bumps with zero API-contract change.
LLM default
gemini-2.0-flashis deprecated in Google's Gemini API and returns 429 rate-limit errors under load. Default switches togemini-flash-latest.Users on a pinned
GEMINI_MODELin~/.agentmemory/.envare unaffected — defaults only.Embedding default
text-embedding-004is deprecated (shutdown Jan 14 2026). Default switches togemini-embedding-001(GA): 100+ languages, MRL dims (768 / 1536 / 3072), 2048-token input.Three implementation details that go with the model swap:
:batchEmbedContent→:batchEmbedContents(plural; the new model's batch endpoint).outputDimensionality: 768— sent on every request so returned vectors matchGeminiEmbeddingProvider.dimensions = 768and the index-restore dim guard from PR fix(embedding): guard provider responses against dimension mismatches #248 — no reindex needed for existing users.text-embedding-004,gemini-embedding-001does not normalize by default — without this the cosine-similarity math elsewhere in the search pipeline (which assumes unit-length vectors) silently collapses recall.Closes
Closes #368 — @yut304's bump
Closes #246 — @AmmarSaleh50's bump
Test plan
npm testpasses — 903 / 903.npm run buildclean.GEMINI_API_KEY=...set,npx agentmemory doctorreports provider =llm, model =gemini-flash-latest.Summary by CodeRabbit
Documentation
New Features