Skip to content

chore(providers): bump Gemini defaults to current GA models#370

Merged
rohitg00 merged 2 commits into
mainfrom
chore/gemini-defaults-bump
May 14, 2026
Merged

chore(providers): bump Gemini defaults to current GA models#370
rohitg00 merged 2 commits into
mainfrom
chore/gemini-defaults-bump

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented May 14, 2026

Summary

Bundles two upstream PRs into one chore — both block real users today and both are default-string bumps with zero API-contract change.

Source PR Author Surface
#368 @yut304 Gemini LLM model default
#246 @AmmarSaleh50 Gemini embedding model + L2 norm + dim plumbing

LLM default

gemini-2.0-flash is deprecated in Google's Gemini API and returns 429 rate-limit errors under load. Default switches to gemini-flash-latest.

Users on a pinned GEMINI_MODEL in ~/.agentmemory/.env are unaffected — defaults only.

Embedding default

text-embedding-004 is deprecated (shutdown Jan 14 2026). Default switches to gemini-embedding-001 (GA): 100+ languages, MRL dims (768 / 1536 / 3072), 2048-token input.

Three implementation details that go with the model swap:

  1. URL path:batchEmbedContent:batchEmbedContents (plural; the new model's batch endpoint).
  2. outputDimensionality: 768 — sent on every request so returned vectors match GeminiEmbeddingProvider.dimensions = 768 and the index-restore dim guard from PR fix(embedding): guard provider responses against dimension mismatches #248 — no reindex needed for existing users.
  3. L2 normalize the returned vectors before pushing them onto the result array. Unlike text-embedding-004, gemini-embedding-001 does not normalize by default — without this the cosine-similarity math elsewhere in the search pipeline (which assumes unit-length vectors) silently collapses recall.

Closes

Closes #368@yut304's bump
Closes #246@AmmarSaleh50's bump

Test plan

  • npm test passes — 903 / 903.
  • npm run build clean.
  • Live smoke: GEMINI_API_KEY=... set, npx agentmemory doctor reports provider = llm, model = gemini-flash-latest.
  • Live smoke: BM25 + vector smart-search round-trip returns expected hits (vector cosine math doesn't collapse).

Summary by CodeRabbit

  • Documentation

    • Updated Gemini embedding provider docs with the latest model specs, language support, dimensional options, and deprecation note for the prior model.
  • New Features

    • Provider now uses the latest Gemini embedding model and offers expanded language & dimension configuration.
    • Embedding outputs are now L2-normalized.
    • Default model fallback updated to the newer Gemini release.

Review Change Stack

Bundles two upstream PRs into one chore — both are blocking real users
today and both are simple default-string bumps with no API contract
change.

LLM default (was PR #368, @yut304)
- `gemini-2.0-flash` is deprecated in Google's Gemini API and returns
  429 rate-limit errors under load. Replace the default with
  `gemini-flash-latest`. Users on a pinned `GEMINI_MODEL` in
  `~/.agentmemory/.env` are unaffected.

Embedding default (was PR #246, @AmmarSaleh50)
- `text-embedding-004` is deprecated (shutdown Jan 14 2026). Replace
  with `gemini-embedding-001` (GA): 100+ languages, MRL dims
  (768 / 1536 / 3072), 2048-token input.
- URL path changes from `:batchEmbedContent` to `:batchEmbedContents`
  (plural — the new model's batch endpoint).
- Each request now sends `outputDimensionality: 768` so the returned
  vectors match the existing index dim guard from #248 — no
  reindex needed.
- L2-normalize each returned vector before pushing to the result
  array. `gemini-embedding-001` does not normalize by default,
  unlike `text-embedding-004`. Without this the cosine-similarity
  math elsewhere in the search pipeline (which assumes unit-length
  vectors) collapses.

Verified
- `npm test` clean: 903 / 903.
- `npm run build` clean.

Closes #368, closes #246.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentmemory Ready Ready Preview, Comment May 14, 2026 0:31am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 371d90a3-2557-4a06-915e-8543bb78a88b

📥 Commits

Reviewing files that changed from the base of the PR and between 6420ffe and 255105b.

📒 Files selected for processing (2)
  • src/config.ts
  • src/providers/embedding/gemini.ts
✅ Files skipped from review due to trivial changes (1)
  • src/config.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/providers/embedding/gemini.ts

📝 Walkthrough

Walkthrough

Switch Gemini embedding to gemini-embedding-001 (API endpoint change), pass outputDimensionality, L2-normalize returned vectors (with zero-norm warning), update default Gemini model fallback to gemini-2.5-flash, and update the README embedding providers table.

Changes

Gemini Provider Migration and Configuration

Layer / File(s) Summary
Model constant and API base
src/providers/embedding/gemini.ts
Adds MODEL constant and rebuilds API_BASE to target models/gemini-embedding-001 instead of the old text-embedding endpoint.
Batch request payload update
src/providers/embedding/gemini.ts
Batch embed requests now include model: MODEL and outputDimensionality: this.dimensions in the payload.
L2 normalization of returned embeddings
src/providers/embedding/gemini.ts
Returned embedding vectors are L2-normalized in-place; zero-norm vectors are left unchanged and cause a one-time warning to be written to process.stderr.
Default Gemini model fallback
src/config.ts
detectProvider() now falls back to gemini-2.5-flash when GEMINI_MODEL is not set (replaces gemini-2.0-flash).
Documentation update
README.md
Embedding providers table updated to reference gemini-embedding-001 and replace the previous text-embedding-004 entry with deprecation/shutdown info.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • rohitg00/agentmemory#248: Adds dimension-guarding/validation for provider-returned embedding lengths, which relates to the outputDimensionality and normalization changes in this PR.

Poem

A bunny hops where vectors bloom,
Models swapped to chase the gloom.
I normalize each tiny thread,
Zero norms get a quiet dread.
🐰🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main objective of the PR: updating Gemini LLM and embedding provider defaults to current GA models (gemini-2.5-flash and gemini-embedding-001).
Linked Issues check ✅ Passed All code changes directly address issue #368 (LLM default update to gemini-2.5-flash) and #246 (embedding migration to gemini-embedding-001 with L2-normalization and outputDimensionality=768).
Out of Scope Changes check ✅ Passed All changes are tightly scoped to the two linked issues: LLM default bump, embedding provider migration, L2-normalization, and README documentation updates. No unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/gemini-defaults-bump

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/providers/embedding/gemini.ts (1)

58-65: ⚡ Quick win

Consider logging or throwing on zero-norm vectors.

The function silently returns the unnormalized vector when norm === 0 (line 62). A zero-norm embedding from the API would indicate a problem upstream, but this implementation swallows it. Consider logging a warning or throwing an error to surface the issue rather than injecting an unnormalized (zero) vector into results that are expected to be unit-length for cosine similarity.

🔍 Proposed enhancement
 function l2Normalize(vec: Float32Array): Float32Array {
   let sum = 0;
   for (let i = 0; i < vec.length; i++) sum += vec[i]! * vec[i]!;
   const norm = Math.sqrt(sum);
-  if (norm === 0) return vec;
+  if (norm === 0) {
+    throw new Error("Cannot normalize zero-length embedding vector");
+  }
   for (let i = 0; i < vec.length; i++) vec[i] = vec[i]! / norm;
   return vec;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/gemini.ts` around lines 58 - 65, The l2Normalize
function currently returns the original array when norm === 0, silently allowing
zero-length embeddings; update l2Normalize to surface this upstream error by
either throwing a descriptive Error (e.g., "zero-norm embedding returned from
upstream") or logging a warning with context before failing, and ensure callers
can handle the exception; refer to the function name l2Normalize and modify its
norm === 0 branch to throw or log (and return a safe fallback only if explicitly
wanted), including details such as the embedding length or source to aid
debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/config.ts`:
- Line 79: The default model string used for the config key "model" (fallback
when env["GEMINI_MODEL"] is unset) should not use the auto-updating alias
"gemini-flash-latest"; change the fallback to a stable pinned identifier such as
"gemini-2.5-flash" so production behavior is deterministic, i.e., update the
expression that sets model (the `model: env["GEMINI_MODEL"] ||
"gemini-flash-latest",` assignment) to use a stable model name as the default.

In `@src/providers/embedding/gemini.ts`:
- Around line 32-36: The requests payload is using camelCase key
outputDimensionality which Gemini expects as snake_case output_dimensionality;
update the chunk.map(...) object so the property is output_dimensionality:
this.dimensions (instead of outputDimensionality) wherever you build requests
for MODEL in the embedding/gemini provider, ensuring any other occurrences of
outputDimensionality are renamed to output_dimensionality so the API receives
the intended 768-dimension vector setting.

---

Nitpick comments:
In `@src/providers/embedding/gemini.ts`:
- Around line 58-65: The l2Normalize function currently returns the original
array when norm === 0, silently allowing zero-length embeddings; update
l2Normalize to surface this upstream error by either throwing a descriptive
Error (e.g., "zero-norm embedding returned from upstream") or logging a warning
with context before failing, and ensure callers can handle the exception; refer
to the function name l2Normalize and modify its norm === 0 branch to throw or
log (and return a safe fallback only if explicitly wanted), including details
such as the embedding length or source to aid debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69176075-be9c-4340-a799-ed5ea3ec84c2

📥 Commits

Reviewing files that changed from the base of the PR and between a9c3a59 and 6420ffe.

📒 Files selected for processing (3)
  • README.md
  • src/config.ts
  • src/providers/embedding/gemini.ts

Comment thread src/config.ts Outdated
Comment on lines 32 to 36
requests: chunk.map((t) => ({
model: "models/text-embedding-004",
model: MODEL,
content: { parts: [{ text: t }] },
outputDimensionality: this.dimensions,
})),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Gemini gemini-embedding-001 outputDimensionality parameter reduce dimensions

💡 Result:

For the Gemini embedding model gemini-embedding-001, the parameter you use to reduce the embedding dimensionality (i.e., the “outputDimensionality/output_dimensionality” behavior) is output_dimensionality / outputDimensionality. 1) What it does - By default, gemini-embedding-001 returns 3072-dimensional embeddings. [1] - You can pass output_dimensionality (or outputDimensionality) to control the size of the output embedding vector; smaller values truncate the embedding to the requested size (to save storage and improve downstream efficiency). [1][2] 2) Parameter name(s) by API surface - Gemini API (ai.google.dev docs): use output_dimensionality in the embed content request. [3] - Vertex AI Generative AI API (cloud.google.com docs): use output_dimensionality in EmbedContentConfig (shown as output_dimensionality=... in examples). [1] - Vertex AI model reference also describes outputDimensionality as an optional int used to specify output embedding size and truncates if set. [2] 3) Practical example (what to change) - Request without dimensionality: defaults to 3072 dims. [1] - Request with dimensionality: set output_dimensionality/outputDimensionality to e.g. 768, 1536, or 3072 (Google recommends 768/1536/3072 for quality/storage tradeoffs). [3][4] 4) Common gotcha - Ensure your client/library actually sends the parameter; otherwise you’ll get the default 3072-dimension vectors, which can break downstream vector databases expecting the reduced dimension. [5] So, to “reduce dimensions” for gemini-embedding-001, set output_dimensionality (Gemini API) or outputDimensionality/output_dimensionality (Vertex AI) in the embedContent request config; e.g., 768 instead of the default 3072. [1][2][3]

Citations:


🏁 Script executed:

cat -n src/providers/embedding/gemini.ts | head -50

Repository: rohitg00/agentmemory

Length of output: 2110


Change outputDimensionality to output_dimensionality (snake_case).

The code targets the Gemini API (ai.google.dev), which expects the parameter as output_dimensionality in snake_case, not outputDimensionality in camelCase (which is the Vertex AI convention). Using the wrong parameter name will cause the API to ignore it and return default 3072-dimensional vectors instead of the intended 768 dimensions, breaking compatibility with existing indexes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/gemini.ts` around lines 32 - 36, The requests payload
is using camelCase key outputDimensionality which Gemini expects as snake_case
output_dimensionality; update the chunk.map(...) object so the property is
output_dimensionality: this.dimensions (instead of outputDimensionality)
wherever you build requests for MODEL in the embedding/gemini provider, ensuring
any other occurrences of outputDimensionality are renamed to
output_dimensionality so the API receives the intended 768-dimension vector
setting.

…norm

Addresses CodeRabbit findings on PR #370.

1. Pin Gemini LLM default to gemini-2.5-flash.

   `gemini-flash-latest` is a moving alias that points to whatever
   Google promotes next. Production behaviour should be deterministic
   from a release perspective — users who upgrade agentmemory should
   not also get a Gemini model rotation in the same step. Switch the
   default to the current stable GA model `gemini-2.5-flash`.
   Users who want the moving alias keep getting it via
   `GEMINI_MODEL=gemini-flash-latest` in `~/.agentmemory/.env`.

2. Warn-once on zero-norm embedding in l2Normalize.

   `gemini-embedding-001` can return a zero-norm vector for
   degenerate input. The previous code silently returned the zero
   vector — downstream cosine-similarity math then divides by zero
   and the call site sees `NaN` scores with no signal as to why.

   Emit a one-time stderr warning naming the model + vector length
   so operators can correlate index quality dips with upstream
   embedding regressions. Behaviour otherwise unchanged: return the
   zero vector and let BM25 carry the search signal.

   Throwing was the other option — rejected because a single bad
   embedding in a 100-item batch would abort the whole batch and
   surface as an indexing pipeline halt. Soft-fail + warn matches
   the rest of the embedding provider error handling.

Skipped finding:

- `outputDimensionality` → `output_dimensionality` snake_case rename.
  CodeRabbit asserts the REST API expects snake_case. The Gemini
  REST API actually uses camelCase on the wire — confirmed against
  ai.google.dev/api/embeddings (field labelled
  `outputDimensionality` in the REST schema; the Python SDK alone
  uses snake_case and translates internally). Current code is
  correct as-shipped; the snake_case rename would silently break
  the dim override.

Verified: 903 / 903 tests pass; build clean.
@rohitg00 rohitg00 merged commit 4b354b7 into main May 14, 2026
5 checks passed
@rohitg00 rohitg00 deleted the chore/gemini-defaults-bump branch May 14, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant