Skip to content

Add multimodal embedding support for VertexAI (Phase 1)#590

Open
Ndunge-Makau wants to merge 22 commits intocrmne:mainfrom
Ndunge-Makau:add-multimodal-embedding-support
Open

Add multimodal embedding support for VertexAI (Phase 1)#590
Ndunge-Makau wants to merge 22 commits intocrmne:mainfrom
Ndunge-Makau:add-multimodal-embedding-support

Conversation

@Ndunge-Makau
Copy link

@Ndunge-Makau Ndunge-Makau commented Jan 30, 2026

What this does

This PR adds support for multimodal embeddings in RubyLLM, enabling embedding images and videos alongside text for use cases like semantic image search, video content analysis, and cross-modal retrieval.

Changes:

  • Added Vertex AI's multimodalembedding model to the list of known models
  • Implemented multimodal embedding support for Vertex AI provider
  • Updated models.json (auto-generated from rake task)

Example usage:

# Create embeddings
RubyLLM.embed "Ruby is elegant and expressive"

# Embedding: #<RubyLLM::Embedding:0x00... >
# Embedding.vectors: {text: [0.214382708, -0.126103446, ... ]}


# Create multimodal embeddings (with supported models)
RubyLLM.embed "Ruby is elegant and expressive" with: ["image.png", "video.mp4"], model: "multimodalembedding"

# Embedding: #<RubyLLM::Embedding:0x00... >
# Embedding.vectors: Vector:  {text: [-0.00527974777, ...], image: [0.0258393418, ... ], video: [{"endOffsetSec" => 0, "startOffsetSec" => 16, "embedding" => [-0.0250446387, 0.0323432237, ...]}, ... }

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Required for new features

  • I opened an issue before writing code and received maintainer approval
  • Linked issue: #529

PRs for new features or enhancements without a prior approved issue will be closed.

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

AI-generated code

  • I used AI tools to help write this code
  • I have reviewed and understand all generated code (required if above is checked)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Copy link

@kaka-ruto kaka-ruto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Well done @Ndunge-Makau

modalities: Capabilities.modalities_for(model_id),
capabilities: Capabilities.capabilities_for(model_id),
pricing: Capabilities.pricing_for(model_id),
metadata: Capabilities.determine_metadata(model_id)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@crmne crmne linked an issue Feb 28, 2026 that may be closed by this pull request
6 tasks
Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

My preference is multimodal embeddings for all providers that support them, not only Vertex AI. The issue/PR scope sounded provider-wide, which is why I approved it.

If you want to ship Vertex-only, please make that explicit in both issue and PR (title + description) as "phase 1: Vertex AI only," and add follow-up issues for other providers.

Please confirm direction before I do line-by-line review:

  1. multi-provider in this PR, or
  2. explicit Vertex-only phase 1.

Also please fix:

  • potential regression in Vertex text embeddings
  • unrelated models.json churn

@Ndunge-Makau Ndunge-Makau changed the title Add multimodal embedding support Add multimodal embedding support for VertexAI (Phase 1) Mar 4, 2026
@Ndunge-Makau
Copy link
Author

Hi @crmne,

Thanks for the feedback. I'd like to confirm that this PR is explicitly for Vertex AI only as phase 1, and I'll create follow-up issues for other providers.

I've also fixed the regression issue in Vertex text embeddings. Regarding the models.json changes—these were auto-generated by the rake task after I added Vertex's multimodalembedding model to the list of known models. Let me know if you'd still want the previous version restored.

@Ndunge-Makau Ndunge-Makau requested a review from crmne March 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add multimodal embedding support (image and video)

3 participants