Skip to content

Fix multi-sequence embeddings#2058

Closed
iamlemec wants to merge 1 commit into
abetlen:mainfrom
iamlemec:fix-batch-embed
Closed

Fix multi-sequence embeddings#2058
iamlemec wants to merge 1 commit into
abetlen:mainfrom
iamlemec:fix-batch-embed

Conversation

@iamlemec

Copy link
Copy Markdown
Contributor

Fixes multi-sequence (batch) embeddings by handling n_seq_max and kv_unified flags. See discussion in #2051.

@LimePencil

Copy link
Copy Markdown

@abetlen any updates yet?

@freckletonj

Copy link
Copy Markdown

confirming this is still an issue

@mlisovyi

mlisovyi commented May 5, 2026

Copy link
Copy Markdown

Shouldn't n_seq_max be also used in Llama.embed() ? One should add or p_batch == n_seq_max to the batch-evaluation condition in the loop here. Otherwise one runs into a danger of collecting a batch that will consist of more sequences as the configured maximum (if individual inputs are short or the configured n_seq_max is small) and this will also lead to the same llama_decode returned -1 error

@mlisovyi

mlisovyi commented May 5, 2026

Copy link
Copy Markdown

Also, would it make sense to expose those parameters in ModelSettings with some meiningful defaults to allow setting them in the server run?

@abetlen

abetlen commented Jun 1, 2026

Copy link
Copy Markdown
Owner

@iamlemec thank you, this should have been fixed in v0.3.23

@abetlen abetlen closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants