Plugin: Batch processing using LLMs in text-summary and image-summary plugin

We currently use Ollama (https://ollama.com/) to prompt LLMs in the text-summary and image-summary plugins. Ollama is easy to use and offers a wide variety of LLMs that can be easily downloaded. However, Ollama does not support batch processing of inputs. It always runs one prompt at a time through the LLM. Some other LLM libraries such as vLLM (https://github.com/vllm-project/vllm) supports batch processing of inputs. But the downside with vLLM is that if we need to download popular local models such as Llama 3.1, Gemma3, etc. from Huggingface, these are "gated" models and require Huggingface authentication (using huggingface-cli) to access the gated models.

The task is to both implement batch processing of prompts through an LLM (must support multi-modal models as well), which enabling us to ship the code in an easy to use manner (We can't ask users to provide huggingface login info, for example, at runtime). We also can't ship the models within the installer (thus eliminating the need for runtime downloads through Huggingface) because that would make the installer too large.

An easier task would be to implement batching using vLLM or other LLM libraries and report the performance gains achieved through batch processing vs using Ollama.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin: Batch processing using LLMs in text-summary and image-summary plugin #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plugin: Batch processing using LLMs in text-summary and image-summary plugin #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions