Skip to content

Plugin: Batch processing using LLMs in text-summary and image-summary plugin #92

@prasannals

Description

@prasannals

We currently use Ollama (https://ollama.com/) to prompt LLMs in the text-summary and image-summary plugins. Ollama is easy to use and offers a wide variety of LLMs that can be easily downloaded. However, Ollama does not support batch processing of inputs. It always runs one prompt at a time through the LLM. Some other LLM libraries such as vLLM (https://github.com/vllm-project/vllm) supports batch processing of inputs. But the downside with vLLM is that if we need to download popular local models such as Llama 3.1, Gemma3, etc. from Huggingface, these are "gated" models and require Huggingface authentication (using huggingface-cli) to access the gated models.

The task is to both implement batch processing of prompts through an LLM (must support multi-modal models as well), which enabling us to ship the code in an easy to use manner (We can't ask users to provide huggingface login info, for example, at runtime). We also can't ship the models within the installer (thus eliminating the need for runtime downloads through Huggingface) because that would make the installer too large.

An easier task would be to implement batching using vLLM or other LLM libraries and report the performance gains achieved through batch processing vs using Ollama.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions