-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Summary
This issue proposes a refactor of our existing VLLMModel implementation in model_logic.py, which currently wraps an OpenAI-compatible chat completion endpoint using raw HTTPX requests. The goal is to:
- Leverage the official OpenAI Python SDK for native support of OpenAI-hosted models.
- Retain
BaseModelas an abstract interface and use backend-specific subclasses (OpenAIModel,VLLMModel, etc.) to support multiple providers. - Improve maintainability, compatibility with newer OpenAI features, and allow smoother integration of other providers or local models in the future.
Motivation
- Reduced Maintenance Burden: By using the official SDK, we avoid having to manually replicate request formatting, error handling, and streaming logic.
- Better Compatibility: New features (e.g., assistants, tool use, parallel function calls) are only available through the official SDK.
- Modular Architecture: Separating provider logic into individual subclasses of
BaseModelenables easier extension and testing of alternatives (e.g., local VLLM, Ollama, LM Studio). - Standardized Behavior: Less likelihood of subtle bugs from inconsistent behavior between our wrapper and the official API.
Proposed Structure
# model_logic.py
class BaseModel(ABC):
@abstractmethod
async def call_stream(self, messages: List[Dict[str, str]], **kwargs) -> AsyncGenerator[str, None]:
...
@abstractmethod
async def close_client(self): ...
class OpenAIModel(BaseModel):
def __init__(self, model_name: str, api_key: str, ...): ...
async def call_stream(...): # use openai.AsyncOpenAI or openai.ChatCompletion.acreate(...)
async def close_client(): ...
class VLLMModel(BaseModel):
# existing behavior preserved for local or remote deployments
# refactored to be more modular if neededReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels