Skip to content

feat: improve offline inference interface and fix several tp and vlm bugs.#968

Open
weizhehuang0827 wants to merge 2 commits intojd-opensource:mainfrom
weizhehuang0827:offline_rebase_main
Open

feat: improve offline inference interface and fix several tp and vlm bugs.#968
weizhehuang0827 wants to merge 2 commits intojd-opensource:mainfrom
weizhehuang0827:offline_rebase_main

Conversation

@weizhehuang0827
Copy link
Collaborator

Summary

Align offline inference with vLLM-style inputs by unifying multimodal request parsing and enabling LLM to auto-route to VLM based on model metadata, while keeping the existing VLM class.

Key Changes

  • Introduced shared multimodal parsing utilities (mm_utils) for vLLM-style request dicts and image normalization.
  • LLM now infers backend from model config and routes to VLMMaster when appropriate.
  • VLM.generate uses shared parsing and lazily imports multimodal helpers to avoid hard PIL dependency at import time.
  • Exposed get_model_backend to Python via pybind and tightened backend detection to fail fast if unavailable.
  • Packaging updated to include the new mm_utils module.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the offline inference interface by aligning it with vLLM-style inputs, enhancing usability. It introduces a unified LLM class for backend routing, new parameter classes (SamplingParams, BeamSearchParams, PoolingParams), and the mm_utils module for multimodal data, making the Python API cleaner and more intuitive. However, a high-severity Path Traversal / Local File Inclusion (LFI) vulnerability was identified in the newly added load_from_local function in xllm/core/framework/request/mm_handler.cpp. This function allows reading arbitrary files from the server's filesystem based on user-controlled input in multi-modal requests, and strict path validation or restricting file access to an allow-list of directories is recommended to mitigate this. Furthermore, while several bug fixes and robustness improvements are included, a potential race condition in vlm_master.cpp also needs to be addressed.

"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
"<|im_start|>user\n"
"<|vision_start|><|image_pad|><|vision_end|>"
"请描述这张图片。<|im_end|>\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use English prompt.

bool xllm::SpawnWorkerServer::g_running_ = true;

namespace {
std::string backend_from_worker_type(const std::string& worker_type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to get_backend_from_worker_type

for key, value in kwargs.items():
self._set_field(key, value)

def _set_field(self, key: str, value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add type for value

def __getattr__(self, key: str):
return getattr(self._request_params, key)

def __setattr__(self, key: str, value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

yq33victor
yq33victor previously approved these changes Mar 4, 2026
Copy link
Collaborator

@yq33victor yq33victor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants