feat: improve offline inference interface and fix several tp and vlm bugs.#968
feat: improve offline inference interface and fix several tp and vlm bugs.#968weizhehuang0827 wants to merge 2 commits intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly improves the offline inference interface by aligning it with vLLM-style inputs, enhancing usability. It introduces a unified LLM class for backend routing, new parameter classes (SamplingParams, BeamSearchParams, PoolingParams), and the mm_utils module for multimodal data, making the Python API cleaner and more intuitive. However, a high-severity Path Traversal / Local File Inclusion (LFI) vulnerability was identified in the newly added load_from_local function in xllm/core/framework/request/mm_handler.cpp. This function allows reading arbitrary files from the server's filesystem based on user-controlled input in multi-modal requests, and strict path validation or restricting file access to an allow-list of directories is recommended to mitigate this. Furthermore, while several bug fixes and robustness improvements are included, a potential race condition in vlm_master.cpp also needs to be addressed.
54abb4c to
961e21e
Compare
| "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n" | ||
| "<|im_start|>user\n" | ||
| "<|vision_start|><|image_pad|><|vision_end|>" | ||
| "请描述这张图片。<|im_end|>\n" |
| bool xllm::SpawnWorkerServer::g_running_ = true; | ||
|
|
||
| namespace { | ||
| std::string backend_from_worker_type(const std::string& worker_type) { |
There was a problem hiding this comment.
rename to get_backend_from_worker_type
| for key, value in kwargs.items(): | ||
| self._set_field(key, value) | ||
|
|
||
| def _set_field(self, key: str, value): |
| def __getattr__(self, key: str): | ||
| return getattr(self._request_params, key) | ||
|
|
||
| def __setattr__(self, key: str, value): |
Summary
Align offline inference with vLLM-style inputs by unifying multimodal request parsing and enabling
LLMto auto-route to VLM based on model metadata, while keeping the existingVLMclass.Key Changes
mm_utils) for vLLM-style request dicts and image normalization.LLMnow infers backend from model config and routes toVLMMasterwhen appropriate.VLM.generateuses shared parsing and lazily imports multimodal helpers to avoid hard PIL dependency at import time.get_model_backendto Python via pybind and tightened backend detection to fail fast if unavailable.mm_utilsmodule.