feat: improve offline inference interface and fix several tp and vlm bugs. by weizhehuang0827 · Pull Request #968 · jd-opensource/xllm

weizhehuang0827 · 2026-03-02T06:07:33Z

Summary

Align offline inference with vLLM-style inputs by unifying multimodal request parsing and enabling LLM to auto-route to VLM based on model metadata, while keeping the existing VLM class.

Key Changes

Introduced shared multimodal parsing utilities (mm_utils) for vLLM-style request dicts and image normalization.
LLM now infers backend from model config and routes to VLMMaster when appropriate.
VLM.generate uses shared parsing and lazily imports multimodal helpers to avoid hard PIL dependency at import time.
Exposed get_model_backend to Python via pybind and tightened backend detection to fail fast if unavailable.
Packaging updated to include the new mm_utils module.

gemini-code-assist

Code Review

This pull request significantly improves the offline inference interface by aligning it with vLLM-style inputs, enhancing usability. It introduces a unified LLM class for backend routing, new parameter classes (SamplingParams, BeamSearchParams, PoolingParams), and the mm_utils module for multimodal data, making the Python API cleaner and more intuitive. However, a high-severity Path Traversal / Local File Inclusion (LFI) vulnerability was identified in the newly added load_from_local function in xllm/core/framework/request/mm_handler.cpp. This function allows reading arbitrary files from the server's filesystem based on user-controlled input in multi-modal requests, and strict path validation or restricting file access to an allow-list of directories is recommended to mitigate this. Furthermore, while several bug fixes and robustness improvements are included, a potential race condition in vlm_master.cpp also needs to be addressed.

xllm/core/distributed_runtime/vlm_master.cpp

xllm/core/framework/request/mm_handler.cpp

xllm/pybind/errors.py

…bugs.

XuZhang99 · 2026-03-02T07:18:11Z

examples/generate_vlm.py

+            "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
+            "<|im_start|>user\n"
+            "<|vision_start|><|image_pad|><|vision_end|>"
+            "请描述这张图片。<|im_end|>\n"


use English prompt.

XuZhang99 · 2026-03-02T07:18:44Z

xllm/core/distributed_runtime/spawn_worker_server/spawn_worker_server.cpp

 bool xllm::SpawnWorkerServer::g_running_ = true;

+namespace {
+std::string backend_from_worker_type(const std::string& worker_type) {


rename to get_backend_from_worker_type

XuZhang99 · 2026-03-02T07:21:45Z

xllm/pybind/params.py

+        for key, value in kwargs.items():
+            self._set_field(key, value)
+
+    def _set_field(self, key: str, value):


add type for value

XuZhang99 · 2026-03-02T07:21:54Z

xllm/pybind/params.py

+    def __getattr__(self, key: str):
+        return getattr(self._request_params, key)
+
+    def __setattr__(self, key: str, value):


yq33victor

LGTM

weizhehuang0827 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners March 2, 2026 06:07

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

xllm/core/distributed_runtime/vlm_master.cpp Show resolved Hide resolved

xllm/core/framework/request/mm_handler.cpp Show resolved Hide resolved

XuZhang99 reviewed Mar 2, 2026

View reviewed changes

xllm/pybind/errors.py Outdated Show resolved Hide resolved

feat: improve offline inference interface and fix several tp and vlm …

961e21e

…bugs.

weizhehuang0827 force-pushed the offline_rebase_main branch from 54abb4c to 961e21e Compare March 2, 2026 07:12

XuZhang99 reviewed Mar 2, 2026

View reviewed changes

yq33victor previously approved these changes Mar 4, 2026

View reviewed changes

feat: support offline inference on cuda device.

a465061

weizhehuang0827 dismissed yq33victor’s stale review via a465061 March 4, 2026 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve offline inference interface and fix several tp and vlm bugs.#968

feat: improve offline inference interface and fix several tp and vlm bugs.#968
weizhehuang0827 wants to merge 2 commits intojd-opensource:mainfrom
weizhehuang0827:offline_rebase_main

weizhehuang0827 commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XuZhang99 Mar 2, 2026

Uh oh!

XuZhang99 Mar 2, 2026

Uh oh!

XuZhang99 Mar 2, 2026

Uh oh!

XuZhang99 Mar 2, 2026

Uh oh!

yq33victor left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

weizhehuang0827 commented Mar 2, 2026

Summary

Key Changes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XuZhang99 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

yq33victor left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants