Skip to content

Conversation

@Benedict-Y
Copy link

@Benedict-Y Benedict-Y commented Oct 9, 2025

PR: Fix TypeError in Online Serving + vLLM video inference parameters

Hi Kwai Team, thank you for the great work and for reviewing this PR.

Summary

  • Fix a TypeError in the Online Serving demo caused by passing an unsupported return_video_kwargs argument to process_vision_info.
  • Add recommended vLLM serve flags for stable video inference.

Changes

  • Remove return_video_kwargs=True from the demo/serving code.
  • Keep the call as:
    image_inputs, video_inputs, video_kwargs = process_vision_info(video_message)
    The function already returns video_kwargs.

Rationale

  • process_vision_info in src/keye_vl_utils/vision_process.py doesn’t accept return_video_kwargs; passing it raises:
    TypeError: process_vision_info() got an unexpected keyword argument 'return_video_kwargs'
    

vLLM Notes (to avoid freezes with long videos)

vllm serve Kwai-Keye/Keye-VL-8B-Preview \
  --tensor-parallel-size 1 \
  --enable-prefix-caching \
  --gpu-memory-utilization 0.6 \
  --host 0.0.0.0 \
  --port 8000 \
  --max-num-batched-tokens 80960 \
  --max-model-len 80960 \
  --trust-remote-code
  • --max-num-batched-tokens 80960: avoids batching issues with long sequences
  • --max-model-len 80960: supports extended context from video frames

Testing

  • Re-ran the Online Serving demo with a sample video.
  • Verified:
    • No TypeError occurs.
    • Chat completion succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant