auto-derive max_req_total_len from model config by Owleye4 · Pull Request #1297 · ModelTC/LightLLM

Owleye4 · 2026-05-08T03:23:31Z

Auto-derive max_req_total_len from model_dir/config.json at API start time.
If derivation fails, fall back to the previous default value to keep existing behavior.

gemini-code-assist

Code Review

This pull request implements automatic derivation of max_req_total_len from model configurations, replacing the previous hardcoded default. It introduces logic to handle various RoPE scaling types and ensures consistency across processes by publishing the effective, KV-cache-clamped value via shared memory. Documentation has been updated to reflect these changes. Feedback suggests using canonical paths for cached configuration lookups, defining the safety margin for token clamping as a named constant, and considering a more dynamic cap for CUDA graph capture lengths.

gemini-code-assist

Code Review

This pull request implements automatic derivation of the --max_req_total_len parameter from model configurations, such as max_sequence_length or max_position_embeddings adjusted by RoPE scaling factors. It introduces logic to soft-clamp this value against the actual KV-cache pool capacity and uses shared memory to synchronize the effective limit across different server processes. Additionally, the changes include early S3 model preparation and updates to documentation. Feedback was provided suggesting that a hardcoded margin of 8 used during KV pool clamping should be replaced with a named constant to improve code maintainability.

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread lightllm/utils/config_utils.py Outdated

Comment thread lightllm/server/router/manager.py Outdated

Comment thread lightllm/server/api_start.py Outdated

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread lightllm/server/router/manager.py Outdated

Owleye4 force-pushed the main branch 2 times, most recently from 127de7b to 698e0b7 Compare May 8, 2026 08:42

auto-derive max_req_total_len from model config

cca2ea1

Owleye4 force-pushed the main branch from 698e0b7 to cca2ea1 Compare May 8, 2026 09:00

hiworldwzj added 9 commits May 8, 2026 09:32

fix

b8ade1c

fix

0e2286a

fix

efff6c4

fix

ea87913

fix

a8d4c81

fix

25cd16a

fix

35a2fd5

fix

79851db

fix

abe835c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto-derive max_req_total_len from model config#1297

auto-derive max_req_total_len from model config#1297
Owleye4 wants to merge 10 commits intoModelTC:mainfrom
Owleye4:main

Owleye4 commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Owleye4 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Owleye4 commented May 8, 2026 •

edited

Loading