[None][fix] Batch addSequence with pre-claim to fix host offloading M… by liji-nv · Pull Request #12878 · NVIDIA/TensorRT-LLM

liji-nv · 2026-04-09T05:30:23Z

…NT overflow

When host offloading is enabled, onboarding a host block to GPU during addSequence can trigger eviction of other reusable host blocks from the radix tree. This causes actual KV cache reuse to be less than the scheduler estimated, leading to max_num_tokens (MNT) overflow assertions.

Add a new addSequenceBatch API that processes all first-chunk context requests in two phases:

Phase 1: Walk the radix tree and claimBlock() for all matching blocks across all requests. No onboarding, no allocation. This protects reusable blocks from eviction.
Phase 2: Onboard host blocks and allocate non-matching blocks. Since all reusable blocks are already claimed, evictions during onboarding cannot touch them.

On the Python side, replace the TOCTOU-prone revalidation loop (count_reusable_blocks + budget check) with a single batch call.

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…NT overflow When host offloading is enabled, onboarding a host block to GPU during addSequence can trigger eviction of other reusable host blocks from the radix tree. This causes actual KV cache reuse to be less than the scheduler estimated, leading to max_num_tokens (MNT) overflow assertions. Add a new addSequenceBatch API that processes all first-chunk context requests in two phases: - Phase 1: Walk the radix tree and claimBlock() for all matching blocks across all requests. No onboarding, no allocation. This protects reusable blocks from eviction. - Phase 2: Onboard host blocks and allocate non-matching blocks. Since all reusable blocks are already claimed, evictions during onboarding cannot touch them. On the Python side, replace the TOCTOU-prone revalidation loop (count_reusable_blocks + budget check) with a single batch call. Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>

…chContent (NVIDIA#12550) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

liji-nv requested a review from a team as a code owner April 9, 2026 05:30

github-actions bot assigned liji-nv Apr 9, 2026

liji-nv force-pushed the fix/batch-addsequence-mnt-overflow branch 3 times, most recently from ba22dfe to cecae98 Compare April 9, 2026 06:43

liji-nv force-pushed the fix/batch-addsequence-mnt-overflow branch from cecae98 to 9dc7da9 Compare April 10, 2026 06:11

[None][infra] Skip already-applied patches gracefully in 3rdparty Fet…

7897157

…chContent (NVIDIA#12550) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

liji-nv requested a review from a team as a code owner April 10, 2026 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Batch addSequence with pre-claim to fix host offloading M…#12878

[None][fix] Batch addSequence with pre-claim to fix host offloading M…#12878
liji-nv wants to merge 2 commits intoNVIDIA:feat/bench_yfrom
liji-nv:fix/batch-addsequence-mnt-overflow

liji-nv commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liji-nv commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liji-nv commented Apr 9, 2026 •

edited

Loading