[None][perf] Use +64 batch sizes for padding-enabled CUDA graphs#12895
[None][perf] Use +64 batch sizes for padding-enabled CUDA graphs#12895yijingl-nvidia wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
When enable_padding=True, replace the sparse powers-of-2 schedule (256, 512, 1024, 2048) with uniform +64 increments after the initial [1,2,4,8,...,128] base, giving denser coverage (192, 256, 320, …, max_batch_size) and reducing padding waste at intermediate batch sizes. Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>
Move filter/sort outside the if/else so sizes exceeding max_batch_size are dropped in the enable_padding=True branch as well. Add guard for empty list before the max_batch_size append. Add regression tests for edge cases: max_batch_size=64, 129, 320. Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>
📝 WalkthroughWalkthroughModified the batch size generation logic in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/llmapi/llm_args.py (1)
169-187:⚠️ Potential issue | 🟠 MajorClamp padding-mode batch sizes to
max_batch_sizebefore returning.On Line 181-Line 183, filtering/sorting now runs only in the
elsebranch.
Withenable_padding=True,batch_sizescan exceedmax_batch_size(e.g.,max_batch_size=64still keeps values up to 128), which can lead to unnecessary graph capture and inconsistent config behavior.🔧 Proposed fix
if enable_padding: # Start with [1, 2, 4, 8, 16, 24, ..., 128] (multiples of 8) batch_sizes = [1, 2, 4] + [i * 8 for i in range(1, 17)] # Sliding 64: extend by increments of 64 up to max_batch_size while batch_sizes[-1] + 64 <= max_batch_size: batch_sizes.append(batch_sizes[-1] + 64) else: batch_sizes = list(range(1, 32)) + [32, 64, 128] # Add powers of 2 up to max_batch_size batch_sizes += [ 2**i for i in range(8, math.ceil(math.log(max_batch_size, 2))) ] - # Filter and sort batch sizes - batch_sizes = sorted( - [size for size in batch_sizes if size <= max_batch_size]) + # Filter and sort batch sizes for both branches + batch_sizes = sorted(size for size in batch_sizes if size <= max_batch_size) # Add max_batch_size if not already included - if max_batch_size != batch_sizes[-1]: + if not batch_sizes or max_batch_size != batch_sizes[-1]: batch_sizes.append(max_batch_size)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/llmapi/llm_args.py` around lines 169 - 187, The enable_padding branch for building batch_sizes can produce values > max_batch_size; update the block that builds batch_sizes when enable_padding is True (the batch_sizes list creation) to clamp/filter values to <= max_batch_size and sort/unique them just like the else branch does, ensuring batch_sizes = sorted([s for s in batch_sizes if s <= max_batch_size]) before the final check that appends max_batch_size if missing; keep the subsequent append-of-max_batch_size logic unchanged.
🧹 Nitpick comments (1)
tensorrt_llm/llmapi/llm_args.py (1)
169-189: Add regression coverage for padding schedule edge cases.Please add tests for
enable_padding=Truewith at least:max_batch_size=64,129, and320to assert all values are<= max_batch_size, sorted, and includemax_batch_size. This will prevent regressions from branch-local filtering changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/llmapi/llm_args.py` around lines 169 - 189, Add regression tests covering the padding schedule branch where enable_padding=True by calling the batch-size generator (the function that takes enable_padding and max_batch_size and returns batch_sizes) with max_batch_size values 64, 129, and 320; for each case assert that every returned value in batch_sizes is <= max_batch_size, that batch_sizes is sorted (non-decreasing), and that max_batch_size is present in the returned list; target the branch that builds batch_sizes using the enable_padding path (referencing the enable_padding parameter, max_batch_size variable, and the returned batch_sizes) so future changes to the padding schedule are validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 169-187: The enable_padding branch for building batch_sizes can
produce values > max_batch_size; update the block that builds batch_sizes when
enable_padding is True (the batch_sizes list creation) to clamp/filter values to
<= max_batch_size and sort/unique them just like the else branch does, ensuring
batch_sizes = sorted([s for s in batch_sizes if s <= max_batch_size]) before the
final check that appends max_batch_size if missing; keep the subsequent
append-of-max_batch_size logic unchanged.
---
Nitpick comments:
In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 169-189: Add regression tests covering the padding schedule branch
where enable_padding=True by calling the batch-size generator (the function that
takes enable_padding and max_batch_size and returns batch_sizes) with
max_batch_size values 64, 129, and 320; for each case assert that every returned
value in batch_sizes is <= max_batch_size, that batch_sizes is sorted
(non-decreasing), and that max_batch_size is present in the returned list;
target the branch that builds batch_sizes using the enable_padding path
(referencing the enable_padding parameter, max_batch_size variable, and the
returned batch_sizes) so future changes to the padding schedule are validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5920502e-9591-4b1c-983a-406885dd4e30
📒 Files selected for processing (1)
tensorrt_llm/llmapi/llm_args.py
|
/bot run |
|
PR_Github #42571 [ run ] triggered by Bot. Commit: |
|
PR_Github #42571 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42656 [ run ] triggered by Bot. Commit: |
|
@yijingl-nvidia can you put a summary on the performance validation results on the PR description? Thanks. |
Description
When enable_padding=True, replace the sparse powers-of-2 schedule (256, 512, 1024, 2048) with uniform +64 increments after the initial [1,2,4,8,...,128] base, giving denser coverage (192, 256, 320, …, max_batch_size) and reducing padding waste at intermediate batch sizes.
Test Coverage
CI
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit