[None][fix] Fix contrained decoding for GLM5#12869
[None][fix] Fix contrained decoding for GLM5#12869cascade812 wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
📝 WalkthroughWalkthroughThe changes consolidate tokenizer alias definitions into a shared constant, implement dynamic custom tokenizer loading via configurable aliases, add chat template file support to the GLM Moe DSA tokenizer, and refactor speculative decoding components to be reset and recreated during KV-cache estimation reinitialization. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2)
720-741: Consider documenting side effects of the helper function.The
create_spec_componentsfunction modifies the outerresourcesdict as a side effect (lines 728-731). While functional, this implicit mutation pattern can be confusing for maintainers.A brief docstring clarifying that the function updates
resources[SPEC_RESOURCE_MANAGER]would improve readability.📝 Suggested documentation
def create_spec_components(max_seq_len): + """Create speculative decoding components. + + Side effect: Updates resources[SPEC_RESOURCE_MANAGER] if spec_resource_manager is created. + + Returns: + Tuple of (spec_resource_manager, drafter). + """ with allocation_scope(ExecutorMemoryType.SPEC_RESOURCES):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py` around lines 720 - 741, Add a short docstring to create_spec_components explaining that in addition to returning (spec_resource_manager, drafter) it may mutate the outer resources dict by setting or removing resources[ResourceManagerType.SPEC_RESOURCE_MANAGER]; mention the allocation_scope side effects (ExecutorMemoryType.SPEC_RESOURCES and ExecutorMemoryType.DRAFTER) so callers know the function updates external state as well as its return values.
877-880: Unused variablespec_resource_managerafter unpacking.The
spec_resource_managerreturned fromcreate_spec_componentsis not used after assignment—the function already stores it in theresourcesdict internally. Use_to indicate the value is intentionally discarded.♻️ Proposed fix
spec_resource_manager = None drafter = None gc.collect() - spec_resource_manager, drafter = create_spec_components(max_seq_len) + _, drafter = create_spec_components(max_seq_len)For consistency, apply the same pattern at line 804:
- spec_resource_manager, drafter = create_spec_components(max_seq_len) + _, drafter = create_spec_components(max_seq_len)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py` around lines 877 - 880, The unpacked return from create_spec_components currently assigns spec_resource_manager and drafter but spec_resource_manager is unused because create_spec_components already stores it in the resources dict; change the assignment to use _ for the discarded value (e.g., "_, drafter = create_spec_components(max_seq_len)") mirroring the existing pattern used elsewhere to indicate the first value is intentionally ignored and keep drafter assigned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`:
- Around line 288-302: The custom tokenizer import block (checking
llm_args.custom_tokenizer, using TOKENIZER_ALIASES, rsplit to get
module_path/class_name, and calling tokenizer_class.from_pretrained or
TransformersTokenizer.from_pretrained) lacks error handling and duplicates logic
in llm_args.py; wrap the dynamic import/attribute lookup and from_pretrained
call in a try/except that catches ImportError/AttributeError/Exception and
raises or logs a clear message including the provided tokenizer identifier and
underlying exception, or better yet move the resolution/loading logic into a
shared helper (e.g., a new function in tensorrt_llm.tokenizer.__init__ like
load_tokenizer(checkpoint_dir, tokenizer_identifier, trust_remote_code)) and
call that helper from both py_executor_creator.py and llm_args.py to centralize
error handling and avoid duplication.
---
Nitpick comments:
In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`:
- Around line 720-741: Add a short docstring to create_spec_components
explaining that in addition to returning (spec_resource_manager, drafter) it may
mutate the outer resources dict by setting or removing
resources[ResourceManagerType.SPEC_RESOURCE_MANAGER]; mention the
allocation_scope side effects (ExecutorMemoryType.SPEC_RESOURCES and
ExecutorMemoryType.DRAFTER) so callers know the function updates external state
as well as its return values.
- Around line 877-880: The unpacked return from create_spec_components currently
assigns spec_resource_manager and drafter but spec_resource_manager is unused
because create_spec_components already stores it in the resources dict; change
the assignment to use _ for the discarded value (e.g., "_, drafter =
create_spec_components(max_seq_len)") mirroring the existing pattern used
elsewhere to indicate the first value is intentionally ignored and keep drafter
assigned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 512060e0-e087-43de-9b77-1e65887eb721
📒 Files selected for processing (4)
tensorrt_llm/_torch/pyexecutor/py_executor_creator.pytensorrt_llm/llmapi/llm_args.pytensorrt_llm/tokenizer/__init__.pytensorrt_llm/tokenizer/glm_moe_dsa/tokenizer.py
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
|
/run bot |
|
/bot run |
|
PR_Github #42569 [ run ] triggered by Bot. Commit: |
|
PR_Github #42569 [ run ] completed with state
|
Description
This PR fixes error when run GLM5 with constrained decoding (guided decoding).
User needs to specify below config:
llm_kwargs["custom_tokenizer"] = 'glm_moe_dsa'Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
New Features
deepseek_v32andglm_moe_dsatokenizers.chat_template.jinjafiles.Improvements