fix transformerblock#8583
fix transformerblock#8583yang-ze-kang wants to merge 5 commits intoProject-MONAI:devfrom yang-ze-kang:dev
Conversation
WalkthroughThe TransformerBlock class in monai/networks/blocks/transformerblock.py now conditionally initializes norm_cross_attn and cross_attn only when with_cross_attention is True. Previously, these attributes were created unconditionally. The forward method continues to gate cross-attention execution on self.with_cross_attention. As a result, cross_attn and norm_cross_attn attributes are absent when with_cross_attention is False. No other files or imports were modified. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
monai/networks/blocks/transformerblock.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/networks/blocks/transformerblock.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: build-docs
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: packaging
- GitHub Check: quick-py3 (windows-latest)
| if with_cross_attention: | ||
| self.norm_cross_attn = nn.LayerNorm(hidden_size) | ||
| self.cross_attn = CrossAttentionBlock( | ||
| hidden_size=hidden_size, | ||
| num_heads=num_heads, | ||
| dropout_rate=dropout_rate, | ||
| qkv_bias=qkv_bias, | ||
| causal=False, | ||
| use_flash_attention=use_flash_attention, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Preserve legacy checkpoints when disabling cross-attention.
Old checkpoints created with with_cross_attention=False still contain cross_attn.* and norm_cross_attn.* entries. With these attributes now missing, load_state_dict(..., strict=True) throws unexpected-key errors, so this is a breaking change. Please strip those keys during load (or otherwise ensure they’re ignored) before dropping the modules at init.
@@
return x
+
+ def _load_from_state_dict(
+ self,
+ state_dict,
+ prefix,
+ local_metadata,
+ strict,
+ missing_keys,
+ unexpected_keys,
+ error_msgs,
+ ):
+ if not self.with_cross_attention:
+ keys = [
+ key
+ for key in list(state_dict.keys())
+ if key.startswith(f"{prefix}cross_attn.") or key.startswith(f"{prefix}norm_cross_attn.")
+ ]
+ for key in keys:
+ state_dict.pop(key)
+ super()._load_from_state_dict(
+ state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if with_cross_attention: | |
| self.norm_cross_attn = nn.LayerNorm(hidden_size) | |
| self.cross_attn = CrossAttentionBlock( | |
| hidden_size=hidden_size, | |
| num_heads=num_heads, | |
| dropout_rate=dropout_rate, | |
| qkv_bias=qkv_bias, | |
| causal=False, | |
| use_flash_attention=use_flash_attention, | |
| ) | |
| return x | |
| def _load_from_state_dict( | |
| self, | |
| state_dict, | |
| prefix, | |
| local_metadata, | |
| strict, | |
| missing_keys, | |
| unexpected_keys, | |
| error_msgs, | |
| ): | |
| # Strip out any cross‐attention params when with_cross_attention=False | |
| if not self.with_cross_attention: | |
| keys = [ | |
| key | |
| for key in list(state_dict.keys()) | |
| if key.startswith(f"{prefix}cross_attn.") or key.startswith(f"{prefix}norm_cross_attn.") | |
| ] | |
| for key in keys: | |
| state_dict.pop(key) | |
| # Delegate actual loading to the parent implementation | |
| super()._load_from_state_dict( | |
| state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs | |
| ) |
🤖 Prompt for AI Agents
In monai/networks/blocks/transformerblock.py around lines 83-92, legacy
checkpoints include parameters under cross_attn.* and norm_cross_attn.* even
when with_cross_attention is False, causing unexpected-key errors; update the
class to either preserve dummy attributes or strip those keys when loading.
Implement a small fix: if you choose to drop the modules at init (keep them
absent), override load_state_dict to detect when with_cross_attention is False
and remove any keys that start with "cross_attn." or "norm_cross_attn." from the
incoming state_dict (also handle optimizer/state dict nested structures if
applicable) before delegating to the parent load_state_dict; alternatively, when
with_cross_attention is False, assign lightweight placeholders (e.g.,
nn.Identity or empty submodules) for self.cross_attn and self.norm_cross_attn so
the parameter names remain present and strict loading succeeds.
Signed-off-by: Zekang Yang <your_email@example.com>
I, yang-ze-kang <603822317@qq.com>, hereby add my Signed-off-by to this commit: 17b76d1 Signed-off-by: yang-ze-kang <603822317@qq.com>
I, yang-ze-kang <603822317@qq.com>, hereby add my Signed-off-by to this commit: 17b76d1 Signed-off-by: yang-ze-kang <603822317@qq.com>
I, yang-ze-kang <603822317@qq.com>, hereby add my Signed-off-by to this commit: 17b76d1 Signed-off-by: yang-ze-kang <603822317@qq.com>
Fixes # monai.networks.blocks.transformerblock
Description
When "with_cross_attention==False", there is no need to initialize "CrossAttentionBlock" in "init"; otherwise, it will introduce unnecessary parameters to the model and may potentially cause some errors.
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests --disttests.make htmlcommand in thedocs/folder.