Skip to content

fix transformerblock#8583

Closed
yang-ze-kang wants to merge 5 commits intoProject-MONAI:devfrom
yang-ze-kang:dev
Closed

fix transformerblock#8583
yang-ze-kang wants to merge 5 commits intoProject-MONAI:devfrom
yang-ze-kang:dev

Conversation

@yang-ze-kang
Copy link
Copy Markdown

Fixes # monai.networks.blocks.transformerblock

Description

When "with_cross_attention==False", there is no need to initialize "CrossAttentionBlock" in "init"; otherwise, it will introduce unnecessary parameters to the model and may potentially cause some errors.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 25, 2025

Walkthrough

The TransformerBlock class in monai/networks/blocks/transformerblock.py now conditionally initializes norm_cross_attn and cross_attn only when with_cross_attention is True. Previously, these attributes were created unconditionally. The forward method continues to gate cross-attention execution on self.with_cross_attention. As a result, cross_attn and norm_cross_attn attributes are absent when with_cross_attention is False. No other files or imports were modified.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description follows the template sections for “### Description” and “### Types of changes” but the initial “Fixes #” line incorrectly references the module path instead of a GitHub issue number, so it does not match the template format. Replace the “Fixes # monai.networks.blocks.transformerblock” line with a valid issue reference such as “Fixes #1234” or remove it if no issue is being closed to align with the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “fix transformerblock” is too generic and fails to specify that the pull request conditionally initializes the CrossAttentionBlock only when with_cross_attention is True in TransformerBlock.init. Please revise the title to highlight the specific fix, for example “Guard CrossAttentionBlock initialization with with_cross_attention flag in TransformerBlock.”
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 53382d8 and 17b76d1.

📒 Files selected for processing (1)
  • monai/networks/blocks/transformerblock.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/networks/blocks/transformerblock.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: build-docs
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (windows-latest)

Comment on lines +83 to +92
if with_cross_attention:
self.norm_cross_attn = nn.LayerNorm(hidden_size)
self.cross_attn = CrossAttentionBlock(
hidden_size=hidden_size,
num_heads=num_heads,
dropout_rate=dropout_rate,
qkv_bias=qkv_bias,
causal=False,
use_flash_attention=use_flash_attention,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Preserve legacy checkpoints when disabling cross-attention.
Old checkpoints created with with_cross_attention=False still contain cross_attn.* and norm_cross_attn.* entries. With these attributes now missing, load_state_dict(..., strict=True) throws unexpected-key errors, so this is a breaking change. Please strip those keys during load (or otherwise ensure they’re ignored) before dropping the modules at init.

@@
         return x
+
+    def _load_from_state_dict(
+        self,
+        state_dict,
+        prefix,
+        local_metadata,
+        strict,
+        missing_keys,
+        unexpected_keys,
+        error_msgs,
+    ):
+        if not self.with_cross_attention:
+            keys = [
+                key
+                for key in list(state_dict.keys())
+                if key.startswith(f"{prefix}cross_attn.") or key.startswith(f"{prefix}norm_cross_attn.")
+            ]
+            for key in keys:
+                state_dict.pop(key)
+        super()._load_from_state_dict(
+            state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if with_cross_attention:
self.norm_cross_attn = nn.LayerNorm(hidden_size)
self.cross_attn = CrossAttentionBlock(
hidden_size=hidden_size,
num_heads=num_heads,
dropout_rate=dropout_rate,
qkv_bias=qkv_bias,
causal=False,
use_flash_attention=use_flash_attention,
)
return x
def _load_from_state_dict(
self,
state_dict,
prefix,
local_metadata,
strict,
missing_keys,
unexpected_keys,
error_msgs,
):
# Strip out any cross‐attention params when with_cross_attention=False
if not self.with_cross_attention:
keys = [
key
for key in list(state_dict.keys())
if key.startswith(f"{prefix}cross_attn.") or key.startswith(f"{prefix}norm_cross_attn.")
]
for key in keys:
state_dict.pop(key)
# Delegate actual loading to the parent implementation
super()._load_from_state_dict(
state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs
)
🤖 Prompt for AI Agents
In monai/networks/blocks/transformerblock.py around lines 83-92, legacy
checkpoints include parameters under cross_attn.* and norm_cross_attn.* even
when with_cross_attention is False, causing unexpected-key errors; update the
class to either preserve dummy attributes or strip those keys when loading.
Implement a small fix: if you choose to drop the modules at init (keep them
absent), override load_state_dict to detect when with_cross_attention is False
and remove any keys that start with "cross_attn." or "norm_cross_attn." from the
incoming state_dict (also handle optimizer/state dict nested structures if
applicable) before delegating to the parent load_state_dict; alternatively, when
with_cross_attention is False, assign lightweight placeholders (e.g.,
nn.Identity or empty submodules) for self.cross_attn and self.norm_cross_attn so
the parameter names remain present and strict loading succeeds.

Signed-off-by: Zekang Yang <your_email@example.com>
I, yang-ze-kang <603822317@qq.com>, hereby add my Signed-off-by to this commit: 17b76d1

Signed-off-by: yang-ze-kang <603822317@qq.com>
I, yang-ze-kang <603822317@qq.com>, hereby add my Signed-off-by to this commit: 17b76d1

Signed-off-by: yang-ze-kang <603822317@qq.com>
I, yang-ze-kang <603822317@qq.com>, hereby add my Signed-off-by to this commit: 17b76d1

Signed-off-by: yang-ze-kang <603822317@qq.com>
@yang-ze-kang yang-ze-kang closed this by deleting the head repository Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant