Skip to content

Add EmbeddingCollapseMetric: detect representational collapse in medi…#8815

Open
ekansh-arora0 wants to merge 1 commit intoProject-MONAI:devfrom
ekansh-arora0:feature/embedding-collapse-metric
Open

Add EmbeddingCollapseMetric: detect representational collapse in medi…#8815
ekansh-arora0 wants to merge 1 commit intoProject-MONAI:devfrom
ekansh-arora0:feature/embedding-collapse-metric

Conversation

@ekansh-arora0
Copy link
Copy Markdown

…cal imaging embeddings

Closes #8808

Fixes # .

Description

A few sentences describing the changes proposed in this pull request.

Adds EmbeddingCollapseMetric for detecting representational collapse in embedding spaces — when class centroids converge, effective dimensionality drops, or domains become indistinguishable, even while AUROC/Dice look fine.

Computes five indicators (all [0, 1], higher = more collapsed):

centroid_similarity — L2-normalised class centroid cosine similarity
effective_rank_score — SVD effective rank (Roy & Vetterli, 2007)
per_class_rank_ — per-class effective rank (asymmetric collapse)
domain_shift — linear CKA between domains (Kornblith et al., 2019)
separation — silhouette inter-class separation (sklearn optional)

Follows FIDMetric/MMDMetric pattern: inherits Metric, tensor-in/tensor-out, torch-only core. Validated on pathology embeddings

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Copilot AI review requested due to automatic review settings April 9, 2026 23:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds monai/metrics/embedding_collapse.py with EmbeddingCollapseMetric, compute_embedding_collapse, and linear_probe_accuracy. The module computes multiple scalar collapse indicators (effective-rank via SVD, centroid cosine similarity, per-class ranks, silhouette separation, cross-domain domain_shift) with optional labels and optional target embeddings; indicators return torch scalar or None when inapplicable. Validates inputs, supports reductions "max", "mean", and "none", validates include_indicators, and aggregates available indicators when reduction != "none". Exports EmbeddingCollapseMetric and compute_embedding_collapse from monai/metrics/init.py. Adds comprehensive tests in tests/test_embedding_collapse.py.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive Title is truncated/incomplete, cutting off mid-word ('medi…'), though the full intent appears related to the main change. Complete the title to clearly state the full feature name and purpose, e.g., 'Add EmbeddingCollapseMetric: detect representational collapse in medical imaging embeddings'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed Description covers the main feature, objectives, and change types. Includes relevant references and follows template structure.
Linked Issues check ✅ Passed Code implements all primary objectives: EmbeddingCollapseMetric class, five collapse indicators (centroid_similarity, effective_rank_score, per_class_rank, domain_shift, separation), linear_probe_accuracy helper, and Metric pattern compliance.
Out of Scope Changes check ✅ Passed All changes are directly scoped to EmbeddingCollapseMetric feature: new module, metric class, helpers, tests, and exports. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ekansh-arora0 ekansh-arora0 force-pushed the feature/embedding-collapse-metric branch from 5b04cc4 to c2ff096 Compare April 9, 2026 23:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new EmbeddingCollapseMetric (and functional compute_embedding_collapse) to quantify representational collapse in embedding spaces via multiple indicators, plus unit tests and metrics-package exports.

Changes:

  • Introduces monai.metrics.embedding_collapse with centroid similarity, effective-rank collapse, per-class rank collapse, domain shift (linear CKA), and optional silhouette-based separation.
  • Adds a comprehensive unit test suite for expected return types/ranges, reductions, validation, and optional scikit-learn linear probe utility.
  • Re-exports the new metric and functional API from monai.metrics.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
monai/metrics/embedding_collapse.py Implements the new collapse metric, indicator functions, aggregation, and optional sklearn utilities.
tests/test_embedding_collapse.py Adds unit tests covering indicator behaviors, reductions, and validation/optional deps.
monai/metrics/__init__.py Exposes EmbeddingCollapseMetric and compute_embedding_collapse at the package level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +273 to +280
centered = emb - emb.mean(dim=0, keepdim=True)
_, sv, _ = torch.linalg.svd(centered, full_matrices=False)
max_sv = sv.max()
if max_sv == 0.0:
return torch.tensor(0.0, dtype=emb.dtype)
eff_rank = sv.sum() / max_sv
max_rank = float(min(emb.shape[0], emb.shape[1]))
return (1.0 - eff_rank / max_rank).clamp(0.0, 1.0)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_effective_rank_score returns 0.0 when all singular values are zero (e.g., all embeddings identical after mean-centering). That case represents maximal dimensional collapse; returning 0.0 under-reports collapse and can skew the aggregate when other indicators are missing. Consider defining this degenerate case as a score of 1.0 (and ensure the returned tensor matches emb device/dtype).

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +216
if reduction != "none":
primary = {"centroid_similarity", "effective_rank_score", "domain_shift", "separation"}
available = [v for k, v in scores.items() if k in primary and v is not None]
if not available:
scores["aggregate"] = torch.tensor(0.0)
elif reduction == "max":
scores["aggregate"] = torch.stack(available).max()
else: # mean
scores["aggregate"] = torch.stack(available).mean()
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several scalars are created with torch.tensor(...) without specifying device (and sometimes dtype). If embeddings are on GPU, this can cause device-mismatch errors when stacking/combining results. Create these tensors on the same device/dtype as the input embeddings (e.g., use emb.new_tensor(...)).

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +236
Returns:
Scalar tensor in ``[0, 1]``. ``None`` if fewer than 2 classes.
1.0 = centroids identical (full collapse).
0.0 = centroids orthogonal (no collapse).
"""
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states "0.0 = centroids orthogonal", but the implementation maps cosine similarity from [-1, 1] to [0, 1] via (raw + 1)/2, so an orthogonal cosine (0) produces 0.5. Please update the docstring to match the actual scaling (and clarify the interpretation for negative cosine values).

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +133
def compute_embedding_collapse(
embeddings: torch.Tensor,
labels: torch.Tensor | None = None,
target_embeddings: torch.Tensor | None = None,
target_labels: torch.Tensor | None = None,
reduction: str = "max",
include_indicators: Sequence[str] | None = None,
) -> dict[str, torch.Tensor | None]:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_embedding_collapse documents reduction as one of {"max","mean","none"}, but unlike EmbeddingCollapseMetric.init it does not validate this argument. Currently any unknown value falls through to the "mean" branch silently. Consider validating reduction here too (or sharing a validation helper) to avoid surprising behavior in the functional API.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +131
def compute_embedding_collapse(
embeddings: torch.Tensor,
labels: torch.Tensor | None = None,
target_embeddings: torch.Tensor | None = None,
target_labels: torch.Tensor | None = None,
reduction: str = "max",
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_labels is accepted and threaded through the public API but is never used in the computation. This makes the API confusing for callers. Either remove it (if not needed) or use it (e.g., class-conditional domain_shift or balanced subsampling) and add corresponding tests/docs.

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +186
# ── Label-dependent indicators ────────────────────────────────────
if labels is not None:
lbl = labels.long()

if inc is None or "centroid_similarity" in inc:
scores["centroid_similarity"] = _centroid_similarity(emb, lbl)

if inc is None or "per_class_rank" in inc:
scores.update(_per_class_rank(emb, lbl))

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no validation that labels has shape [N] and matches embeddings.shape[0]. If a mismatched shape is passed, boolean masking can broadcast unexpectedly or raise cryptic errors. Consider adding explicit checks (ndim==1 and same length as N) for labels (and similarly for target_labels if kept).

Copilot uses AI. Check for mistakes.
@ekansh-arora0
Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/test_embedding_collapse.py (1)

78-100: Add regressions for the current blind spots.

The suite still misses three cases: a constant matrix should drive effective_rank_score to 1.0, compute_embedding_collapse should reject bad reduction / include_indicators values, and the labeled CUDA path should exercise aggregate when separation participates. Those would have caught the current bugs.

As per coding guidelines, "**/*.py: Ensure new or modified definitions will be covered by existing or new unit tests."

Also applies to: 183-223, 308-327, 330-360

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_embedding_collapse.py` around lines 78 - 100, Add three regression
tests: 1) create a constant embedding matrix and assert
effective_rank_score(...) returns 1.0 (use the same tensor shape as other tests
and call effective_rank_score to verify constant-case behavior); 2) add
parameter validation tests that call compute_embedding_collapse(...) with
invalid reduction and invalid include_indicators values and assert it raises a
ValueError (or the module's chosen exception) to ensure bad inputs are rejected;
3) add a CUDA-path labeled test that runs compute_embedding_collapse on GPU
tensors with labels and a configuration where separation participates and assert
that the returned dict includes the "aggregate" key, ensuring the labeled CUDA
code path computes aggregate. Ensure tests reference compute_embedding_collapse
and effective_rank_score so they cover the modified/erroneous code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 171-174: compute_embedding_collapse currently does not validate
the public string options: it should validate the reduction argument and the
include_indicators names like the class constructor does. Update
compute_embedding_collapse to check that reduction is one of the allowed values
(e.g., "mean", "sum", "none" — match the same allowed set used by
EmbeddingCollapseMetric.__init__) and raise a ValueError for unknown reductions
instead of falling through to the mean branch; also resolve include_indicators
into a set and verify each provided name exists in the expected indicator
list/mapping (the same indicator name source used elsewhere), and raise a
ValueError listing any invalid names rather than silently ignoring them (avoid
returning an empty/zero aggregate when a typo is provided).
- Around line 260-280: The _effective_rank_score function currently computes
effective rank as sum(sv)/max(sv) and returns 0.0 for zero-variance inputs,
which is incorrect; replace the effective-rank computation with the Roy &
Vetterli definition: normalize singular values p = sv / sv.sum(); compute
entropy H = -sum(p * log(p + eps)) and eff_rank = exp(H); if sv.sum() == 0
(all-zero/constant embeddings) return 1.0 (full collapse); then compute score =
(1.0 - eff_rank / max_rank).clamp(0.0, 1.0). Use torch functions (torch.log,
torch.exp, torch.sum) with a small eps to avoid log(0), preserve emb.dtype when
creating scalars, and keep existing variables sv, max_rank, centered, and
_effective_rank_score to locate the change.
- Around line 207-216: The aggregate path may mix CPU scalars and CUDA tensors
causing torch.stack to fail; ensure all items in available and any fallback
tensors use the same device before stacking. Determine a target device from
existing tensor entries in available (or default to torch.device("cpu")),
convert CPU-built scalars/fallbacks (e.g., the torch.tensor(0.0) and values
returned by _separation) to that device, and call torch.stack on device-aligned
tensors (or use [v.to(device) for v in available]) so scores["aggregate"] is
created on the correct device; update the code around the available/scores
aggregation and any creation of torch.tensor(...) to respect that target device.

---

Nitpick comments:
In `@tests/test_embedding_collapse.py`:
- Around line 78-100: Add three regression tests: 1) create a constant embedding
matrix and assert effective_rank_score(...) returns 1.0 (use the same tensor
shape as other tests and call effective_rank_score to verify constant-case
behavior); 2) add parameter validation tests that call
compute_embedding_collapse(...) with invalid reduction and invalid
include_indicators values and assert it raises a ValueError (or the module's
chosen exception) to ensure bad inputs are rejected; 3) add a CUDA-path labeled
test that runs compute_embedding_collapse on GPU tensors with labels and a
configuration where separation participates and assert that the returned dict
includes the "aggregate" key, ensuring the labeled CUDA code path computes
aggregate. Ensure tests reference compute_embedding_collapse and
effective_rank_score so they cover the modified/erroneous code paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0558161c-bf2b-4b78-b904-39c8336e78e1

📥 Commits

Reviewing files that changed from the base of the PR and between deb3f98 and 5b04cc4.

📒 Files selected for processing (3)
  • monai/metrics/__init__.py
  • monai/metrics/embedding_collapse.py
  • tests/test_embedding_collapse.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
monai/metrics/embedding_collapse.py (3)

171-174: ⚠️ Potential issue | 🟠 Major

Validate functional API options (reduction, include_indicators)

Line 171 onward still accepts invalid reduction values and unknown indicator names silently. This can produce incorrect aggregate behavior and hide typos.

Suggested patch
 def compute_embedding_collapse(
@@
 ) -> dict[str, torch.Tensor | None]:
@@
     _validate_embeddings(embeddings)
     emb = embeddings.float()
+    if reduction not in {"max", "mean", "none"}:
+        raise ValueError(f"reduction must be 'max', 'mean', or 'none', got '{reduction}'")
+
+    valid_indicators = {"centroid_similarity", "effective_rank", "per_class_rank", "domain_shift", "separation"}
     inc = set(include_indicators) if include_indicators is not None else None
+    if inc is not None:
+        unknown = inc - valid_indicators
+        if unknown:
+            raise ValueError(f"Unknown indicator(s): {sorted(unknown)}")

As per coding guidelines, "**/*.py: ... Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 171 - 174, The code
currently calls _validate_embeddings and normalizes embeddings but does not
validate the functional API options; add explicit validation for the reduction
parameter (check that reduction is one of the allowed strings, e.g., "mean",
"sum", "none" or the set used by aggregate) and for include_indicators (if not
None, ensure every name is in the known indicator set), and raise a clear
ValueError listing permitted values on invalid input; update the block around
_validate_embeddings, emb = embeddings.float(), and the handling of
include_indicators so invalid reduction or unknown indicator names are detected
before calling aggregate and producing silent bugs.

210-216: ⚠️ Potential issue | 🔴 Critical

Keep aggregate tensors on a single device

Lines 210-216 can mix CUDA tensors with CPU scalars (e.g., fallback tensors and _separation output), causing torch.stack(available) to fail at runtime.

Suggested patch
     if reduction != "none":
         primary = {"centroid_similarity", "effective_rank_score", "domain_shift", "separation"}
-        available = [v for k, v in scores.items() if k in primary and v is not None]
+        available = [v.to(device=emb.device) for k, v in scores.items() if k in primary and v is not None]
         if not available:
-            scores["aggregate"] = torch.tensor(0.0)
+            scores["aggregate"] = emb.new_tensor(0.0)
         elif reduction == "max":
             scores["aggregate"] = torch.stack(available).max()
         else:  # mean
             scores["aggregate"] = torch.stack(available).mean()
#!/bin/bash
# Read-only check for mixed-device risk in aggregation path.
rg -n -C2 'available = \[v for k, v in scores\.items\(\)|torch\.tensor\(0\.0\)|torch\.stack\(available\)|return torch\.tensor\(' monai/metrics/embedding_collapse.py

As per coding guidelines, "**/*.py: ... Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 210 - 216, The aggregation
path can mix devices (CPU scalar vs CUDA tensors) causing torch.stack(available)
to fail; fix by creating the fallback tensor and performing stacking/aggregation
on the same device as the tensors in available: determine a target_device from
the first tensor in available (or fall back to next(t for t in scores.values()
if isinstance(t, torch.Tensor), torch.tensor(0.0)).device or
torch.device("cpu")), then use torch.tensor(0.0, device=target_device) for the
empty-case and ensure torch.stack(available).to(target_device) or convert any
CPU scalars to that device before calling .max()/.mean(); update references in
the block handling scores, available, reduction and scores["aggregate"].

260-280: ⚠️ Potential issue | 🔴 Critical

Use Roy–Vetterli effective rank and correct zero-variance behavior

Line 263’s formula (sum(sv)/max(sv)) is not the cited effective rank definition, and Line 277 returns 0.0 for fully collapsed embeddings. That inverts the intended meaning.

Suggested patch
 def _effective_rank_score(emb: torch.Tensor) -> torch.Tensor:
@@
     centered = emb - emb.mean(dim=0, keepdim=True)
     _, sv, _ = torch.linalg.svd(centered, full_matrices=False)
-    max_sv = sv.max()
-    if max_sv == 0.0:
-        return torch.tensor(0.0, dtype=emb.dtype)
-    eff_rank = sv.sum() / max_sv
+    sv_sum = sv.sum()
+    if sv_sum == 0:
+        return emb.new_tensor(1.0)
+
+    probs = sv / sv_sum
+    safe_probs = probs.clamp_min(torch.finfo(probs.dtype).tiny)
+    entropy = -(probs * safe_probs.log()).sum()
+    eff_rank = entropy.exp()
     max_rank = float(min(emb.shape[0], emb.shape[1]))
     return (1.0 - eff_rank / max_rank).clamp(0.0, 1.0)

As per coding guidelines, "**/*.py: ... Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 260 - 280, The
_effective_rank_score implementation uses the wrong formula and returns inverted
semantics for zero-variance cases; replace the current sum(sv)/max(sv)
computation with Roy–Vetterli's effective rank = exp( - sum_i p_i * log(p_i) )
where p = sv / sv.sum(), compute ent = - (p * torch.log(p)).sum() then eff_rank
= torch.exp(ent), set max_rank = float(min(emb.shape[0], emb.shape[1])), and
compute score = (1.0 - eff_rank / max_rank).clamp(0.0, 1.0); also when all
singular values are zero (sv.sum()==0) return a tensor representing full
collapse (1.0) with the same dtype/device as emb so zero-variance behavior is
correct and tensor creation uses emb.dtype and emb.device.
🧹 Nitpick comments (1)
monai/metrics/embedding_collapse.py (1)

394-402: Add a Google-style docstring to _validate_embeddings

_validate_embeddings lacks a docstring while it enforces contract and raises exceptions. Add Args and Raises sections for consistency and maintainability.

As per coding guidelines, "**/*.py: ... Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 394 - 402, Add a
Google-style docstring to the _validate_embeddings function describing the
purpose, the Args and Raises sections: document the embeddings parameter as a
2-D torch.Tensor of shape [N, D], and list the two ValueError cases (when
embeddings.ndim != 2 and when embeddings.shape[0] < 2) with their conditions;
place this docstring immediately above the def _validate_embeddings(embeddings:
torch.Tensor) -> None: signature so it documents the contract and exceptions
raised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 171-174: The code currently calls _validate_embeddings and
normalizes embeddings but does not validate the functional API options; add
explicit validation for the reduction parameter (check that reduction is one of
the allowed strings, e.g., "mean", "sum", "none" or the set used by aggregate)
and for include_indicators (if not None, ensure every name is in the known
indicator set), and raise a clear ValueError listing permitted values on invalid
input; update the block around _validate_embeddings, emb = embeddings.float(),
and the handling of include_indicators so invalid reduction or unknown indicator
names are detected before calling aggregate and producing silent bugs.
- Around line 210-216: The aggregation path can mix devices (CPU scalar vs CUDA
tensors) causing torch.stack(available) to fail; fix by creating the fallback
tensor and performing stacking/aggregation on the same device as the tensors in
available: determine a target_device from the first tensor in available (or fall
back to next(t for t in scores.values() if isinstance(t, torch.Tensor),
torch.tensor(0.0)).device or torch.device("cpu")), then use torch.tensor(0.0,
device=target_device) for the empty-case and ensure
torch.stack(available).to(target_device) or convert any CPU scalars to that
device before calling .max()/.mean(); update references in the block handling
scores, available, reduction and scores["aggregate"].
- Around line 260-280: The _effective_rank_score implementation uses the wrong
formula and returns inverted semantics for zero-variance cases; replace the
current sum(sv)/max(sv) computation with Roy–Vetterli's effective rank = exp( -
sum_i p_i * log(p_i) ) where p = sv / sv.sum(), compute ent = - (p *
torch.log(p)).sum() then eff_rank = torch.exp(ent), set max_rank =
float(min(emb.shape[0], emb.shape[1])), and compute score = (1.0 - eff_rank /
max_rank).clamp(0.0, 1.0); also when all singular values are zero (sv.sum()==0)
return a tensor representing full collapse (1.0) with the same dtype/device as
emb so zero-variance behavior is correct and tensor creation uses emb.dtype and
emb.device.

---

Nitpick comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 394-402: Add a Google-style docstring to the _validate_embeddings
function describing the purpose, the Args and Raises sections: document the
embeddings parameter as a 2-D torch.Tensor of shape [N, D], and list the two
ValueError cases (when embeddings.ndim != 2 and when embeddings.shape[0] < 2)
with their conditions; place this docstring immediately above the def
_validate_embeddings(embeddings: torch.Tensor) -> None: signature so it
documents the contract and exceptions raised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6749b97a-5d27-4195-8d07-c204ab273211

📥 Commits

Reviewing files that changed from the base of the PR and between 5b04cc4 and c2ff096.

📒 Files selected for processing (3)
  • monai/metrics/__init__.py
  • monai/metrics/embedding_collapse.py
  • tests/test_embedding_collapse.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_embedding_collapse.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • monai/metrics/init.py

@ekansh-arora0 ekansh-arora0 force-pushed the feature/embedding-collapse-metric branch from c2ff096 to d69aba0 Compare April 10, 2026 02:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
monai/metrics/embedding_collapse.py (1)

277-302: ⚠️ Potential issue | 🟠 Major

_effective_rank_score is still using the wrong definition and ceiling.

Roy–Vetterli effective rank is entropy-based, not sum(sv) / max(sv). Also, after mean-centering the attainable rank is min(N - 1, D), so the current normalization overstates collapse on small batches.

Proposed fix
 def _effective_rank_score(emb: torch.Tensor) -> torch.Tensor:
     """Dimensional collapse score via SVD effective rank.
 
     Uses the entropy-based effective rank from Roy & Vetterli (2007):
-    ``eff_rank = sum(sv) / max(sv)`` where ``sv`` are the singular values
-    of the mean-centred embedding matrix.
+    ``eff_rank = exp(-sum(p * log(p)))`` where ``p = sv / sum(sv)`` and
+    ``sv`` are the singular values of the mean-centred embedding matrix.
@@
     centered = emb - emb.mean(dim=0, keepdim=True)
     _, sv, _ = torch.linalg.svd(centered, full_matrices=False)
     sv_sum = sv.sum()
     if sv_sum == 0:
         return emb.new_tensor(1.0)
-    eff_rank = sv_sum / sv.max()
-    max_rank = float(min(emb.shape[0], emb.shape[1]))
+
+    probs = sv / sv_sum
+    safe_probs = probs.clamp_min(torch.finfo(probs.dtype).tiny)
+    eff_rank = (-(probs * safe_probs.log()).sum()).exp()
+    max_rank = float(min(emb.shape[0] - 1, emb.shape[1]))
     return (1.0 - eff_rank / max_rank).clamp(0.0, 1.0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 277 - 302, The current
_effective_rank_score uses the wrong definition and normalizer; replace its
formula with the entropy-based effective rank: compute singular values sv from
torch.linalg.svd(centered), form p = sv / sv.sum() (handle sv.sum()==0 by
returning emb.new_tensor(1.0)), compute spectral entropy H = -sum_i p_i *
log(p_i) with numerically-safe masking to avoid log(0), then eff_rank = exp(H);
normalize by max_rank = float(min(emb.shape[0] - 1, emb.shape[1])) (since
mean-centering reduces max attainable rank) and return (1.0 - eff_rank /
max_rank).clamp(0.0, 1.0) preserving the original tensor dtype and device.
🧹 Nitpick comments (1)
monai/metrics/embedding_collapse.py (1)

401-416: Finish the helper docstrings.

_validate_embeddings has no docstring, and _hsic still lacks the Args/Returns/Raises structure used elsewhere in this file.

As per coding guidelines, "**/*.py: Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 401 - 416, Add Google-style
docstrings to the helper functions: for _hsic(x: torch.Tensor, y: torch.Tensor)
document Args (x and y as 2‑D tensors of shape [N, D] or [N, *], same N),
Returns (torch.Tensor scalar HSIC estimate), and Raises (e.g., ValueError if x
and y have different first-dimension sizes or N<2 if you choose to validate
inside); for _validate_embeddings(embeddings: torch.Tensor) add a docstring
describing Args (embeddings: 2‑D tensor [N, D]), Returns (None), and Raises
(ValueError when embeddings.ndim != 2 or when embeddings.shape[0] < 2), matching
the file’s Google-style docstring format and wording conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 223-231: The current logic sets scores["aggregate"] =
emb.new_tensor(0.0) when no primary indicators are available, which incorrectly
signals a valid aggregate; instead, detect when available is empty and set
scores["aggregate"] to a NaN tensor (e.g., emb.new_tensor(float("nan"))) or omit
the key so callers can detect "no aggregate computed" (adjust callers if
needed). Update the block that inspects reduction / primary / available
(variables and keys: reduction, primary, available, scores["aggregate"],
emb.new_tensor) to assign NaN (or remove the key) rather than 0.0 when available
is empty.
- Around line 389-395: The sklearn bridge calls use tensors that may require
gradients, causing RuntimeError when calling .cpu().numpy(); modify the calls in
functions _separation and linear_probe_accuracy to detach the tensors first
(e.g., use emb.detach().cpu().numpy() and labels.detach().cpu().numpy()) before
passing to sklearn_silhouette or other sklearn helpers, and similarly update the
analogous calls around lines 456-459 to detach both embeddings and label tensors
to ensure compatibility with grad-enabled tensors.

---

Duplicate comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 277-302: The current _effective_rank_score uses the wrong
definition and normalizer; replace its formula with the entropy-based effective
rank: compute singular values sv from torch.linalg.svd(centered), form p = sv /
sv.sum() (handle sv.sum()==0 by returning emb.new_tensor(1.0)), compute spectral
entropy H = -sum_i p_i * log(p_i) with numerically-safe masking to avoid log(0),
then eff_rank = exp(H); normalize by max_rank = float(min(emb.shape[0] - 1,
emb.shape[1])) (since mean-centering reduces max attainable rank) and return
(1.0 - eff_rank / max_rank).clamp(0.0, 1.0) preserving the original tensor dtype
and device.

---

Nitpick comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 401-416: Add Google-style docstrings to the helper functions: for
_hsic(x: torch.Tensor, y: torch.Tensor) document Args (x and y as 2‑D tensors of
shape [N, D] or [N, *], same N), Returns (torch.Tensor scalar HSIC estimate),
and Raises (e.g., ValueError if x and y have different first-dimension sizes or
N<2 if you choose to validate inside); for _validate_embeddings(embeddings:
torch.Tensor) add a docstring describing Args (embeddings: 2‑D tensor [N, D]),
Returns (None), and Raises (ValueError when embeddings.ndim != 2 or when
embeddings.shape[0] < 2), matching the file’s Google-style docstring format and
wording conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec20dd4b-57fd-48eb-8854-30d96e407ae5

📥 Commits

Reviewing files that changed from the base of the PR and between c2ff096 and d69aba0.

📒 Files selected for processing (3)
  • monai/metrics/__init__.py
  • monai/metrics/embedding_collapse.py
  • tests/test_embedding_collapse.py
✅ Files skipped from review due to trivial changes (2)
  • monai/metrics/init.py
  • tests/test_embedding_collapse.py

@ekansh-arora0 ekansh-arora0 force-pushed the feature/embedding-collapse-metric branch from d69aba0 to 3ef945f Compare April 10, 2026 03:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
monai/metrics/embedding_collapse.py (3)

389-393: ⚠️ Potential issue | 🟠 Major

Detach before sklearn .numpy() conversion.

Lines 391-393 and Lines 457-459 call .cpu().numpy() directly. If embeddings require gradients, this raises RuntimeError; in _separation it gets swallowed and silently returns None.

🛠️ Proposed fix
         sil = sklearn_silhouette(  # type: ignore[operator]
-            emb.cpu().numpy(),
-            labels.cpu().numpy(),
+            emb.detach().cpu().numpy(),
+            labels.detach().cpu().numpy(),
             metric="cosine",
         )
@@
-    clf.fit(train_embeddings.float().cpu().numpy(), train_labels.cpu().numpy())
-    preds = clf.predict(test_embeddings.float().cpu().numpy())
-    acc = (preds == test_labels.cpu().numpy()).mean()
+    clf.fit(train_embeddings.detach().float().cpu().numpy(), train_labels.detach().cpu().numpy())
+    preds = clf.predict(test_embeddings.detach().float().cpu().numpy())
+    acc = (preds == test_labels.detach().cpu().numpy()).mean()

Also applies to: 457-459

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 389 - 393, The calls to
.cpu().numpy() on tensors may raise RuntimeError if tensors require gradients;
update both occurrences (in the silhouette call inside _separation and the later
duplicate at lines ~457-459) to detach the tensors first (e.g., use
emb.detach().cpu().numpy() and labels.detach().cpu().numpy()) so conversion to
numpy never attempts to access the autograd graph; keep the rest of the
sklearn_silhouette invocation and error handling unchanged.

223-227: ⚠️ Potential issue | 🟠 Major

aggregate=0.0 is misleading when nothing was aggregated.

Line 227 returns 0.0 when available is empty. With include_indicators=["per_class_rank"], that reads like “no collapse” although no primary aggregate input existed.

🛠️ Proposed fix
         available = [v.to(device=emb.device) for k, v in scores.items() if k in primary and v is not None]
         if not available:
-            scores["aggregate"] = emb.new_tensor(0.0)
+            scores["aggregate"] = None
         elif reduction == "max":
             scores["aggregate"] = torch.stack(available).max()
         else:
             scores["aggregate"] = torch.stack(available).mean()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 223 - 227, The current
block sets scores["aggregate"] = emb.new_tensor(0.0) when no primary metrics are
available, which is misleading; change the behavior so that when available is
empty the aggregate is set to a NaN tensor (preserving emb's dtype/device) or
omit the "aggregate" key—update the code handling in embedding_collapse.py
around the reduction check and the variables reduction, primary, available,
scores, and emb so that an absent aggregation is represented by NaN (e.g.,
emb.new_tensor(float("nan")) with the same dtype/device) instead of 0.0.

280-301: ⚠️ Potential issue | 🟠 Major

Effective-rank formula does not match the cited definition.

Lines 280-301 claim Roy–Vetterli entropy effective-rank, but compute sum(sv)/max(sv). That is a different metric and changes collapse scoring behavior.

🛠️ Proposed fix
-    Uses the entropy-based effective rank from Roy & Vetterli (2007):
-    ``eff_rank = sum(sv) / max(sv)`` where ``sv`` are the singular values
-    of the mean-centred embedding matrix.
+    Uses the entropy-based effective rank from Roy & Vetterli (2007):
+    ``p = sv / sum(sv)``, ``eff_rank = exp(-sum(p * log(p)))``,
+    where ``sv`` are singular values of the mean-centred embedding matrix.
@@
-    eff_rank = sv_sum / sv.max()
+    probs = sv / sv_sum
+    safe_probs = probs.clamp_min(torch.finfo(probs.dtype).tiny)
+    entropy = -(probs * safe_probs.log()).sum()
+    eff_rank = entropy.exp()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 280 - 301, The current
implementation computes eff_rank = sv_sum / sv.max(), which is not the
Roy–Vetterli entropy effective rank; replace that with the entropy-based
definition: normalize singular values p = sv / sv_sum, compute entropy H =
-sum(p * log(p + eps)) (use a small eps to avoid log(0)), then eff_rank =
exp(H). Finally map that eff_rank (which ranges [1, max_rank]) into the collapse
score in [0,1] (e.g., collapse = (max_rank - eff_rank) / (max_rank - 1) with the
special-case when max_rank == 1) and return that tensor. Update the code
locations referencing centered, sv, sv_sum, eff_rank, and max_rank accordingly.
🧹 Nitpick comments (1)
monai/metrics/embedding_collapse.py (1)

401-417: Complete Google-style docstrings for private helpers.

_hsic and _validate_embeddings are missing full Args/Returns/Raises sections. Add complete docstrings to keep contracts explicit and consistent.

As per coding guidelines, "**/*.py: Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 401 - 417, Both private
helpers _hsic and _validate_embeddings lack full Google-style docstrings; add
complete docstrings for each that include a short description plus Args,
Returns, and Raises sections: for _hsic describe x and y as torch.Tensor of
shape [N, D] (or [N, C]), state that it returns a torch.Tensor scalar HSIC
estimate, and document that it may raise on mismatched batch sizes or invalid
dims; for _validate_embeddings describe embeddings as torch.Tensor shape [N, D],
no return value, and document ValueError raised when embeddings.ndim != 2 or
embeddings.shape[0] < 2. Ensure types, shapes, and raised exceptions are
explicit and follow Google docstring formatting for both functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 216-218: The current code hard-fails by calling
_validate_embeddings(target_embeddings) which raises for
target_embeddings.shape[0] < 2; instead, change the logic around
target_embeddings in the block that sets scores["domain_shift"] so that you only
call _validate_embeddings and _domain_shift when target_embeddings is not None
and has at least 2 rows, otherwise set scores["domain_shift"] = None;
specifically, check target_embeddings is not None and target_embeddings.shape[0]
>= 2 before invoking _validate_embeddings(target_embeddings) and
scores["domain_shift"] = _domain_shift(emb, target_embeddings.float()), and if
that check fails assign None to scores["domain_shift"] to preserve the
documented graceful path.

---

Duplicate comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 389-393: The calls to .cpu().numpy() on tensors may raise
RuntimeError if tensors require gradients; update both occurrences (in the
silhouette call inside _separation and the later duplicate at lines ~457-459) to
detach the tensors first (e.g., use emb.detach().cpu().numpy() and
labels.detach().cpu().numpy()) so conversion to numpy never attempts to access
the autograd graph; keep the rest of the sklearn_silhouette invocation and error
handling unchanged.
- Around line 223-227: The current block sets scores["aggregate"] =
emb.new_tensor(0.0) when no primary metrics are available, which is misleading;
change the behavior so that when available is empty the aggregate is set to a
NaN tensor (preserving emb's dtype/device) or omit the "aggregate" key—update
the code handling in embedding_collapse.py around the reduction check and the
variables reduction, primary, available, scores, and emb so that an absent
aggregation is represented by NaN (e.g., emb.new_tensor(float("nan")) with the
same dtype/device) instead of 0.0.
- Around line 280-301: The current implementation computes eff_rank = sv_sum /
sv.max(), which is not the Roy–Vetterli entropy effective rank; replace that
with the entropy-based definition: normalize singular values p = sv / sv_sum,
compute entropy H = -sum(p * log(p + eps)) (use a small eps to avoid log(0)),
then eff_rank = exp(H). Finally map that eff_rank (which ranges [1, max_rank])
into the collapse score in [0,1] (e.g., collapse = (max_rank - eff_rank) /
(max_rank - 1) with the special-case when max_rank == 1) and return that tensor.
Update the code locations referencing centered, sv, sv_sum, eff_rank, and
max_rank accordingly.

---

Nitpick comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 401-417: Both private helpers _hsic and _validate_embeddings lack
full Google-style docstrings; add complete docstrings for each that include a
short description plus Args, Returns, and Raises sections: for _hsic describe x
and y as torch.Tensor of shape [N, D] (or [N, C]), state that it returns a
torch.Tensor scalar HSIC estimate, and document that it may raise on mismatched
batch sizes or invalid dims; for _validate_embeddings describe embeddings as
torch.Tensor shape [N, D], no return value, and document ValueError raised when
embeddings.ndim != 2 or embeddings.shape[0] < 2. Ensure types, shapes, and
raised exceptions are explicit and follow Google docstring formatting for both
functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c035fdb0-3d6f-4f52-9c44-00c9d995a0b8

📥 Commits

Reviewing files that changed from the base of the PR and between d69aba0 and 3ef945f.

📒 Files selected for processing (3)
  • monai/metrics/__init__.py
  • monai/metrics/embedding_collapse.py
  • tests/test_embedding_collapse.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_embedding_collapse.py

@ekansh-arora0 ekansh-arora0 force-pushed the feature/embedding-collapse-metric branch from 3ef945f to c4013f8 Compare April 10, 2026 03:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
monai/metrics/embedding_collapse.py (4)

389-393: ⚠️ Potential issue | 🟠 Major

Detach tensors before sklearn NumPy conversion.

Tensor.cpu().numpy() fails on grad-tracking tensors. _separation currently masks this as None; linear_probe_accuracy will error.

Proposed fix
         sil = sklearn_silhouette(  # type: ignore[operator]
-            emb.cpu().numpy(),
-            labels.cpu().numpy(),
+            emb.detach().cpu().numpy(),
+            labels.detach().cpu().numpy(),
             metric="cosine",
         )
@@
-    clf.fit(train_embeddings.float().cpu().numpy(), train_labels.cpu().numpy())
-    preds = clf.predict(test_embeddings.float().cpu().numpy())
-    acc = (preds == test_labels.cpu().numpy()).mean()
+    clf.fit(train_embeddings.detach().float().cpu().numpy(), train_labels.detach().cpu().numpy())
+    preds = clf.predict(test_embeddings.detach().float().cpu().numpy())
+    acc = (preds == test_labels.detach().cpu().numpy()).mean()
#!/bin/bash
set -euo pipefail

python - <<'PY'
import torch
x = torch.randn(4, requires_grad=True)
try:
    x.cpu().numpy()
    print("UNEXPECTED: numpy conversion succeeded without detach")
except RuntimeError as e:
    print("EXPECTED RuntimeError:", e)
PY

rg -nP 'cpu\(\)\.numpy\(' monai/metrics/embedding_collapse.py -C2

As per coding guidelines, "**/*.py: Review the Python code for quality and correctness."

Also applies to: 457-459

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 389 - 393, The sklearn
conversion is called on grad-tracking tensors which raises at .cpu().numpy();
update the calls around the sklearn_silhouette invocation (and the similar block
at the other occurrence) to detach the tensors first (e.g., use
emb.detach().cpu().numpy() and labels.detach().cpu().numpy()) so _separation
returns real values and downstream linear_probe_accuracy won't error; ensure
both emb and labels are detached before converting to numpy in the _separation
helper and the duplicate site.

216-220: ⚠️ Potential issue | 🟠 Major

Don’t hard-fail domain_shift for small target_embeddings.

This currently raises for target_embeddings.shape[0] < 2, but _domain_shift already has a graceful None path for non-computable cases.

Proposed fix
     if inc is None or "domain_shift" in inc:
         if target_embeddings is not None:
-            _validate_embeddings(target_embeddings)
+            if target_embeddings.ndim != 2:
+                raise ValueError(
+                    f"target_embeddings must be 2-D [N, D], got shape {tuple(target_embeddings.shape)}"
+                )
             scores["domain_shift"] = _domain_shift(emb, target_embeddings.float())
         else:
             scores["domain_shift"] = None

As per coding guidelines, "**/*.py: ... Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 216 - 220, The code
hard-fails when target_embeddings has fewer than 2 rows but _domain_shift
already returns None for non-computable cases; change the logic so small
target_embeddings do not raise. Specifically, in the block handling
target_embeddings, avoid letting _validate_embeddings raise on tiny inputs
(either relax _validate_embeddings to allow n<2 or check
target_embeddings.shape[0] before calling it), then call scores["domain_shift"]
= _domain_shift(emb, target_embeddings.float()) and let _domain_shift return
None for non-computable cases (or catch validation errors and set
scores["domain_shift"] = None). Reference symbols: target_embeddings,
_validate_embeddings, _domain_shift, scores["domain_shift"], emb.

277-302: ⚠️ Potential issue | 🔴 Critical

Use Roy–Vetterli effective-rank math, not stable-rank math.

Current implementation computes sum(sv)/max(sv), which is not the cited effective rank and skews collapse scores.

Proposed fix
 def _effective_rank_score(emb: torch.Tensor) -> torch.Tensor:
@@
-    Uses the entropy-based effective rank from Roy & Vetterli (2007):
-    ``eff_rank = sum(sv) / max(sv)`` where ``sv`` are the singular values
-    of the mean-centred embedding matrix.
+    Uses the entropy-based effective rank from Roy & Vetterli (2007):
+    ``eff_rank = exp(-sum(p_i * log(p_i)))`` where
+    ``p_i = sv_i / sum(sv)`` and ``sv`` are singular values of the
+    mean-centred embedding matrix.
@@
-    eff_rank = sv_sum / sv.max()
+    probs = sv / sv_sum
+    safe_probs = probs.clamp_min(torch.finfo(probs.dtype).tiny)
+    entropy = -(probs * safe_probs.log()).sum()
+    eff_rank = entropy.exp()

As per coding guidelines, "**/*.py: ... Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 277 - 302, The function
_effective_rank_score currently uses sum(sv)/max(sv) which is wrong; implement
Roy & Vetterli's effective rank: compute p = sv / sv_sum, entropy H = -sum(p *
log(p)) (ignore or mask p==0 to avoid log(0)), then eff_rank = exp(H); keep the
existing handling when sv_sum == 0 (return 1.0), compute max_rank =
float(min(emb.shape[0], emb.shape[1])) and return (1.0 - eff_rank /
max_rank).clamp(0.0, 1.0); use torch operations (torch.log, torch.exp, masking
or torch.where) on sv, and reference variables sv, sv_sum, eff_rank, max_rank
and the function name _effective_rank_score to locate where to change.

223-231: ⚠️ Potential issue | 🟠 Major

Avoid aggregate=0.0 when nothing was aggregated.

Returning 0.0 here looks like “healthy embeddings” instead of “aggregate unavailable”.

Proposed fix
         available = [v.to(device=emb.device) for k, v in scores.items() if k in primary and v is not None]
         if not available:
-            scores["aggregate"] = emb.new_tensor(0.0)
+            scores["aggregate"] = None
         elif reduction == "max":
             scores["aggregate"] = torch.stack(available).max()
         else:
             scores["aggregate"] = torch.stack(available).mean()

As per coding guidelines, "**/*.py: ... Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 223 - 231, The current
branch that sets scores["aggregate"] = emb.new_tensor(0.0) when nothing is
aggregated should instead mark the aggregate as unavailable (not a healthy 0.0);
change that assignment to create a NaN tensor on the same device/dtype as emb
(e.g., emb.new_tensor(float("nan")) or equivalent) so callers can detect
"aggregate unavailable" (refer to variables reduction, primary, scores, emb,
available, and the aggregate key).
🧹 Nitpick comments (1)
monai/metrics/embedding_collapse.py (1)

412-417: Add a Google-style docstring to _validate_embeddings.

This is a definition without a docstring; keep Args/Raises explicit for consistency.

As per coding guidelines, "**/*.py: ... Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/metrics/embedding_collapse.py` around lines 412 - 417, Add a
Google-style docstring to the _validate_embeddings function describing the
parameter and raised exceptions: document the Args section for embeddings
(torch.Tensor) specifying it must be 2-D with shape [N, D], and include a Raises
section listing ValueError for non-2-D input and for fewer than 2 samples; keep
the content concise and place the docstring immediately under the def
_validate_embeddings(...) signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 389-393: The sklearn conversion is called on grad-tracking tensors
which raises at .cpu().numpy(); update the calls around the sklearn_silhouette
invocation (and the similar block at the other occurrence) to detach the tensors
first (e.g., use emb.detach().cpu().numpy() and labels.detach().cpu().numpy())
so _separation returns real values and downstream linear_probe_accuracy won't
error; ensure both emb and labels are detached before converting to numpy in the
_separation helper and the duplicate site.
- Around line 216-220: The code hard-fails when target_embeddings has fewer than
2 rows but _domain_shift already returns None for non-computable cases; change
the logic so small target_embeddings do not raise. Specifically, in the block
handling target_embeddings, avoid letting _validate_embeddings raise on tiny
inputs (either relax _validate_embeddings to allow n<2 or check
target_embeddings.shape[0] before calling it), then call scores["domain_shift"]
= _domain_shift(emb, target_embeddings.float()) and let _domain_shift return
None for non-computable cases (or catch validation errors and set
scores["domain_shift"] = None). Reference symbols: target_embeddings,
_validate_embeddings, _domain_shift, scores["domain_shift"], emb.
- Around line 277-302: The function _effective_rank_score currently uses
sum(sv)/max(sv) which is wrong; implement Roy & Vetterli's effective rank:
compute p = sv / sv_sum, entropy H = -sum(p * log(p)) (ignore or mask p==0 to
avoid log(0)), then eff_rank = exp(H); keep the existing handling when sv_sum ==
0 (return 1.0), compute max_rank = float(min(emb.shape[0], emb.shape[1])) and
return (1.0 - eff_rank / max_rank).clamp(0.0, 1.0); use torch operations
(torch.log, torch.exp, masking or torch.where) on sv, and reference variables
sv, sv_sum, eff_rank, max_rank and the function name _effective_rank_score to
locate where to change.
- Around line 223-231: The current branch that sets scores["aggregate"] =
emb.new_tensor(0.0) when nothing is aggregated should instead mark the aggregate
as unavailable (not a healthy 0.0); change that assignment to create a NaN
tensor on the same device/dtype as emb (e.g., emb.new_tensor(float("nan")) or
equivalent) so callers can detect "aggregate unavailable" (refer to variables
reduction, primary, scores, emb, available, and the aggregate key).

---

Nitpick comments:
In `@monai/metrics/embedding_collapse.py`:
- Around line 412-417: Add a Google-style docstring to the _validate_embeddings
function describing the parameter and raised exceptions: document the Args
section for embeddings (torch.Tensor) specifying it must be 2-D with shape [N,
D], and include a Raises section listing ValueError for non-2-D input and for
fewer than 2 samples; keep the content concise and place the docstring
immediately under the def _validate_embeddings(...) signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7c151780-73cd-49b4-994a-1a795ab020d8

📥 Commits

Reviewing files that changed from the base of the PR and between 3ef945f and c4013f8.

📒 Files selected for processing (3)
  • monai/metrics/__init__.py
  • monai/metrics/embedding_collapse.py
  • tests/test_embedding_collapse.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_embedding_collapse.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • monai/metrics/init.py

@ekansh-arora0 ekansh-arora0 force-pushed the feature/embedding-collapse-metric branch 5 times, most recently from 3ccd276 to 91dd667 Compare April 10, 2026 19:12
…cal imaging embeddings

Closes Project-MONAI#8808

Signed-off-by: ekansh-arora0 <ekansh.arora0@gmail.com>
@ekansh-arora0 ekansh-arora0 force-pushed the feature/embedding-collapse-metric branch from 91dd667 to 10b53f0 Compare April 10, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add EmbeddingCollapseMetric: detect representational collapse in medical imaging embeddings

2 participants