Skip to content

Fix: Remove device_map to prevent meta tensor errors in table detection#459

Open
micmarty-deepsense wants to merge 2 commits intomainfrom
fix/table-transformer-meta-tensor
Open

Fix: Remove device_map to prevent meta tensor errors in table detection#459
micmarty-deepsense wants to merge 2 commits intomainfrom
fix/table-transformer-meta-tensor

Conversation

@micmarty-deepsense
Copy link
Contributor

@micmarty-deepsense micmarty-deepsense commented Jan 27, 2026

Summary

Fixes NotImplementedError: Cannot copy out of meta tensor error in table detection during multi-threaded processing.

Root Cause: How device_map Creates Meta Tensors

The Problem

When passing device_map to from_pretrained(), HuggingFace Transformers uses a special loading path:

  1. Forces low_cpu_mem_usage=True automatically (HF #33326)
  2. Creates model on meta device first - placeholder tensors with NO actual data
  3. Attempts to load state_dict onto meta tensors without assign=True (HF #37615)
  4. Tries to move to target device with .to(device)
  5. FAILS because meta tensors cannot be copied/moved (HF #26700)

From HF issue #33326:

"Tensors created on the meta device are meaningless empty tensors, which renders initialization code completely ineffective."

From HF issue #26700:

"When using device_map, the transformers library creates a context manager that sets the default device to 'meta'. During initialization, the code attempts to copy weights from the original modules. However, since the backbone was created on the meta device, the weights are not materialized, causing the copy operation to fail."

Two Different Code Paths in Transformers

With device_map (broken path):

from_pretrained(device_map="cpu")
  → Accelerate's distributed loading
    → Initialize on meta device
      → Load state_dict onto meta tensors
        → Try to move to CPU
          → ERROR: Cannot copy from meta tensor

Without device_map (working path):

from_pretrained()
  → Normal PyTorch loading
    → Initialize on CPU with real tensors
      → Load weights directly into memory
        → .to(device) moves real data
          → SUCCESS

Why device_map Exists

Originally designed for HUGE models (>100B params) that don't fit in memory. Our TableTransformer models (~500MB) don't need this optimization and shouldn't use it.

Changes

  • Removed device_map from DetrImageProcessor.from_pretrained() (line 75)
  • Removed device_map from TableTransformerForObjectDetection.from_pretrained() (lines 85-88)
  • Added explicit .to(self.device) after model loading (line 87)
# BEFORE (broken)
self.model = TableTransformerForObjectDetection.from_pretrained(
    model,
    device_map=self.device,  # Forces meta device path
)

# AFTER (working)
self.model = TableTransformerForObjectDetection.from_pretrained(model)  # Normal loading
self.model.to(self.device)  # Move real tensors to device

Pattern Consistency

This matches the fix pattern used for SentenceTransformer models in core-product:

  • c8b175f7: Added device parameter to model constructor
  • db636932: Made thread-safe with @threadsafe_lazyproperty

Antonio's PR #446 addressed a different issue (thread-safety race condition), while this PR fixes the underlying meta tensor problem.

Testing

Error observed in in-vpc customer deployment with strategy=fast + table detection during OCR processing.

Stacktrace (from customer deployment)

tables_agent_patch.py:297 → patched_load_agent
  → tables.py:85 → initialize
    → TableTransformerForObjectDetection.from_pretrained()
      → model.to(device)  ← META TENSOR ERROR

References

The device_map parameter with HuggingFace Transformers can cause
NotImplementedError "Cannot copy out of meta tensor" in multi-threaded
contexts when loading TableTransformerForObjectDetection models.

This fix:
- Removes device_map from DetrImageProcessor.from_pretrained()
- Removes device_map from TableTransformerForObjectDetection.from_pretrained()
- Uses explicit .to(device) after model loading instead

This pattern matches the fix applied to SentenceTransformer models in
core-product (commits c8b175f7 and db636932).

Error observed in in-vpc customer deployment when using strategy=fast with table detection.
…rors

Removed manual bitmap.close() and page.close() calls in convert_pdf_to_image()
to prevent pypdfium2 AssertionError during concurrent PDF processing.

Issue: When manually closing child objects (bitmap, page) followed by parent
PDF close, pypdfium2's weakref finalizers can run after parent closes,
triggering assertion failures in cleanup logic.

Solution: Let pypdfium2 finalizers handle resource cleanup automatically.
This prevents double-cleanup race conditions and simplifies code.

Version: Bumped to 1.1.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant