Skip to content

pdfplumber broken in code_execution sandbox — charset_normalizer mypyc module missing #1340

@michaelkleyn

Description

@michaelkleyn

Environment

  • Tool: code_execution_20250825 sandbox
  • Python: 3.11.12, Linux x86_64

Error

ModuleNotFoundError: No module named '81d243bd2c585b0f4821__mypyc'

Full traceback:

File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/pdfplumber/__init__.py", line 11, in <module>
    import pdfminer.pdftypes
  File "/usr/local/lib/python3.11/site-packages/pdfminer/pdftypes.py", line 22, in <module>
    from pdfminer.psparser import LIT, PSObject
  File "/usr/local/lib/python3.11/site-packages/pdfminer/psparser.py", line 20, in <module>
    from pdfminer.utils import choplist
  File "/usr/local/lib/python3.11/site-packages/pdfminer/utils.py", line 31, in <module>
    import charset_normalizer
  File "/usr/local/lib/python3.11/site-packages/charset_normalizer/__init__.py", line 24, in <module>
    from .api import from_bytes, from_fp, from_path, is_binary
  File "/usr/local/lib/python3.11/site-packages/charset_normalizer/api.py", line 5, in <module>
    from .cd import (
ModuleNotFoundError: No module named '81d243bd2c585b0f4821__mypyc'

Reproduction

import pdfplumber

This fails immediately on import inside the code execution sandbox.

Root Cause

The charset_normalizer package (dependency chain: pdfplumberpdfminercharset_normalizer) ships mypyc-compiled C extensions. The hash-prefixed module name (81d243bd2c585b0f4821__mypyc) indicates a compiled .so binary that is either missing from the sandbox image or was built for a different platform/Python ABI.

Impact

pdfplumber is documented as a pre-installed package in the code execution sandbox but is completely unusable. Users who upload PDFs and rely on the agent choosing pdfplumber for table/text extraction hit this error.

Workaround

Use pypdf instead, which works correctly in the sandbox:

from pypdf import PdfReader
reader = PdfReader("file.pdf")
for page in reader.pages:
    print(page.extract_text())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions