Skip to content

AttributeError: 'NoneType' object has no attribute 'encode' with load_file #390

@umaplehurst

Description

@umaplehurst

Bug Report

Since v0.12.0 I seem to get this sort of backtrace when loading certain .pdf files:

  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\py_pdf_parser\loaders.py", line 41, in load_file
    return load(in_file, pdf_file_path=path_to_file, la_params=la_params, **kwargs)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\py_pdf_parser\loaders.py", line 75, in load
    for page in extract_pages(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\high_level.py", line 197, in extract_pages
    for page in PDFPage.get_pages(
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfpage.py", line 151, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 744, in __init__
    self._initialize_password(password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 771, in _initialize_password
    handler = factory(docid, param, password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 358, in __init__
    self.init()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 366, in init
    self.init_key()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 379, in init_key
    self.key = self.authenticate(self.password)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\pdfminer\pdfdocument.py", line 428, in authenticate
    password_bytes = password.encode("latin1")
AttributeError: 'NoneType' object has no attribute 'encode'

Not sure why it only happens with certain files -- has to hit if "Encrypt" in trailer: in pdfdocument.py of pdfminer.six which only happens with certain files? -- but < v0.12.0 is fine. The problem seems to be with: password: str = None that was added in py_pdf_parser/loaders.py for load(...) as part of 02f92ce. I guess this needs to be changed to password: str = "" to match what pdfminer.six has as its default (see pdfpage.py, get_pages) and then everything should be fine again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions