Skip to content

parsing with layout detection fails if region_maxsize is not specified in config #144

@keefir

Description

@keefir

System Info / 系統信息

WSL Ubuntu 24.04.1 LTS

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

  1. Put test.pdf into root directory
  2. Create config.yaml with the following content (layout is copypasted from config.yaml example):
pipeline:
  maas:
    enabled: false
  
  ocr_api:
    api_host: localhost
    api_port: 11434
    api_path: /api/generate  # Use Ollama native endpoint
    model: glm-ocr:latest    # Required: specify model name
    api_mode: ollama_generate  # Required: use Ollama native format
  
  enable_layout: true
  # Layout detection settings (used when enable_layout=true)
  layout:
    # PP-DocLayoutV3 model directory
    # Can be a local folder or a Hugging Face model id
    # (Use *_safetensors for Transformers; PaddlePaddle/PP-DocLayoutV3 is a PaddleOCR export)
    model_dir: PaddlePaddle/PP-DocLayoutV3_safetensors

    # Detection threshold
    threshold: 0.3
    # threshold_by_class:           # per-class threshold override
    #   0: 0.5
    #   1: 0.3
    #   text: 0.5
    #   table: 0.2

    # Processing
    # batch_size: max images per model forward pass (reduce to 1 if OOM)
    batch_size: 1
    workers: 1
    cuda_visible_devices: "0"
    # img_size: null                # resize input (optional)

    # Post-processing
    layout_nms: true
    layout_unclip_ratio:
      - 1.0
      - 1.0

    # Merge mode for overlapping bboxes: "large" or "small"
    # Can be a single value or per-class dict
    layout_merge_bboxes_mode:
      0: large # abstract
      1: large # algorithm
      2: large # aside_text
      3: large # chart
      4: large # content
      5: large # display_formula
      6: large # doc_title
      7: large # figure_title
      8: large # footer
      9: large # footer
      10: large # footnote
      11: large # formula_number
      12: large # header
      13: large # header
      14: large # image
      15: large # inline_formula
      16: large # number
      17: large # paragraph_title
      18: small # reference
      19: large # reference_content
      20: large # seal
      21: large # table
      22: large # text
      23: large # vertical_text
      24: large # vision_footnote

    # Map detected labels to OCR task types
    # - text/table/formula: OCR with corresponding prompt
    # - skip: keep region but don't OCR (e.g., images)
    # - abandon: discard region entirely
    label_task_mapping:
      text:
        - abstract
        - algorithm
        - content
        - doc_title
        - figure_title
        - paragraph_title
        - reference_content
        - text
        - vertical_text
        - vision_footnote
        - seal
        - formula_number
      table:
        - table
      formula:
        - display_formula
        - inline_formula
      skip:
        - chart
        - image
      abandon:
        - header
        - footer
        - number
        - footnote
        - aside_text
        - reference
        - footer_image
        - header_image

    # Map label index to label name
    id2label:
      0: abstract
      1: algorithm
      2: aside_text
      3: chart
      4: content
      5: display_formula
      6: doc_title
      7: figure_title
      8: footer
      9: footer_image
      10: footnote
      11: formula_number
      12: header
      13: header_image
      14: image
      15: inline_formula
      16: number
      17: paragraph_title
      18: reference
      19: reference_content
      20: seal
      21: table
      22: text
      23: vertical_text
      24: vision_footnote
  1. Run glmocr (self-hosted version):
 glmocr parse test.pdf --config config.yaml
  1. Get the following error:
Exception in thread Thread-3 (layout_detection_thread):
Traceback (most recent call last):
  File "/home/username/glm-ocr/glmocr/pipeline/pipeline.py", line 357, in layout_detection_thread
    self._stream_process_layout_batch(
  File "/home/username/glm-ocr/glmocr/pipeline/pipeline.py", line 636, in _stream_process_layout_batch
    region_queue.put(
  File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/queue.py", line 134, in put
    if self.maxsize > 0:
       ^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/home/username/glm-ocr/glmocr/pipeline/pipeline.py", line 392, in layout_detection_thread
    state.region_queue.put(("error", None, None))
  File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/queue.py", line 134, in put
    if self.maxsize > 0:
       ^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'

Expected behavior / 期待表现

Normal behavior (successful glmocr parse execution)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions