pipeline:
maas:
enabled: false
ocr_api:
api_host: localhost
api_port: 11434
api_path: /api/generate # Use Ollama native endpoint
model: glm-ocr:latest # Required: specify model name
api_mode: ollama_generate # Required: use Ollama native format
enable_layout: true
# Layout detection settings (used when enable_layout=true)
layout:
# PP-DocLayoutV3 model directory
# Can be a local folder or a Hugging Face model id
# (Use *_safetensors for Transformers; PaddlePaddle/PP-DocLayoutV3 is a PaddleOCR export)
model_dir: PaddlePaddle/PP-DocLayoutV3_safetensors
# Detection threshold
threshold: 0.3
# threshold_by_class: # per-class threshold override
# 0: 0.5
# 1: 0.3
# text: 0.5
# table: 0.2
# Processing
# batch_size: max images per model forward pass (reduce to 1 if OOM)
batch_size: 1
workers: 1
cuda_visible_devices: "0"
# img_size: null # resize input (optional)
# Post-processing
layout_nms: true
layout_unclip_ratio:
- 1.0
- 1.0
# Merge mode for overlapping bboxes: "large" or "small"
# Can be a single value or per-class dict
layout_merge_bboxes_mode:
0: large # abstract
1: large # algorithm
2: large # aside_text
3: large # chart
4: large # content
5: large # display_formula
6: large # doc_title
7: large # figure_title
8: large # footer
9: large # footer
10: large # footnote
11: large # formula_number
12: large # header
13: large # header
14: large # image
15: large # inline_formula
16: large # number
17: large # paragraph_title
18: small # reference
19: large # reference_content
20: large # seal
21: large # table
22: large # text
23: large # vertical_text
24: large # vision_footnote
# Map detected labels to OCR task types
# - text/table/formula: OCR with corresponding prompt
# - skip: keep region but don't OCR (e.g., images)
# - abandon: discard region entirely
label_task_mapping:
text:
- abstract
- algorithm
- content
- doc_title
- figure_title
- paragraph_title
- reference_content
- text
- vertical_text
- vision_footnote
- seal
- formula_number
table:
- table
formula:
- display_formula
- inline_formula
skip:
- chart
- image
abandon:
- header
- footer
- number
- footnote
- aside_text
- reference
- footer_image
- header_image
# Map label index to label name
id2label:
0: abstract
1: algorithm
2: aside_text
3: chart
4: content
5: display_formula
6: doc_title
7: figure_title
8: footer
9: footer_image
10: footnote
11: formula_number
12: header
13: header_image
14: image
15: inline_formula
16: number
17: paragraph_title
18: reference
19: reference_content
20: seal
21: table
22: text
23: vertical_text
24: vision_footnote
Exception in thread Thread-3 (layout_detection_thread):
Traceback (most recent call last):
File "/home/username/glm-ocr/glmocr/pipeline/pipeline.py", line 357, in layout_detection_thread
self._stream_process_layout_batch(
File "/home/username/glm-ocr/glmocr/pipeline/pipeline.py", line 636, in _stream_process_layout_batch
region_queue.put(
File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/queue.py", line 134, in put
if self.maxsize > 0:
^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
File "/home/username/glm-ocr/glmocr/pipeline/pipeline.py", line 392, in layout_detection_thread
state.region_queue.put(("error", None, None))
File "/home/username/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/queue.py", line 134, in put
if self.maxsize > 0:
^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'
System Info / 系統信息
WSL Ubuntu 24.04.1 LTS
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
test.pdfinto root directoryconfig.yamlwith the following content (layout is copypasted from config.yaml example):Expected behavior / 期待表现
Normal behavior (successful
glmocr parseexecution)