Pass merge levels to remote OCR#2050
Conversation
Greptile SummaryThis PR fixes a parity gap between local and remote OCR paths by populating the
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/ocr/shared.py | Adds _MERGE_LEVEL_BY_LABEL dict and _merge_level_for_ocr_label helper; wires per-image merge_levels into remote NIM calls for both full-image and page-element OCR paths; replaces hard-coded local-jobs dict with a comprehension over the mapping. |
| nemo_retriever/src/nemo_retriever/chart/shared.py | Adds merge_levels=["word"] * len(flat_crop_b64s) to the batched remote OCR call in graphic_elements_ocr_page_elements, matching the local word-level chart-title OCR path. |
| nemo_retriever/tests/test_video_frame_ocr_actor.py | Extends existing test to assert merge_levels is forwarded; adds new test verifying that a non-default merge_level on the actor is correctly broadcast to the remote batch call. |
| nemo_retriever/tests/test_table_structure.py | Adds three new tests: validates _merge_level_for_ocr_label raises on unknown labels; confirms local_jobs follows the mapping under monkeypatching; confirms the remote path sends per-modality merge levels for a mixed-detection page. |
| nemo_retriever/tests/test_chart_graphic_elements.py | Adds test asserting that graphic_elements_ocr_page_elements sends merge_levels=["word"] when routing through remote OCR. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Page detections] --> B{use_remote?}
B -- Yes --> C[_crop_all_from_page as_b64=True]
C --> D[crop_meta: label + bbox per crop]
D --> E["merge_levels = [_merge_level_for_ocr_label(label) for label in crop_meta]"]
E --> F[invoke_image_inference_batches\nimage_b64_list + merge_levels]
F --> G[NIM OCR endpoint]
B -- No --> H[_crop_all_from_page as ndarray]
H --> I["{ml: [] for ml in _MERGE_LEVEL_BY_LABEL.values()}"]
I --> J["_merge_level_for_ocr_label(label) per crop"]
J --> K[model.invoke merge_level=ml per group]
subgraph _MERGE_LEVEL_BY_LABEL
L["table → word"]
M["chart → paragraph"]
N["infographic → paragraph"]
O["text → paragraph"]
P["title → paragraph"]
Q["header_footer → paragraph"]
end
E --> _MERGE_LEVEL_BY_LABEL
J --> _MERGE_LEVEL_BY_LABEL
Reviews (4): Last reviewed commit: "Linting" | Re-trigger Greptile
18f2477 to
e8404ba
Compare
e8404ba to
6d80fd9
Compare
Description
Remote HTTP OCR now passes explicit
merge_levelsinto NIM image-inference requests so it matches the local OCR path:wordfor table crops andparagraphfor charts, infographics, and text/title/header-footer crops.word, matching its local OCR behavior.Root cause:
NIMClientalready supported per-imagemerge_levels, but several graph paths built remote OCR requests without populating that field, so the endpoint default was used instead of modality-specific local behavior.Validation:
uv run --extra dev pytest tests/test_video_frame_ocr_actor.py tests/test_table_structure.py tests/test_chart_graphic_elements.pygit diff --checkChecklist