Skip to content

Pass merge levels to remote OCR#2050

Open
charlesbluca wants to merge 3 commits into
NVIDIA:mainfrom
charlesbluca:codex/remote-ocr-merge-levels
Open

Pass merge levels to remote OCR#2050
charlesbluca wants to merge 3 commits into
NVIDIA:mainfrom
charlesbluca:codex/remote-ocr-merge-levels

Conversation

@charlesbluca
Copy link
Copy Markdown
Collaborator

@charlesbluca charlesbluca commented May 18, 2026

Description

Remote HTTP OCR now passes explicit merge_levels into NIM image-inference requests so it matches the local OCR path:

  • Full-image/video OCR repeats the actor's configured merge level for every valid image in the batch.
  • Page-element OCR sends word for table crops and paragraph for charts, infographics, and text/title/header-footer crops.
  • Graphic-elements chart OCR sends word, matching its local OCR behavior.

Root cause: NIMClient already supported per-image merge_levels, but several graph paths built remote OCR requests without populating that field, so the endpoint default was used instead of modality-specific local behavior.

Validation:

  • uv run --extra dev pytest tests/test_video_frame_ocr_actor.py tests/test_table_structure.py tests/test_chart_graphic_elements.py
  • git diff --check

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • Not applicable: no docker-compose.yaml environment variables changed.

@charlesbluca charlesbluca changed the title [codex] Pass merge levels to remote OCR Pass merge levels to remote OCR May 18, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR fixes a parity gap between local and remote OCR paths by populating the merge_levels field on every NIM HTTP request. Previously, NIMClient supported per-image merge levels but several call sites omitted the field, silently inheriting the endpoint default instead of the modality-specific level used by the local model path.

  • Introduces _MERGE_LEVEL_BY_LABEL and _merge_level_for_ocr_label in ocr/shared.py to provide a single source of truth for label→merge-level mapping; replaces the previous inline "word" if label == "table" else "paragraph" pattern and the hard-coded two-key local_jobs dict with comprehensions over the mapping.
  • Wires merge_levels into the remote call in ocr_page_elements (per-crop, per-modality), ocr_b64_to_text (actor-configured level broadcast to the whole batch), and graphic_elements_ocr_page_elements in chart/shared.py (always "word", matching local chart-title detection).
  • Adds focused unit and integration tests covering the new helper, the local-jobs dynamic keying, and end-to-end per-modality remote call arguments.

Confidence Score: 5/5

Safe to merge — the change is a targeted fix that adds a missing field to existing NIM HTTP calls, with no interface removals, no schema mutations, and comprehensive new tests covering local, remote, and actor paths.

All three remote call sites now pass correctly-sized merge_levels lists. The _merge_level_for_ocr_label helper raises on unknown labels rather than silently defaulting, which is an intentional improvement. The local-jobs dict is derived from the same mapping, so both paths stay in sync automatically. New tests assert the exact merge_levels argument for every affected call site. No existing behaviour is removed or reordered.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/ocr/shared.py Adds _MERGE_LEVEL_BY_LABEL dict and _merge_level_for_ocr_label helper; wires per-image merge_levels into remote NIM calls for both full-image and page-element OCR paths; replaces hard-coded local-jobs dict with a comprehension over the mapping.
nemo_retriever/src/nemo_retriever/chart/shared.py Adds merge_levels=["word"] * len(flat_crop_b64s) to the batched remote OCR call in graphic_elements_ocr_page_elements, matching the local word-level chart-title OCR path.
nemo_retriever/tests/test_video_frame_ocr_actor.py Extends existing test to assert merge_levels is forwarded; adds new test verifying that a non-default merge_level on the actor is correctly broadcast to the remote batch call.
nemo_retriever/tests/test_table_structure.py Adds three new tests: validates _merge_level_for_ocr_label raises on unknown labels; confirms local_jobs follows the mapping under monkeypatching; confirms the remote path sends per-modality merge levels for a mixed-detection page.
nemo_retriever/tests/test_chart_graphic_elements.py Adds test asserting that graphic_elements_ocr_page_elements sends merge_levels=["word"] when routing through remote OCR.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Page detections] --> B{use_remote?}
    B -- Yes --> C[_crop_all_from_page as_b64=True]
    C --> D[crop_meta: label + bbox per crop]
    D --> E["merge_levels = [_merge_level_for_ocr_label(label) for label in crop_meta]"]
    E --> F[invoke_image_inference_batches\nimage_b64_list + merge_levels]
    F --> G[NIM OCR endpoint]
    B -- No --> H[_crop_all_from_page as ndarray]
    H --> I["{ml: [] for ml in _MERGE_LEVEL_BY_LABEL.values()}"]
    I --> J["_merge_level_for_ocr_label(label) per crop"]
    J --> K[model.invoke merge_level=ml per group]
    subgraph _MERGE_LEVEL_BY_LABEL
        L["table → word"]
        M["chart → paragraph"]
        N["infographic → paragraph"]
        O["text → paragraph"]
        P["title → paragraph"]
        Q["header_footer → paragraph"]
    end
    E --> _MERGE_LEVEL_BY_LABEL
    J --> _MERGE_LEVEL_BY_LABEL
Loading

Reviews (4): Last reviewed commit: "Linting" | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/ocr/shared.py Outdated
@charlesbluca charlesbluca force-pushed the codex/remote-ocr-merge-levels branch from 18f2477 to e8404ba Compare May 18, 2026 14:25
@charlesbluca charlesbluca force-pushed the codex/remote-ocr-merge-levels branch from e8404ba to 6d80fd9 Compare May 18, 2026 14:40
@charlesbluca charlesbluca marked this pull request as ready for review May 18, 2026 14:49
@charlesbluca charlesbluca requested review from a team as code owners May 18, 2026 14:49
@charlesbluca charlesbluca requested a review from jdye64 May 18, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant