Skip to content

Parallelize OCR block processing and add optional CUDA preprocessing#8

Merged
zeetee1235 merged 1 commit intomainfrom
codex/improve-ocr-speed-with-gpu-acceleration
Mar 12, 2026
Merged

Parallelize OCR block processing and add optional CUDA preprocessing#8
zeetee1235 merged 1 commit intomainfrom
codex/improve-ocr-speed-with-gpu-acceleration

Conversation

@zeetee1235
Copy link
Owner

Motivation

  • Reduce wall-clock OCR latency without changing fusion or parser decision logic by running independent per-block Tesseract calls concurrently.
  • Provide an optional fast preprocessing path that can leverage OpenCV CUDA when available to accelerate grayscale/threshold operations.

Description

  • Added a configurable worker pool to ocr/bridge/ocr_bridge.py to process OCR blocks in parallel using ThreadPoolExecutor and DOCSTRUCT_OCR_WORKERS to control concurrency while preserving deterministic output ordering.
  • Introduced CUDA-aware preprocessing helpers _to_grayscale and _otsu_threshold that use OpenCV CUDA if DOCSTRUCT_OCR_USE_CUDA=1 is set and CUDA bindings are available, otherwise silently fall back to CPU implementations.
  • Kept the original multi-pass OCR candidate scoring, block classification, post-processing, and LaTeX (pix2tex) handling intact and re-applied them to the ordered results to preserve existing OCR logic.
  • Documented the new runtime environment variables in README.md (DOCSTRUCT_OCR_WORKERS, DOCSTRUCT_OCR_USE_CUDA, and existing DOCSTRUCT_BRIDGE/DOCSTRUCT_PYTHON).

Testing

  • Ran Python syntax check with python3 -m py_compile ocr/bridge/ocr_bridge.py, which completed successfully.
  • Ran Rust test suite with cargo test -q, and the tests completed successfully (all automated tests passed).

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df510233a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

for block in blocks:

def process_block(block: dict) -> dict | None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid Python 3.10-only union syntax in bridge

The new annotation -> dict | None in run_ocr uses PEP 604 syntax, which is only valid on Python 3.10+, so the bridge now fails to import with a SyntaxError on Python 3.9 runtimes that previously worked; this is a functional regression because the OCR process cannot start at all in those environments (README still documents Python 3.8+ support).

Useful? React with 👍 / 👎.

@zeetee1235 zeetee1235 merged commit 63749b5 into main Mar 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant