Skip to content

fix: update DocIntel default and surface OCR failures#1642

Open
imadreamerboy wants to merge 1 commit intomicrosoft:mainfrom
imadreamerboy:fix-azure-api-endpoint
Open

fix: update DocIntel default and surface OCR failures#1642
imadreamerboy wants to merge 1 commit intomicrosoft:mainfrom
imadreamerboy:fix-azure-api-endpoint

Conversation

@imadreamerboy
Copy link
Copy Markdown

This PR fixes Azure Document Intelligence handling for image OCR in markitdown.

There were two separate problems:

  1. DocumentIntelligenceConverter still defaulted to api_version="2024-07-31-preview", which can fail on valid Azure resources with 404 Resource not found during begin_analyze_document(...).
  2. That failure could be masked by fallback behavior in MarkItDown._convert(): after the DocIntel converter failed, ImageConverter could return an empty DocumentConverterResult(markdown=""), and markitdown treated that as a successful conversion. The caller then saw result.text_content == "" instead of the real Azure error.

Changes

  • Updated the default Azure Document Intelligence API version from 2024-07-31-preview to 2024-11-30
  • Kept explicit docintel_api_version=... override behavior intact
  • Changed conversion flow so an empty fallback result does not count as success if an earlier converter already failed
  • Added regression tests for:
    • new default DocIntel API version
    • explicit API version override
    • empty image fallback no longer masking a prior converter failure

Why

This matches current Azure behavior more reliably for OCR/image analysis and fixes a misleading failure mode where real Azure/DocIntel errors were swallowed and surfaced as “no OCR text extracted”.

Validation

Tested with focused pytest coverage for DocIntel and fallback behavior:

  • test_docintel_default_api_version
  • test_docintel_explicit_api_version
  • test_empty_image_fallback_does_not_mask_prior_failure

These pass with the local package under test.

@imadreamerboy
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant