Skip to content

docling_serve: support the async-job endpoint for long-running conversions #3345

@Hansehart

Description

@Hansehart

Is your feature request related to a problem? Please describe.

DoclingServeConverter.run() and run_async() both POST to docling-serve's synchronous endpoints (/v1/convert/file, /v1/convert/source) and block until the server returns the conversion result. This pattern hits two real limits for non-trivial documents:

  1. Server-side cap. docling-serve enforces a 120s ceiling on synchronous requests via the max_sync_wait setting (docling-serve/docling_serve/settings.py#L146). Can be raised via DOCLING_SERVE_MAX_SYNC_WAIT, however its still sync.

  2. Network-path timeouts. A long-held HTTP request must survive every hop (reverse proxies, load balancers, NAT keepalive windows, corporate firewalls) where defaults vary.

For PDFs with VLM picture description, OCR-heavy scans, or large multi-page documents, both limits get hit routinely in production.

Describe the solution you'd like

Add opt-in support for docling-serve's async-job endpoint triplet:

  • POST /v1/convert/file/async → returns task_id immediately
  • GET /v1/status/poll/{task_id}?wait=N → server-side long-poll, returns on status transition
  • GET /v1/result/{task_id} → fetch result after terminal status

Concretely, add a constructor parameter:

DoclingServeConverter(
    base_url="...",
    mode: Literal["sync", "async"] = "sync",  # opt-in, default unchanged
    poll_interval: float = 2.0,
    job_timeout: float = 600.0,
)

When mode="async", run()/run_async() submit, long-poll /v1/status/poll/{task_id}?wait=... until task_status is success or failure, then GET /v1/result/{task_id}. Errors are surfaced explicitly: HTTP errors, task failures (task_status == "failure" with error_message), per-document failures (status in {"failure", "skipped"}), and job timeout.

The async-job pattern is what docling's own reference Python client uses (docling/service_client/client.py) for exactly this case. So the proposal mirrors upstream convention.

Describe alternatives you've considered

  • Bump DOCLING_SERVE_MAX_SYNC_WAIT server-side. Solves the docling-serve cap but leaves every intermediate proxy timeout in place. Brittle and ops-config-dependent; doesn't scale beyond a single known network path.
  • Subclass DoclingServeConverter downstream. Currently doable (the _post_file* helpers are subclass-friendly), but every consumer reinvents the same ~100 lines of polling logic. Better to land it once upstream.

Additional context

  • Backwards compatible: existing users see no change with default mode="sync".

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions