Is your feature request related to a problem? Please describe.
DoclingServeConverter.run() and run_async() both POST to docling-serve's synchronous endpoints (/v1/convert/file, /v1/convert/source) and block until the server returns the conversion result. This pattern hits two real limits for non-trivial documents:
-
Server-side cap. docling-serve enforces a 120s ceiling on synchronous requests via the max_sync_wait setting (docling-serve/docling_serve/settings.py#L146). Can be raised via DOCLING_SERVE_MAX_SYNC_WAIT, however its still sync.
-
Network-path timeouts. A long-held HTTP request must survive every hop (reverse proxies, load balancers, NAT keepalive windows, corporate firewalls) where defaults vary.
For PDFs with VLM picture description, OCR-heavy scans, or large multi-page documents, both limits get hit routinely in production.
Describe the solution you'd like
Add opt-in support for docling-serve's async-job endpoint triplet:
POST /v1/convert/file/async → returns task_id immediately
GET /v1/status/poll/{task_id}?wait=N → server-side long-poll, returns on status transition
GET /v1/result/{task_id} → fetch result after terminal status
Concretely, add a constructor parameter:
DoclingServeConverter(
base_url="...",
mode: Literal["sync", "async"] = "sync", # opt-in, default unchanged
poll_interval: float = 2.0,
job_timeout: float = 600.0,
)
When mode="async", run()/run_async() submit, long-poll /v1/status/poll/{task_id}?wait=... until task_status is success or failure, then GET /v1/result/{task_id}. Errors are surfaced explicitly: HTTP errors, task failures (task_status == "failure" with error_message), per-document failures (status in {"failure", "skipped"}), and job timeout.
The async-job pattern is what docling's own reference Python client uses (docling/service_client/client.py) for exactly this case. So the proposal mirrors upstream convention.
Describe alternatives you've considered
- Bump
DOCLING_SERVE_MAX_SYNC_WAIT server-side. Solves the docling-serve cap but leaves every intermediate proxy timeout in place. Brittle and ops-config-dependent; doesn't scale beyond a single known network path.
- Subclass
DoclingServeConverter downstream. Currently doable (the _post_file* helpers are subclass-friendly), but every consumer reinvents the same ~100 lines of polling logic. Better to land it once upstream.
Additional context
- Backwards compatible: existing users see no change with default
mode="sync".
Is your feature request related to a problem? Please describe.
DoclingServeConverter.run()andrun_async()both POST to docling-serve's synchronous endpoints (/v1/convert/file,/v1/convert/source) and block until the server returns the conversion result. This pattern hits two real limits for non-trivial documents:Server-side cap. docling-serve enforces a 120s ceiling on synchronous requests via the
max_sync_waitsetting (docling-serve/docling_serve/settings.py#L146). Can be raised viaDOCLING_SERVE_MAX_SYNC_WAIT, however its still sync.Network-path timeouts. A long-held HTTP request must survive every hop (reverse proxies, load balancers, NAT keepalive windows, corporate firewalls) where defaults vary.
For PDFs with VLM picture description, OCR-heavy scans, or large multi-page documents, both limits get hit routinely in production.
Describe the solution you'd like
Add opt-in support for docling-serve's async-job endpoint triplet:
POST /v1/convert/file/async→ returnstask_idimmediatelyGET /v1/status/poll/{task_id}?wait=N→ server-side long-poll, returns on status transitionGET /v1/result/{task_id}→ fetch result after terminal statusConcretely, add a constructor parameter:
When
mode="async",run()/run_async()submit, long-poll/v1/status/poll/{task_id}?wait=...untiltask_statusissuccessorfailure, then GET/v1/result/{task_id}. Errors are surfaced explicitly: HTTP errors, task failures (task_status == "failure"witherror_message), per-document failures (status in {"failure", "skipped"}), and job timeout.The async-job pattern is what docling's own reference Python client uses (
docling/service_client/client.py) for exactly this case. So the proposal mirrors upstream convention.Describe alternatives you've considered
DOCLING_SERVE_MAX_SYNC_WAITserver-side. Solves the docling-serve cap but leaves every intermediate proxy timeout in place. Brittle and ops-config-dependent; doesn't scale beyond a single known network path.DoclingServeConverterdownstream. Currently doable (the_post_file*helpers are subclass-friendly), but every consumer reinvents the same ~100 lines of polling logic. Better to land it once upstream.Additional context
mode="sync".