docling_serve: support the async-job endpoint for long-running conversions

**Is your feature request related to a problem? Please describe.**

`DoclingServeConverter.run()` and `run_async()` both POST to docling-serve's synchronous endpoints (`/v1/convert/file`, `/v1/convert/source`) and block until the server returns the conversion result. This pattern hits two real limits for non-trivial documents:

1. **Server-side cap.** docling-serve enforces a 120s ceiling on synchronous requests via the `max_sync_wait` setting ([`docling-serve/docling_serve/settings.py#L146`](https://github.com/docling-project/docling-serve/blob/main/docling_serve/settings.py#L146)). Can be raised via `DOCLING_SERVE_MAX_SYNC_WAIT`, however its still sync.

2. **Network-path timeouts.** A long-held HTTP request must survive every hop (reverse proxies, load balancers, NAT keepalive windows, corporate firewalls) where defaults vary.

For PDFs with VLM picture description, OCR-heavy scans, or large multi-page documents, both limits get hit routinely in production.

**Describe the solution you'd like**

Add opt-in support for docling-serve's async-job endpoint triplet:

- `POST /v1/convert/file/async` → returns `task_id` immediately
- `GET /v1/status/poll/{task_id}?wait=N` → server-side long-poll, returns on status transition
- `GET /v1/result/{task_id}` → fetch result after terminal status

Concretely, add a constructor parameter:

```python
DoclingServeConverter(
    base_url="...",
    mode: Literal["sync", "async"] = "sync",  # opt-in, default unchanged
    poll_interval: float = 2.0,
    job_timeout: float = 600.0,
)
```

When `mode="async"`, `run()`/`run_async()` submit, long-poll `/v1/status/poll/{task_id}?wait=...` until `task_status` is `success` or `failure`, then GET `/v1/result/{task_id}`. Errors are surfaced explicitly: HTTP errors, task failures (`task_status == "failure"` with `error_message`), per-document failures (`status in {"failure", "skipped"}`), and job timeout.

The async-job pattern is what docling's own reference Python client uses ([`docling/service_client/client.py`](https://github.com/docling-project/docling/blob/main/docling/service_client/client.py)) for exactly this case. So the proposal mirrors upstream convention.

**Describe alternatives you've considered**

- **Bump `DOCLING_SERVE_MAX_SYNC_WAIT` server-side.** Solves the docling-serve cap but leaves every intermediate proxy timeout in place. Brittle and ops-config-dependent; doesn't scale beyond a single known network path.
- **Subclass `DoclingServeConverter` downstream.** Currently doable (the `_post_file*` helpers are subclass-friendly), but every consumer reinvents the same ~100 lines of polling logic. Better to land it once upstream.

**Additional context**

- Backwards compatible: existing users see no change with default `mode="sync"`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docling_serve: support the async-job endpoint for long-running conversions #3345

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

docling_serve: support the async-job endpoint for long-running conversions #3345

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions