Releases: docling-project/docling-graph
Releases · docling-project/docling-graph
v1.5.0
Features
Installation Flexibility
- Add an optional
vlminstallation mode so users can install VLM-related dependencies only when needed (46f1ebb) docling-project.github
LiteLLM Client
- Add response streaming support to the LiteLLM client path for incremental output handling (
311c28e) github
Visualizer Export
- Embed Cytoscape directly in generated HTML visualizations so exports remain self-contained and usable offline (
00c3f27) js.cytoscape
Chores
Dependencies
- Refresh runtime and dev dependencies, including updates to
aiohttp,litellm,pillow,pygments,requests,tornado,rich, andpytest(216026a,a78f9a1,f4dd5cb,3361b85,bb35933,92e68cd,a8d0eeb,94ff305) - Refresh GitHub Actions versions in CI workflows, including major-version updates for checkout, artifact handling, Codecov, and GitHub App token generation (
7b48a58,117fe7c)
Test Coverage
- Expand test coverage for untested branches and edge cases in core paths (
39d9d5e)
v1.4.4
Staged Extraction
Bug Fixes
- Canonicalize IDs, improve parent lookup and node discovery (
a551d38)
Refactoring
- Support nested paths with parent-aware dedup, and shared fill reuse (
f210be9) - Apply ruff formatting to staged backend ops test (
b83a20c)
Tests
- Add coverage for backend_ops and orchestrator quality gates (
eba5445) - Cover list-under-list dedup, merge, and fill reuse semantics (
34e59ce)
Documentation
- Document many-to-many behavior for nested list paths (
876c943)
v1.4.3
Chores
Continuous Integration
- Ensure GitHub release includes notes and assets in Release workflow (
485b875) - Reset working tree before rebase in semantic‑release push step (
e31c29b) - Preserve release tag across rebase in semantic‑release (
84b4e51)
GitHub Pages
- Refresh install steps and GitHub Pages content (
07e1136)
v1.4.0
Features
Core Entity Consistency
- Add entity name normalizer and sentence-aware description merge helper to improve identity stability and merged descriptions (
8e5858a)
Local Inference
- Support LM Studio as a local inference provider via LiteLLM (
lm_studio/<model>,LM_STUDIO_API_BASE) (d317459)
Delta Extraction
- Persist orphan
parent_idson salvage and reattach orphans by parent-id match when multiple parent candidates exist (ead9096)
Bug Fixes
Delta Extraction Quality Gates
- Allow disabling
parent_lookup_missgate with-1and relax adaptive thresholding for large graphs (d25567d)
Extraction Logging
- Add extraction-phase progress logs (contract-prefixed + “Calling LLM…”) and move raw extracted payloads to trace/debug only (
dd32f66)
Schema Validation
- Strengthen schema validation/guidance/dedup patterns and add a domain-agnostic string coercion fallback in the LLM backend (
61e7520)
Documentation
Schema-Definition Guides
- Update schema-definition docs with best practices for descriptive IDs, enum synonyms, validators, where-to-look hints, and deduplicating root-level lists under chunked extraction (
7db205d)
Repository References
- Update documentation to the new repository URL (
f14fcd9)
Tests
Coverage
v1.3.1
Features
Metadata Export
- Export full effective config (incl defaults) and preserve staged benchmark compatibility (
6877b38)
Delta Extraction
- Backfill root ids, normalize paths, and repair scalar id fields before validation (
6877b38) - Infer missing list parents and backfill ids to improve attachment quality (
6877b38)
Chores
Dependencies
- Bump pillow from 11.3.0 to 12.1.1 (
9faa614)
v1.3.0
Features
Delta Extraction (new)
- Add opt-in delta contract with flat graph IR batching, global merge/dedup, and template projection controls (
0b19e08) - Update unit + integration coverage for delta batching/merge/projection and contract routing (
8150def)
LLM Extraction Pipeline
- Harden llm pipeline w/ contract dispatch, staged extraction, deterministic merge & observability (
92a5089) - Improve catalog definition, flatten ID discovery & add validation retries (
a1aba89)
Structured Output (default ON)
- Enable default schema-enforced structured output via LiteLLM with prompt-schema fallback (
6e96f54)
Custom LLM Endpoints
- Support custom OpenAI-compatible endpoints via env-based auth and init scaffolding (
0bebc44)
Refactoring
Input ingestion
- Unify ingestion via Docling conversion with DoclingDocument passthrough (
689426b)
Trace & debug
- Revamp debug trace_data into a chronological event log (
4ba4b5b) - Improve stage naming and split serializer into helpers (
0378f65)
Documentation
GitHub Pages & traces
- Refresh pages with updated output handling and debug artifacts (
07e0cbc)
Delta extraction
- Document delta extraction contract (flat graph IR), config/CLI flags, and migration notes (
66aa6be)
Staged extraction
- Update staged extraction docs, schema definition and performance tuning guides (
bfabcbb)
Bug Fixes
Delta Extraction Quality
- Prevent spurious list-entity nodes by adding identity allowlists and post-merge filtering (
f45f790) - Improve entity ID quality, limiting index-based ID inference, and enabling content-based dedup (
4767e26)
Continuous Integration
- Remove unused mypy ignores for rapidfuzz and spacy imports (
9fa8f75)
v1.2.4
Refactoring
LLM Clients (Gateway)
- Consolidate all local + remote LLM providers behind a single LiteLLM gateway, simplifying provider routing and making LiteLLM the canonical inference entrypoint (
94cad99)
Trace & Debug
- Unify TraceData and debug flow: one
debugswitch consistently enables in-memoryTraceDataon the pipeline context and preserves existing on-disk debug artifacts (d25ef06)
Documentation
Custom LLM Clients
- Document and exemplify “bring your own” LLM client via
PipelineConfig.llm_client/LLMClientProtocol, enabling custom inference URLs and auth/headers while reusing docling-graph prompts + schema parsing (87e730a)
Chores
Compatibility
- Use typing_extensions.Self in Pydantic templates for Python 3.10 compatibility (
912e16c)
Dependencies
v1.2.3
v1.2.2
Bug Fixes
Graph Converters
- Preserve component (non-entity) data during graph pruning to avoid dropping nested structures (addresses, totals, etc.) (
8552ea5) - Tighten error logging and improve node-id collision detection for more reliable graph builds (
bafab02) - Auto-clean empty output directories on pipeline failure when
dump_to_diskis enabled (preserves partial results) (9e4c031) - Add a
User-Agentheader for URL downloads (HEAD + GET) to avoid HTTP 403 responses; add regression tests (77dbd02)
Extractors & Visualization
- Make Pydantic schemas validators lenient (coerce instead of reject) and log coercions for data-quality tracking (
fb1bb37) - Render nested node/edge details as formatted JSON (fixes
"[object Object]"in the interactive viewer) (014778a)
Refactoring
BillingDocument Template
- Add a comprehensive billing/invoice Pydantic extraction template (
21a2200) - Simplify BillingDocument schema + prompts to improve extraction consistency and reduce unnecessary nesting (
81fdbc9)
Documentation
- Update the problem statement to reflect recent docling-graph improvements and capabilities (
872a3a3) - Refine wording and improve styling (
e7b7f0a) - Update examples and navigation to align with BillingDocument schema references (
c254825)
Chores
Dependencies
- Update
aiofilesrequirement (afc6d52)
Continuous Integration
v1.2.1
Documentation
GitHub Pages
- Improve doc accuracy by aligning API references and updating version strings (
ed3317d) - Reduce redundancy and improve navigation; standardize callouts by converting
NoteandImportantblocks to MkDocs admonitions (ed3317d)
Examples
- Add updated end-to-end example scripts
beginner→advanced, and update navigation to match the new learning path (f22395b) - Refresh CLI documentation references to point to the new examples set (
92d04c9)