Skip to content

Releases: docling-project/docling-graph

v1.5.0

19 Apr 13:18

Choose a tag to compare

Features

Installation Flexibility
LiteLLM Client
  • Add response streaming support to the LiteLLM client path for incremental output handling (311c28e) github
Visualizer Export
  • Embed Cytoscape directly in generated HTML visualizations so exports remain self-contained and usable offline (00c3f27) js.cytoscape

Chores

Dependencies
  • Refresh runtime and dev dependencies, including updates to aiohttp, litellm, pillow, pygments, requests, tornado, rich, and pytest (216026a, a78f9a1, f4dd5cb, 3361b85, bb35933, 92e68cd, a8d0eeb, 94ff305)
  • Refresh GitHub Actions versions in CI workflows, including major-version updates for checkout, artifact handling, Codecov, and GitHub App token generation (7b48a58, 117fe7c)
Test Coverage
  • Expand test coverage for untested branches and edge cases in core paths (39d9d5e)

v1.4.4

18 Feb 15:47

Choose a tag to compare

Staged Extraction

Bug Fixes
  • Canonicalize IDs, improve parent lookup and node discovery (a551d38)
Refactoring
  • Support nested paths with parent-aware dedup, and shared fill reuse (f210be9)
  • Apply ruff formatting to staged backend ops test (b83a20c)
Tests
  • Add coverage for backend_ops and orchestrator quality gates (eba5445)
  • Cover list-under-list dedup, merge, and fill reuse semantics (34e59ce)
Documentation
  • Document many-to-many behavior for nested list paths (876c943)

v1.4.3

18 Feb 07:03

Choose a tag to compare

Chores

Continuous Integration
  • Ensure GitHub release includes notes and assets in Release workflow (485b875)
  • Reset working tree before rebase in semantic‑release push step (e31c29b)
  • Preserve release tag across rebase in semantic‑release (84b4e51)
GitHub Pages
  • Refresh install steps and GitHub Pages content (07e1136)

v1.4.0

18 Feb 01:31

Choose a tag to compare

Features

Core Entity Consistency
  • Add entity name normalizer and sentence-aware description merge helper to improve identity stability and merged descriptions (8e5858a)
Local Inference
  • Support LM Studio as a local inference provider via LiteLLM (lm_studio/<model>, LM_STUDIO_API_BASE) (d317459)
Delta Extraction
  • Persist orphan parent_ids on salvage and reattach orphans by parent-id match when multiple parent candidates exist (ead9096)

Bug Fixes

Delta Extraction Quality Gates
  • Allow disabling parent_lookup_miss gate with -1 and relax adaptive thresholding for large graphs (d25567d)
Extraction Logging
  • Add extraction-phase progress logs (contract-prefixed + “Calling LLM…”) and move raw extracted payloads to trace/debug only (dd32f66)
Schema Validation
  • Strengthen schema validation/guidance/dedup patterns and add a domain-agnostic string coercion fallback in the LLM backend (61e7520)

Documentation

Schema-Definition Guides
  • Update schema-definition docs with best practices for descriptive IDs, enum synonyms, validators, where-to-look hints, and deduplicating root-level lists under chunked extraction (7db205d)
Repository References
  • Update documentation to the new repository URL (f14fcd9)

Tests

Coverage
  • Add targeted unit tests to raise Codecov patch coverage for newly changed branches (c39e02c)
  • Further extend test coverage for remaining uncovered paths (5ec1562)

v1.3.1

16 Feb 09:09

Choose a tag to compare

Features

Metadata Export
  • Export full effective config (incl defaults) and preserve staged benchmark compatibility (6877b38)
Delta Extraction
  • Backfill root ids, normalize paths, and repair scalar id fields before validation (6877b38)
  • Infer missing list parents and backfill ids to improve attachment quality (6877b38)

Chores

Dependencies
  • Bump pillow from 11.3.0 to 12.1.1 (9faa614)

v1.3.0

15 Feb 21:05

Choose a tag to compare

Features

Delta Extraction (new)
  • Add opt-in delta contract with flat graph IR batching, global merge/dedup, and template projection controls (0b19e08)
  • Update unit + integration coverage for delta batching/merge/projection and contract routing (8150def)
LLM Extraction Pipeline
  • Harden llm pipeline w/ contract dispatch, staged extraction, deterministic merge & observability (92a5089)
  • Improve catalog definition, flatten ID discovery & add validation retries (a1aba89)
Structured Output (default ON)
  • Enable default schema-enforced structured output via LiteLLM with prompt-schema fallback (6e96f54)
Custom LLM Endpoints
  • Support custom OpenAI-compatible endpoints via env-based auth and init scaffolding (0bebc44)

Refactoring

Input ingestion
  • Unify ingestion via Docling conversion with DoclingDocument passthrough (689426b)
Trace & debug
  • Revamp debug trace_data into a chronological event log (4ba4b5b)
  • Improve stage naming and split serializer into helpers (0378f65)

Documentation

GitHub Pages & traces
  • Refresh pages with updated output handling and debug artifacts (07e0cbc)
Delta extraction
  • Document delta extraction contract (flat graph IR), config/CLI flags, and migration notes (66aa6be)
Staged extraction
  • Update staged extraction docs, schema definition and performance tuning guides (bfabcbb)

Bug Fixes

Delta Extraction Quality
  • Prevent spurious list-entity nodes by adding identity allowlists and post-merge filtering (f45f790)
  • Improve entity ID quality, limiting index-based ID inference, and enabling content-based dedup (4767e26)
Continuous Integration
  • Remove unused mypy ignores for rapidfuzz and spacy imports (9fa8f75)

v1.2.4

10 Feb 13:33

Choose a tag to compare

Refactoring

LLM Clients (Gateway)
  • Consolidate all local + remote LLM providers behind a single LiteLLM gateway, simplifying provider routing and making LiteLLM the canonical inference entrypoint (94cad99)
Trace & Debug
  • Unify TraceData and debug flow: one debug switch consistently enables in-memory TraceData on the pipeline context and preserves existing on-disk debug artifacts (d25ef06)

Documentation

Custom LLM Clients
  • Document and exemplify “bring your own” LLM client via PipelineConfig.llm_client / LLMClientProtocol, enabling custom inference URLs and auth/headers while reusing docling-graph prompts + schema parsing (87e730a)

Chores

Compatibility
  • Use typing_extensions.Self in Pydantic templates for Python 3.10 compatibility (912e16c)
Dependencies
  • Bump nbconvert from 7.16.6 to 7.17.0 (1485e5d)
  • Update types-setuptools requirement (6e4fc37)

v1.2.3

26 Jan 05:47

Choose a tag to compare

Refactoring

Templates
  • Improve slurry-battery rheology Pydantic schema (b737343)

Documentation

GitHub Pages
  • Update references and examples related to the rheology template (35a9243)
  • Fix broken references to the ScholarlyRheologyPaper template (a6d994b)

v1.2.2

26 Jan 03:19

Choose a tag to compare

Bug Fixes

Graph Converters
  • Preserve component (non-entity) data during graph pruning to avoid dropping nested structures (addresses, totals, etc.) (8552ea5)
  • Tighten error logging and improve node-id collision detection for more reliable graph builds (bafab02)
  • Auto-clean empty output directories on pipeline failure when dump_to_disk is enabled (preserves partial results) (9e4c031)
  • Add a User-Agent header for URL downloads (HEAD + GET) to avoid HTTP 403 responses; add regression tests (77dbd02)
Extractors & Visualization
  • Make Pydantic schemas validators lenient (coerce instead of reject) and log coercions for data-quality tracking (fb1bb37)
  • Render nested node/edge details as formatted JSON (fixes "[object Object]" in the interactive viewer) (014778a)

Refactoring

BillingDocument Template
  • Add a comprehensive billing/invoice Pydantic extraction template (21a2200)
  • Simplify BillingDocument schema + prompts to improve extraction consistency and reduce unnecessary nesting (81fdbc9)

Documentation

  • Update the problem statement to reflect recent docling-graph improvements and capabilities (872a3a3)
  • Refine wording and improve styling (e7b7f0a)
  • Update examples and navigation to align with BillingDocument schema references (c254825)

Chores

Dependencies
  • Update aiofiles requirement (afc6d52)
Continuous Integration
  • Restore default semantic-release templates and regenerate changelog (59b2f43)
  • Treat refactor: commits as patch bumps in semantic-release config (f3debd6)

v1.2.1

25 Jan 17:20

Choose a tag to compare

Documentation

GitHub Pages
  • Improve doc accuracy by aligning API references and updating version strings (ed3317d)
  • Reduce redundancy and improve navigation; standardize callouts by converting Note and Important blocks to MkDocs admonitions (ed3317d)
Examples
  • Add updated end-to-end example scripts beginneradvanced, and update navigation to match the new learning path (f22395b)
  • Refresh CLI documentation references to point to the new examples set (92d04c9)

Bug Fixes

Continuous Integration
  • Remove invalid Dependabot commit-message configuration (include property) to restore valid Dependabot updates config (58f879e)
  • Apply Ruff formatter across the codebase for consistent formatting and cleaner diffs going forward (d58d897)