growgraph · alexander-belikov · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.env.example b/.env.example
@@ -28,6 +28,10 @@ LLM_PROVIDER=openai
 LLM_MODEL_NAME=gpt-4o-mini
 LLM_TEMPERATURE=0.0
 LLM_API_KEY=your_openai_api_key_here
+# LLM_CACHE_ENABLED=true
+# LLM_CACHE_READ_ONLY=false
+# LLM_MAX_INFLIGHT=16
+# MAX_CONCURRENT_PROCESSES=4
 
 # Ollama Configuration (Alternative)
 # LLM_PROVIDER=ollama

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,10 +8,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **Facts precision/recall/F1** on `POST /match/evaluate` (`fact_precision`, `fact_recall`, `fact_f1` and counts): relational triples only, excluding schema predicates and triples with ontological class/concept nodes in subject or object position.
 - **Anthropic (Claude) and Google (Gemini) LLM providers** via `LLM_PROVIDER=anthropic|google`, with `ClaudeModel` and `GeminiModel` config enums.
 - **Token usage reporting** in `BudgetTracker` when providers return `usage_metadata` on LLM responses (character counts remain the universal fallback).
+- **LLM disk cache controls** on `LLMConfig`: `LLM_CACHE_ENABLED` (default on), `LLM_CACHE_READ_ONLY`, and in-memory plus on-disk stats via `LLMTool.get_cache_stats()`; `GET /info` exposes `llm_cache`.
+- **Global LLM in-flight limit** (`LLM_MAX_INFLIGHT`, default 16) — shared semaphore caps concurrent provider requests across parallel unit workers.
+- **Optional process concurrency cap** (`MAX_CONCURRENT_PROCESSES`) — limits simultaneous `/process` and `/process_unit` handlers (additional requests wait for a slot).
+- **OpenAI Batch API helpers** (`ontocast.tool.llm_batch`) to export chat batch JSONL and import completed results into the LLM disk cache for offline benchmark pre-warming.
+- **`BudgetTracker.cache_hits`** — disk-cache hits count toward character totals but not `calls_count`; included in budget summaries when non-zero.
+- **Structured-document preprocessing** for heading-structured text (papers, reports): optional **Tag Sections** node detects academic-style headings, assigns **section-aligned labels** to semantic chunks via character-span overlap, and `target_sections` filters units before extraction.
+- **Optional chunk summarization** — `summarize_sections` and `summary_max_sentences` on `/process` and CLI (`--summarize-sections`, `--summary-max-sentences`) run a **Summarize Chunks** graph node; ontology/facts render and critic prompts use `ContentUnit.extraction_text` (summary when present, else full chunk text).
 
 ### Changed
+- **LLM caching path** — `complete`, `extract`, `__call__`, and `acall` share one `_invoke_cached` implementation with consistent cache keys (normalized prompt text), optional disable/read-only modes, and provider calls gated by the global in-flight semaphore.
 - **Facts extraction prompts** (`facts_guidelines.py`): clearer two-namespace contract — domain ontology is read-only schema plus optional **reference individuals**; all text-derived occurrences use `cd:` with `lowercase_snake_case` local names. New rules separate **classes** from **instances** (no PascalCase class IRIs in subject/object slots), forbid typing `cd:` entities as `rdfs:Class` / `rdf:Property`, and add a final structural validation checklist before output.
 
 ### Fixed
@@ -20,6 +29,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Documentation
 - User guide: facts two-namespace model (`concepts.md`), facts guidelines vs `facts_user_instruction` (`user_instructions.md`), entity alignment and evaluate semantics (`aggregation.md`, `api.md`, `workflow.md`).
+- User guide: LLM cache configuration, in-flight/process limits, batch pre-warming, and `/info` cache stats (`llm_caching.md`, `configuration.md`, `api.md`, `concepts.md`, `workflow.md`).
+- User guide: structured documents — section tagging, section-aligned chunk labels, `target_sections` / `summarize_sections` (`concepts.md`, `workflow.md`, `api.md`, `configuration.md`).
 
 ## [0.4.0] - 2026-05-26
 

diff --git a/docs/assets/graph.lr.png b/docs/assets/graph.lr.png
diff --git a/docs/assets/graph.lr.svg b/docs/assets/graph.lr.svg
diff --git a/docs/assets/graph.png b/docs/assets/graph.png
diff --git a/docs/assets/graph.preview.png b/docs/assets/graph.preview.png
diff --git a/docs/assets/graph.svg b/docs/assets/graph.svg
diff --git a/docs/index.md b/docs/index.md
@@ -28,7 +28,8 @@ OntoCast extracts semantic triples from documents using an agentic, ontology-dri
 - **Triple store integration** — Fuseki, Neo4j (n10s), or filesystem fallback
 - **Tenancy** — partition datasets/collections by tenant and project
 - **REST API** — document processing, ontology catalog management, graph matching
-- **Automatic LLM caching** — built-in response caching
+- **Automatic LLM caching** — disk cache with optional read-only mode, global in-flight limiting, and OpenAI Batch API pre-warming for benchmarks
+- **Structured documents** — optional section tagging, section-aligned chunk labels, section filtering, and LLM summarization before extraction
 
 ---
 
@@ -101,7 +102,7 @@ Document-level pipeline (regenerated via `uv run plot-graph`):
 
 Landscape variant: [graph.lr.png](assets/graph.lr.png). Per-unit render/critic loops are documented in [Workflow](user_guide/workflow.md#per-unit-atomic-loop).
 
-1. Convert → chunk document
+1. Convert → optional tag sections → chunk (semantic) → optional summarize chunks
 2. Parallel ontology render per unit → normalize → optional consolidate → validate
 3. Parallel facts render per unit → merge with disambiguation
 4. Serialize to triple store; return Turtle in API response

diff --git a/docs/user_guide/aggregation.md b/docs/user_guide/aggregation.md
@@ -44,7 +44,9 @@ For evaluation against ground truth, use the match endpoints (see [API Endpoints
 
 - Align entities across multiple graphs globally
 - Derive pairwise predicted↔GT mappings
-- Compute triple and entity precision/recall/F1
+- Compute triple, facts, and entity precision/recall/F1
+
+**Facts vs triple metrics:** triple-level scores count typing and taxonomy (`rdf:type`, `rdfs:subClassOf`, …). **Facts** scores measure only instance-to-instance relations (e.g. book → character via an ontology property), excluding schema predicates and triples that touch class/concept nodes in subject or object position. Relation property IRIs in predicate position still count toward facts.
 
 Entity match payloads accept IRI strings or `URIRef` values; evaluation normalizes to `URIRef` for projection. **Entity false positives/negatives** count unmatched entities in each graph (set difference), so a shared ontology vocabulary IRI matched once is not also counted as an extra false positive on the other side.
 

diff --git a/docs/user_guide/api.md b/docs/user_guide/api.md
@@ -10,15 +10,25 @@ Returns service health. Use for load balancers and readiness probes.
 
 ### `GET /info`
 
-Returns version, configuration summary, and active backend information.
+Returns service metadata, including:
+
+| Field | Description |
+|-------|-------------|
+| `version` | Package version |
+| `llm_cache` | When the LLM tool is initialized: in-memory hit/miss counters plus on-disk cache file stats (`cache_hits`, `cache_misses`, `disk`) |
+| `max_concurrent_processes` | Configured cap on simultaneous `/process` handlers, if `MAX_CONCURRENT_PROCESSES` is set |
+
+```bash
+curl http://localhost:8999/info
+```
 
 ---
 
 ## Document Processing
 
 ### `POST /process`
 
-Runs the full document pipeline: convert → chunk → ontology map/reduce → facts map/reduce → serialize.
+Runs the full document pipeline: convert → [tag sections] → chunk → [summarize chunks] → ontology map/reduce → facts map/reduce → serialize. Bracketed stages run only when structured-document parameters are set.
 
 **Content types:**
 
@@ -40,6 +50,18 @@ Runs the full document pipeline: convert → chunk → ontology map/reduce → f
 | `ontology_user_instruction` | Guide ontology extraction |
 | `ontology_selection_user_instruction` | Guide catalog ontology selection |
 | `facts_user_instruction` | Guide facts extraction |
+| `target_sections` | Comma-separated or JSON list; keep only these sections (enables section tagging) |
+| `summarize_sections` | Sections to summarize before extraction; omit to skip. `*` or empty = all chunks |
+| `summary_max_sentences` | Max sentences per summary when summarization runs (default `5`) |
+
+**CLI file processing** (`ontocast --input-path …`) accepts the same structured-document flags:
+
+```bash
+ontocast --input-path ./papers/ \
+  --target-sections results,methods \
+  --summarize-sections results \
+  --summary-max-sentences 5
+```
 
 **Examples:**
 
@@ -60,9 +82,18 @@ curl -X POST "http://localhost:8999/process?strip_provenance=true" \
 # Multi-tenant request
 curl -X POST "http://localhost:8999/process?tenant=acme&project=reports" \
   -F "file=@document.pdf"
+
+# Structured paper: keep Results/Methods, summarize Results only
+curl -X POST "http://localhost:8999/process?target_sections=results,methods&summarize_sections=results&summary_max_sentences=5" \
+  -F "file=@paper.pdf"
+
+# JSON body with section lists
+curl -X POST http://localhost:8999/process \
+  -H "Content-Type: application/json" \
+  -d '{"text": "# Introduction\n...\n## Results\n...", "target_sections": ["results"], "summarize_sections": ["*"], "summary_max_sentences": 5}'
 ```
 
-**Response:** JSON with `data.facts` (Turtle), `data.ontology_artifacts` (list of ontology TTL payloads), and `metadata` (status, chunk counts, budget).
+**Response:** JSON with `data.facts` (Turtle), `data.ontology_artifacts` (list of ontology TTL payloads), and `metadata` (status, chunk counts, budget including `cache_hits` when applicable).
 
 ---
 
@@ -154,9 +185,13 @@ Derive 1:1 predicted↔ground-truth entity matches for one graph pair from align
 
 ### `POST /match/evaluate`
 
-Compute triple and entity precision/recall/F1 given graphs and entity matches. Label triples (`rdfs:label`) are excluded from triple metrics.
+Compute triple, **facts**, and entity precision/recall/F1 given graphs and entity matches.
+
+- **Triple metrics** — all triples except `rdfs:label` (includes `rdf:type` and other schema assertions).
+- **Facts metrics** — relational assertions only: excludes schema predicates (`rdf:type`, `rdfs:subClassOf`, `rdfs:comment`) and any triple whose subject or object is an ontological (class/concept) URIRef. Ontology **relation** IRIs used only as predicates (e.g. `.../relations#P674`) are not treated as ontological entities.
+- **Entity metrics** — true positives = number of accepted entity matches; false positives = predicted entities not in the matched set; false negatives = ground-truth entities not in the matched set (set-based, so correctly matched shared vocabulary IRIs are not double-penalized).
 
-Entity metrics: true positives = number of accepted entity matches; false positives = predicted entities not in the matched set; false negatives = ground-truth entities not in the matched set (set-based, so correctly matched shared vocabulary IRIs are not double-penalized).
+Response fields: `precision` / `recall` / `f1` (triples), `fact_precision` / `fact_recall` / `fact_f1` (facts), `entity_precision` / `entity_recall` / `entity_f1` (entities), plus TP/FP/FN counts for each tier.
 
 **Standalone CLI:**
 
@@ -178,6 +213,9 @@ match-dirs \
 | `400` | Invalid parameters (e.g. missing fixed ontology id) |
 | `409` | Vector store unavailable when vector ontology mode requested |
 | `500` | Processing or store errors |
+| `503` | Health check: LLM not initialized or health probe failure |
+
+When `MAX_CONCURRENT_PROCESSES` is set, additional `/process` and `/process_unit` requests **wait** until a handler slot is free (they are not rejected with 503).
 
 Vector mode unavailable:
 
@@ -193,5 +231,7 @@ Vector mode unavailable:
 ## Related
 
 - [Configuration](configuration.md) — server and tool settings
+- [LLM Caching](llm_caching.md) — disk cache, in-flight limits, batch pre-warming
 - [User Instructions](user_instructions.md) — guiding extraction
 - [Workflow](workflow.md) — what happens inside `/process`
+- [Structured documents](concepts.md#structured-documents-optional) — section tagging and summarization
diff --git a/docs/user_guide/concepts.md b/docs/user_guide/concepts.md
@@ -31,6 +31,30 @@ OntoCast uses **pyoxigraph** for RDF 1.2 quoted-triple syntax and separates prov
 
 See [Workflow](workflow.md#4-ontology-reduce-document-level).
 
+## Structured documents (optional)
+
+For papers and other heading-structured Markdown text, `/process` and `ontocast --input-path` accept optional parameters. When both `target_sections` and `summarize_sections` are omitted, the pipeline stays `convert → chunk → extract` with no extra graph nodes.
+
+### Section tagging and section-aligned chunks
+
+1. **Tag Sections** (when `target_sections` or `summarize_sections` is set) scans converted text for academic-style headings (`introduction`, `methods`, `results`, `discussion`, `conclusion`, `future_work`, `limitations`, `related_work`, `background`, and numbered variants).
+2. **Chunk** still uses the semantic `ChunkerTool`; each content unit then gets a `section_label` by **maximum character-span overlap** with detected section ranges (section-aligned labeling, not a separate chunker mode).
+3. **`target_sections`** drops units whose label is not in the allowlist (case-insensitive).
+
+Recognized labels match normalized heading text (underscore form), e.g. `results`, `future_work`.
+
+### Optional summarization
+
+When `summarize_sections` is present (including empty or `*` for all units), the **Summarize Chunks** node runs an LLM pass per selected unit (bounded by `PARALLEL_WORKERS`). Summaries are stored on `ContentUnit.summary`; render and critic agents read `extraction_text`, which prefers the summary over the raw chunk.
+
+| Parameter | Default | Effect |
+|-----------|---------|--------|
+| `target_sections` | omitted | Enable tagging; keep only listed sections (e.g. `results,methods`) |
+| `summarize_sections` | omitted | Enable tagging + summarization node; omit to skip summaries. `*` or empty = all chunks |
+| `summary_max_sentences` | `5` | Max sentences per summary when summarization runs |
+
+Section lists accept comma-separated values or a JSON array in query, form, or JSON body fields.
+
 ## Parallel Map/Reduce
 
 Document processing uses a **parallel map/reduce** architecture:
@@ -96,11 +120,12 @@ Details: [Tenancy](tenancy.md).
 
 ## Budget Tracking
 
-- **LLM Statistics**: API calls, characters sent/received
+- **LLM Statistics**: API calls, characters sent/received; optional token counts when the provider reports usage metadata
+- **Cache hits**: Disk-cache hits increment `cache_hits` and character totals but **not** `calls_count` (no provider tokens)
 - **Triple Metrics**: Ontology and facts triples per operation
 - **Summary Reports**: Logged at end of processing:
   ```
-  LLM: X calls, Y sent, Z received | Triples: A ontology, B facts
+  LLM: X calls, Y sent, Z received, N cache hits | Triples: A ontology, B facts
   ```
 - **BudgetTracker** lives on `AgentState` and per-unit states; merged at reduce stages
 
@@ -114,7 +139,7 @@ Details: [Tenancy](tenancy.md).
 | `UnitOntologyState` / `UnitFactsState` | Per-unit loop state |
 | `ToolBox` | LLM, triple store, chunking, vector store, cache |
 | `GraphUpdate` | Structured SPARQL operations from the LLM |
-| `ContentUnit` | One chunk's ontology/facts outputs |
+| `ContentUnit` | One chunk's text, optional `section_label` / `summary`, and ontology/facts outputs (`extraction_text` for LLM prompts) |
 
 ## Next Steps
 

diff --git a/docs/user_guide/configuration.md b/docs/user_guide/configuration.md
@@ -51,6 +51,14 @@ LLM_BASE_URL=http://localhost:11434     # optional (ollama; anthropic proxy URL)
 
 OntoCast uses `LLM_API_KEY` for all cloud providers (not `ANTHROPIC_API_KEY` / `GOOGLE_API_KEY`).
 
+**Disk cache and provider concurrency** (see [LLM Caching](llm_caching.md)):
+
+```bash
+LLM_CACHE_ENABLED=true          # read/write disk cache (default true)
+LLM_CACHE_READ_ONLY=false       # use cache without writing new entries
+LLM_MAX_INFLIGHT=16             # max concurrent provider requests (all documents)
+```
+
 ```bash
 # Anthropic Claude
 LLM_PROVIDER=anthropic
@@ -79,6 +87,7 @@ PARALLEL_WORKERS=4
 PARALLEL_FACTS_RETRIES=3
 PARALLEL_ONTOLOGY_RETRIES=3
 ENABLE_ONTOLOGY_CONSOLIDATION=false
+# MAX_CONCURRENT_PROCESSES=4      # optional cap on simultaneous /process handlers
 ```
 
 ### Chunking
@@ -90,6 +99,27 @@ CHUNK_MIN_SIZE=3000
 CHUNK_MAX_SIZE=12000
 ```
 
+Semantic chunking is configured here. **Section-aligned labels** and filtering are not chunker settings: they run when `/process` or CLI file mode passes `target_sections` and/or `summarize_sections` (see [Structured documents](concepts.md#structured-documents-optional)).
+
+### Structured documents (per request)
+
+No environment variables. Pass on `POST /process`, multipart form, JSON body, or CLI batch mode:
+
+| Parameter | CLI flag | Description |
+|-----------|----------|-------------|
+| `target_sections` | `--target-sections` | Comma-separated or JSON list; enables tagging and keeps only these sections |
+| `summarize_sections` | `--summarize-sections` | Enables tagging + summarization; `*` or empty = all chunks |
+| `summary_max_sentences` | `--summary-max-sentences` | Max sentences per summary (default `5`) |
+
+```bash
+ontocast --input-path ./papers/ \
+  --target-sections results,methods \
+  --summarize-sections results \
+  --summary-max-sentences 5
+```
+
+Details: [API Endpoints](api.md#post-process), [Workflow](workflow.md#2-chunking-and-optional-structured-preprocessing).
+
 ### Triple Stores
 
 ```bash
@@ -247,6 +277,8 @@ Entity alignment and evaluation endpoints are documented in [API Endpoints](api.
 - `MAX_VISITS` is supported as an alias for `max_visits_per_node`.
 - `RECURSION_LIMIT` was renamed to `BASE_RECURSION_LIMIT`.
 - `WEB_SEARCH_ALLOWED_DOMAINS` and `WEB_SEARCH_BLOCKED_DOMAINS` accept comma-separated values.
+- `LLM_CACHE_ENABLED` and `LLM_CACHE_READ_ONLY` control disk cache read/write behavior.
+- `LLM_MAX_INFLIGHT` must be ≥ 1; `MAX_CONCURRENT_PROCESSES` must be ≥ 1 when set.
 
 ## Recommended Workflow