Architecture

Updated: 2026-04-08

This app has two distinct layers:

semantic interpretation
deterministic PDF writing and release gating

That split is deliberate. Gemini is used where meaning is hard. The PDF writer stays deterministic.

The visible product model is also split cleanly:

release-ready output
manual remediation when trustworthiness is not high enough
optional advanced review only for human-legible visible output

Runtime flow

flowchart TD
    A["Input PDF"] --> B["Classify"]
    B --> C["OCR when needed"]
    C --> D["Structure extraction"]
    D --> E["Canonical document model"]
    E --> F["Semantic unit builder"]
    F --> G["Grounding evidence"]
    G --> G1["page image"]
    G --> G2["crop image"]
    G --> G3["native text"]
    G --> G4["OCR text"]
    G --> G5["nearby context"]
    F --> H["Direct Gemini structured outputs\nFiles API + context cache"]
    H --> I["Resolved semantic decisions"]
    I --> J["Pretag rationalization\n(widgets, figures, structure)"]
    J --> K["Deterministic tagger/remediator"]
    K --> L["veraPDF + fidelity gate"]
    L --> M{"Release-ready"}
    M -->|Yes| N["Release-ready PDF"]
    M -->|No| O["Manual remediation"]
    N --> P["Optional visible review surface"]

Session boundary

The product uses anonymous browser sessions instead of user accounts.

FastAPI middleware assigns an HTTP-only cookie to each browser
every job row is owned by a hash of that session token
list, detail, download, preview, SSE progress, and review routes are all scoped to the current browser session
jobs and their files are ephemeral and expire after JOB_TTL_HOURS, which defaults to 12

This keeps the app login-free while preventing one browser session from seeing another session's PDFs through the product API. It does not change the fact that semantic adjudication still uses the configured external LLM provider.

Semantic units

The semantic layer no longer treats text, tables, forms, figures, and TOC candidates as unrelated flows. It normalizes them into local regions with shared evidence:

page number
bounding box
kind candidate
native text candidate
OCR text candidate
image crop
nearby structure context
confidence and provenance

Current semantic-unit families:

suspicious text blocks
reading-order pages
tables
forms
figures
TOC groups

Gemini's role

Gemini is the primary semantic judge for hard units.

It decides things like:

what assistive tech should hear for a garbled block
which table rows are headers
what a form field should be labeled
whether a figure candidate is actually a figure, a table, or a form region
whether a page region is a TOC group

Gemini is not allowed to write PDF objects directly.

Deterministic layer

The deterministic layer is responsible for:

pretag rationalization of suspicious widgets and under-described visual figures
PDF/UA tag tree construction
/ActualText
form /TU
artifacts
bookmarks and TOC structure
font remediation
metadata
final validation and fidelity gating

Main implementation files:

Key services

Canonical model

Generic semantic adjudication

Specialized wrappers over the shared semantic engine

Shared LLM transport

Transport choices

The target transport is direct Gemini for PDF-understanding lanes.

The decision rule is Docling-first:

trust Docling-native title, language, hyperlink/widget metadata, and native TOC when present
escalate to Gemini only when the extracted document evidence is missing, weak, or semantically ambiguous
build Docling-derived ambiguity plans first so Gemini sees only unresolved units, not whole lanes

Important properties:

Gemini Files API / cached PDF context for reusable document slices
native response_json_schema structured output
candidate-ID adjudication for bookmark and navigation decisions
retry and timeout bounds
audit-grade token and cost tracking

The intended semantic transport is Gemini directly. Where the chat-completions compatibility endpoint is still used, it should point at Google rather than a proxy.

Release gate

A document is release-ready only when all three are true:

veraPDF says compliant
fidelity says faithful enough
the run ends complete, not manual_remediation

Optional visible review items do not block release. Hidden structural blockers still do.

Known limits

complex tables still require stronger extraction or manual remediation in some cases
visual WCAG issues such as contrast are not yet a first-class audit layer
math support is conservative formula tagging plus speakable formula text, not rich equation semantics
rich media remains partial
semantic adjudication still depends on good local page/crop evidence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Runtime flow

Session boundary

Semantic units

Gemini's role

Deterministic layer

Key services

Canonical model

Generic semantic adjudication

Specialized wrappers over the shared semantic engine

Shared LLM transport

Transport choices

Release gate

Known limits

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

Runtime flow

Session boundary

Semantic units

Gemini's role

Deterministic layer

Key services

Canonical model

Generic semantic adjudication

Specialized wrappers over the shared semantic engine

Shared LLM transport

Transport choices

Release gate

Known limits