Release 1.0.0 by devdaviddr · Pull Request #1 · devdaviddr/agentic-rag-app

devdaviddr · 2026-04-26T08:15:39Z

No description provided.

Bootstrap a pnpm-workspace monorepo for a local-first agentic RAG app: - apps/web: Next.js 15 (App Router) with /api/chat (streaming agent) and /api/ingest endpoints, basic chat UI using ai/react useChat. - packages/shared: zod schemas for documents/chunks/retrieval + env loader. - packages/db: Drizzle schema with pgvector(HNSW, cosine) and pg_trgm GIN, postgres-js client singleton, migrate runner. - packages/rag: recursive chunker, Ollama embedder via ollama-ai-provider, transactional ingest, cosine ANN retrieval. - packages/agent: Vercel AI SDK streamText loop with search_kb / fetch_doc / list_sources tools and a tool-first system prompt. - Infra: docker-compose for pgvector/pg16 + Ollama, init SQL enabling vector + pg_trgm extensions, web Dockerfile (standalone output), ollama-pull.sh helper. - Tooling: TS project references, strict tsconfig, flat ESLint config, Prettier, EditorConfig, .env.example. - Docs: README quickstart, CONTRIBUTING with gitflow, spec/ design docs (overview, architecture, data model, agent design, API, roadmap), CHANGELOG. - CI: GitHub Actions workflow (lint + typecheck + build) on main/develop, PR + issue templates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The web service was hidden behind a Compose profile, so `docker compose up` brought up only postgres + ollama. There was also no automation for pulling Ollama models or applying Drizzle migrations, so even running the web container manually would 404 on first chat and crash on first query. Changes: - docker-compose.yml: drop the `app` profile from `web`. Add two one-shot services: `ollama-init` (pulls chat + embed models, idempotent) and `migrate` (runs Drizzle migrations against postgres). `web` now waits on both via `service_completed_successfully`. - apps/web/Dockerfile: add a `migrate` build target that reuses the deps layer and runs `tsx src/migrate.ts` from packages/db. `web` build target is now explicit (`target: runner`). - packages/db/package.json: add `dotenv` as a runtime dependency (migrate.ts and drizzle.config.ts both import `dotenv/config`). - README.md: rewrite quickstart to use a single `docker compose up --build` and document the local-dev variant for hot reload. Result: `docker compose up --build` brings up postgres → ollama → pulls models → applies migrations → starts the web app on :3000, in that order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous Dockerfile only copied /repo/node_modules from the deps stage into the builder and migrate stages. With pnpm workspaces the per-package node_modules (e.g. packages/shared/node_modules) contain symlinks into the hoisted store and must also be present, otherwise `tsc -p` in any workspace package fails with TS2307: Cannot find module 'zod'. Copy the entire /repo tree from deps before overlaying source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Workspace TS packages use NodeNext-style imports (`./env.js` resolving to `./env.ts`), which `tsc --moduleResolution=Bundler` accepts but Next's webpack does not. With `transpilePackages` consuming TS source directly, webpack failed every cross-file import with `Module not found`. Add a `resolve.extensionAlias` mapping `.js` -> `.ts/.tsx/.js` (and `.mjs` -> `.mts/.mjs`) so webpack follows the same convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`loadEnv()` ran at module import, which Next evaluates during the "Collecting page data" build phase. Build-time has no DATABASE_URL, so the build crashed before producing an image. Replace top-level singletons with lazy `getEnv()` / `getEmbedder()` helpers that initialize on first call inside the route handlers. Also mark /api/ingest as `force-dynamic` for symmetry with /api/chat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Next standalone build expects apps/web/public to exist; with no static assets the directory was never created and the runner stage failed at `COPY /repo/apps/web/public`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ming, unmask errors A multi-agent review found three converging causes of the chat endpoint returning the masked AI SDK error '3:"An error occurred."': 1. Drizzle migrations were never generated. `packages/db/drizzle/` did not exist, so the `migrate` Compose service exited with `Can't find meta/_journal.json file`, postgres had zero tables, and every tool call (search_kb / fetch_doc / list_sources) failed on a missing relation. Generated the initial migration (0000_chilly_union_jack.sql) and committed it. Also commit the root pnpm-lock.yaml so container builds are reproducible. 2. `ollama-ai-provider@1.2.0` + AI SDK v4 `streamText` + tools is broken upstream (vercel/ai#4700) — the model factory needs `simulateStreaming: true` so tool rounds are batched and chunked back to the data stream instead of failing silently. 3. Errors were masked into the data stream. Pass `getErrorMessage` to `toDataStreamResponse` and add an `onError` log inside `streamText` so future failures surface their real message instead of '3:"An error occurred."'. After this commit a fresh `docker compose up --build` should: postgres healthy → ollama-init pulls models → migrate creates schema → web boots and chat works end-to-end (after at least one document is ingested via /api/ingest). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ollama-ai-provider's default baseURL is http://localhost:11434/api — the trailing /api is mandatory because the provider concatenates routes like /chat and /embeddings onto it. Passing the bare host (http://ollama:11434) made every request 404, surfacing as `3:"Not Found"` in the chat stream. Add normalizeOllamaBaseURL() in @app/rag and use it from both the agent and the embedder. Accepts bare-host or already-suffixed values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rvice Confirmed working end-to-end against an Ollama server on the LAN (http://192.168.4.90:11434) hosting llama3.1:8b + nomic-embed-text: ingested a doc, asked a question, got a tool-grounded cited response. - docker-compose.yml: remove `ollama` and `ollama-init` services. Web no longer waits on ollama-init. Add OLLAMA_BASE_URL pass-through with the LAN host as the default. Also pass through RAG_TOP_K / RAG_CHUNK_SIZE / RAG_CHUNK_OVERLAP so /api tuning works without code changes. - .env / .env.example: switch OLLAMA_BASE_URL to the LAN URL and document the alternatives (localhost, host.docker.internal, LAN host). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings the v1.0.0 release branch to a fully working state: - docker compose up brings the whole stack to a usable state - Drizzle migrations generated and committed - AI SDK v4 + ollama-ai-provider workaround (simulateStreaming) - Ollama base URL normalization (/api suffix) - Errors unmasked through to the client - Compose now points at LAN Ollama (http://192.168.4.90:11434), in-stack Ollama service removed - pnpm-lock.yaml committed for reproducible container builds

Adds a sidebar where users can ingest documents into the knowledge base without leaving the chat UI. Backend: - packages/rag: add unpdf dep + extractPdfText() helper. Joins pages with blank lines so the existing chunker treats them as paragraph breaks. Throws on image-only / encrypted PDFs (callers map to 422). - packages/db: add listDocuments() and deleteDocument() helpers so the web layer doesn't import drizzle-orm directly. The web bundle already failed to build when /api/documents/[id] reached for `eq` from drizzle-orm — keep ORM use inside @app/db. - apps/web/api/sources GET: returns documents (id, source, title, chunk count, metadata, created_at) newest-first. - apps/web/api/ingest/upload POST: accepts multipart with a `file` field. Detects .txt / .md / .markdown / text/* as text; .pdf / application/pdf as PDF. 25 MB cap. 422 on extraction failure. - apps/web/api/documents/[id] DELETE: cascade-deletes via the helper, 204 on success / 404 on miss. Frontend: - New SourcesPanel component (left sidebar): file picker, paste-text form, list of ingested documents with chunk counts and a delete button. Auto-refreshes after each mutation. - Home page restructured to a two-column grid (sidebar | chat). Verified end-to-end against the LAN Ollama: ingested .txt / .md / .pdf files, asked questions, agent retrieved + cited each, delete removed the row from the listing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…onents - Introduced PostCSS configuration for Tailwind CSS. - Implemented API routes for fetching document chunks and original content. - Created a document page to display document details and viewer. - Added a documents overview page with a sources panel. - Developed an app shell layout with a sidebar and topbar for navigation. - Built a document viewer component to display original content and chunks. - Added icons for UI components. - Created a sidebar for navigation between chat and documents. - Implemented a topbar with branding and status indicators. - Updated database schema to include mime_type, bytes, and original_content fields for documents.

Plain `tsc -p tsconfig.json` on a composite project with `references` trusts the dependent project's `.tsbuildinfo` only when it was produced by build mode. In a fresh container `pnpm --filter @app/web... build` runs each package's `tsc -p` separately; rag then fails with TS6305 because it cannot verify that shared/db outputs are current. Switching every package's build script to `tsc -b` walks the project graph in build mode and refreshes tsbuildinfo files in dependency order, so the Docker builder stage no longer fails on first run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the full design for the upcoming feature: page-rasterized vision OCR via Ollama (default gemma3:4b), figure detection + crop + summary, inline asset:UUID markdown, settings page (DB-overlays-env), retrieval join for imageRefs, find_figure tool, and a three-layer guarantee that chat answers actually render the matching figure (system prompt rule + server-side post-processor + hard-fallback prepend + react-markdown urlTransform). Phased into A/B/C/D milestones with explicit exit criteria, including a gating Vitest test that asserts <img> mounts in the rendered DOM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ase A) Add the foundation for the vision-OCR pipeline without changing existing behavior. New schema: - documents.ingest_status / ingest_error / pages_total / pages_done / extraction_method to track the async ingest job lifecycle - chunks.image_ids (uuid[], GIN index) + chunks.page so retrieval can join inline figure references back to the source page - document_images table (page rasters + figure crops) with bytea bytes, vision-generated summary, and HNSW summary_embedding index for retrieve_figures() queries in a later phase - app_settings singleton (id=1) with optional override columns over the env defaults Migration 0002 (clammy_jack_flag) adds all of the above plus seeds the single app_settings row. Settings layer: - ResolvedSettings shape in @app/shared - getSettings() in apps/web with 30s in-process cache, invalidated on PATCH /api/settings - GET/PATCH /api/settings (env-overlaid) and GET /api/ollama/tags (proxies http://OLLAMA/api/tags) - /settings page: Ollama base URL + Test Connection (per-model ✓/✗ availability with copy `ollama pull` hint), chat/embed/vision model fields, temperature slider, advanced RAG knobs, reset-to-env per field - New OLLAMA_VISION_MODEL env default (gemma3:4b) - Sidebar gains Settings nav item; SettingsIcon, CheckIcon, AlertIcon added to the icon set Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the text-only PDF path with a vision-LLM pipeline that runs asynchronously after a 202 upload response. Ingest job (packages/rag/src/ingest-pdf.ts): - ensurePdfjs() lazy-configures unpdf to use the full pdfjs build (renderPageAsImage requires it) - per page (concurrency 2 via util/sema.ts): - rasterize at scale 2.0, resize longest edge to 1792px - vision call 1: ocrPage() returns clean GitHub-flavored markdown - vision call 2: detectFigures() returns JSON [{label, bbox, caption}] - per figure: cropByBbox(), vision call 3 summarizeFigure(), insert document_images row, splice ![label](asset:UUID) into the page md - persist kind='page' raster - chunkText over the combined markdown, embed, insert chunks with image_ids extracted from each chunk's content - batch-embed figure summaries into document_images.summary_embedding Asset endpoints: - /api/assets/[id] streams document_images.bytes with the right MIME type, immutable Cache-Control, ETag - /api/documents/[id]/status returns the live ingest_status, pages_done/total, error - POST /api/ingest/upload now returns 202 and runs runIngestJob via Next 15 after() Retrieval: - retrieve.ts joins document_images for any chunk image_ids to attach imageRefs[] to RetrievalResult - retrieve-figures.ts adds direct HNSW search against document_images.summary_embedding (powers the agent tool) Agent: - findFigure tool returns pre-baked assetMarkdown for top-K matches so small chat models can copy the markdown verbatim UI: - sources-panel.tsx polls /status for non-terminal docs (single shared 2s interval), shows status pills (queued/rasterizing/ocr/embedding/ ready/failed) and a thin progress bar driven by pages_done/total Deps: - bump unpdf to ^1.6 for renderPageAsImage; add @napi-rs/canvas for rasterize/crop; declare both as runtime deps of @app/web because Next's server externals resolve them from the apps/web cwd - next.config.mjs marks @napi-rs/canvas + unpdf as server externals so the native binary and dynamic pdfjs subpath import don't bundle Verified end-to-end against the user's local Ollama (vision model gemma4:e4b): a 1-page chart PDF transitions queued->ocr->ready in about 21s with one chunk and one figure crop, asset markdown lands inline, and /api/assets/<id> returns image/png bytes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three-layer enforcement that questions about a PDF figure return the figure inline in the chat reply. Markdown rendering (apps/web/src/components/markdown.tsx): - react-markdown@9 + remark-gfm for tables/lists/code - urlTransform overrides the default safelist to map asset:UUID to /api/assets/UUID after a strict UUID check; non-asset URIs delegate to react-markdown's defaultUrlTransform so javascript: and data: URIs are still dropped - custom <img> wrapped in <a target=_blank> with onError that swaps the element for a small text-danger fallback span when an asset URL 404s - chat.tsx renders the assistant message via <ChatMarkdown>; user messages stay plain pre-wrap Layer 1 - prompt (packages/agent/src/prompts.ts): - mandatory image-rendering rules referencing the findFigure tool - few-shot example showing the model the exact verbatim copy-the- markdown pattern Layer 2 - server-side post-processor (apps/web/src/lib/image-guarantee. ts): - visual-intent regex matches "show / view / figure / chart / diagram / screenshot / image / photo / picture" against the user's last message - gathers figure UUIDs from findFigure and search_kb tool results - if visual intent and the assistant text dropped any figures, append a "Related figure(s)" block (cap 3) Layer 3 - hard fallback: - if after the post-processor the text still has zero asset:UUID tokens, prepend the top-ranked figure unconditionally Wiring (apps/web/src/app/api/chat/route.ts): - accumulate tool calls via onStepFinish, capture user message, run applyImageGuarantee inside an experimental_transform that buffers text-delta parts and re-emits a single corrected delta on step-finish/finish/flush. simulateStreaming=true makes this lossless - log [chat] image_inject {reason, count} for telemetry - agent.ts now accepts onStepFinish and experimental_transform pass- throughs Document viewer: - new "Pages" tab fetches /api/documents/[id]/images and renders one card per page with a horizontal rail of figure crops underneath Render-verification gate (apps/web/src/__tests__): - chat-render.test.tsx mounts <ChatMarkdown> with an asset:UUID message, stubs /api/assets/* with a 1x1 PNG, asserts an <img> with the rewritten /api/assets src actually mounts in jsdom; also asserts javascript: URIs are dropped and invalid UUIDs render nothing - image-guarantee.test.ts covers no-op/related-block/hard-fallback/ cap-3/search_kb path vitest + jsdom + @testing-library/react + @vitejs/plugin-react added as devDeps with a test:web script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Hardening pass over the vision-OCR pipeline. Asset-token atomicity (packages/rag/src/chunker.ts): - mergeAssetTokens post-pass walks adjacent chunks and merges any pair that ends with a partial ![alt](asset:... open or starts with the closing-paren tail. Capped iterations prevent runaway merges. - 6 vitest cases cover: intact link survives, split at (asset: gets merged, split at the closing ) gets merged, two consecutive figures both survive Per-document mutex (apps/web/src/app/api/ingest/upload/route.ts): - compute a fingerprint over filename + size + first 1KB; track in-flight fingerprints in globalThis.__rag_inflight; second identical upload returns 409 duplicate_in_flight while the first job is still running. atomic check-and-add ordered before the DB insert so two parallel requests cannot both pass Failure-path coverage: - ingest-pdf detects "model not found" responses from Ollama, sets a user-facing ingest_error suggesting `ollama pull <model>`, and short-circuits to the legacy text-extraction path so the doc still ends 'ready' with extraction_method='text' rather than 'failed' - 3-in-a-row empty/malformed figure-detect responses on docs longer than 5 pages skip subsequent figure detection (still OCRs text) - raster.resizeLongestEdge caps input at 8192px; cropByBbox rejects bboxes smaller than 1% of the page - /api/assets/[id] returns 400 invalid_id on non-UUID params Settings overlay in chat (Phase D7): - /api/chat now reads chatModel/temperature/topK via getSettings() rather than env directly, so changing them in /settings actually takes effect without a restart Telemetry (apps/web/src/lib/telemetry.ts): - single-line JSON logger used by the ingest job, the upload route, and the chat post-processor's image_inject events Verification: - mutex: parallel curl confirms exactly one 202 and one 409 - cascade: deleting a document clears the rows in chunks + document_images via the FK; /api/assets/<deleted-id> 404s, invalid UUID 400s - @app/web vitest: 10/10, @app/rag vitest: 6/6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two fixes for the /settings page when an Ollama host is configured. 1. Tag-implicit match. Ollama returns model names with explicit tags (e.g. `nomic-embed-text:latest`); users typically store the bare name (`nomic-embed-text`). The availability check did an exact string compare and reported the model as missing even though it was pulled. Added `normalizeTag` so a tagless stored value matches `<name>:latest` in the listing. Used in both the per-model badge row and the new dropdown's "selected but not pulled" warning. 2. Model dropdowns. Replaced free-form text inputs for chat / embed / vision model with `<select>` elements populated from GET /api/ollama/tags. Tags are auto-fetched whenever the effective Base URL changes (debounced 350ms), so the dropdown reflects the host the user is currently typing without requiring a save first. The route now accepts an optional ?baseUrl=<url> override so the form can probe a draft URL before persisting it. A Refresh button next to each dropdown forces a re-fetch. If the stored value is not in the list, it remains selectable with a "not pulled" hint and the `ollama pull <name>` command for copy-paste. Verified locally against http://192.168.4.90:11434 — 14 models returned, `nomic-embed-text` now reads as ✓ Embed (matched by tag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… node:22 unpdf@1.6 ships a bundled pdfjs that calls `Promise.try`, which only landed in Node 22. The compose container ran on `node:20-alpine`, so the very first vision-OCR ingest threw TypeError: Promise.try is not a function before any pdf parsing started. Two changes: - Bump the web Dockerfile from `node:20-alpine` to `node:22-alpine` for both the deps/builder/migrate stages and the runtime image. - Add `apps/web/instrumentation.ts` (Next 15 picks it up automatically) that installs a `Promise.try` polyfill when missing — keeps any host running Node 20/21 functional without a forced upgrade. - Bump root engines.node to >=22.0.0 to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Next 15 instrumentation.ts hook didn't apply in the standalone output running in the compose stack — the unhandledRejection "TypeError: Promise.try is not a function" still fired on the first PDF ingest after bumping the container to node:22. Move the polyfill into a tiny `packages/rag/src/polyfills.ts` and side-effect import it as the very first line of every module that loads unpdf (`pdf.ts`, `raster.ts`, `ingest-pdf.ts`). This guarantees the polyfill is installed before unpdf's bundled pdfjs is imported, regardless of the host runtime's Promise.try support. Container is still pinned to node:22-alpine, so this is also belt- and-suspenders for any downstream consumer pulling @app/rag on an older Node. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ge, image-render guarantee Ships the v1.0.0 feature set for the agentic RAG app: - Document upload UI (txt / md / pdf) with sources panel, viewer, and paste-text path - /settings page with DB-overlays-env resolver, Ollama base URL test, model dropdowns auto-populated from /api/ollama/tags (matches ":latest" tag implicitly), temperature, RAG knobs - Vision-OCR PDF pipeline: page rasterize -> Ollama vision OCR (default gemma3:4b) -> figure detect + crop + summary -> embedded chunks + document_images with HNSW summary_embedding index. Async via Next 15 after(); status polling on /api/documents/[id]/status - Three-layer chat image-render guarantee: system-prompt rule + few-shot example -> server-side post-processor that appends a "Related figure" block when visual intent is detected and tools returned figure UUIDs the assistant dropped -> hard-fallback prepend of the top figure if the text still has no asset token. Vitest gating test confirms an <img src="/api/assets/..."> mounts in the rendered DOM - Hardening: per-document upload mutex, asset-token atomicity post-pass in the chunker, vision-model-missing -> text-extraction fallback, oversize/tiny-bbox guards, asset 400 on bad UUIDs, JSON-line telemetry - Build: tsc -b in package scripts for hermetic Docker rebuilds; node:22-alpine container; Promise.try polyfill side-effect-imported in @app/rag for unpdf@1.6 pdfjs compatibility on V8 < 13 Verified end-to-end against a 4-page clinical flowchart PDF on a local Ollama (gemma4:e4b vision, nomic-embed-text embeddings): queued -> ready in 1m37s with 13 chunks, 8 images (4 pages + 4 figures); chat "show me the flow chart" returned the algorithm chart inline. See spec/1.0.0/pdfupload.md for the design contract.

The README was still describing v0.x text-only ingest; rewrite it around the v1.0.0 capabilities (vision-OCR PDF upload, image-aware chat, settings page, modern Tailwind UI) and link out to a new top-level ARCHITECTURE.md. ARCHITECTURE.md covers: - An at-a-glance ASCII diagram of every component (browser, Next routes, @app/agent + @app/rag + @app/db, Ollama, Postgres) - Component-by-component breakdown with the role of each module - Step-by-step ASCII data flow for PDF ingest (sync text path vs async vision path) and for chat with the three-layer image guarantee - Selected schema (documents, chunks with image_ids[], document_ images with summary_embedding HNSW, app_settings singleton) - Build/runtime notes (tsc -b project graph, node:22-alpine, Promise.try polyfill, Compose layout) - Telemetry sample lines - A "where to look next" pointer table README links to ARCHITECTURE.md from the masthead and from the documentation section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

devdaviddr and others added 23 commits April 26, 2026 11:55

devdaviddr merged commit 811ab98 into main Apr 26, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 1.0.0#1

Release 1.0.0#1
devdaviddr merged 23 commits into
mainfrom
release-1.0.0

devdaviddr commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devdaviddr commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant