fix: Resolve schema drift and SQL injection in Phase 1 SOTA services#26
fix: Resolve schema drift and SQL injection in Phase 1 SOTA services#26chitcommit wants to merge 3 commits into
Conversation
Remove unresolved conflict markers and Mac-local paths (/Users/nb/...) that don't apply to this environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove phantom D1, KV, R2, and Durable Object bindings that were never provisioned. Add proper Workers module entry at worker/index.ts replacing the Pages Functions stub. Match compatibility_date to deployed (2024-11-01). Secrets (NEON_DATABASE_URL, CHITTYCONNECT_SERVICE_TOKEN) and chittytrack tail consumer are preserved from the live worker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… 1 services Two P0 blockers fixed before wiring Phase 1 SOTA routes: 1. Schema drift: shared/schema.ts declared embedding columns as varchar but migrations/001_add_pgvector.sql creates them as vector(n). Added a customType helper that maps TypeScript number[] to pgvector's vector(n) DDL and handles serialization. No DDL change needed. 2. SQL injection: hybridSearchService.ts semanticSearch() interpolated caseId, entryType, dateFrom, dateTo into raw SQL via string concat + sql.raw(). Rewrote to use drizzle's parameterized sql template with column references and sql.join(). All sql.raw() calls removed. Also updated embeddingService.ts to pass number[] directly instead of manually stringifying — the customType's toDriver handles conversion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR transitions the service architecture from Cloudflare Pages Functions with database-backed API endpoints to a Cloudflare Workers model serving lightweight JSON responses. It simplifies API routing (removing CRUD, export, and proxy logic), refactors embedding storage from string literals to native PostgreSQL vector types with custom Drizzle serialization, replaces string-interpolated SQL with parameterized queries, and updates the deployment configuration accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
chittychronicle | 13c158b | Commit Preview URL Branch Preview URL |
Apr 13 2026, 10:31 AM |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13c158bce0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { pattern = "chronicle.chitty.cc/*", zone_name = "chitty.cc" } | ||
| ] |
There was a problem hiding this comment.
Route non-manifest traffic to the app backend
This config now routes chronicle.chitty.cc/* to worker/index.ts, but the worker only serves health/manifest paths and returns 404 for everything else (worker/index.ts fallback at lines 68-71). In production, that means normal application URLs and API routes like /api/cases and /api/timeline/* are no longer reachable through the domain and will fail with 404s.
Useful? React with 👍 / 👎.
| health: '/api/health', | ||
| cases: '/api/cases', | ||
| timeline: '/api/timeline/entries', | ||
| search: '/api/timeline/search', | ||
| contradictions: '/api/timeline/analysis/contradictions', |
There was a problem hiding this comment.
Stop publishing unavailable API routes in manifest
The service manifest advertises /api/cases, /api/timeline/entries, and /api/timeline/search, but this worker has no handlers for those paths and falls through to a 404 response. Discovery clients that trust the manifest will attempt calls that are guaranteed to fail, causing broken integrations and false-positive service registration.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
wrangler.toml (1)
15-17:⚠️ Potential issue | 🔴 CriticalDo not mount this Worker on the whole hostname unless it proxies the app.
Workers routes execute the script for every matching request, so
chronicle.chitty.cc/*andpreview.chronicle.chitty.cc/*put this new health/manifest Worker in front of the entire site. Sinceworker/index.tsonly handles health/discovery paths and returns 404 for everything else,/api/cases,/api/timeline/*, and the rest of the app will never reach the origin. Narrow the route patterns to the endpoints this Worker actually owns, or add explicit pass-through/proxying for all other traffic.🛠️ Example route narrowing
[env.production] name = "chittychronicle" routes = [ - { pattern = "chronicle.chitty.cc/*", zone_name = "chitty.cc" } + { pattern = "chronicle.chitty.cc/health*", zone_name = "chitty.cc" }, + { pattern = "chronicle.chitty.cc/api/health*", zone_name = "chitty.cc" }, + { pattern = "chronicle.chitty.cc/.well-known/*", zone_name = "chitty.cc" } ] [env.preview] name = "chittychronicle-preview" routes = [ - { pattern = "preview.chronicle.chitty.cc/*", zone_name = "chitty.cc" } + { pattern = "preview.chronicle.chitty.cc/health*", zone_name = "chitty.cc" }, + { pattern = "preview.chronicle.chitty.cc/api/health*", zone_name = "chitty.cc" }, + { pattern = "preview.chronicle.chitty.cc/.well-known/*", zone_name = "chitty.cc" } ]Also applies to: 33-35
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wrangler.toml` around lines 15 - 17, The routes currently mount the Worker for the entire hostname (pattern "chronicle.chitty.cc/*"), which blocks the app because worker/index.ts only serves health/discovery and returns 404 for other paths; update wrangler.toml to restrict the route patterns to only the exact health/manifest endpoints this Worker owns (e.g., /health, /manifest, any discovery paths) or implement explicit pass-through/proxy logic inside worker/index.ts for all other requests so the origin receives non-owned paths; locate the route list in wrangler.toml and the request-handling logic in worker/index.ts to apply the change.
🧹 Nitpick comments (2)
worker/index.ts (1)
36-66: Use one shared source for discovery manifests.
server/routes.ts:45-70andserver/routes.ts:86-97already serve file-backed discovery payloads, andfunctions/api/[[path]].ts:39-47now defines another manifest shape. Keeping separate inline JSON here will drift quickly. Please move the service and MCP manifests into a shared module/JSON asset so every runtime publishes the same contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@worker/index.ts` around lines 36 - 66, The inline manifest objects returned in worker/index.ts (the two Response.json blocks for '/.well-known/chronicle-manifest.json' and '/.well-known/mcp-manifest.json') must be moved to a single shared asset and referenced from there; extract both payloads into a shared module or JSON file (e.g., export SERVICE_MANIFEST and MCP_MANIFEST from a new manifests module) and replace the inline objects in worker/index.ts with imports of those constants, then update server/routes.ts and functions/api/[[path]].ts to import and reuse the same exported manifests so all runtimes publish the identical contract.server/hybridSearchService.ts (1)
147-151: Verify LIKE pattern parameterization handles special characters.While Drizzle's
like()function parameterizes the pattern value, user input containing%or_metacharacters will affect matching behavior. For example, a query of100%would match any string containing100followed by anything.Consider escaping LIKE metacharacters if strict substring matching is required:
const escapedQuery = query.replace(/[%_]/g, '\\$&'); like(timelineEntries.description, `%${escapedQuery}%`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/hybridSearchService.ts` around lines 147 - 151, The LIKE pattern currently uses the raw user input `query` in calls to like(timelineEntries.description, `%${query}%`) and like(timelineEntries.detailedNotes, `%${query}%`), so `%` and `_` in `query` act as wildcards; escape those metacharacters before building the pattern (e.g., create an `escapedQuery` by replacing /[%_]/g with an escaped version) and use `%${escapedQuery}%` in the like() calls so the search performs strict substring matching, ensuring you preserve parameterization and the DB escape semantics expected by your driver.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@functions/api/`[[path]].ts:
- Around line 39-50: The manifest currently advertises endpoints (/api/cases,
/api/timeline/*) that this handler always 404s; update the Response.json
manifest block to only list the routes this function actually serves (e.g.,
'manifest' and 'health' and the base '') by editing the endpoints array in the
branch that returns Response.json when path === 'manifest' || path === '' (the
Response.json call that builds the manifest), or alternatively implement
proxying logic for paths like 'cases' and 'timeline' before the final 404—modify
the code around the path variable check and the Response.json manifest return so
the advertised endpoints match actual behavior.
In `@server/embeddingService.ts`:
- Around line 320-328: The excerptEmbedding vector dimension is mismatched:
generateEmbedding(source.excerpt) uses the default EMBEDDING_CONFIG.dimensions
(1536) but timelineSources.excerptEmbedding is defined as vector(768); update
the schema to vector(1536) so it matches EMBEDDING_CONFIG.dimensions (or
alternatively pass a 768-dim model into generateEmbedding), and update any DB
migration and schema declaration in shared/schema.ts plus any related constants
(EMBEDDING_CONFIG.dimensions) to keep them consistent with timelineSources and
the embedding-producing model.
---
Outside diff comments:
In `@wrangler.toml`:
- Around line 15-17: The routes currently mount the Worker for the entire
hostname (pattern "chronicle.chitty.cc/*"), which blocks the app because
worker/index.ts only serves health/discovery and returns 404 for other paths;
update wrangler.toml to restrict the route patterns to only the exact
health/manifest endpoints this Worker owns (e.g., /health, /manifest, any
discovery paths) or implement explicit pass-through/proxy logic inside
worker/index.ts for all other requests so the origin receives non-owned paths;
locate the route list in wrangler.toml and the request-handling logic in
worker/index.ts to apply the change.
---
Nitpick comments:
In `@server/hybridSearchService.ts`:
- Around line 147-151: The LIKE pattern currently uses the raw user input
`query` in calls to like(timelineEntries.description, `%${query}%`) and
like(timelineEntries.detailedNotes, `%${query}%`), so `%` and `_` in `query` act
as wildcards; escape those metacharacters before building the pattern (e.g.,
create an `escapedQuery` by replacing /[%_]/g with an escaped version) and use
`%${escapedQuery}%` in the like() calls so the search performs strict substring
matching, ensuring you preserve parameterization and the DB escape semantics
expected by your driver.
In `@worker/index.ts`:
- Around line 36-66: The inline manifest objects returned in worker/index.ts
(the two Response.json blocks for '/.well-known/chronicle-manifest.json' and
'/.well-known/mcp-manifest.json') must be moved to a single shared asset and
referenced from there; extract both payloads into a shared module or JSON file
(e.g., export SERVICE_MANIFEST and MCP_MANIFEST from a new manifests module) and
replace the inline objects in worker/index.ts with imports of those constants,
then update server/routes.ts and functions/api/[[path]].ts to import and reuse
the same exported manifests so all runtimes publish the identical contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f445e3ca-a377-4bb7-b2f5-087e618dfd0a
📒 Files selected for processing (7)
.claude/settings.local.jsonfunctions/api/[[path]].tsserver/embeddingService.tsserver/hybridSearchService.tsshared/schema.tsworker/index.tswrangler.toml
💤 Files with no reviewable changes (1)
- .claude/settings.local.json
| if (path === 'manifest' || path === '') { | ||
| return Response.json({ | ||
| service: 'chittychronicle', | ||
| version: '1.0.0', | ||
| description: 'Legal timeline management and evidentiary tracking', | ||
| endpoints: ['/api/health', '/api/cases', '/api/timeline/entries', '/api/timeline/search'], | ||
| ecosystem: 'chittyos', | ||
| tier: 5, | ||
| }); | ||
| } | ||
|
|
||
| // Not found | ||
| return new Response(JSON.stringify({ error: 'Endpoint not found' }), { | ||
| status: 404, | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| }); | ||
| return Response.json({ error: 'Endpoint not found', path: `/api/${path}` }, { status: 404 }); |
There was a problem hiding this comment.
Manifest advertises routes this handler immediately 404s.
The only non-404 paths here are health, manifest, and empty, but the manifest still lists /api/cases and /api/timeline/*. Line 50 guarantees those discovered endpoints fail. Either proxy those paths through to the main app, or trim the manifest to only the surface this function actually serves.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@functions/api/`[[path]].ts around lines 39 - 50, The manifest currently
advertises endpoints (/api/cases, /api/timeline/*) that this handler always
404s; update the Response.json manifest block to only list the routes this
function actually serves (e.g., 'manifest' and 'health' and the base '') by
editing the endpoints array in the branch that returns Response.json when path
=== 'manifest' || path === '' (the Response.json call that builds the manifest),
or alternatively implement proxying logic for paths like 'cases' and 'timeline'
before the final 404—modify the code around the path variable check and the
Response.json manifest return so the advertised endpoints match actual behavior.
| const result = await generateEmbedding(source.excerpt); | ||
| const vectorString = `[${result.embedding.join(",")}]`; | ||
|
|
||
| await db | ||
| .update(timelineSources) | ||
| .set({ | ||
| excerptEmbedding: vectorString, | ||
| excerptEmbedding: result.embedding, | ||
| embeddingModel: result.model, | ||
| embeddingGeneratedAt: new Date(), | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the migration file for excerptEmbedding dimension declaration
cat migrations/001_add_pgvector.sql | grep -A2 excerpt_embedding
# Check if there's any dimension configuration for excerpt embeddings
rg -n "excerpt.*768|768.*excerpt" --type tsRepository: chittyos/chittychronicle
Length of output: 713
🏁 Script executed:
cd server && grep -n "function generateEmbedding\|const generateEmbedding\|export.*generateEmbedding" embeddingService.ts | head -20Repository: chittyos/chittychronicle
Length of output: 111
🏁 Script executed:
rg -n "generateEmbedding" server/ --type ts -A 10 -B 2 | head -100Repository: chittyos/chittychronicle
Length of output: 4615
🏁 Script executed:
rg -n "text-embedding-3" server/ --type tsRepository: chittyos/chittychronicle
Length of output: 357
🏁 Script executed:
sed -n '58,120p' server/embeddingService.tsRepository: chittyos/chittychronicle
Length of output: 1683
🏁 Script executed:
rg -n "EMBEDDING|embedding.*model" server/embeddingService.ts -B2 -A2 | head -50Repository: chittyos/chittychronicle
Length of output: 1392
🏁 Script executed:
rg -n "768|1536" server/ --type tsRepository: chittyos/chittychronicle
Length of output: 350
Fix dimension mismatch: excerptEmbedding schema expects 768 dimensions but default embedding model produces 1536.
The excerptEmbedding column is declared as vector(768) in the schema (line 153 of shared/schema.ts), but the default model text-embedding-3-small produces 1536-dimensional vectors. The EMBEDDING_CONFIG.dimensions defaults to 1536, and the code at line 320 calls generateEmbedding() with no model parameter, using the default. This will cause pgvector to reject inserts with a dimension mismatch error.
Either:
- Change the schema to use
vector(1536)forexcerptEmbedding(recommended, as it aligns withEMBEDDING_CONFIG.dimensions) - Or configure a different embedding model that produces 768 dimensions (currently unavailable in the codebase)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/embeddingService.ts` around lines 320 - 328, The excerptEmbedding
vector dimension is mismatched: generateEmbedding(source.excerpt) uses the
default EMBEDDING_CONFIG.dimensions (1536) but timelineSources.excerptEmbedding
is defined as vector(768); update the schema to vector(1536) so it matches
EMBEDDING_CONFIG.dimensions (or alternatively pass a 768-dim model into
generateEmbedding), and update any DB migration and schema declaration in
shared/schema.ts plus any related constants (EMBEDDING_CONFIG.dimensions) to
keep them consistent with timelineSources and the embedding-producing model.
Summary
Schema drift fix: Embedding columns (
description_embedding,content_embedding,excerpt_embedding) were declared asvarcharinshared/schema.tsbut the migration (migrations/001_add_pgvector.sql) creates them asvector(768)/vector(1536). Added acustomTypehelper that correctly maps TypeScriptnumber[]to pgvector'svector(n)DDL with proper serialization. No DDL change — the database columns are already correct.SQL injection fix (P0):
semanticSearch()inhybridSearchService.ts:197-225interpolatedcaseId,entryType,dateFrom,dateTofrom HTTP request params directly into raw SQL via string concatenation +sql.raw(). An attacker could injectcaseId = "' OR '1'='1"to bypass case isolation and access all timeline entries across all cases — a multi-tenant data breach in a legal evidence system. Rewrote to use drizzle's parameterizedsqltemplate with column references andsql.join(). Allsql.raw()calls removed.Embedding service update:
embeddingService.tsnow passesnumber[]directly to drizzle instead of manually stringifying to[1,2,3]— thecustomType'stoDriverhandles serialization.Context
These are P0 blockers that must land before Phase 1 SOTA routes (hybrid search, RAG Q&A) are wired into
routes.ts. The services exist on main but are not yet exposed over HTTP — fixing them now prevents shipping a reachable SQL injection.Part of the broader Neon Auth unification initiative (see chittyfoundation/chittyauth#4 and chittyos/chittyconnect#152 for the charter amendments).
Test plan
npm run checkpasses (TypeScript compilation with newnumber[]types)npm run db:pushproduces no DDL diff (schema now matches migration)semanticSearch()with adversarialcaseIdvalue confirms parameterizationembedTimelineEntry()correctly writes vector to DB via customType🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Chores
Refactor