Skip to content

fix: Resolve schema drift and SQL injection in Phase 1 SOTA services#26

Open
chitcommit wants to merge 3 commits into
mainfrom
fix/phase1-schema-drift-and-sql-injection
Open

fix: Resolve schema drift and SQL injection in Phase 1 SOTA services#26
chitcommit wants to merge 3 commits into
mainfrom
fix/phase1-schema-drift-and-sql-injection

Conversation

@chitcommit
Copy link
Copy Markdown
Contributor

@chitcommit chitcommit commented Apr 13, 2026

Summary

  • Schema drift fix: Embedding columns (description_embedding, content_embedding, excerpt_embedding) were declared as varchar in shared/schema.ts but the migration (migrations/001_add_pgvector.sql) creates them as vector(768) / vector(1536). Added a customType helper that correctly maps TypeScript number[] to pgvector's vector(n) DDL with proper serialization. No DDL change — the database columns are already correct.

  • SQL injection fix (P0): semanticSearch() in hybridSearchService.ts:197-225 interpolated caseId, entryType, dateFrom, dateTo from HTTP request params directly into raw SQL via string concatenation + sql.raw(). An attacker could inject caseId = "' OR '1'='1" to bypass case isolation and access all timeline entries across all cases — a multi-tenant data breach in a legal evidence system. Rewrote to use drizzle's parameterized sql template with column references and sql.join(). All sql.raw() calls removed.

  • Embedding service update: embeddingService.ts now passes number[] directly to drizzle instead of manually stringifying to [1,2,3] — the customType's toDriver handles serialization.

Context

These are P0 blockers that must land before Phase 1 SOTA routes (hybrid search, RAG Q&A) are wired into routes.ts. The services exist on main but are not yet exposed over HTTP — fixing them now prevents shipping a reachable SQL injection.

Part of the broader Neon Auth unification initiative (see chittyfoundation/chittyauth#4 and chittyos/chittyconnect#152 for the charter amendments).

Test plan

  • Verify npm run check passes (TypeScript compilation with new number[] types)
  • Verify npm run db:push produces no DDL diff (schema now matches migration)
  • Manual test: semanticSearch() with adversarial caseId value confirms parameterization
  • Manual test: embedTimelineEntry() correctly writes vector to DB via customType

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Chores

    • Cleaned up local configuration files by removing unused path entries and conflict markers.
  • Refactor

    • Migrated deployment platform from Cloudflare Pages to Cloudflare Workers for improved scalability.
    • Simplified API endpoints to focus on essential health checks and service metadata responses.
    • Updated database connectivity configuration to support Neon integration.
    • Enhanced vector embedding type handling and optimized database query performance.

chitcommit and others added 3 commits April 6, 2026 12:01
Remove unresolved conflict markers and Mac-local paths (/Users/nb/...)
that don't apply to this environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove phantom D1, KV, R2, and Durable Object bindings that were never
provisioned. Add proper Workers module entry at worker/index.ts replacing
the Pages Functions stub. Match compatibility_date to deployed (2024-11-01).
Secrets (NEON_DATABASE_URL, CHITTYCONNECT_SERVICE_TOKEN) and chittytrack
tail consumer are preserved from the live worker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… 1 services

Two P0 blockers fixed before wiring Phase 1 SOTA routes:

1. Schema drift: shared/schema.ts declared embedding columns as varchar
   but migrations/001_add_pgvector.sql creates them as vector(n). Added
   a customType helper that maps TypeScript number[] to pgvector's
   vector(n) DDL and handles serialization. No DDL change needed.

2. SQL injection: hybridSearchService.ts semanticSearch() interpolated
   caseId, entryType, dateFrom, dateTo into raw SQL via string concat
   + sql.raw(). Rewrote to use drizzle's parameterized sql template
   with column references and sql.join(). All sql.raw() calls removed.

Also updated embeddingService.ts to pass number[] directly instead of
manually stringifying — the customType's toDriver handles conversion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

This PR transitions the service architecture from Cloudflare Pages Functions with database-backed API endpoints to a Cloudflare Workers model serving lightweight JSON responses. It simplifies API routing (removing CRUD, export, and proxy logic), refactors embedding storage from string literals to native PostgreSQL vector types with custom Drizzle serialization, replaces string-interpolated SQL with parameterized queries, and updates the deployment configuration accordingly.

Changes

Cohort / File(s) Summary
Deployment & Configuration
wrangler.toml, worker/index.ts, .claude/settings.local.json
Migrated from Cloudflare Pages/Functions to Cloudflare Workers model (main = "worker/index.ts"). Updated environment variables (NODE_ENVENVIRONMENT), bumped compatibility_date to 2024-11-01, enabled smart placement. Removed D1/KV/R2 bindings; added tail_consumers binding. Cleanup of local machine-specific allowlist entries and merge conflict markers.
API Handler Refactoring
functions/api/[[path]].ts
Replaced multi-endpoint Pages Functions router (handling cases, timeline, chittypm, export with database queries and caching) with simplified Workers handler. Removed all helper functions for DB querying, KV caching, R2 usage, and CSV generation. Added lightweight health and manifest endpoints returning static JSON; all other paths return 404.
Vector Storage & Schema Migration
shared/schema.ts, server/embeddingService.ts, server/hybridSearchService.ts
Introduced custom Drizzle vector type mapping PostgreSQL vector(dimensions) with fromDriver/toDriver conversions. Updated embedding columns to use native vector types (768 and 1536 dimensions). Replaced string-based vector construction in embedding service with direct number[] assignment. Parameterized SQL WHERE clauses in search service using sql fragments and sql.join(...) instead of string interpolation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 From Pages we hop to Workers so bright,
Embeddings now dance in vectors of light,
SQL guards with parameters in place,
Simpler endpoints at a much faster pace! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly addresses the two main objectives: schema drift resolution and SQL injection fix in Phase 1 services, clearly summarizing the primary changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/phase1-schema-drift-and-sql-injection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
chittychronicle 13c158b Commit Preview URL

Branch Preview URL
Apr 13 2026, 10:31 AM

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13c158bce0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread wrangler.toml
Comment on lines 16 to 17
{ pattern = "chronicle.chitty.cc/*", zone_name = "chitty.cc" }
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route non-manifest traffic to the app backend

This config now routes chronicle.chitty.cc/* to worker/index.ts, but the worker only serves health/manifest paths and returns 404 for everything else (worker/index.ts fallback at lines 68-71). In production, that means normal application URLs and API routes like /api/cases and /api/timeline/* are no longer reachable through the domain and will fail with 404s.

Useful? React with 👍 / 👎.

Comment thread worker/index.ts
Comment on lines +45 to +49
health: '/api/health',
cases: '/api/cases',
timeline: '/api/timeline/entries',
search: '/api/timeline/search',
contradictions: '/api/timeline/analysis/contradictions',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop publishing unavailable API routes in manifest

The service manifest advertises /api/cases, /api/timeline/entries, and /api/timeline/search, but this worker has no handlers for those paths and falls through to a 404 response. Discovery clients that trust the manifest will attempt calls that are guaranteed to fail, causing broken integrations and false-positive service registration.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
wrangler.toml (1)

15-17: ⚠️ Potential issue | 🔴 Critical

Do not mount this Worker on the whole hostname unless it proxies the app.

Workers routes execute the script for every matching request, so chronicle.chitty.cc/* and preview.chronicle.chitty.cc/* put this new health/manifest Worker in front of the entire site. Since worker/index.ts only handles health/discovery paths and returns 404 for everything else, /api/cases, /api/timeline/*, and the rest of the app will never reach the origin. Narrow the route patterns to the endpoints this Worker actually owns, or add explicit pass-through/proxying for all other traffic.

🛠️ Example route narrowing
 [env.production]
 name = "chittychronicle"
 routes = [
-  { pattern = "chronicle.chitty.cc/*", zone_name = "chitty.cc" }
+  { pattern = "chronicle.chitty.cc/health*", zone_name = "chitty.cc" },
+  { pattern = "chronicle.chitty.cc/api/health*", zone_name = "chitty.cc" },
+  { pattern = "chronicle.chitty.cc/.well-known/*", zone_name = "chitty.cc" }
 ]

 [env.preview]
 name = "chittychronicle-preview"
 routes = [
-  { pattern = "preview.chronicle.chitty.cc/*", zone_name = "chitty.cc" }
+  { pattern = "preview.chronicle.chitty.cc/health*", zone_name = "chitty.cc" },
+  { pattern = "preview.chronicle.chitty.cc/api/health*", zone_name = "chitty.cc" },
+  { pattern = "preview.chronicle.chitty.cc/.well-known/*", zone_name = "chitty.cc" }
 ]

Also applies to: 33-35

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wrangler.toml` around lines 15 - 17, The routes currently mount the Worker
for the entire hostname (pattern "chronicle.chitty.cc/*"), which blocks the app
because worker/index.ts only serves health/discovery and returns 404 for other
paths; update wrangler.toml to restrict the route patterns to only the exact
health/manifest endpoints this Worker owns (e.g., /health, /manifest, any
discovery paths) or implement explicit pass-through/proxy logic inside
worker/index.ts for all other requests so the origin receives non-owned paths;
locate the route list in wrangler.toml and the request-handling logic in
worker/index.ts to apply the change.
🧹 Nitpick comments (2)
worker/index.ts (1)

36-66: Use one shared source for discovery manifests.

server/routes.ts:45-70 and server/routes.ts:86-97 already serve file-backed discovery payloads, and functions/api/[[path]].ts:39-47 now defines another manifest shape. Keeping separate inline JSON here will drift quickly. Please move the service and MCP manifests into a shared module/JSON asset so every runtime publishes the same contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@worker/index.ts` around lines 36 - 66, The inline manifest objects returned
in worker/index.ts (the two Response.json blocks for
'/.well-known/chronicle-manifest.json' and '/.well-known/mcp-manifest.json')
must be moved to a single shared asset and referenced from there; extract both
payloads into a shared module or JSON file (e.g., export SERVICE_MANIFEST and
MCP_MANIFEST from a new manifests module) and replace the inline objects in
worker/index.ts with imports of those constants, then update server/routes.ts
and functions/api/[[path]].ts to import and reuse the same exported manifests so
all runtimes publish the identical contract.
server/hybridSearchService.ts (1)

147-151: Verify LIKE pattern parameterization handles special characters.

While Drizzle's like() function parameterizes the pattern value, user input containing % or _ metacharacters will affect matching behavior. For example, a query of 100% would match any string containing 100 followed by anything.

Consider escaping LIKE metacharacters if strict substring matching is required:

const escapedQuery = query.replace(/[%_]/g, '\\$&');
like(timelineEntries.description, `%${escapedQuery}%`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/hybridSearchService.ts` around lines 147 - 151, The LIKE pattern
currently uses the raw user input `query` in calls to
like(timelineEntries.description, `%${query}%`) and
like(timelineEntries.detailedNotes, `%${query}%`), so `%` and `_` in `query` act
as wildcards; escape those metacharacters before building the pattern (e.g.,
create an `escapedQuery` by replacing /[%_]/g with an escaped version) and use
`%${escapedQuery}%` in the like() calls so the search performs strict substring
matching, ensuring you preserve parameterization and the DB escape semantics
expected by your driver.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@functions/api/`[[path]].ts:
- Around line 39-50: The manifest currently advertises endpoints (/api/cases,
/api/timeline/*) that this handler always 404s; update the Response.json
manifest block to only list the routes this function actually serves (e.g.,
'manifest' and 'health' and the base '') by editing the endpoints array in the
branch that returns Response.json when path === 'manifest' || path === '' (the
Response.json call that builds the manifest), or alternatively implement
proxying logic for paths like 'cases' and 'timeline' before the final 404—modify
the code around the path variable check and the Response.json manifest return so
the advertised endpoints match actual behavior.

In `@server/embeddingService.ts`:
- Around line 320-328: The excerptEmbedding vector dimension is mismatched:
generateEmbedding(source.excerpt) uses the default EMBEDDING_CONFIG.dimensions
(1536) but timelineSources.excerptEmbedding is defined as vector(768); update
the schema to vector(1536) so it matches EMBEDDING_CONFIG.dimensions (or
alternatively pass a 768-dim model into generateEmbedding), and update any DB
migration and schema declaration in shared/schema.ts plus any related constants
(EMBEDDING_CONFIG.dimensions) to keep them consistent with timelineSources and
the embedding-producing model.

---

Outside diff comments:
In `@wrangler.toml`:
- Around line 15-17: The routes currently mount the Worker for the entire
hostname (pattern "chronicle.chitty.cc/*"), which blocks the app because
worker/index.ts only serves health/discovery and returns 404 for other paths;
update wrangler.toml to restrict the route patterns to only the exact
health/manifest endpoints this Worker owns (e.g., /health, /manifest, any
discovery paths) or implement explicit pass-through/proxy logic inside
worker/index.ts for all other requests so the origin receives non-owned paths;
locate the route list in wrangler.toml and the request-handling logic in
worker/index.ts to apply the change.

---

Nitpick comments:
In `@server/hybridSearchService.ts`:
- Around line 147-151: The LIKE pattern currently uses the raw user input
`query` in calls to like(timelineEntries.description, `%${query}%`) and
like(timelineEntries.detailedNotes, `%${query}%`), so `%` and `_` in `query` act
as wildcards; escape those metacharacters before building the pattern (e.g.,
create an `escapedQuery` by replacing /[%_]/g with an escaped version) and use
`%${escapedQuery}%` in the like() calls so the search performs strict substring
matching, ensuring you preserve parameterization and the DB escape semantics
expected by your driver.

In `@worker/index.ts`:
- Around line 36-66: The inline manifest objects returned in worker/index.ts
(the two Response.json blocks for '/.well-known/chronicle-manifest.json' and
'/.well-known/mcp-manifest.json') must be moved to a single shared asset and
referenced from there; extract both payloads into a shared module or JSON file
(e.g., export SERVICE_MANIFEST and MCP_MANIFEST from a new manifests module) and
replace the inline objects in worker/index.ts with imports of those constants,
then update server/routes.ts and functions/api/[[path]].ts to import and reuse
the same exported manifests so all runtimes publish the identical contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f445e3ca-a377-4bb7-b2f5-087e618dfd0a

📥 Commits

Reviewing files that changed from the base of the PR and between f5cf949 and 13c158b.

📒 Files selected for processing (7)
  • .claude/settings.local.json
  • functions/api/[[path]].ts
  • server/embeddingService.ts
  • server/hybridSearchService.ts
  • shared/schema.ts
  • worker/index.ts
  • wrangler.toml
💤 Files with no reviewable changes (1)
  • .claude/settings.local.json

Comment thread functions/api/[[path]].ts
Comment on lines +39 to +50
if (path === 'manifest' || path === '') {
return Response.json({
service: 'chittychronicle',
version: '1.0.0',
description: 'Legal timeline management and evidentiary tracking',
endpoints: ['/api/health', '/api/cases', '/api/timeline/entries', '/api/timeline/search'],
ecosystem: 'chittyos',
tier: 5,
});
}

// Not found
return new Response(JSON.stringify({ error: 'Endpoint not found' }), {
status: 404,
headers: { 'Content-Type': 'application/json' },
});
return Response.json({ error: 'Endpoint not found', path: `/api/${path}` }, { status: 404 });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Manifest advertises routes this handler immediately 404s.

The only non-404 paths here are health, manifest, and empty, but the manifest still lists /api/cases and /api/timeline/*. Line 50 guarantees those discovered endpoints fail. Either proxy those paths through to the main app, or trim the manifest to only the surface this function actually serves.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@functions/api/`[[path]].ts around lines 39 - 50, The manifest currently
advertises endpoints (/api/cases, /api/timeline/*) that this handler always
404s; update the Response.json manifest block to only list the routes this
function actually serves (e.g., 'manifest' and 'health' and the base '') by
editing the endpoints array in the branch that returns Response.json when path
=== 'manifest' || path === '' (the Response.json call that builds the manifest),
or alternatively implement proxying logic for paths like 'cases' and 'timeline'
before the final 404—modify the code around the path variable check and the
Response.json manifest return so the advertised endpoints match actual behavior.

Comment on lines 320 to 328
const result = await generateEmbedding(source.excerpt);
const vectorString = `[${result.embedding.join(",")}]`;

await db
.update(timelineSources)
.set({
excerptEmbedding: vectorString,
excerptEmbedding: result.embedding,
embeddingModel: result.model,
embeddingGeneratedAt: new Date(),
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the migration file for excerptEmbedding dimension declaration
cat migrations/001_add_pgvector.sql | grep -A2 excerpt_embedding

# Check if there's any dimension configuration for excerpt embeddings
rg -n "excerpt.*768|768.*excerpt" --type ts

Repository: chittyos/chittychronicle

Length of output: 713


🏁 Script executed:

cd server && grep -n "function generateEmbedding\|const generateEmbedding\|export.*generateEmbedding" embeddingService.ts | head -20

Repository: chittyos/chittychronicle

Length of output: 111


🏁 Script executed:

rg -n "generateEmbedding" server/ --type ts -A 10 -B 2 | head -100

Repository: chittyos/chittychronicle

Length of output: 4615


🏁 Script executed:

rg -n "text-embedding-3" server/ --type ts

Repository: chittyos/chittychronicle

Length of output: 357


🏁 Script executed:

sed -n '58,120p' server/embeddingService.ts

Repository: chittyos/chittychronicle

Length of output: 1683


🏁 Script executed:

rg -n "EMBEDDING|embedding.*model" server/embeddingService.ts -B2 -A2 | head -50

Repository: chittyos/chittychronicle

Length of output: 1392


🏁 Script executed:

rg -n "768|1536" server/ --type ts

Repository: chittyos/chittychronicle

Length of output: 350


Fix dimension mismatch: excerptEmbedding schema expects 768 dimensions but default embedding model produces 1536.

The excerptEmbedding column is declared as vector(768) in the schema (line 153 of shared/schema.ts), but the default model text-embedding-3-small produces 1536-dimensional vectors. The EMBEDDING_CONFIG.dimensions defaults to 1536, and the code at line 320 calls generateEmbedding() with no model parameter, using the default. This will cause pgvector to reject inserts with a dimension mismatch error.

Either:

  1. Change the schema to use vector(1536) for excerptEmbedding (recommended, as it aligns with EMBEDDING_CONFIG.dimensions)
  2. Or configure a different embedding model that produces 768 dimensions (currently unavailable in the codebase)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/embeddingService.ts` around lines 320 - 328, The excerptEmbedding
vector dimension is mismatched: generateEmbedding(source.excerpt) uses the
default EMBEDDING_CONFIG.dimensions (1536) but timelineSources.excerptEmbedding
is defined as vector(768); update the schema to vector(1536) so it matches
EMBEDDING_CONFIG.dimensions (or alternatively pass a 768-dim model into
generateEmbedding), and update any DB migration and schema declaration in
shared/schema.ts plus any related constants (EMBEDDING_CONFIG.dimensions) to
keep them consistent with timelineSources and the embedding-producing model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant