Skip to content

fix: chunk byline IN clauses to stay within D1 SQL variable limit#223

Merged
ascorbic merged 6 commits intoemdash-cms:mainfrom
baezor:fix/byline-sql-batch-chunking
Apr 11, 2026
Merged

fix: chunk byline IN clauses to stay within D1 SQL variable limit#223
ascorbic merged 6 commits intoemdash-cms:mainfrom
baezor:fix/byline-sql-batch-chunking

Conversation

@baezor
Copy link
Copy Markdown
Contributor

@baezor baezor commented Apr 4, 2026

What does this PR do?

Fixes unbounded IN (?, ?, …) clauses in byline hydration that exceed Cloudflare D1's SQL bound-parameter limit when querying large collections.

Adds a chunks() utility (utils/chunks.ts) and applies it defense-in-depth at the repository level so any caller is protected:

  • BylineRepository.getContentBylinesMany — deduplicates IDs, then chunks content_id IN (…)
  • BylineRepository.findByUserIds — chunks user_id IN (…)
  • getAuthorIds (bylines/index.ts) — chunks id IN (…) raw SQL

Each batch is capped at 50 IDs, well within D1's limit. Content IDs are also deduplicated before chunking to prevent duplicate credits when the same ID spans multiple chunks.

Closes #219

Type of change

  • Bug fix
  • Feature (requires approved Discussion)
  • Refactor (no behavior change)
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm --silent lint:json | jq '.diagnostics | length' returns 0
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

  • This PR includes AI-generated code

Screenshots / test output

All 26 tests pass across 3 test files (10 new):

 ✓ chunks > returns empty array for empty input
 ✓ chunks > returns single chunk when array fits within size
 ✓ chunks > splits array into even chunks
 ✓ chunks > handles remainder in last chunk
 ✓ chunks > handles chunk size of 1
 ✓ chunks > handles array exactly equal to chunk size
 ✓ SQL_BATCH_SIZE > is 50
 ✓ BylineRepository > getContentBylinesMany handles more IDs than SQL_BATCH_SIZE
 ✓ BylineRepository > getContentBylinesMany does not duplicate credits for repeated content IDs
 ✓ BylineRepository > findByUserIds handles more IDs than SQL_BATCH_SIZE
 ✓ getBylinesForEntries > handles batches larger than SQL_BATCH_SIZE across explicit and inferred bylines

baezor added 3 commits April 4, 2026 01:26
Fixes emdash-cms#219. hydrateEntryBylines builds unbounded IN (?, ?, …) clauses
that exceed Cloudflare D1's bound-parameter limit on large collections.

Adds a chunks() utility and applies it defense-in-depth at the
repository level: getContentBylinesMany, findByUserIds, and
getAuthorIds now batch IDs in groups of 50.
Deduplicates contentIds in getContentBylinesMany to prevent duplicate
credits when the same ID appears across chunk boundaries. Adds tests
for the duplication edge case and an end-to-end getBylinesForEntries
test spanning both explicit and inferred byline paths.
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 4, 2026

🦋 Changeset detected

Latest commit: 11823fd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/plugin-ai-moderation Patch
@emdash-cms/plugin-atproto Patch
@emdash-cms/plugin-audit-log Patch
@emdash-cms/plugin-color Patch
@emdash-cms/plugin-embeds Patch
@emdash-cms/plugin-forms Patch
@emdash-cms/plugin-webhook-notifier Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@baezor
Copy link
Copy Markdown
Contributor Author

baezor commented Apr 4, 2026

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Apr 4, 2026
@baezor
Copy link
Copy Markdown
Contributor Author

baezor commented Apr 9, 2026

Confirming this addresses a real production bug — we just hit it at prepain.mx with emdash@0.1.1 on Cloudflare D1, with 124 posts in the collection. When the admin bundle loads /admin and issues GET /_emdash/api/content/posts?limit=100, the worker logs:

Content list error: Error: D1_ERROR: too many SQL variables at offset 369: SQLITE_ERROR

Minimal standalone repro of the D1 limit itself (just to isolate the constraint from emdash):

wrangler d1 execute <db> --remote --command \
  "SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,…×100)"
# → too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]

Per the D1 limits docs, D1 caps bound parameters at 100 per query — far below SQLite upstream's default of 32,766. Any WHERE col = ? AND id IN (?×100) clause produces 101 params and trips the limit. That's why this bug only reproduces on D1 at scale, and not on local SQLite fixtures.

Downstream (React #300 "rendered fewer hooks than expected") is a symptom — the admin bundle throws on the 500 response and the hook count goes inconsistent during the re-render. Fixing the query fixes the crash.

Heads-up: one uncovered site

While tracing the failure in prod we noticed that SeoRepository.getMany (packages/core/src/database/repositories/seo.ts:67-78) has the exact same unbounded IN pattern:

const rows = await this.db
  .selectFrom("_emdash_seo")
  .selectAll()
  .where("collection", "=", collection)
  .where("content_id", "in", contentIds)  // ← 100 ids + 1 collection = 101 params
  .execute();

And it's called from handleContentList before hydrateBylinesMany:

// packages/core/src/api/handlers/content.ts:257-259
const hasSeo = await collectionHasSeo(db, collection);
await hydrateSeoMany(db, collection, result.items, hasSeo);
await hydrateBylinesMany(db, collection, result.items);

So any collection with has_seo = 1 and ≥99 items will still 500 on the content list endpoint even after this PR merges. Our own posts collection has has_seo = 1, and it's the first function to fail in our trace.

Happy to open a follow-up PR that reuses the new chunks() helper + SQL_BATCH_SIZE for SeoRepository.getMany, or you can fold it into this PR if you'd prefer to land both together — whichever you prefer.

(As a short-term unblock we shipped a local patch-package hot-patch against our node_modules/emdash/dist/** that chunks all three functions — SEO, bylines, and findByUserIds — in batches of 90. We'll drop it the moment a release with the upstream fix ships.)

@github-actions
Copy link
Copy Markdown
Contributor

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

Copilot AI review requested due to automatic review settings April 11, 2026 10:35
Copy link
Copy Markdown
Collaborator

@ascorbic ascorbic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and your other PR are great. Thank you. If possible, could you do the same for taxonomies?

@ascorbic ascorbic enabled auto-merge (squash) April 11, 2026 10:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Cloudflare D1 “too many SQL variables” failures during byline hydration by chunking large IN (...) queries (defense-in-depth at repository + runtime SQL helper layers).

Changes:

  • Added chunks() utility and SQL_BATCH_SIZE = 50 for batching SQL IN clauses.
  • Updated byline repository batch queries and getAuthorIds() raw SQL lookup to run in bounded chunks (and dedupe content IDs to avoid duplicate credits).
  • Added unit tests covering chunking behavior and byline hydration behavior over batch limits, plus a patch changeset.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/core/src/utils/chunks.ts Introduces chunking helper + default SQL batch size constant.
packages/core/src/database/repositories/byline.ts Chunks content_id IN (...) and user_id IN (...) queries; dedupes content IDs.
packages/core/src/bylines/index.ts Chunks raw SQL id IN (...) in getAuthorIds() to keep bound parameters under limit.
packages/core/tests/unit/utils/chunks.test.ts Adds unit tests for chunks() and SQL_BATCH_SIZE.
packages/core/tests/unit/database/repositories/byline.test.ts Adds repository-level tests ensuring chunked queries work beyond batch size and avoid duplicate credits.
packages/core/tests/unit/bylines/bylines-query.test.ts Adds integration-style test validating chunking across explicit + inferred byline resolution.
.changeset/cute-eagles-rescue.md Patch release note for the D1 variable-limit fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +12
export function chunks<T>(arr: T[], size: number): T[][] {
if (arr.length === 0) return [];
const result: T[][] = [];
for (let i = 0; i < arr.length; i += size) {
result.push(arr.slice(i, i + size));
}
Comment on lines 311 to +320
async findByUserIds(userIds: string[]): Promise<Map<string, BylineSummary>> {
const result = new Map<string, BylineSummary>();
if (userIds.length === 0) return result;

const rows = await this.db
.selectFrom("_emdash_bylines")
.selectAll()
.where("user_id", "in", userIds)
.execute();
for (const chunk of chunks(userIds, SQL_BATCH_SIZE)) {
const rows = await this.db
.selectFrom("_emdash_bylines")
.selectAll()
.where("user_id", "in", chunk)
.execute();
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 11, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/admin@223

@emdash-cms/auth

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/auth@223

@emdash-cms/blocks

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/blocks@223

@emdash-cms/cloudflare

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/cloudflare@223

emdash

npm i https://pkg.pr.new/emdash-cms/emdash@223

create-emdash

npm i https://pkg.pr.new/emdash-cms/emdash/create-emdash@223

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/gutenberg-to-portable-text@223

@emdash-cms/x402

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/x402@223

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-ai-moderation@223

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-atproto@223

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-audit-log@223

@emdash-cms/plugin-color

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-color@223

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-embeds@223

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-forms@223

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/emdash-cms/emdash/@emdash-cms/plugin-webhook-notifier@223

commit: 11823fd

@ascorbic ascorbic merged commit fcd8b7b into emdash-cms:main Apr 11, 2026
25 checks passed
@emdashbot emdashbot bot mentioned this pull request Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hydrateEntryBylines exceeds D1 SQL variable limit on large collections

3 participants