Skip to content

Comments

splits the job into one per batch#41

Open
techiejd wants to merge 12 commits intomainfrom
multiple_jobs_for_multiple_batches
Open

splits the job into one per batch#41
techiejd wants to merge 12 commits intomainfrom
multiple_jobs_for_multiple_batches

Conversation

@techiejd
Copy link
Owner

No description provided.

@techiejd techiejd self-assigned this Feb 20, 2026
@techiejd techiejd added the enhancement New feature or request label Feb 20, 2026
@techiejd techiejd linked an issue Feb 20, 2026 that may be closed by this pull request
techiejd and others added 11 commits February 21, 2026 23:06
- Remove 30s waitUntil delay from per-batch task re-queue (was causing
  test timeouts since the original code had no such delay)
- Add failedChunkData JSON field to batch collection so per-batch tasks
  can store chunk-level failure data independently
- Aggregate failedChunkData from batch records in finalizeRunIfComplete()
  instead of relying on in-memory accumulation from the old single-task flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rker architecture

Splits prepare-bulk-embedding into coordinator + per-collection workers.
Each worker processes one page of one collection, queuing a continuation
job before processing to ensure crash safety. Default batchLimit is 1000
when not explicitly set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The second test was creating a separate Payload instance sharing the same
DB and job queues, causing two crons to compete for jobs. This led to
double-execution and mock state inconsistency (expected 4 to be 2).
Now both tests use the single beforeAll instance with cleanup between.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every test file that creates a Payload instance now calls
payload.destroy() in afterAll (or try/finally for in-test instances).
This stops background cron jobs from accumulating across tests, which
was causing heap exhaustion in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --max-old-space-size=8192 to test:int NODE_OPTIONS (cross-env was
  overriding the CI env var, so the heap limit never took effect)
- Fix polling.spec.ts queueSpy assertions: coordinator/worker adds an
  extra queue call, so poll-or-complete-single-batch is now call 3 and 4
  instead of 2 and 3
- Add extensive [vectorize-debug] console.log throughout task handlers
  (coordinator, worker, poll-single, finalize, streamAndBatchDocs) to
  diagnose any remaining CI hangs
- Remove redundant NODE_OPTIONS from CI workflow (now in the script)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ncrementally

Remove the backward-compatible fan-out task since the per-batch architecture
hasn't been released yet. Refactor finalizeRunIfComplete to aggregate batch
counts incrementally during pagination instead of collecting all batch objects
into memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bump version 0.5.4 → 0.5.5
- Add 0.5.5 entry to CHANGELOG.md (coordinator/worker, batchLimit, per-batch polling)
- Document batchLimit in README CollectionVectorizeOption section
- Remove all diagnostic console.log statements from bulkEmbedAll.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor Bulk Embedding Job Creation to Use Per-Batch Jobs

2 participants