feat(jobs): add dataset_cleanup_sweep global job#548
Draft
SgtPooki wants to merge 1 commit into
Draft
Conversation
Scheduled global job that flips Deal.cleaned_up=true for any uncleaned Deal row whose data_set_id shows pdpEndEpoch != 0n on FWSS (terminated) or whose getDataSet returns null (removed). Closes the gap from the operator-must-terminate workflow tracked in #546: after an operator runs a Safe terminateService batch (like #545), synapse-sdk's createContext filters out the now-terminated dataset, so getDataSetProvisioningStatus returns "missing" instead of "terminated" and repairTerminatedDataSet is never invoked for those rows. Empirically, 87% of recent failed retrievals tie to terminated datasets that should have been auto-cleaned. The sweeper eliminates this noise. Cadence default 24h (DATASET_CLEANUP_SWEEP_INTERVAL_SECONDS=86400), batch size 50 (DATASET_CLEANUP_SWEEP_BATCH_SIZE=50). Reuses recordJobExecution for jobs_* metrics. Idempotent UPDATE filters cleaned_up=false. Adds a partial index on deals(data_set_id) WHERE cleaned_up=false AND data_set_id IS NOT NULL to keep the SELECT DISTINCT cheap as the deals table grows. Extracts the existing UPDATE block from repairTerminatedDataSet into DealService.markDealsCleanedUpForDataSets and shares it with the sweeper to avoid divergence. Tracking: #546
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
New scheduled global job
dataset_cleanup_sweepthat periodically reconcilesDeal.cleaned_upwith FWSS state. For every uncleaned Deal row with a non-nulldata_set_id, the sweeper probes FWSSgetDataSet(dataSetId):pdpEndEpoch != 0n→ flipcleaned_up=true(terminated)getDataSetreturns null → flipcleaned_up=true(dataset removed)UPDATEs filter
cleaned_up=falseand run inside a single transaction per bucket. Reuses the existing UPDATE block fromrepairTerminatedDataSetvia a new shared methodDealService.markDealsCleanedUpForDataSets.Why
In session-key + multisig payer mode, dealbot cannot auto-terminate datasets (#546). Operators run
terminateServicevia Safe (precedent: #545). After the Safe batch lands, synapse-sdk'screateContextfilters out the terminated dataset, sogetDataSetProvisioningStatusreturns"missing"instead of"terminated"andrepairTerminatedDataSetis never invoked for those rows. The retrieval candidate selector keeps picking the stale Deal rows and pollutes failure metrics.Empirically on calibration staging (7-day window):
How to verify
pnpm testfrom repo root (364 tests pass).Manual: in staging after deploy, watch for
dataset_cleanup_sweep_completedlog events. First run drains the historical backlog; steady-state ticks should reportdatasetsTerminated: 0after the queue is clean.Notes / risks
DATASET_CLEANUP_SWEEP_INTERVAL_SECONDS=86400). Operators can dial down via env if a backlog spikes after a Safe batch.CREATE INDEX idx_deals_unclean_dataset ON deals(data_set_id) WHERE cleaned_up = false AND data_set_id IS NOT NULL. Without it, theSELECT DISTINCT data_set_id WHERE cleaned_up = falsetriggers a full table scan on every tick.pdpEndEpoch != 0nis treated as irreversible (verified against FWSS source: no path resets it to zero once set).result == nullso a flaky RPC will not wrongly mark deals cleaned up.data_set_id IS NULL(separate root cause from Synapse upgrade causes Dealbot to create a new dataset per deal for providers beyond first 100 client datasets #511 piece_id=0 pattern). Filed as a separate concern.pdpEndEpochper dataset, so a future iteration could skip the per-dataset multicall entirely.Refs
DataSetTerminateRequiresOperatorError(related, not a dependency)safeReadTransportworkaround (background)