Add dataset refresh cadence scheduling#110
Conversation
📝 WalkthroughWalkthroughThis PR implements a complete scheduled refresh system for datasets. It introduces a typed refresh cadence model (manual, 30m, 6h, 12h, daily, weekly) and updates the database schema with scheduling fields. New Convex mutations enable users to configure refresh settings and handle the backend workflow lifecycle. A local scheduler runs on the backend, periodically querying for due datasets, claiming refresh runs, and triggering background update workflows. The frontend adds UI to edit refresh cadence on dataset detail pages and updates dataset creation to use the new cadence model. Component types are updated to propagate refresh metadata, and a local ProfileUser type decouples UI components from external Clerk types. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
frontend/app/dashboard/page.tsx (1)
17-22: 💤 Low valueConsider extracting
ProfileUserto a shared types file.This type is duplicated identically in
frontend/app/dataset/[id]/page.tsx. Extracting to a shared location (e.g.,@/types/user.ts) would reduce duplication and ensure consistent updates.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/app/dashboard/page.tsx` around lines 17 - 22, The ProfileUser type is duplicated; extract it into a shared types module (e.g., export type ProfileUser in a new user types file) and replace the inline declaration with an import of ProfileUser where used (update references in the ProfileUser declaration locations such as the ProfileUser type in page.tsx and the identical declaration in the dataset page). Ensure the new module exports the same shape (fullName, firstName, primaryEmailAddress, imageUrl) and update imports in components to use that single source of truth.frontend/lib/refresh-cadence.ts (1)
12-23: 💤 Low valueConsider deriving REFRESH_CADENCE_LABELS from REFRESH_CADENCE_OPTIONS to eliminate duplication.
The label strings appear in both
REFRESH_CADENCE_OPTIONS(lines 4-9) andREFRESH_CADENCE_LABELS(lines 13-18). This duplication creates a maintenance burden if labels ever need to change.♻️ Proposed refactor to derive LABELS from OPTIONS
export const REFRESH_CADENCE_OPTIONS: { value: RefreshCadence; label: string }[] = [ { value: "manual", label: "Manual" }, { value: "30m", label: "Every 30 min" }, { value: "6h", label: "Every 6 hours" }, { value: "12h", label: "Every 12 hours" }, { value: "daily", label: "Daily" }, { value: "weekly", label: "Weekly" }, ]; -export const REFRESH_CADENCE_LABELS: Record<RefreshCadence, string> = { - manual: "Manual", - "30m": "Every 30 min", - "6h": "Every 6 hours", - "12h": "Every 12 hours", - daily: "Daily", - weekly: "Weekly", -}; +export const REFRESH_CADENCE_LABELS: Record<RefreshCadence, string> = + Object.fromEntries( + REFRESH_CADENCE_OPTIONS.map(opt => [opt.value, opt.label]) + ) as Record<RefreshCadence, string>; export function refreshCadenceLabel(cadence: RefreshCadence): string { return REFRESH_CADENCE_LABELS[cadence]; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/lib/refresh-cadence.ts` around lines 12 - 23, REFRESH_CADENCE_LABELS duplicates strings already present in REFRESH_CADENCE_OPTIONS; remove the duplication by deriving labels from REFRESH_CADENCE_OPTIONS instead—either add a label property to each option in REFRESH_CADENCE_OPTIONS and compute REFRESH_CADENCE_LABELS by mapping option.key -> option.label, or if REFRESH_CADENCE_OPTIONS is already a key->label map, reuse it directly; update refreshCadenceLabel(cadence: RefreshCadence) to return the label from REFRESH_CADENCE_OPTIONS (or the computed map) so there is a single source of truth for cadence labels.backend/src/index.ts (1)
434-522: Consider adding error recovery and monitoring for the scheduler.The scheduler tick (lines 444-507) will retry indefinitely if Convex is down or mutations fail repeatedly. Consider adding:
- A circuit breaker to pause the scheduler after N consecutive failures
- Metrics/alerts for scheduler health (successful ticks, failed claims, etc.)
- A maximum retry count per dataset to prevent infinite retries
This is not blocking for initial deployment but should be addressed before relying on scheduled refreshes in production.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/index.ts` around lines 434 - 522, The scheduler in startLocalRefreshScheduler/tick needs resilience: add a consecutiveFailures counter and per-dataset retry guard so repeated convex/mutation failures pause the scheduler and avoid infinite retries; increment consecutiveFailures on any catch in tick and when it exceeds a threshold (e.g., N), stop the interval (clearInterval) and start an exponential backoff retry timer or set a paused flag so tick no-ops until a cooldown, and reset consecutiveFailures to 0 on a successful tick; emit metrics (e.g., scheduler_tick_success, scheduler_tick_failure, scheduled_claim_failed, scheduled_claim_started) via your metrics client inside tick, around the convex.query and convex.mutation calls and when claim.outcome !== "started"; add a maxRetryCount to the claim/mutation payload or dataset metadata and refuse to requeue if the run has exceeded that count (or increment a retryCount returned from claim and skip when > MAX_RETRIES) to prevent per-dataset infinite retries; wire all changes into startLocalRefreshScheduler, the tick function, the convex.mutation call to internal.datasets.claimScheduledRefreshInternal, and the runScheduledUpdateWorkflowInBackground invocation so failures are logged, counted, and trigger pause/backoff behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/env.ts`:
- Around line 60-70: The numeric env vars REFRESH_SCHEDULER_POLL_MS,
REFRESH_SCHEDULER_BATCH_SIZE, and REFRESH_SCHEDULER_STALE_AFTER_MS must be
validated after parsing so they never become NaN; change their construction to
parse (e.g., parseInt/Number) then check Number.isFinite(value) (or
!Number.isNaN(value)) and fall back to the existing default if invalid, and keep
REFRESH_SCHEDULER_ENABLED logic as-is — update the definitions for
REFRESH_SCHEDULER_POLL_MS, REFRESH_SCHEDULER_BATCH_SIZE, and
REFRESH_SCHEDULER_STALE_AFTER_MS to perform this validation and fallback to the
provided literals when parsing fails.
In `@backend/src/index.ts`:
- Around line 539-540: The call to backfillDatasetRefreshSettings is not
awaited, causing the server and startLocalRefreshScheduler to run before
migration finishes; update the initialization so you await
backfillDatasetRefreshSettings(fastify.log) (or wrap in an async IIFE/top-level
await) before calling startLocalRefreshScheduler(fastify.log) and before
allowing the server to accept requests, and add error handling (try/catch) to
log failures and stop startup if the backfill fails.
In `@frontend/convex/datasets.ts`:
- Around line 31-53: Extract the duplicated refresh scheduling constants and
function into a shared helper module (e.g., create a new
frontend/convex/lib/refreshScheduling.ts) that exports REFRESH_INTERVAL_MS and
nextRefreshAtFor; then replace the local definitions in datasets.ts and
publicSeed.ts with imports from that module. Ensure the exported types
(RefreshCadence) or its equivalent are also exported/ re-exported so both files
can type against the same union, and update import sites to use the shared
symbols (REFRESH_INTERVAL_MS, nextRefreshAtFor, and RefreshCadence) to remove
duplication.
---
Nitpick comments:
In `@backend/src/index.ts`:
- Around line 434-522: The scheduler in startLocalRefreshScheduler/tick needs
resilience: add a consecutiveFailures counter and per-dataset retry guard so
repeated convex/mutation failures pause the scheduler and avoid infinite
retries; increment consecutiveFailures on any catch in tick and when it exceeds
a threshold (e.g., N), stop the interval (clearInterval) and start an
exponential backoff retry timer or set a paused flag so tick no-ops until a
cooldown, and reset consecutiveFailures to 0 on a successful tick; emit metrics
(e.g., scheduler_tick_success, scheduler_tick_failure, scheduled_claim_failed,
scheduled_claim_started) via your metrics client inside tick, around the
convex.query and convex.mutation calls and when claim.outcome !== "started"; add
a maxRetryCount to the claim/mutation payload or dataset metadata and refuse to
requeue if the run has exceeded that count (or increment a retryCount returned
from claim and skip when > MAX_RETRIES) to prevent per-dataset infinite retries;
wire all changes into startLocalRefreshScheduler, the tick function, the
convex.mutation call to internal.datasets.claimScheduledRefreshInternal, and the
runScheduledUpdateWorkflowInBackground invocation so failures are logged,
counted, and trigger pause/backoff behavior.
In `@frontend/app/dashboard/page.tsx`:
- Around line 17-22: The ProfileUser type is duplicated; extract it into a
shared types module (e.g., export type ProfileUser in a new user types file) and
replace the inline declaration with an import of ProfileUser where used (update
references in the ProfileUser declaration locations such as the ProfileUser type
in page.tsx and the identical declaration in the dataset page). Ensure the new
module exports the same shape (fullName, firstName, primaryEmailAddress,
imageUrl) and update imports in components to use that single source of truth.
In `@frontend/lib/refresh-cadence.ts`:
- Around line 12-23: REFRESH_CADENCE_LABELS duplicates strings already present
in REFRESH_CADENCE_OPTIONS; remove the duplication by deriving labels from
REFRESH_CADENCE_OPTIONS instead—either add a label property to each option in
REFRESH_CADENCE_OPTIONS and compute REFRESH_CADENCE_LABELS by mapping option.key
-> option.label, or if REFRESH_CADENCE_OPTIONS is already a key->label map,
reuse it directly; update refreshCadenceLabel(cadence: RefreshCadence) to return
the label from REFRESH_CADENCE_OPTIONS (or the computed map) so there is a
single source of truth for cadence labels.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3edf4297-965d-46a2-af81-4aed80eb302e
📒 Files selected for processing (13)
backend/src/env.tsbackend/src/index.tsdocker-compose.dev.ymlfrontend/app/dashboard/page.tsxfrontend/app/dataset/[id]/page.tsxfrontend/app/dataset/new/page.tsxfrontend/components/dataset/DatasetCard.tsxfrontend/components/table/types.tsfrontend/convex/datasets.tsfrontend/convex/modelConfig.tsfrontend/convex/publicSeed.tsfrontend/convex/schema.tsfrontend/lib/refresh-cadence.ts
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/convex/publicSeed.ts (1)
20-29:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHeader docstring is now stale relative to the
force=truepath.Lines 24-27 state these fields are "NOT re-synced to the live row" and that in-place patches are "a deliberate future enhancement," and lines 28-29 instruct deleting the dataset in the Convex dashboard to force a content refresh. The new
force=truehandler now patches metadata (includingrefreshCadence) and replaces rows in place, which directly contradicts this guidance. An admin following the header could perform unnecessary destructive deletes. The lower docstring (Lines 342-350) already documents this correctly — reconcile the header with it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/convex/publicSeed.ts` around lines 20 - 29, Update the stale header docstring to reflect the new force=true behavior: change the statements that "name, description, refreshCadence, and other fields are NOT re-synced" and the instruction to delete from the Convex dashboard so they note that when force=true the seeding code will patch metadata (e.g., refreshCadence, name, description) and replace rows in place rather than requiring manual deletion; reference the existing accurate description in the lower docstring (lines ~342-350) and align the header with that wording, removing the misleading deletion instruction and adding a brief note about force=true performing in-place updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@frontend/convex/publicSeed.ts`:
- Around line 20-29: Update the stale header docstring to reflect the new
force=true behavior: change the statements that "name, description,
refreshCadence, and other fields are NOT re-synced" and the instruction to
delete from the Convex dashboard so they note that when force=true the seeding
code will patch metadata (e.g., refreshCadence, name, description) and replace
rows in place rather than requiring manual deletion; reference the existing
accurate description in the lower docstring (lines ~342-350) and align the
header with that wording, removing the misleading deletion instruction and
adding a brief note about force=true performing in-place updates.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7cee1a0b-f9cc-4119-a421-2bb722b0a2bb
📒 Files selected for processing (9)
backend/src/env.tsbackend/src/index.tsfrontend/app/dashboard/page.tsxfrontend/app/dataset/[id]/page.tsxfrontend/convex/datasets.tsfrontend/convex/lib/refreshScheduling.tsfrontend/convex/publicSeed.tsfrontend/lib/profile-user.tsfrontend/lib/refresh-cadence.ts
✅ Files skipped from review due to trivial changes (2)
- frontend/convex/lib/refreshScheduling.ts
- frontend/lib/profile-user.ts
🚧 Files skipped from review as they are similar to previous changes (6)
- frontend/lib/refresh-cadence.ts
- backend/src/env.ts
- frontend/app/dashboard/page.tsx
- frontend/convex/datasets.ts
- frontend/app/dataset/[id]/page.tsx
- backend/src/index.ts
Summary
Verification