Skip to content

Add dataset refresh cadence scheduling#110

Merged
pranavjana merged 2 commits into
mainfrom
pranav/dataset-refresh-cadence
Jun 1, 2026
Merged

Add dataset refresh cadence scheduling#110
pranavjana merged 2 commits into
mainfrom
pranav/dataset-refresh-cadence

Conversation

@pranavjana

Copy link
Copy Markdown
Collaborator

Summary

  • Add per-dataset refresh cadence settings and Settings dropdown controls
  • Add local backend scheduler for due dataset refreshes
  • Add Convex claim/complete/fail mutations plus legacy cadence backfill

Verification

  • npm run build in backend
  • npm run build in frontend
  • make convex-push

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a complete scheduled refresh system for datasets. It introduces a typed refresh cadence model (manual, 30m, 6h, 12h, daily, weekly) and updates the database schema with scheduling fields. New Convex mutations enable users to configure refresh settings and handle the backend workflow lifecycle. A local scheduler runs on the backend, periodically querying for due datasets, claiming refresh runs, and triggering background update workflows. The frontend adds UI to edit refresh cadence on dataset detail pages and updates dataset creation to use the new cadence model. Component types are updated to propagate refresh metadata, and a local ProfileUser type decouples UI components from external Clerk types.

Possibly related PRs

  • tinyfish-io/bigset#104: The new refresh-scheduler triggers background update workflows and depends on the background update workflow plumbing added in #104.
  • tinyfish-io/bigset#108: Both PRs modify backend workflow authContext wiring to pass per-user modelConfig into background update workflows.
  • tinyfish-io/bigset#15: Related changes to public dataset seeding and datasets schema/indexes that interact with the seeded dataset metadata.

Suggested reviewers

  • simantak-dabhade
  • manav-tf
  • hwennnn
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add dataset refresh cadence scheduling' accurately and concisely summarizes the main change: introducing refresh cadence functionality and scheduling.
Description check ✅ Passed The description clearly relates to the changeset, detailing the three main components: UI controls, backend scheduler, and Convex mutations, all of which are present in the file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pranav/dataset-refresh-cadence

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
frontend/app/dashboard/page.tsx (1)

17-22: 💤 Low value

Consider extracting ProfileUser to a shared types file.

This type is duplicated identically in frontend/app/dataset/[id]/page.tsx. Extracting to a shared location (e.g., @/types/user.ts) would reduce duplication and ensure consistent updates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/app/dashboard/page.tsx` around lines 17 - 22, The ProfileUser type
is duplicated; extract it into a shared types module (e.g., export type
ProfileUser in a new user types file) and replace the inline declaration with an
import of ProfileUser where used (update references in the ProfileUser
declaration locations such as the ProfileUser type in page.tsx and the identical
declaration in the dataset page). Ensure the new module exports the same shape
(fullName, firstName, primaryEmailAddress, imageUrl) and update imports in
components to use that single source of truth.
frontend/lib/refresh-cadence.ts (1)

12-23: 💤 Low value

Consider deriving REFRESH_CADENCE_LABELS from REFRESH_CADENCE_OPTIONS to eliminate duplication.

The label strings appear in both REFRESH_CADENCE_OPTIONS (lines 4-9) and REFRESH_CADENCE_LABELS (lines 13-18). This duplication creates a maintenance burden if labels ever need to change.

♻️ Proposed refactor to derive LABELS from OPTIONS
 export const REFRESH_CADENCE_OPTIONS: { value: RefreshCadence; label: string }[] = [
   { value: "manual", label: "Manual" },
   { value: "30m", label: "Every 30 min" },
   { value: "6h", label: "Every 6 hours" },
   { value: "12h", label: "Every 12 hours" },
   { value: "daily", label: "Daily" },
   { value: "weekly", label: "Weekly" },
 ];

-export const REFRESH_CADENCE_LABELS: Record<RefreshCadence, string> = {
-  manual: "Manual",
-  "30m": "Every 30 min",
-  "6h": "Every 6 hours",
-  "12h": "Every 12 hours",
-  daily: "Daily",
-  weekly: "Weekly",
-};
+export const REFRESH_CADENCE_LABELS: Record<RefreshCadence, string> = 
+  Object.fromEntries(
+    REFRESH_CADENCE_OPTIONS.map(opt => [opt.value, opt.label])
+  ) as Record<RefreshCadence, string>;

 export function refreshCadenceLabel(cadence: RefreshCadence): string {
   return REFRESH_CADENCE_LABELS[cadence];
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/lib/refresh-cadence.ts` around lines 12 - 23, REFRESH_CADENCE_LABELS
duplicates strings already present in REFRESH_CADENCE_OPTIONS; remove the
duplication by deriving labels from REFRESH_CADENCE_OPTIONS instead—either add a
label property to each option in REFRESH_CADENCE_OPTIONS and compute
REFRESH_CADENCE_LABELS by mapping option.key -> option.label, or if
REFRESH_CADENCE_OPTIONS is already a key->label map, reuse it directly; update
refreshCadenceLabel(cadence: RefreshCadence) to return the label from
REFRESH_CADENCE_OPTIONS (or the computed map) so there is a single source of
truth for cadence labels.
backend/src/index.ts (1)

434-522: Consider adding error recovery and monitoring for the scheduler.

The scheduler tick (lines 444-507) will retry indefinitely if Convex is down or mutations fail repeatedly. Consider adding:

  1. A circuit breaker to pause the scheduler after N consecutive failures
  2. Metrics/alerts for scheduler health (successful ticks, failed claims, etc.)
  3. A maximum retry count per dataset to prevent infinite retries

This is not blocking for initial deployment but should be addressed before relying on scheduled refreshes in production.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/index.ts` around lines 434 - 522, The scheduler in
startLocalRefreshScheduler/tick needs resilience: add a consecutiveFailures
counter and per-dataset retry guard so repeated convex/mutation failures pause
the scheduler and avoid infinite retries; increment consecutiveFailures on any
catch in tick and when it exceeds a threshold (e.g., N), stop the interval
(clearInterval) and start an exponential backoff retry timer or set a paused
flag so tick no-ops until a cooldown, and reset consecutiveFailures to 0 on a
successful tick; emit metrics (e.g., scheduler_tick_success,
scheduler_tick_failure, scheduled_claim_failed, scheduled_claim_started) via
your metrics client inside tick, around the convex.query and convex.mutation
calls and when claim.outcome !== "started"; add a maxRetryCount to the
claim/mutation payload or dataset metadata and refuse to requeue if the run has
exceeded that count (or increment a retryCount returned from claim and skip when
> MAX_RETRIES) to prevent per-dataset infinite retries; wire all changes into
startLocalRefreshScheduler, the tick function, the convex.mutation call to
internal.datasets.claimScheduledRefreshInternal, and the
runScheduledUpdateWorkflowInBackground invocation so failures are logged,
counted, and trigger pause/backoff behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/env.ts`:
- Around line 60-70: The numeric env vars REFRESH_SCHEDULER_POLL_MS,
REFRESH_SCHEDULER_BATCH_SIZE, and REFRESH_SCHEDULER_STALE_AFTER_MS must be
validated after parsing so they never become NaN; change their construction to
parse (e.g., parseInt/Number) then check Number.isFinite(value) (or
!Number.isNaN(value)) and fall back to the existing default if invalid, and keep
REFRESH_SCHEDULER_ENABLED logic as-is — update the definitions for
REFRESH_SCHEDULER_POLL_MS, REFRESH_SCHEDULER_BATCH_SIZE, and
REFRESH_SCHEDULER_STALE_AFTER_MS to perform this validation and fallback to the
provided literals when parsing fails.

In `@backend/src/index.ts`:
- Around line 539-540: The call to backfillDatasetRefreshSettings is not
awaited, causing the server and startLocalRefreshScheduler to run before
migration finishes; update the initialization so you await
backfillDatasetRefreshSettings(fastify.log) (or wrap in an async IIFE/top-level
await) before calling startLocalRefreshScheduler(fastify.log) and before
allowing the server to accept requests, and add error handling (try/catch) to
log failures and stop startup if the backfill fails.

In `@frontend/convex/datasets.ts`:
- Around line 31-53: Extract the duplicated refresh scheduling constants and
function into a shared helper module (e.g., create a new
frontend/convex/lib/refreshScheduling.ts) that exports REFRESH_INTERVAL_MS and
nextRefreshAtFor; then replace the local definitions in datasets.ts and
publicSeed.ts with imports from that module. Ensure the exported types
(RefreshCadence) or its equivalent are also exported/ re-exported so both files
can type against the same union, and update import sites to use the shared
symbols (REFRESH_INTERVAL_MS, nextRefreshAtFor, and RefreshCadence) to remove
duplication.

---

Nitpick comments:
In `@backend/src/index.ts`:
- Around line 434-522: The scheduler in startLocalRefreshScheduler/tick needs
resilience: add a consecutiveFailures counter and per-dataset retry guard so
repeated convex/mutation failures pause the scheduler and avoid infinite
retries; increment consecutiveFailures on any catch in tick and when it exceeds
a threshold (e.g., N), stop the interval (clearInterval) and start an
exponential backoff retry timer or set a paused flag so tick no-ops until a
cooldown, and reset consecutiveFailures to 0 on a successful tick; emit metrics
(e.g., scheduler_tick_success, scheduler_tick_failure, scheduled_claim_failed,
scheduled_claim_started) via your metrics client inside tick, around the
convex.query and convex.mutation calls and when claim.outcome !== "started"; add
a maxRetryCount to the claim/mutation payload or dataset metadata and refuse to
requeue if the run has exceeded that count (or increment a retryCount returned
from claim and skip when > MAX_RETRIES) to prevent per-dataset infinite retries;
wire all changes into startLocalRefreshScheduler, the tick function, the
convex.mutation call to internal.datasets.claimScheduledRefreshInternal, and the
runScheduledUpdateWorkflowInBackground invocation so failures are logged,
counted, and trigger pause/backoff behavior.

In `@frontend/app/dashboard/page.tsx`:
- Around line 17-22: The ProfileUser type is duplicated; extract it into a
shared types module (e.g., export type ProfileUser in a new user types file) and
replace the inline declaration with an import of ProfileUser where used (update
references in the ProfileUser declaration locations such as the ProfileUser type
in page.tsx and the identical declaration in the dataset page). Ensure the new
module exports the same shape (fullName, firstName, primaryEmailAddress,
imageUrl) and update imports in components to use that single source of truth.

In `@frontend/lib/refresh-cadence.ts`:
- Around line 12-23: REFRESH_CADENCE_LABELS duplicates strings already present
in REFRESH_CADENCE_OPTIONS; remove the duplication by deriving labels from
REFRESH_CADENCE_OPTIONS instead—either add a label property to each option in
REFRESH_CADENCE_OPTIONS and compute REFRESH_CADENCE_LABELS by mapping option.key
-> option.label, or if REFRESH_CADENCE_OPTIONS is already a key->label map,
reuse it directly; update refreshCadenceLabel(cadence: RefreshCadence) to return
the label from REFRESH_CADENCE_OPTIONS (or the computed map) so there is a
single source of truth for cadence labels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3edf4297-965d-46a2-af81-4aed80eb302e

📥 Commits

Reviewing files that changed from the base of the PR and between 3222058 and 1fe202a.

📒 Files selected for processing (13)
  • backend/src/env.ts
  • backend/src/index.ts
  • docker-compose.dev.yml
  • frontend/app/dashboard/page.tsx
  • frontend/app/dataset/[id]/page.tsx
  • frontend/app/dataset/new/page.tsx
  • frontend/components/dataset/DatasetCard.tsx
  • frontend/components/table/types.ts
  • frontend/convex/datasets.ts
  • frontend/convex/modelConfig.ts
  • frontend/convex/publicSeed.ts
  • frontend/convex/schema.ts
  • frontend/lib/refresh-cadence.ts

Comment thread backend/src/env.ts
Comment thread backend/src/index.ts Outdated
Comment thread frontend/convex/datasets.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/convex/publicSeed.ts (1)

20-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Header docstring is now stale relative to the force=true path.

Lines 24-27 state these fields are "NOT re-synced to the live row" and that in-place patches are "a deliberate future enhancement," and lines 28-29 instruct deleting the dataset in the Convex dashboard to force a content refresh. The new force=true handler now patches metadata (including refreshCadence) and replaces rows in place, which directly contradicts this guidance. An admin following the header could perform unnecessary destructive deletes. The lower docstring (Lines 342-350) already documents this correctly — reconcile the header with it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/convex/publicSeed.ts` around lines 20 - 29, Update the stale header
docstring to reflect the new force=true behavior: change the statements that
"name, description, refreshCadence, and other fields are NOT re-synced" and the
instruction to delete from the Convex dashboard so they note that when
force=true the seeding code will patch metadata (e.g., refreshCadence, name,
description) and replace rows in place rather than requiring manual deletion;
reference the existing accurate description in the lower docstring (lines
~342-350) and align the header with that wording, removing the misleading
deletion instruction and adding a brief note about force=true performing
in-place updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@frontend/convex/publicSeed.ts`:
- Around line 20-29: Update the stale header docstring to reflect the new
force=true behavior: change the statements that "name, description,
refreshCadence, and other fields are NOT re-synced" and the instruction to
delete from the Convex dashboard so they note that when force=true the seeding
code will patch metadata (e.g., refreshCadence, name, description) and replace
rows in place rather than requiring manual deletion; reference the existing
accurate description in the lower docstring (lines ~342-350) and align the
header with that wording, removing the misleading deletion instruction and
adding a brief note about force=true performing in-place updates.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7cee1a0b-f9cc-4119-a421-2bb722b0a2bb

📥 Commits

Reviewing files that changed from the base of the PR and between 1fe202a and 12f937f.

📒 Files selected for processing (9)
  • backend/src/env.ts
  • backend/src/index.ts
  • frontend/app/dashboard/page.tsx
  • frontend/app/dataset/[id]/page.tsx
  • frontend/convex/datasets.ts
  • frontend/convex/lib/refreshScheduling.ts
  • frontend/convex/publicSeed.ts
  • frontend/lib/profile-user.ts
  • frontend/lib/refresh-cadence.ts
✅ Files skipped from review due to trivial changes (2)
  • frontend/convex/lib/refreshScheduling.ts
  • frontend/lib/profile-user.ts
🚧 Files skipped from review as they are similar to previous changes (6)
  • frontend/lib/refresh-cadence.ts
  • backend/src/env.ts
  • frontend/app/dashboard/page.tsx
  • frontend/convex/datasets.ts
  • frontend/app/dataset/[id]/page.tsx
  • backend/src/index.ts

@pranavjana pranavjana merged commit c476d11 into main Jun 1, 2026
3 checks passed
@pranavjana pranavjana deleted the pranav/dataset-refresh-cadence branch June 1, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants