Allow custom dataset max rows#128
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis PR adds a configurable per-dataset maxRowCount (default 100, capped by FREE_TIER_MONTHLY_QUOTA = 2500). It extends schemas and Convex mutations to store/validate maxRowCount, adds UI for selecting/editing it in NewDataset and Dataset settings (quota-aware), includes maxRowCount in frontend populate calls, and threads the value through /populate, scheduled refreshes, Mastra workflows and agents. The run_subagent tool and dataset row-insert logic enforce the cap and return/signal ROW_LIMIT_REACHED when reached; /populate re-queries the dataset after claim and returns 404 if missing. Sequence Diagram(s)sequenceDiagram
participant Client
participant Backend as /populate
participant Workflow as mastra/workflows/populate
participant Agent as buildPopulateAgent
participant Subagent as run_subagent (buildSubagentTool)
participant Convex as Convex DB
Client->>Backend: POST /populate(maxRowCount, columns)
Backend->>Convex: claim populate & re-query dataset
Backend->>Workflow: start workflow with effective maxRowCount
Workflow->>Agent: pass inputData.maxRowCount
Agent->>Subagent: construct/run with maxRowCount
Subagent->>Convex: countByDataset
Convex-->>Subagent: rowCount
alt rowCount >= maxRowCount
Subagent-->>Agent: ROW_LIMIT_REACHED
Agent-->>Workflow: stop generation due to ROW_LIMIT_REACHED
else rowCount < maxRowCount
Subagent->>Convex: insert rows (subject to updated cap)
end
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/src/mastra/tools/investigate-tool.ts (1)
130-130:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate investigate subagent
maxStepsto match the 25-step guideline
backend/src/mastra/tools/investigate-tool.tsspawns the investigate subagent withagent.generate(prompt, { maxSteps: 10 }), but the guidelines requiremaxSteps: 25for fresh investigate subagents. Update this tomaxSteps: 25(or revise the guideline if 10 is an intentional, documented change).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/mastra/tools/investigate-tool.ts` at line 130, The investigate subagent is being spawned with agent.generate(prompt, { maxSteps: 10 }) which violates the guideline requiring fresh investigate subagents use maxSteps: 25; locate the agent.generate call in investigate-tool (the line that assigns const result = await agent.generate(prompt, { maxSteps: 10 })) and change the options to { maxSteps: 25 } (or update any surrounding comments to reflect an intentional alternative if 10 is desired).
🧹 Nitpick comments (2)
backend/src/pipeline/populate.ts (1)
15-15: ⚡ Quick winConsider adding an upper bound for defense-in-depth.
The frontend validates
maxRowCount <= FREE_TIER_MONTHLY_QUOTA, but the backend schema has no upper bound. While data flows through validated frontend mutations in the normal path, adding.max()here would provide defense-in-depth against bugs or direct backend calls.🛡️ Add upper bound validation
- maxRowCount: z.number().int().min(1).default(100), + maxRowCount: z.number().int().min(1).max(10000).default(100),Note: Replace
10000with the actualFREE_TIER_MONTHLY_QUOTAvalue, or import it from a shared constants location if available.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/pipeline/populate.ts` at line 15, The schema property maxRowCount in the populate pipeline (defined as maxRowCount: z.number().int().min(1).default(100)) needs an upper bound for defense-in-depth; update that zod schema to add .max(FREE_TIER_MONTHLY_QUOTA) (or a literal like .max(10000) if the constant isn't available) and, if possible, import FREE_TIER_MONTHLY_QUOTA from the shared constants module so the backend enforces the same cap as the frontend.frontend/convex/datasets.ts (1)
65-65: 💤 Low valueConsider consolidating the default row count constant.
DEFAULT_MAX_ROW_COUNThere (line 65) andDEFAULT_MAX_DATASET_ROWSindatasetRows.ts(line 8) both define the same value (100) with different names. This duplication could lead to drift if one is updated without the other.♻️ Consolidation approach
Define the constant in one location (e.g.,
datasets.ts) and import it intodatasetRows.ts:// In datasetRows.ts import { DEFAULT_MAX_ROW_COUNT } from "./datasets.js"; // Remove: const DEFAULT_MAX_DATASET_ROWS = 100; // Use DEFAULT_MAX_ROW_COUNT throughout🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/convex/datasets.ts` at line 65, Consolidate the duplicate default row constant by keeping a single exported constant (e.g., DEFAULT_MAX_ROW_COUNT) and remove the duplicate DEFAULT_MAX_DATASET_ROWS; update the other module to import and use DEFAULT_MAX_ROW_COUNT instead of defining its own value. Specifically, export DEFAULT_MAX_ROW_COUNT from the module where it currently exists, remove the local declaration of DEFAULT_MAX_DATASET_ROWS, and replace usages of DEFAULT_MAX_DATASET_ROWS with the imported DEFAULT_MAX_ROW_COUNT (ensure import statements are added and no other references remain).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/index.ts`:
- Around line 696-702: The code calls
setDatasetPopulateStatus(parsed.data.datasetId, "failed", ...) even when the
dataset lookup (const dataset = await
convex.query(internal.datasets.getInternal, { id: parsed.data.datasetId }))
returns null, which can throw because the dataset doesn't exist; remove that
status update in the not-found branch and simply return reply.code(404).send({
error: "Dataset not found" }) (i.e., delete or skip the setDatasetPopulateStatus
call in the block that handles !dataset so only the 404 response is sent).
In `@frontend/app/dataset/`[id]/page.tsx:
- Around line 222-227: The catch block in the max-rows mutation only logs errors
(console.error and captureException) but doesn't surface them to the user; add a
local React state (e.g., maxRowCountError) in the component that you set in the
catch of the update handler (the block that currently logs "[max rows] failed"
and calls captureException with operation "dataset_max_row_count_update" and
datasetId dataset._id), and clear that state on successful mutation; then render
maxRowCountError text in the settings dropdown help text area (around the
existing settings UI) so backend validation/quota errors are visible to the
user.
- Line 214: The silent early return on the quota check (the line with "if (usage
&& maxRowCount > usage.remaining) return;") leaves users without feedback when
save is blocked; update the Save button click handler that contains this check
to surface a clear error/toast/inline message (e.g., set an error state or call
the existing showToast/snackbar) explaining the quota limit, and keep the early
return after showing that message so the save is aborted but the user sees why;
reference the same symbols usage, maxRowCount, and usage.remaining so the change
is applied where the check currently occurs.
In `@frontend/convex/datasets.ts`:
- Around line 427-440: The quota check in updateMaxRowCount uses
args.maxRowCount directly but should only require quota for additional rows;
compute additionalRows = Math.max(0, args.maxRowCount - (dataset.rowCount || 0))
after loading the dataset (loadOwnedDataset) and pass additionalRows to
requireQuotaRemaining(ctx, dataset.ownerId, additionalRows) instead of
args.maxRowCount, then proceed to validateMaxRowCount and ctx.db.patch as
before; this ensures lowering the cap requires no quota and raising it requires
only the incremental quota.
---
Outside diff comments:
In `@backend/src/mastra/tools/investigate-tool.ts`:
- Line 130: The investigate subagent is being spawned with
agent.generate(prompt, { maxSteps: 10 }) which violates the guideline requiring
fresh investigate subagents use maxSteps: 25; locate the agent.generate call in
investigate-tool (the line that assigns const result = await
agent.generate(prompt, { maxSteps: 10 })) and change the options to { maxSteps:
25 } (or update any surrounding comments to reflect an intentional alternative
if 10 is desired).
---
Nitpick comments:
In `@backend/src/pipeline/populate.ts`:
- Line 15: The schema property maxRowCount in the populate pipeline (defined as
maxRowCount: z.number().int().min(1).default(100)) needs an upper bound for
defense-in-depth; update that zod schema to add .max(FREE_TIER_MONTHLY_QUOTA)
(or a literal like .max(10000) if the constant isn't available) and, if
possible, import FREE_TIER_MONTHLY_QUOTA from the shared constants module so the
backend enforces the same cap as the frontend.
In `@frontend/convex/datasets.ts`:
- Line 65: Consolidate the duplicate default row constant by keeping a single
exported constant (e.g., DEFAULT_MAX_ROW_COUNT) and remove the duplicate
DEFAULT_MAX_DATASET_ROWS; update the other module to import and use
DEFAULT_MAX_ROW_COUNT instead of defining its own value. Specifically, export
DEFAULT_MAX_ROW_COUNT from the module where it currently exists, remove the
local declaration of DEFAULT_MAX_DATASET_ROWS, and replace usages of
DEFAULT_MAX_DATASET_ROWS with the imported DEFAULT_MAX_ROW_COUNT (ensure import
statements are added and no other references remain).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c411e1c-525e-449c-a1ab-8465c85c0c8d
📒 Files selected for processing (11)
backend/src/index.tsbackend/src/mastra/agents/populate.tsbackend/src/mastra/tools/investigate-tool.tsbackend/src/mastra/workflows/populate.tsbackend/src/pipeline/populate.tsfrontend/app/dataset/[id]/page.tsxfrontend/app/dataset/new/page.tsxfrontend/convex/datasetRows.tsfrontend/convex/datasets.tsfrontend/convex/schema.tsfrontend/lib/backend.ts
# Conflicts: # backend/src/mastra/agents/populate.ts # backend/src/mastra/tools/investigate-tool.ts # backend/src/mastra/workflows/populate.ts # frontend/app/dataset/[id]/page.tsx # frontend/convex/datasetRows.ts
* Cap dataset population at 100 rows * Handle row cap count failures in subagent tool * Mention row limit sentinel in populate prompt * Allow custom dataset max rows * Allow editing dataset max rows * Address max rows review feedback
Summary
Verification
Note: Convex codegen could not run locally without CONVEX_DEPLOYMENT set.