Skip to content

Cap dataset population at 100 rows#111

Merged
simantak-dabhade merged 3 commits into
mainfrom
pranav/max-rows
Jun 1, 2026
Merged

Cap dataset population at 100 rows#111
simantak-dabhade merged 3 commits into
mainfrom
pranav/max-rows

Conversation

@pranavjana

@pranavjana pranavjana commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Enforce a hard 100-row cap in the dataset row insert mutation
  • Stop spawning populate subagents once a dataset has reached 100 rows
  • Update populate prompts to treat 100 rows as a hard stop
  • Return a structured tool result if the preflight row-count check fails

Verification

  • npm run build in backend
  • npm run build in frontend
  • make convex-push
  • git diff --check

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70237cb2-2bd5-479f-afe3-94e7a1c432af

📥 Commits

Reviewing files that changed from the base of the PR and between efb4510 and 4def5e8.

📒 Files selected for processing (1)
  • backend/src/mastra/workflows/populate.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/src/mastra/workflows/populate.ts

📝 Walkthrough

Walkthrough

This PR implements a hard 100-row dataset limit across the populate pipeline. The persistence layer (datasetRows.ts) now enforces the cap by rejecting inserts that would exceed it. The investigate tool (investigate-tool.ts) checks row count before spawning subagents and returns a ROW_LIMIT_REACHED signal when the limit is met. The populate orchestrator (populate.ts agent and workflow) updates its instructions and prompt to explicitly stop dispatching and cease tool calls once the dataset reaches 100 rows, creating coordinated early-stop behavior across all three system levels.

Sequence Diagram(s)

sequenceDiagram
  participant PopulateAgent
  participant InvestigateTool
  participant ConvexInternal
  participant DatasetRows
  PopulateAgent->>InvestigateTool: run_subagent(request with authorizedDatasetId)
  InvestigateTool->>ConvexInternal: internal.datasetRows.countByDataset(authorizedDatasetId)
  ConvexInternal->>DatasetRows: count query
  DatasetRows-->>ConvexInternal: rowCount
  ConvexInternal-->>InvestigateTool: rowCount
  alt rowCount >= 100
    InvestigateTool-->>PopulateAgent: return { inserted:false, reason: "ROW_LIMIT_REACHED" }
    PopulateAgent-->>PopulateAgent: stop dispatching subagents and cease tool calls
  else rowCount < 100
    InvestigateTool-->>PopulateAgent: proceed (spawn subagent / normal flow)
  end
Loading

Possibly Related PRs

  • tinyfish-io/bigset#81: Related work on orchestrator → investigate subagent architecture and prior row-limit coordination.
  • tinyfish-io/bigset#100: Modifies populate/investigate execution paths and overlaps on investigate tool changes and run metrics.
  • tinyfish-io/bigset#83: Prior changes introducing the orchestrator + investigate_row subagent structure that this PR builds on.

Suggested Reviewers

  • simantak-dabhade
  • giaphutran12
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing a 100-row cap on dataset population, which matches the primary focus across all modified files.
Description check ✅ Passed The description is directly related to the changeset, providing specific details about enforcing the 100-row cap across multiple layers (mutations, subagents, prompts) and mentioning verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pranav/max-rows

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/mastra/tools/investigate-tool.ts`:
- Around line 88-98: The Convex query that computes rowCount
(convex.query(internal.datasetRows.countByDataset, { datasetId:
authorizedDatasetId })) is executed outside the try/catch in
run_subagent.execute, so move that call inside the existing try block and handle
any rejection by returning the same structured failure object (inserted: false,
reason: ... , row_summary: undefined, clues: undefined) used elsewhere; ensure
you still compare rowCount to MAX_DATASET_ROWS after the query and keep the
query reference to internal.datasetRows.countByDataset and variable name
rowCount unchanged so behavior and messaging remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f0a255f0-d3aa-4af2-bc76-42661e82ebc0

📥 Commits

Reviewing files that changed from the base of the PR and between 3222058 and a6b16c9.

📒 Files selected for processing (4)
  • backend/src/mastra/agents/populate.ts
  • backend/src/mastra/tools/investigate-tool.ts
  • backend/src/mastra/workflows/populate.ts
  • frontend/convex/datasetRows.ts

Comment thread backend/src/mastra/tools/investigate-tool.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/mastra/workflows/populate.ts`:
- Around line 199-200: The prompt in populate.ts instructs the agent to stop at
100 rows but doesn't tell it what to do when run_subagent returns the tool-level
hard-stop signal; update the prompt used by the logic that calls run_subagent
(referencing run_subagent and the investigate tool) to explicitly mention the
sentinel ROW_LIMIT_REACHED and instruct the agent to immediately cease
dispatching any further subagents or actions when it receives that signal from
run_subagent, and to return/propagate that stop status up the workflow so the
populate run halts at 100 rows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4b38ce5-d03a-4c66-9bb1-9dce948f3071

📥 Commits

Reviewing files that changed from the base of the PR and between a6b16c9 and efb4510.

📒 Files selected for processing (4)
  • backend/src/mastra/agents/populate.ts
  • backend/src/mastra/tools/investigate-tool.ts
  • backend/src/mastra/workflows/populate.ts
  • frontend/convex/datasetRows.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/src/mastra/agents/populate.ts
  • backend/src/mastra/tools/investigate-tool.ts
  • frontend/convex/datasetRows.ts

Comment thread backend/src/mastra/workflows/populate.ts

@simantak-dabhade simantak-dabhade left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants