Skip to content

Fix zero-row populate retry and refresh model config#117

Open
giaphutran12 wants to merge 3 commits into
mainfrom
codex/fix-agent-zero-row-model-config
Open

Fix zero-row populate retry and refresh model config#117
giaphutran12 wants to merge 3 commits into
mainfrom
codex/fix-agent-zero-row-model-config

Conversation

@giaphutran12

Copy link
Copy Markdown
Collaborator

Summary

  • make refresh agents use the configured investigate model instead of hardcoded Qwen
  • tighten populate instructions so search/fetch-only runs must hand concrete leads to subagents
  • retry populate once with stricter instructions when the first pass inserts zero rows, then fail clearly if still empty

Verification

  • backend: npm ci --cache /private/tmp/bigset-npm-cache
  • backend: npm run build
  • git diff --check origin/main...HEAD
  • public PR gate: git diff --name-status origin/main...HEAD

@giaphutran12 giaphutran12 self-assigned this Jun 2, 2026
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24a54e67-23a3-4f72-bb64-b15378ae695d

📥 Commits

Reviewing files that changed from the base of the PR and between a79cf22 and 3d87f56.

📒 Files selected for processing (2)
  • backend/src/mastra/agents/refresh.ts
  • backend/src/mastra/workflows/populate.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/src/mastra/workflows/populate.ts
  • backend/src/mastra/agents/refresh.ts

📝 Walkthrough

Walkthrough

This PR improves the populate orchestration pipeline and aligns agent model configuration. The populate agent's INSTRUCTIONS prompt is restructured into explicit numbered workflow steps with a "CRITICAL" checklist that enforces subagent invocation for concrete leads and clarifies row-limit termination. The populate workflow is enhanced with row-count verification after generation: if zero rows are inserted, it logs a warning and retries with stricter subagent requirements; if still empty after retry, it throws an error; otherwise it returns combined output. The refresh agent now derives its OpenRouter model from configured authContext instead of using a hardcoded value.

Sequence Diagram

sequenceDiagram
  participant PopulateWorkflow as Populate Workflow
  participant PopulateAgent as Populate Agent
  participant Subagent as Subagent
  participant Dataset as Dataset
  PopulateWorkflow->>PopulateAgent: agent.generate(initial prompt) with run_subagent requirement
  PopulateAgent->>Subagent: run_subagent(concrete leads)
  Subagent->>Dataset: insert rows
  PopulateWorkflow->>Dataset: count inserted rows
  alt rows == 0
    PopulateWorkflow->>PopulateAgent: agent.generate(retry prompt) with stricter subagent requirements
    PopulateAgent->>Subagent: run_subagent(3-5 candidates)
    Subagent->>Dataset: insert additional rows
    PopulateWorkflow->>Dataset: count rows again
    alt still 0 rows
      PopulateWorkflow->>PopulateWorkflow: throw error
    else rows > 0
      PopulateWorkflow->>PopulateWorkflow: return original + retry output
    end
  else rows > 0
    PopulateWorkflow->>PopulateWorkflow: return output
  end
Loading

Possibly related PRs

  • tinyfish-io/bigset#111: Overlaps on populate orchestrator updates including explicit stopping at ROW_LIMIT_REACHED / 100-row cap and run_subagent dispatch behavior.
  • tinyfish-io/bigset#85: Related changes to the populate workflow and agent execution/error handling.
  • tinyfish-io/bigset#81: Related orchestration and investigate_row subagent architecture that this PR builds on.

Suggested reviewers

  • simantak-dabhade
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: fixing zero-row populate retry logic and refresh model configuration.
Description check ✅ Passed The description is directly related to the changeset, providing clear context for all three modified files and explaining the rationale behind each change.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-agent-zero-row-model-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/src/mastra/agents/refresh.ts (1)

59-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate the investigateSubagent model slug before calling openrouter()

In backend/src/mastra/agents/refresh.ts, authContextSchema defines modelConfig and enforces investigateSubagent: z.string().min(1), so the authContext.modelConfig! non-null assertion should be safe. What’s still missing is validation that investigateSubagent is a valid OpenRouter model identifier/format—invalid strings can still fail at runtime when passed to openrouter(modelSlug). Add schema refinement (or a runtime guard) to enforce the expected model-id format/range.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/mastra/agents/refresh.ts` around lines 59 - 68, Validate the
investigateSubagent value before passing it to openrouter: add a schema
refinement to authContextSchema (or a runtime guard near refresh agent creation)
that enforces the allowed OpenRouter model-id format/range, then use the
validated value instead of raw authContext.modelConfig!.investigateSubagent;
specifically update the authContextSchema or add a helper that checks
authContext.modelConfig.investigateSubagent (the investigateSubagent string)
against the OpenRouter model-id pattern/range and throw/handle a clear error if
invalid before calling openrouter(modelSlug) in the Agent constructor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/mastra/workflows/populate.ts`:
- Around line 252-285: The retry call to the orchestrator uses the wrong step
budget—change the second agent.generate call that produces retryResult to use
maxSteps: 80 (matching the initial run) instead of 40; update the call site
where retryResult is created (agent.generate(retryPrompt, { maxSteps: 40 })) so
the retry has the full orchestrator budget and then continue to record metrics
and re-check rowCount as currently done.

---

Outside diff comments:
In `@backend/src/mastra/agents/refresh.ts`:
- Around line 59-68: Validate the investigateSubagent value before passing it to
openrouter: add a schema refinement to authContextSchema (or a runtime guard
near refresh agent creation) that enforces the allowed OpenRouter model-id
format/range, then use the validated value instead of raw
authContext.modelConfig!.investigateSubagent; specifically update the
authContextSchema or add a helper that checks
authContext.modelConfig.investigateSubagent (the investigateSubagent string)
against the OpenRouter model-id pattern/range and throw/handle a clear error if
invalid before calling openrouter(modelSlug) in the Agent constructor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c3eb323-cf1e-4082-8d13-9a84f32e7116

📥 Commits

Reviewing files that changed from the base of the PR and between 07b496c and 6725f4c.

📒 Files selected for processing (3)
  • backend/src/mastra/agents/populate.ts
  • backend/src/mastra/agents/refresh.ts
  • backend/src/mastra/workflows/populate.ts

Comment thread backend/src/mastra/workflows/populate.ts
@giaphutran12

Copy link
Copy Markdown
Collaborator Author

Rechecked this head before asking for review.

Current state:

  • CodeRabbit, TruffleHog, and OSV all pass.
  • backend: npm ci && npm run build passes in a clean temp worktree.
  • Earlier CodeRabbit points are addressed in head: retry maxSteps is now 80, and investigateSubagent is validated before openrouter(modelSlug).

@MMeteorL @simantak-dabhade could one of you do the non-author review when you have a minute? Looks good from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants