feat: implement per-cell source provenance#130
Conversation
…olve React hook state effect warning
📝 WalkthroughWalkthroughThis PR introduces cell-level provenance tracking across the data extraction and display pipeline. Backend agent instructions now require each extracted field to include source provenance (URL, query, snippet). Backend tools accept this provenance and forward it through Convex mutations. The frontend data model persists provenance per cell, and UI components render source origin details in cell detail panels with verified URL links, search queries, and snippet excerpts. Table cells display a small indicator when provenance is available. Supporting changes include cursor-based pagination for runStats queries and a timing fix for row change flash detection. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
frontend/components/SideSheet.tsx (1)
130-137: 💤 Low valueConsider extracting URL validation to a shared utility.
The
isValidHttpUrlhelper here is similar totoSafeHttpUrlinCellValue.tsx. While the return types differ (boolean vs string|null), consider extracting the common validation logic to a shared utility module to reduce duplication and improve maintainability.♻️ Potential shared utility approach
Create a
lib/url-validation.tsfile:export function isValidHttpUrl(url: string): boolean { try { const { protocol } = new URL(url); return protocol === "http:" || protocol === "https:"; } catch { return false; } } export function toSafeHttpUrl(url: string): string | null { return isValidHttpUrl(url) ? url : null; }Then import in both files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/components/SideSheet.tsx` around lines 130 - 137, Extract the shared HTTP URL validation into a new utility (e.g., lib/url-validation.ts) and replace the inline helper in SideSheet.tsx and the logic in CellValue.tsx with imports; specifically, move the current isValidHttpUrl implementation into a exported isValidHttpUrl(url: string): boolean and add an exported toSafeHttpUrl(url: string): string | null that calls isValidHttpUrl, then update SideSheet.tsx to import and use isValidHttpUrl and update CellValue.tsx to import and use toSafeHttpUrl so both files reuse the same validation logic.frontend/components/table/use-row-change-detection.ts (1)
55-75: ⚡ Quick winConsider starting the flash-off timer inside the flash-on callback for clearer sequencing.
The flash-on timer (line 56) and flash-off timer (line 66) are currently scheduled at the same time. While the event loop guarantees that
setTimeout(0)fires beforesetTimeout(1500), starting the flash-off timer inside the flash-on callback would make the sequencing explicit and ensure the flash duration is exactlyFLASH_DURATION_MSfrom when the flash actually appears.♻️ Proposed refactor for clearer sequencing
if (newFlashes.size > 0) { const updateTimer = setTimeout(() => { setFlashingCells((prev) => { const merged = new Set(prev); for (const key of newFlashes) merged.add(key); return merged; }); flashTimersRef.current.delete(updateTimer); + + const flashOffTimer = setTimeout(() => { + setFlashingCells((prev) => { + const next = new Set(prev); + for (const key of newFlashes) next.delete(key); + return next; + }); + flashTimersRef.current.delete(flashOffTimer); + }, FLASH_DURATION_MS); + flashTimersRef.current.add(flashOffTimer); }, 0); flashTimersRef.current.add(updateTimer); - - const timer = setTimeout(() => { - setFlashingCells((prev) => { - const next = new Set(prev); - for (const key of newFlashes) next.delete(key); - return next; - }); - flashTimersRef.current.delete(timer); - }, FLASH_DURATION_MS); - flashTimersRef.current.add(timer); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/components/table/use-row-change-detection.ts` around lines 55 - 75, The flash-off timeout should be started inside the flash-on callback to make sequencing explicit: inside the setTimeout callback that calls setFlashingCells to add newFlashes (the "flash-on" callback), create the second setTimeout that removes those keys after FLASH_DURATION_MS, add both timers to flashTimersRef.current, and ensure each timer is deleted from flashTimersRef.current when it fires; update the logic around newFlashes, setFlashingCells, flashTimersRef, and FLASH_DURATION_MS accordingly so the off timer is scheduled only after the on callback runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/convex/runStats.ts`:
- Line 110: The current limit computation in runStats.ts (the const limit =
Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE);) doesn't prevent zero
or negative values; update the logic that computes limit (using args.limit,
DEFAULT_PAGE_SIZE and MAX_PAGE_SIZE) to enforce a positive integer (e.g., coerce
to a number and clamp with a lower bound of 1) before using it, and validate or
sanitize args.limit so limit is always >= 1 and <= MAX_PAGE_SIZE.
- Line 84: The limit calculation currently allows non-positive values from
args.limit to pass through; update the code around the variable limit (the line
using args.limit, DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE) to validate and clamp
args.limit to a positive integer before applying Math.min—e.g., coerce
args.limit to a number, ensure it's at least 1 (or fall back to
DEFAULT_PAGE_SIZE) and then cap with MAX_PAGE_SIZE so paginate() always receives
a positive limit; modify the expression that computes limit to perform this
validation.
---
Nitpick comments:
In `@frontend/components/SideSheet.tsx`:
- Around line 130-137: Extract the shared HTTP URL validation into a new utility
(e.g., lib/url-validation.ts) and replace the inline helper in SideSheet.tsx and
the logic in CellValue.tsx with imports; specifically, move the current
isValidHttpUrl implementation into a exported isValidHttpUrl(url: string):
boolean and add an exported toSafeHttpUrl(url: string): string | null that calls
isValidHttpUrl, then update SideSheet.tsx to import and use isValidHttpUrl and
update CellValue.tsx to import and use toSafeHttpUrl so both files reuse the
same validation logic.
In `@frontend/components/table/use-row-change-detection.ts`:
- Around line 55-75: The flash-off timeout should be started inside the flash-on
callback to make sequencing explicit: inside the setTimeout callback that calls
setFlashingCells to add newFlashes (the "flash-on" callback), create the second
setTimeout that removes those keys after FLASH_DURATION_MS, add both timers to
flashTimersRef.current, and ensure each timer is deleted from
flashTimersRef.current when it fires; update the logic around newFlashes,
setFlashingCells, flashTimersRef, and FLASH_DURATION_MS accordingly so the off
timer is scheduled only after the on callback runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 99f3fe30-0595-4b1f-820e-debae85223d3
📒 Files selected for processing (11)
backend/src/mastra/agents/investigate.tsbackend/src/mastra/tools/dataset-tools.tsfrontend/app/dataset/[id]/page.tsxfrontend/components/SideSheet.tsxfrontend/components/table/DataRow.tsxfrontend/components/table/types.tsfrontend/components/table/use-row-change-detection.tsfrontend/convex/datasetRows.tsfrontend/convex/runStats.tsfrontend/convex/schema.tsscripts/verify-authz.sh
| }, | ||
| handler: async (ctx, args) => { | ||
| const runs = await ctx.db | ||
| const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE); |
There was a problem hiding this comment.
Add validation to ensure limit is positive.
The current limit calculation allows negative or zero values to pass through if explicitly provided in args.limit. This could cause unexpected pagination behavior or errors in the underlying paginate() call.
🛡️ Suggested fix to enforce positive limit
- const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE);
+ const limit = Math.min(
+ Math.max(args.limit ?? DEFAULT_PAGE_SIZE, 1),
+ MAX_PAGE_SIZE
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE); | |
| const limit = Math.min( | |
| Math.max(args.limit ?? DEFAULT_PAGE_SIZE, 1), | |
| MAX_PAGE_SIZE | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/convex/runStats.ts` at line 84, The limit calculation currently
allows non-positive values from args.limit to pass through; update the code
around the variable limit (the line using args.limit, DEFAULT_PAGE_SIZE,
MAX_PAGE_SIZE) to validate and clamp args.limit to a positive integer before
applying Math.min—e.g., coerce args.limit to a number, ensure it's at least 1
(or fall back to DEFAULT_PAGE_SIZE) and then cap with MAX_PAGE_SIZE so
paginate() always receives a positive limit; modify the expression that computes
limit to perform this validation.
| }, | ||
| handler: async (ctx, args) => { | ||
| const runs = await ctx.db | ||
| const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE); |
There was a problem hiding this comment.
Add validation to ensure limit is positive.
Same issue as in listByDataset: the limit calculation allows negative or zero values to pass through, which could cause unexpected behavior.
🛡️ Suggested fix to enforce positive limit
- const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE);
+ const limit = Math.min(
+ Math.max(args.limit ?? DEFAULT_PAGE_SIZE, 1),
+ MAX_PAGE_SIZE
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE, MAX_PAGE_SIZE); | |
| const limit = Math.min( | |
| Math.max(args.limit ?? DEFAULT_PAGE_SIZE, 1), | |
| MAX_PAGE_SIZE | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/convex/runStats.ts` at line 110, The current limit computation in
runStats.ts (the const limit = Math.min(args.limit ?? DEFAULT_PAGE_SIZE,
MAX_PAGE_SIZE);) doesn't prevent zero or negative values; update the logic that
computes limit (using args.limit, DEFAULT_PAGE_SIZE and MAX_PAGE_SIZE) to
enforce a positive integer (e.g., coerce to a number and clamp with a lower
bound of 1) before using it, and validate or sanitize args.limit so limit is
always >= 1 and <= MAX_PAGE_SIZE.
This pull request implements per-cell source provenance across the database schema, backend workflow tools, agent prompts, and frontend UI. It allows the web search subagents to record specific URLs, queries, and snippets for individual cell extractions, and renders indicators in the dataset table and detailed origin information in the SideSheet.
Key changes: