Improve Playwright row extraction refresh path#142
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (12)
Disabled knowledge base sources:
📝 WalkthroughWalkthroughThis PR introduces configurable GitHub repository row extraction into the BigSet platform. It adds two user-tunable parameters— ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds configurable “row extractor” settings and introduces a GitHub-focused row extractor that can insert/refresh dataset rows via TinyFish Browser, with workflow-level integration and a local-mode settings UI.
Changes:
- Add row extractor settings (concurrency + browser attempts) to model config types, storage schema, backend settings endpoint, and frontend settings UI.
- Introduce a new GitHub row extractor implementation using TinyFish Browser + Playwright CDP, and wire it into investigate + refresh workflows (with fallback to the existing agent).
- Normalize/validate row extractor numeric settings (bounds + defaults) across backend and frontend.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/lib/backend.ts | Extends model config types and normalizes row extractor settings in getModelConfig. |
| frontend/convex/schema.ts | Persists new row extractor settings fields in Convex schema. |
| frontend/convex/modelConfig.ts | Allows upserting new fields via Convex mutations. |
| frontend/app/dashboard/settings/models/page.tsx | Adds local-mode UI controls to edit/save row extractor settings. |
| backend/src/row-extractors/try-row-extractor.ts | New GitHub row extractor + refresh path using TinyFish Browser and Playwright CDP. |
| backend/src/mastra/workflows/update.ts | Runs row extractor during refresh; uses model config concurrency and browser attempts. |
| backend/src/mastra/workflows/populate.ts | Extends workflow auth context schema with row extractor settings defaults/bounds. |
| backend/src/mastra/tools/investigate-tool.ts | Tries row extractor before spawning subagent (insert shortcut + fallback). |
| backend/src/index.ts | Accepts new settings fields on /settings/models and forwards to persistence layer. |
| backend/src/env.ts | Adds env defaults for row extractor settings. |
| backend/src/config/models.ts | Normalizes row extractor settings and includes them in returned effective model config. |
| backend/package.json | Adds playwright-core dependency for CDP connection. |
Files not reviewed (1)
- backend/package-lock.json: Generated file
Comments suppressed due to low confidence (1)
frontend/lib/backend.ts:1
- The new
SavedModelConfigfields are typed asnumber | null, but the backend route only accepts them whentypeof ... === \"number\"(sonullis silently ignored), and the Convex validators arev.optional(v.number())(which also won’t acceptnull). Either (a) change the frontend types to usenumber | undefinedand omit keys to preserve existing values, or (b) explicitly supportnullend-to-end as a ‘clear/reset’ semantics (backend parsing + Convex schema + persistence).
export interface InferredSchema {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }; | ||
|
|
||
| const pkColumns = columns.filter((c) => c.isPrimaryKey); | ||
| const maxConcurrent = authContext.modelConfig.rowExtractorConcurrency; |
| `[refresh-rows] Processing ${rows.length} rows (max ${maxConcurrent} concurrent)`, | ||
| ); | ||
| await processWithConcurrency(rows, processRow, MAX_CONCURRENT); | ||
| await processWithConcurrency(rows, processRow, maxConcurrent); |
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| if (/duplicate/i.test(msg)) { | ||
| return { | ||
| status: "miss", | ||
| reason: `${msg} Move on to the next entity.`, | ||
| }; | ||
| } | ||
| return { status: "failed", reason: msg }; | ||
| } |
| const response = await page.request.get( | ||
| `https://api.github.com/repos/${encodeURIComponent(owner)}/${encodeURIComponent(repo)}`, | ||
| { | ||
| headers: { | ||
| Accept: "application/vnd.github+json", | ||
| }, | ||
| timeout: FETCH_TIMEOUT_MS, | ||
| }, | ||
| ); |
No description provided.