-
Notifications
You must be signed in to change notification settings - Fork 156
Add AI-powered dataset populate with web search and CRUD tools #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| import { Agent } from "@mastra/core/agent"; | ||
| import { createOpenRouter } from "@openrouter/ai-sdk-provider"; | ||
| import { | ||
| insertRowTool, | ||
| listRowsTool, | ||
| getRowTool, | ||
| updateRowTool, | ||
| deleteRowTool, | ||
| } from "../tools/dataset-tools.js"; | ||
| import { searchWebTool, fetchPageTool } from "../tools/web-tools.js"; | ||
|
|
||
| const openrouter = createOpenRouter({ | ||
| apiKey: process.env.OPENROUTER_API_KEY!, | ||
| }); | ||
|
|
||
| export const populateAgent = new Agent({ | ||
| id: "populate-agent", | ||
| name: "Dataset Populate Agent", | ||
| instructions: `You fill datasets with real data. Here's how: | ||
|
|
||
| 1. Search the web for data that fits the dataset topic. | ||
| 2. Fetch 1-2 pages to get details. | ||
| 3. Call insert_row for each row using what you found. Don't stop until you've inserted all the rows asked for. | ||
|
|
||
| If you can't find enough real data, make up realistic data to fill the rest. Every row must be inserted with insert_row.`, | ||
| model: openrouter("anthropic/claude-sonnet-4-6"), | ||
| tools: { | ||
| insert_row: insertRowTool, | ||
| list_rows: listRowsTool, | ||
| get_row: getRowTool, | ||
| update_row: updateRowTool, | ||
| delete_row: deleteRowTool, | ||
| search_web: searchWebTool, | ||
| fetch_page: fetchPageTool, | ||
| }, | ||
| }); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,9 @@ | ||
| import { Mastra } from "@mastra/core/mastra"; | ||
| import { inferSchemaWorkflow } from "./workflows/infer-schema.js"; | ||
| import { populateWorkflow } from "./workflows/populate.js"; | ||
| import { populateAgent } from "./agents/populate.js"; | ||
|
|
||
| export const mastra = new Mastra({ | ||
| workflows: { inferSchemaWorkflow }, | ||
| agents: { populateAgent }, | ||
| workflows: { inferSchemaWorkflow, populateWorkflow }, | ||
| }); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,161 @@ | ||
| import { createTool } from "@mastra/core/tools"; | ||
| import { z } from "zod"; | ||
| import { convex, api, internal } from "../../convex.js"; | ||
|
|
||
| const resultSchema = z.object({ | ||
| success: z.boolean(), | ||
| error: z.string().optional(), | ||
| }); | ||
|
|
||
| function cleanDataKeys(data: Record<string, unknown>): Record<string, unknown> { | ||
| const cleaned: Record<string, unknown> = {}; | ||
| for (const [key, value] of Object.entries(data)) { | ||
| cleaned[key.replace(/^["`]+|["`]+$/g, "")] = value; | ||
| } | ||
| return cleaned; | ||
| } | ||
|
|
||
| export const insertRowTool = createTool({ | ||
| id: "insert_row", | ||
| description: | ||
| "Insert a single row into the dataset. Call this each time you have a row ready — don't wait to batch them.", | ||
| inputSchema: z.object({ | ||
| datasetId: z.string(), | ||
| data: z.record(z.string(), z.any()), | ||
| }), | ||
| outputSchema: resultSchema, | ||
| execute: async ({ datasetId, data }) => { | ||
| if (!datasetId) return { success: false, error: "datasetId is required." }; | ||
| if (!data || Object.keys(data).length === 0) | ||
| return { success: false, error: "data is required and must have at least one key. Pass an object like { \"Column Name\": value }." }; | ||
|
|
||
| const cleanedData = cleanDataKeys(data); | ||
| console.log(`[insert_row] Inserting row into ${datasetId} (${Object.keys(cleanedData).length} columns)`); | ||
| try { | ||
| await convex.mutation(internal.datasetRows.insert, { datasetId, data: cleanedData }); | ||
| console.log(`[insert_row] Row inserted successfully`); | ||
| return { success: true }; | ||
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| console.error(`[insert_row] Failed:`, msg); | ||
| if (msg.includes("not found")) | ||
| return { success: false, error: `Dataset "${datasetId}" not found. Check the datasetId is correct.` }; | ||
| if (msg.includes("validator")) | ||
| return { success: false, error: `Data validation failed: ${msg}. Check that your data keys are plain strings and values match expected types.` }; | ||
| return { success: false, error: `Insert failed: ${msg}` }; | ||
| } | ||
| }, | ||
| }); | ||
|
|
||
| export const listRowsTool = createTool({ | ||
| id: "list_rows", | ||
| description: | ||
| "Read all rows in the dataset. Returns an array of row objects, each with _id and data fields.", | ||
| inputSchema: z.object({ | ||
| datasetId: z.string(), | ||
| }), | ||
| outputSchema: z.object({ rows: z.array(z.any()).optional(), error: z.string().optional() }), | ||
| execute: async ({ datasetId }) => { | ||
| if (!datasetId) return { error: "datasetId is required." }; | ||
|
|
||
| console.log(`[list_rows] Reading all rows for dataset ${datasetId}`); | ||
| try { | ||
| const rows = await convex.query(api.datasetRows.listByDataset, { datasetId }); | ||
| console.log(`[list_rows] Found ${rows.length} rows`); | ||
| return { rows }; | ||
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| console.error(`[list_rows] Failed:`, msg); | ||
| if (msg.includes("not found")) | ||
| return { error: `Dataset "${datasetId}" not found. Check the datasetId.` }; | ||
| return { error: `List rows failed: ${msg}` }; | ||
| } | ||
| }, | ||
| }); | ||
|
|
||
| export const getRowTool = createTool({ | ||
| id: "get_row", | ||
| description: | ||
| "Read a single row by its ID. Returns the row object with _id and data fields, or an error if not found.", | ||
| inputSchema: z.object({ | ||
| rowId: z.string(), | ||
| }), | ||
| outputSchema: z.object({ row: z.any().optional(), error: z.string().optional() }), | ||
| execute: async ({ rowId }) => { | ||
| if (!rowId) return { error: "rowId is required." }; | ||
|
|
||
| console.log(`[get_row] Reading row ${rowId}`); | ||
| try { | ||
| const row = await convex.query(internal.datasetRows.get, { id: rowId }); | ||
| if (!row) return { error: `Row "${rowId}" not found. It may have been deleted.` }; | ||
| console.log(`[get_row] Found`); | ||
| return { row }; | ||
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| console.error(`[get_row] Failed:`, msg); | ||
| if (msg.includes("validator") || msg.includes("Invalid")) | ||
| return { error: `Invalid row ID format: "${rowId}". Row IDs look like "jd7..." — they are Convex document IDs.` }; | ||
| return { error: `Get row failed: ${msg}` }; | ||
| } | ||
| }, | ||
| }); | ||
|
|
||
| export const updateRowTool = createTool({ | ||
| id: "update_row", | ||
| description: | ||
| "Update an existing row by its ID. Pass the full updated data object. Changes are tracked in history.", | ||
| inputSchema: z.object({ | ||
| rowId: z.string(), | ||
| data: z.record(z.string(), z.any()), | ||
| }), | ||
| outputSchema: resultSchema, | ||
| execute: async ({ rowId, data }) => { | ||
| if (!rowId) return { success: false, error: "rowId is required." }; | ||
| if (!data || Object.keys(data).length === 0) | ||
| return { success: false, error: "data is required. Pass the full updated row data object." }; | ||
|
|
||
| const cleanedData = cleanDataKeys(data); | ||
| console.log(`[update_row] Updating row ${rowId} (${Object.keys(cleanedData).length} columns)`); | ||
| try { | ||
| await convex.mutation(internal.datasetRows.update, { id: rowId, data: cleanedData }); | ||
| console.log(`[update_row] Row updated successfully`); | ||
| return { success: true }; | ||
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| console.error(`[update_row] Failed:`, msg); | ||
| if (msg.includes("Row not found") || msg.includes("not found")) | ||
| return { success: false, error: `Row "${rowId}" not found. Use list_rows to see existing row IDs.` }; | ||
| if (msg.includes("validator") || msg.includes("Invalid")) | ||
| return { success: false, error: `Invalid input: ${msg}. Check that rowId is a valid Convex ID and data keys are plain strings.` }; | ||
| return { success: false, error: `Update failed: ${msg}` }; | ||
| } | ||
| }, | ||
| }); | ||
|
|
||
| export const deleteRowTool = createTool({ | ||
| id: "delete_row", | ||
| description: | ||
| "Delete a single row by its ID. This is permanent.", | ||
| inputSchema: z.object({ | ||
| rowId: z.string(), | ||
| }), | ||
| outputSchema: resultSchema, | ||
| execute: async ({ rowId }) => { | ||
| if (!rowId) return { success: false, error: "rowId is required." }; | ||
|
|
||
| console.log(`[delete_row] Deleting row ${rowId}`); | ||
| try { | ||
| await convex.mutation(internal.datasetRows.remove, { id: rowId }); | ||
| console.log(`[delete_row] Row deleted successfully`); | ||
| return { success: true }; | ||
| } catch (err) { | ||
| const msg = err instanceof Error ? err.message : String(err); | ||
| console.error(`[delete_row] Failed:`, msg); | ||
| if (msg.includes("not found")) | ||
| return { success: false, error: `Row "${rowId}" not found. It may have already been deleted.` }; | ||
| if (msg.includes("validator") || msg.includes("Invalid")) | ||
| return { success: false, error: `Invalid row ID format: "${rowId}". Use list_rows to find valid row IDs.` }; | ||
| return { success: false, error: `Delete failed: ${msg}` }; | ||
| } | ||
| }, | ||
| }); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narrow invalid-dataset detection to dataset lookup failures only.
Line 84 matches any error containing
"Invalid", so downstream workflow/tool failures can be mislabeled as400 Invalid datasetId. That can hide real server-side failures and return the wrong status code.Suggested fix
🤖 Prompt for AI Agents