Add local CLI backend support#137
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR implements backend HTTP API routes for local trusted CLI access to dataset operations. It adds Zod-based request validation and a Sequence Diagram(s)sequenceDiagram
participant CLI as Local CLI
participant Guard as requireLocalCli
participant HTTP as HTTP Endpoint (POST /cli/datasets/:id/populate/start)
participant Convex as Convex internal mutation (claim/start populate)
participant Workflow as Populate Workflow Runner
participant AbortReg as AbortRegistry
CLI->>Guard: request to start populate
Guard->>HTTP: allows request
HTTP->>Convex: claim populate + create run record
Convex->>Workflow: start populate workflow (returns runId)
Workflow->>AbortReg: register running dataset -> runId
HTTP->>CLI: HTTP 202 { runId }
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/index.ts`:
- Around line 1016-1022: When handling an orphaned CLI run where aborted is
false, also clear any lingering "updating" row statuses before marking the
dataset failed: call clearAllPendingUpdateStatus(dataset._id) immediately prior
to setDatasetPopulateStatus(dataset._id, "failed", ...). This mirrors the
protected /stop route behavior and prevents stale UI shimmer indicators; ensure
you reference the same dataset._id and maintain the reply.code(200).send({
success: true }) flow.
In `@cli/src/commands/rows.ts`:
- Around line 23-38: The run() method currently lets flags.csv silently override
flags.json; add an explicit conflict check right after parsing (after const {
args, flags } = await this.parse(RowsCommand)) that detects if both flags.csv
and flags.json are true and rejects the invocation by reporting a clear error
and exiting (use the command's error/exit helper or throw an Error) so callers
get deterministic behavior; reference the run() function, the parsed flags.csv
and flags.json, and RowsCommand when locating where to add this validation.
In `@cli/src/csv.ts`:
- Around line 3-9: csvEscape currently protects delimiters but not CSV formula
injection; update csvEscape to detect values whose text begins with =, +, -, or
@ and prepend a single quote (') to neutralize formulas before applying existing
escaping rules; ensure you handle existing leading apostrophes (do not
double-prefix) and keep the current behavior of wrapping and doubling internal
quotes when /[",\n\r]/ matches so quoting and delimiter-escaping still work for
the modified text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bcaa4d2f-d78b-4e86-8a5d-0421bc528d5d
⛔ Files ignored due to path filters (1)
cli/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (22)
.gitignorebackend/src/env.tsbackend/src/index.tsbigset-cli-mvp.excalidrawcli/bin/run.jscli/package.jsoncli/src/client.tscli/src/commands/create.tscli/src/commands/export.tscli/src/commands/list.tscli/src/commands/populate.tscli/src/commands/rows.tscli/src/commands/status.tscli/src/commands/stop.tscli/src/config.tscli/src/csv.tscli/src/index.tscli/tsconfig.jsonfrontend/convex/datasetRows.tsfrontend/convex/datasets.tsmakefiles/Makefilescripts/build-release.mjs
MMeteorL
left a comment
There was a problem hiding this comment.
Great work! I'm super excited about the CLI here! For the current PR, there may be one blocker:
CLI populate can leave datasets stuck in building
Code: backend/src/index.ts:957-992
Reasoning: The CLI populate route calls beginDatasetPopulate before getModelConfig, populateWorkflow.createRun, and registerDataset. If model config lookup, keychain access, or run creation fails, the catch returns 502 but never transitions the dataset out of building. The protected /populate path has explicit rollback for run creation failure; the CLI path should mirror that cleanup and mark the dataset failed or restore the prior terminal state.
The following are not blockers, just some concerns for possible future deployment on server. These security issues need resolving before deployment:
Local mode fails open if PROD is missing
Code: backend/src/env.ts:21-24, backend/src/clerk-auth.ts:88-95, frontend/lib/app-mode.ts:1, frontend/proxy.ts:4
The PR makes PROD !== "1" mean local mode. In local mode, backend requireAuth unconditionally assigns local_user_default, the frontend skips Clerk, and proxy auth is disabled. Any production/preview/staging deployment that forgets PROD=1 becomes unauthenticated by default. Local mode should require an explicit opt-in like BIGSET_LOCAL_MODE=1, with production failing closed when auth env is absent.
Local CLI/setup endpoints are unauthenticated and exposed through host ports
Code: backend/src/index.ts:726-799, backend/src/index.ts:830-1045, backend/src/index.ts:79-89, docker-compose.dev.yml:25-26, docker-compose.dev.yml:132-136
Reasoning: /local-setup/* and /cli/datasets* rely only on env.IS_LOCAL_MODE; requireLocalCli does not verify a caller token. The backend and Convex ports are published without binding to 127.0.0.1, while Fastify listens on 0.0.0.0. A LAN-reachable local instance could allow another machine/process to list datasets/rows, trigger runs, stop runs, or overwrite local credentials with a valid attacker-controlled key. Add a per-launch bearer token or equivalent local secret for these routes, bind release/dev services to loopback, and avoid exposing Convex unauthenticated local identity broadly.
| }); | ||
| if (!dataset) return reply.code(404).send({ error: "Dataset not found" }); | ||
|
|
||
| const populateOutcome = await beginDatasetPopulate(dataset._id, ownerId); |
There was a problem hiding this comment.
If model config lookup, keychain access, or run creation fails, the catch returns 502 but never transitions the dataset out of building.
|
@MMeteorL fixed the blocker in 49a9753. The CLI populate route now tracks when it has successfully claimed the dataset as building. If anything after that claim fails before the background run is registered, like model config lookup, keychain/credential access, or workflow run creation, the catch block now marks the dataset as failed before returning 502. So the route either starts the run or releases the dataset out of building instead of leaving it stuck. I left the local-mode/server hardening notes untouched for this PR since those are broader future deployment concerns and this PR is scoped to the local CLI MVP. |
|
LGTM! Please merge this if you think fit @simantak-dabhade |
|
Thank ya so much for this @pranavjana and thanks for the review @MMeteorL |
Summary
Notes
bigsetcommand implementation lives in Add oclif dataset commands AdamEXu/bigset-cli#1.Verification
npm run buildinbackend/make -f makefiles/Makefile build-release--home