Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
# These are read by docker-compose.dev.yml.
# Copy this file to .env and fill in your Clerk keys.
# Copy this file to .env and fill in your values.

# Clerk — create a free app at https://dashboard.clerk.com
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_...
CLERK_SECRET_KEY=sk_test_...

# OpenRouter — required by backend + Mastra for AI model calls.
# Generate at https://openrouter.ai/settings/keys
OPENROUTER_API_KEY=sk-or-...

# Generate once after the first `make dev` with:
# docker compose exec convex ./generate_admin_key.sh
# Used by the backend container to call internal Convex functions.
Expand Down
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,10 @@ yarn-debug.log*
tmp/
temp/

backend/output/
.mastra

# Local tarballs
*.tgz

# Internal docs
BigSet Technical Specs & Goals.md
24 changes: 18 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# BigSet

Monorepo: `frontend/` (Next.js 16) + `backend/` (Fastify). Run with `make dev` (Docker).
Monorepo: `frontend/` (Next.js 16) + `backend/` (Fastify + Mastra). Run with `make dev` (Docker).

Frontend on :3500, backend on :3501.
Frontend on :3500, backend on :3501, Mastra Studio on :4111, Convex dashboard on :6791.

## Setup

Expand All @@ -12,21 +12,33 @@ Frontend on :3500, backend on :3501.
- `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` — from Clerk API Keys
- `CLERK_SECRET_KEY` — from Clerk API Keys
- `CLERK_JWT_ISSUER_DOMAIN` — your Frontend API URL (e.g. `https://your-app.clerk.accounts.dev`)
4. Run `make dev` — this starts all Docker services AND pushes Convex functions automatically.
5. Generate a Convex admin key (first run only): `docker compose exec convex ./generate_admin_key.sh` and add it as `CONVEX_SELF_HOSTED_ADMIN_KEY` in `frontend/.env.local`, then re-run `make dev`.
4. Add an OpenRouter API key to the root `.env` file: `OPENROUTER_API_KEY=sk-or-...` (get one at https://openrouter.ai/settings/keys). Docker Compose reads the root `.env` and passes it to the backend and Mastra containers.
5. Run `make dev` — this starts all Docker services AND pushes Convex functions automatically.
6. Generate a Convex admin key (first run only): `docker compose exec convex ./generate_admin_key.sh` and add it as `CONVEX_SELF_HOSTED_ADMIN_KEY` in `frontend/.env.local`, then re-run `make dev`.

## Architecture

Auth is Clerk. Frontend uses `@clerk/nextjs` with `ClerkProvider` wrapping the app. Convex validates Clerk JWTs via `convex/auth.config.ts`. Protected routes enforced by Clerk proxy (`frontend/proxy.ts`). No self-hosted auth database.

Dataset storage uses Convex (self-hosted at :3210). Schema in `frontend/convex/schema.ts`, functions in `frontend/convex/datasets.ts` and `frontend/convex/datasetRows.ts`. Convex dashboard at :6791.

Frontend uses Convex React hooks (`useQuery`, `useMutation`) with `ConvexProviderWithClerk` for authenticated realtime queries. Use `useConvexAuth()` (not Clerk's `useAuth()`) to check auth state in components.
Frontend uses Convex React hooks (`useQuery`, `useMutation`) with `ConvexProviderWithClerk` for authenticated realtime queries. Use `useConvexAuth()` (not Clerk's `useAuth()`) to check auth state in components. For backend calls, the frontend uses `useAuth().getToken()` from `@clerk/nextjs` to get a Bearer token and passes it to the API client in `frontend/lib/backend.ts`.

Backend is an agent runner — Fastify + `ConvexHttpClient`. It writes to Convex via HTTP mutations (`backend/src/convex.ts`). It does not handle auth.
Backend is Fastify + Mastra. Fastify serves the HTTP API (Clerk JWT auth on protected routes via `backend/src/clerk-auth.ts`). Mastra (`backend/src/mastra/`) is the workflow orchestration layer — it wraps pipelines into inspectable workflows with a Studio UI. Both run as separate Docker services sharing the same source code.

The schema inference pipeline: frontend calls `POST /infer-schema` → Fastify verifies the Clerk JWT → calls `inferSchema()` in `backend/src/pipeline/schema-inference.ts` → Claude Sonnet 4.6 via OpenRouter → returns a Zod-validated `DatasetSchema` → frontend maps it to editable columns in the wizard.

Convex functions use `ctx.auth.getUserIdentity()` to get the authenticated user. The `ownerId` field on datasets stores `identity.subject` (Clerk user ID). Do not pass `ownerId` from the client.

## Environment Variables

Docker Compose interpolates variables from the root `.env` file. Key variables:
- `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY`, `CLERK_SECRET_KEY` — shared by frontend and backend
- `OPENROUTER_API_KEY` — used by backend and Mastra for AI model calls
- `CONVEX_SELF_HOSTED_ADMIN_KEY` — used by backend for system-level Convex writes

The backend container maps `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` → `CLERK_PUBLISHABLE_KEY` (see `docker-compose.dev.yml`).

## Convex Deploys

Convex is self-hosted — it does NOT hot-reload when you edit files in `frontend/convex/`. After changing any Convex function, schema, or auth config, you must run `make convex-push` to deploy the updated code to the running instance. `make dev` does this automatically on startup, but subsequent edits require a manual push.
Expand Down
7 changes: 4 additions & 3 deletions backend/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@ PORT=3501
# Generate with: docker compose exec convex ./generate_admin_key.sh
CONVEX_SELF_HOSTED_ADMIN_KEY=

# Required once any user-facing protected route is added.
# Same value as the frontend's CLERK_SECRET_KEY.
# Required for user-facing protected routes (JWT verification).
# Same values as the frontend's Clerk keys.
CLERK_SECRET_KEY=
CLERK_PUBLISHABLE_KEY=

# OpenRouter API key — required by the schema-inference CLI (npm run infer-schema).
# OpenRouter API key — required by schema inference.
# Generate at https://openrouter.ai/settings/keys
OPENROUTER_API_KEY=sk-or-...
36 changes: 35 additions & 1 deletion backend/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,42 @@

Fastify + TypeScript + ESM (`"type": "module"` — use `.js` extensions in imports).

The backend is an agent runner. It does not handle auth — that is Clerk's job on the frontend.
## HTTP API (Fastify)

Fastify serves the backend API on :3501. Protected routes use Clerk JWT verification via the `requireAuth` preHandler in `src/clerk-auth.ts`. The frontend passes a Bearer token (from `@clerk/nextjs` `getToken()`) on every request.

Routes:
- `GET /health` — public health check
- `POST /infer-schema` — protected. Accepts `{ prompt: string }`, returns a `DatasetSchema`. Calls `inferSchema()` from the pipeline.

To add a new protected route, register it inside the scoped plugin in `src/index.ts` that has `requireAuth` as a preHandler. Use `req.auth.userId` for the authenticated user — never trust user-supplied IDs in the body.

## Schema Inference Pipeline

`src/pipeline/schema-inference.ts` — takes a natural language prompt and returns a structured `DatasetSchema` (Zod-validated, defined in `src/pipeline/types.ts`). Uses Claude Sonnet 4.6 via OpenRouter (`@openrouter/ai-sdk-provider` + Vercel AI SDK). Auto-retries once on validation failure by feeding the error back to the model.

The pipeline is a pure function (`inferSchema(prompt) → DatasetSchema`). It is called by both Fastify (for the HTTP API) and Mastra (for workflow orchestration).

## Mastra (Workflow Orchestration)

`src/mastra/` — wraps pipelines into Mastra workflows. Runs as a separate Docker service on :4111 with `mastra dev`, which provides a Studio UI for inspecting and testing workflows.

- `src/mastra/index.ts` — registers workflows with the `Mastra` instance
- `src/mastra/workflows/infer-schema.ts` — `inferSchemaWorkflow`, a single-step workflow wrapping `inferSchema()`

Mastra uses `HOST` and `PORT` env vars for binding. In Docker, `HOST=0.0.0.0` is required.

## Convex

Writes to Convex via `ConvexHttpClient` in `src/convex.ts`. Import `{ convex, api }` from `./convex.js` to call Convex mutations and queries. The `api` types are re-exported from the frontend's generated Convex code.

The `tsconfig.json` includes `../frontend/convex` so TypeScript can resolve the generated types.

## Environment

Required env vars (see `.env.example`):
- `CONVEX_URL` — Convex instance URL
- `CLERK_SECRET_KEY`, `CLERK_PUBLISHABLE_KEY` — for JWT verification
- `OPENROUTER_API_KEY` — for AI model calls

In Docker, these are interpolated from the root `.env` file via `docker-compose.dev.yml`.
13 changes: 13 additions & 0 deletions backend/Dockerfile.mastra
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM node:22-alpine

WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci

COPY tsconfig.json ./
COPY src/ ./src/
RUN chown -R node:node /app
USER node

CMD ["npx", "mastra", "dev"]
Loading