Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,23 +1,33 @@
# These are read by docker-compose.dev.yml.
# This is the only local env file BigSet expects.
# Copy this file to .env and fill in your values.

# Local service URLs
CLIENT_ORIGIN=http://localhost:3500
CONVEX_URL=http://localhost:3210
NEXT_PUBLIC_CONVEX_URL=http://127.0.0.1:3210
CONVEX_SELF_HOSTED_URL=http://127.0.0.1:3210
NEXT_PUBLIC_BACKEND_URL=http://localhost:3501
PORT=3501

# Clerk — create a free app at https://dashboard.clerk.com
# Enable the Clerk JWT Templates -> Convex template, then set your issuer URL.
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_...
CLERK_SECRET_KEY=sk_test_...
CLERK_JWT_ISSUER_DOMAIN=https://your-app.clerk.accounts.dev

# OpenRouter — required by backend + Mastra for AI model calls.
# Generate at https://openrouter.ai/settings/keys
OPENROUTER_API_KEY=sk-or-...

# TinyFish — used by the backend's populate agent for web search and fetch.
# Generate at https://agent.tinyfish.ai/api-keys
TINYFISH_API_KEY=

# Generate once after the first `make dev` with:
# docker compose exec convex ./generate_admin_key.sh
# Used by the backend container to call internal Convex functions.
CONVEX_SELF_HOSTED_ADMIN_KEY=

# TinyFish — used by the backend's populate agent for web search and fetch.
# Generate at https://agent.tinyfish.ai/api-keys
TINYFISH_API_KEY=

# Resend (optional — transactional emails when a populate workflow finishes).
# Unset → email module logs and no-ops. Generate at https://resend.com/api-keys
RESEND_API_KEY=
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.DS_Store
node_modules/
.npm-cache/
.env
.env.local
Project_BigSet_brief.md
Expand All @@ -26,4 +27,4 @@ temp/
*.tgz

# Internal docs
BigSet Technical Specs & Goals.md
BigSet Technical Specs & Goals.md
8 changes: 4 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ Frontend on :3500, backend on :3501, Mastra Studio on :4111, Convex dashboard on

1. Create a free Clerk account at https://clerk.com and create an application.
2. In the Clerk dashboard, go to **JWT Templates** and enable the **Convex** template.
3. Copy `frontend/.env.example` to `frontend/.env.local` and fill in your Clerk keys:
3. Copy `.env.example` to `.env` and fill in your Clerk keys:
- `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` — from Clerk API Keys
- `CLERK_SECRET_KEY` — from Clerk API Keys
- `CLERK_JWT_ISSUER_DOMAIN` — your Frontend API URL (e.g. `https://your-app.clerk.accounts.dev`)
4. Add an OpenRouter API key to the root `.env` file: `OPENROUTER_API_KEY=sk-or-...` (get one at https://openrouter.ai/settings/keys). Docker Compose reads the root `.env` and passes it to the backend and Mastra containers.
4. Add an OpenRouter API key to the root `.env` file: `OPENROUTER_API_KEY=sk-or-...` (get one at https://openrouter.ai/settings/keys).
4b. Add a TinyFish API key to the root `.env` file: `TINYFISH_API_KEY=...` (get one at https://agent.tinyfish.ai/api-keys). This enables the populate agent to search the web and fetch page content.
5. Run `make dev` — this starts all Docker services AND pushes Convex functions automatically.
6. Generate a Convex admin key (first run only): `docker compose exec convex ./generate_admin_key.sh` and add it as `CONVEX_SELF_HOSTED_ADMIN_KEY` in `frontend/.env.local`, then re-run `make dev`.
6. Generate a Convex admin key (first run only): `docker compose exec convex ./generate_admin_key.sh` and add it as `CONVEX_SELF_HOSTED_ADMIN_KEY` in root `.env`, then re-run `make dev`.

## Architecture

Expand All @@ -35,7 +35,7 @@ Convex functions use `ctx.auth.getUserIdentity()` to get the authenticated user.

## Environment Variables

Docker Compose interpolates variables from the root `.env` file. Key variables:
Root `.env` is the only local env file. Docker Compose, package scripts, and Convex CLI helper targets all read it. Key variables:
- `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY`, `CLERK_SECRET_KEY` — shared by frontend and backend
- `OPENROUTER_API_KEY` — used by backend and Mastra for AI model calls
- `CONVEX_SELF_HOSTED_ADMIN_KEY` — used by backend for system-level Convex writes
Expand Down
27 changes: 12 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,12 @@ cd bigset

Create a Clerk application at [dashboard.clerk.com](https://dashboard.clerk.com), then go to **JWT Templates** and enable the **Convex** template.

### 2. Configure env files
### 2. Configure env

```bash
# Root .env — used by Docker for the frontend container
cp .env.example .env
# Fill in NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY and CLERK_SECRET_KEY

# Frontend .env.local — used by Next.js and Convex CLI
cp frontend/.env.example frontend/.env.local
# Fill in all three Clerk keys (publishable, secret, and JWT issuer domain)
# Fill in NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY, CLERK_SECRET_KEY,
# CLERK_JWT_ISSUER_DOMAIN, OPENROUTER_API_KEY, and optional service keys.
```

> **Required for the create-dataset wizard:** set `OPENROUTER_API_KEY` (used by the schema-inference pipeline). Get one at [openrouter.ai](https://openrouter.ai). Without it the wizard's "Generate Schema" step will fail.
Expand All @@ -66,7 +62,9 @@ cp frontend/.env.example frontend/.env.local
make dev
```

This starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically. Once it's up:
This starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically.
`make dev` checks that root `.env` contains real Clerk and OpenRouter values before it starts Docker.
Once it's up:

- App: http://localhost:3500
- Convex dashboard: http://localhost:6791
Expand All @@ -78,26 +76,26 @@ This starts all Docker services, waits for Convex to be healthy, and deploys Con
docker compose exec convex ./generate_admin_key.sh
```

Paste the output into `frontend/.env.local` as `CONVEX_SELF_HOSTED_ADMIN_KEY`, then re-run `make dev`.
Paste the output into `.env` as `CONVEX_SELF_HOSTED_ADMIN_KEY`, then re-run `make dev`.

### 5. Load curated public datasets

The landing page and the dashboard's "Curated" section read from a set of 9 system-owned datasets. Load them with:

```bash
cd frontend
npx convex run publicSeed:seedPublicDatasets
make seed-public-datasets
```

The script is **idempotent** — rerunning it skips datasets that already exist (matched by a stable `seedKey`, so renaming a curated dataset never creates a duplicate). To add a 10th curated dataset, append it to `PUBLIC_DATASETS` in [frontend/convex/publicSeed.ts](frontend/convex/publicSeed.ts) with a fresh `seedKey` and rerun the command. To replace existing curated content in place, pass `force: true`:

```bash
npx convex run publicSeed:seedPublicDatasets '{"force":true}'
cd frontend
node ../scripts/with-root-env.mjs npx convex run publicSeed:seedPublicDatasets '{"force":true}'
```

Open [localhost:3500](http://localhost:3500) and click **Get started** to sign in.

> **Note:** Backend env needs no setup — `backend/.env.example` has correct defaults. If you edit Convex functions in `frontend/convex/`, run `make convex-push` to deploy the changes.
> **Note:** root `.env` is the only local env file. If you edit Convex functions in `frontend/convex/`, run `make convex-push` to deploy the changes.

> **Free tier:** each signed-in account gets **2,500 row operations per calendar month** (resets on the 1st, UTC). The header shows a live usage badge; system-owned curated datasets bypass the quota.

Expand All @@ -123,14 +121,13 @@ Open [localhost:3500](http://localhost:3500) and click **Get started** to sign i
bigset/
├── frontend/ Next.js 16 — UI + Convex schema & functions
│ ├── convex/ Convex functions, schema, authz + quota helpers
│ └── .env.local Clerk + Convex keys (not committed)
├── backend/ Fastify + Mastra — schema inference + populate agent
│ ├── src/pipeline/ Pure pipelines: schema inference + populate context
│ ├── src/mastra/ Mastra workflows, agents, and tools (Studio at :4111 in dev)
│ ├── src/email/ Transactional email (Resend) — sends "dataset ready" notifications
│ └── src/analytics/ Server-side PostHog wrapper for backend-only events
├── scripts/ One-off scripts (e.g. verify-authz.sh)
├── .env Clerk keys for docker-compose (not committed)
├── .env Local env for frontend, backend, Convex CLI, and Docker (not committed)
├── docker-compose.dev.yml
└── Makefile
```
Expand Down
31 changes: 0 additions & 31 deletions backend/.env.example

This file was deleted.

14 changes: 8 additions & 6 deletions backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@ Fastify server that handles auth, database, and talks to TinyFish APIs.
## Running

```bash
# From the repo root:
cp .env.example .env
# Set BETTER_AUTH_SECRET (openssl rand -base64 32)
# Fill in the root .env file.
cd backend
npm install
npx drizzle-kit push
npm run dev
```

Expand All @@ -17,14 +18,15 @@ Starts on [localhost:3501](http://localhost:3501).
## Key Paths

- `src/index.ts` — Fastify server + route setup
- `src/auth.ts` — Better Auth config
- `src/schema.ts` — Drizzle table definitions
- `src/db.ts` — Database connection
- `src/clerk-auth.ts` — Clerk JWT verification
- `src/convex.ts` — Convex HTTP client
- `src/env.ts` — root env loader

## Scripts

| Command | What it does |
|---------|-------------|
| `npm run dev` | Start with hot reload |
| `npm run build` | Compile TypeScript |
| `npm run db:push` | Push schema changes to Postgres |

Local backend scripts load the repo-root `.env` through `../scripts/with-root-env.mjs`.
6 changes: 3 additions & 3 deletions backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
"type": "module",
"private": true,
"scripts": {
"dev": "tsx watch src/index.ts",
"dev": "node ../scripts/with-root-env.mjs tsx watch src/index.ts",
"build": "tsc",
"start": "node dist/index.js",
"mastra:dev": "mastra dev"
"start": "node ../scripts/with-root-env.mjs node dist/index.js",
"mastra:dev": "node ../scripts/with-root-env.mjs mastra dev"
},
"dependencies": {
"@clerk/backend": "^3.4.11",
Expand Down
6 changes: 5 additions & 1 deletion backend/src/convex.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ import { anyApi } from "convex/server";

import { env } from "./env.js";

type ConvexHttpClientWithAdminAuth = ConvexHttpClient & {
setAdminAuth(token: string): void;
};

/**
* Convex client for SYSTEM-LEVEL operations from the backend.
*
Expand All @@ -27,5 +31,5 @@ export const internal = anyApi;
export const convex = new ConvexHttpClient(env.CONVEX_URL);

if (env.CONVEX_ADMIN_KEY) {
convex.setAdminAuth(env.CONVEX_ADMIN_KEY);
(convex as ConvexHttpClientWithAdminAuth).setAdminAuth(env.CONVEX_ADMIN_KEY);
}
9 changes: 7 additions & 2 deletions backend/src/env.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
import "dotenv/config";
import { config as loadDotenv } from "dotenv";
import { fileURLToPath } from "node:url";

loadDotenv({ path: fileURLToPath(new URL("../../.env", import.meta.url)) });

function required(name: string): string {
const value = process.env[name];
Expand All @@ -21,7 +24,9 @@ export const env = {
// Used by ./clerk-auth.ts to verify JWTs on protected routes (e.g.
// /infer-schema). Required for the backend to function.
CLERK_SECRET_KEY: process.env.CLERK_SECRET_KEY,
CLERK_PUBLISHABLE_KEY: process.env.CLERK_PUBLISHABLE_KEY,
CLERK_PUBLISHABLE_KEY:
process.env.CLERK_PUBLISHABLE_KEY ??
process.env.NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY,

OPENROUTER_API_KEY: process.env.OPENROUTER_API_KEY,

Expand Down
18 changes: 14 additions & 4 deletions backend/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,11 @@ await fastify.register(async (instance) => {
}

try {
const auth = req.auth;
if (!auth) {
return reply.code(401).send({ error: "Authentication required" });
}

// Ownership check uses the INTERNAL (admin-callable, no-authz) getter.
// We can't use `api.datasets.get` here because that runs through
// `loadReadableDataset`, which requires either a Clerk-identified
Expand All @@ -94,7 +99,7 @@ await fastify.register(async (instance) => {
if (!dataset) {
return reply.code(404).send({ error: "Dataset not found" });
}
if (dataset.ownerId !== req.auth.userId) {
if (dataset.ownerId !== auth.userId) {
return reply.code(403).send({ error: "Not authorized to populate this dataset" });
}

Expand All @@ -108,7 +113,7 @@ await fastify.register(async (instance) => {
inputData: {
...parsed.data,
authContext: {
authorizedUserId: req.auth!.userId,
authorizedUserId: auth.userId,
workflowRunId: run.runId,
},
},
Expand Down Expand Up @@ -257,13 +262,18 @@ await fastify.register(async (instance) => {
}

try {
const auth = req.auth;
if (!auth) {
return reply.code(401).send({ error: "Authentication required" });
}

const dataset = await convex.query(internal.datasets.getInternal, {
id: parsed.data.datasetId,
});
if (!dataset) {
return reply.code(404).send({ error: "Dataset not found" });
}
if (dataset.ownerId !== req.auth.userId) {
if (dataset.ownerId !== auth.userId) {
return reply.code(403).send({ error: "Not authorized to update this dataset" });
}

Expand All @@ -272,7 +282,7 @@ await fastify.register(async (instance) => {
inputData: {
...parsed.data,
authContext: {
authorizedUserId: req.auth!.userId,
authorizedUserId: auth.userId,
workflowRunId: run.runId,
},
},
Expand Down
2 changes: 1 addition & 1 deletion backend/src/mastra/tools/investigate-tool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ function parseInvestigateResult(
const reasonMatch = text.match(/REASON:\s*(.+?)$/is);

return {
inserted: insertedMatch?.[1]?.toLowerCase() === "true" ?? false,
inserted: insertedMatch?.[1]?.toLowerCase() === "true",
row_summary: summaryMatch?.[1]?.trim() || undefined,
clues: cluesMatch?.[1]?.trim() || undefined,
reason: reasonMatch?.[1]?.trim() || text.slice(0, 300),
Expand Down
2 changes: 1 addition & 1 deletion backend/src/pipeline/schema-inference.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ async function callOnce(
model,
output: Output.object({ schema: datasetSchemaSchema }),
system: SYSTEM_PROMPT,
maxTokens: 4096,
maxOutputTokens: 4096,
prompt,
});
if (!output) throw new Error("Model did not generate a valid schema object");
Expand Down
9 changes: 9 additions & 0 deletions docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,13 @@ services:
build:
context: ./backend
dockerfile: Dockerfile.dev
env_file:
- .env
ports:
- "3501:3501"
volumes:
- ./backend/src:/app/src
- ./scripts:/scripts:ro
environment:
CLIENT_ORIGIN: http://localhost:3500
CONVEX_URL: http://convex:3210
Expand All @@ -48,10 +51,13 @@ services:
build:
context: ./backend
dockerfile: Dockerfile.mastra
env_file:
- .env
ports:
- "4111:4111"
volumes:
- ./backend/src:/app/src
- ./scripts:/scripts:ro
environment:
HOST: 0.0.0.0
PORT: 4111
Expand All @@ -67,6 +73,8 @@ services:
build:
context: ./frontend
dockerfile: Dockerfile.dev
env_file:
- .env
Comment on lines +76 to +77

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove full root .env injection from the frontend service.

Line 74 introduces env_file: .env for frontend, which makes server-only keys available inside the frontend container runtime. Keep frontend env scoped to explicit required vars to preserve least privilege.

Suggested fix
   frontend:
     build:
       context: ./frontend
       dockerfile: Dockerfile.dev
-    env_file:
-      - .env
     ports:
       - "3500:3500"

As per coding guidelines, frontend/**: "Frontend uses Next.js 16, React 19, Tailwind 4 for pure UI — no server-side auth logic".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env_file:
- .env
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.dev.yml` around lines 74 - 75, The frontend service currently
injects the full root .env via the env_file: .env entry which exposes
server-only secrets; edit the docker-compose service named frontend to remove
the env_file: .env line and replace it with an explicit environment: block
listing only the public frontend variables required at runtime (e.g.,
NEXT_PUBLIC_API_URL, NEXT_PUBLIC_ANALYTICS_KEY or other NEXT_PUBLIC_* keys used
by the Next.js app), ensuring no SERVER_ or secret keys are included; locate the
frontend service block in the compose diff and make this substitution so only
scoped public env vars are passed into the container.

ports:
- "3500:3500"
volumes:
Expand All @@ -80,6 +88,7 @@ services:
# time and silently ignores local edits.
- ./frontend/proxy.ts:/app/proxy.ts
- ./frontend/next.config.ts:/app/next.config.ts
- ./scripts:/scripts:ro
environment:
NEXT_PUBLIC_CONVEX_URL: http://localhost:3210
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY: ${NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY}
Expand Down
Loading