Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ cp frontend/.env.example frontend/.env.local
# Fill in all three Clerk keys (publishable, secret, and JWT issuer domain)
```

> **Required for the create-dataset wizard:** set `OPENROUTER_API_KEY` (used by the schema-inference pipeline). Get one at [openrouter.ai](https://openrouter.ai). Without it the wizard's "Generate Schema" step will fail.

> **Optional:** to enable [PostHog](https://posthog.com) product analytics + session replay + error tracking, set `NEXT_PUBLIC_POSTHOG_KEY` and `NEXT_PUBLIC_POSTHOG_HOST`. Leave blank to disable cleanly (the app no-ops every event).
Comment on lines +59 to 61

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix blockquote formatting to satisfy markdownlint (MD028).

Line 60 inserts a blank line between blockquote lines, which triggers no-blanks-blockquote. Remove the blank line or make the paragraph continuous within the same blockquote.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 60-60: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 59 - 61, Remove the blank line breaking the two
blockquote lines in README.md so the two sentences about OPENROUTER_API_KEY and
the optional PostHog keys remain in the same blockquote (i.e., make the
paragraph continuous rather than separated by an empty line) to satisfy
markdownlint MD028; update the blockquote that contains "Required for the
create-dataset wizard: set `OPENROUTER_API_KEY`..." and the subsequent
"Optional: to enable PostHog..." line so they are part of the same >-prefixed
block.


### 3. Start everything
Expand All @@ -64,7 +66,11 @@ cp frontend/.env.example frontend/.env.local
make dev
```

This starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically.
This starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically. Once it's up:

- App: http://localhost:3500
- Convex dashboard: http://localhost:6791
- [Mastra Studio](https://mastra.ai) (workflow inspector): http://localhost:4111

### 4. Generate Convex admin key (first time only)

Expand Down Expand Up @@ -93,6 +99,8 @@ Open [localhost:3500](http://localhost:3500) and click **Get started** to sign i

> **Note:** Backend env needs no setup — `backend/.env.example` has correct defaults. If you edit Convex functions in `frontend/convex/`, run `make convex-push` to deploy the changes.

> **Free tier:** each signed-in account gets **2,500 row operations per calendar month** (resets on the 1st, UTC). The header shows a live usage badge; system-owned curated datasets bypass the quota.

---

## 🛠 Tech Stack
Expand All @@ -104,17 +112,22 @@ Open [localhost:3500](http://localhost:3500) and click **Get started** to sign i
| Auth | [Clerk](https://clerk.com) |
| Database | [Convex](https://convex.dev) (self-hosted) |
| Data Collection | [TinyFish](https://tinyfish.ai) APIs (Search, Fetch, Browser) |
| Schema inference | [Mastra](https://mastra.ai) workflows + [Vercel AI SDK](https://sdk.vercel.ai) + [OpenRouter](https://openrouter.ai) → Claude Sonnet |
| Table view | [TanStack Table](https://tanstack.com/table) + [react-window](https://github.com/bvaughn/react-window) virtualization |
| Exports | CSV (built-in) + XLSX ([SheetJS](https://sheetjs.com), dynamic-imported) |
| Analytics | [PostHog](https://posthog.com) — events, session replay, error tracking (optional) |

## 📁 Project Structure

```text
bigset/
├── frontend/ Next.js 16 — UI + Convex schema & functions
│ ├── convex/ Convex functions, schema, and auth config
│ ├── convex/ Convex functions, schema, authz + quota helpers
│ └── .env.local Clerk + Convex keys (not committed)
├── backend/ Fastify — agent runner, writes to Convex via HTTP
├── backend/ Fastify + Mastra — schema inference + (future) agents
│ ├── src/pipeline/ Pure schema-inference fn (called by Fastify + Mastra)
│ └── src/mastra/ Mastra workflows (Studio at :4111 in dev)
├── scripts/ One-off scripts (e.g. verify-authz.sh)
├── .env Clerk keys for docker-compose (not committed)
├── docker-compose.dev.yml
└── Makefile
Expand Down
67 changes: 48 additions & 19 deletions frontend/app/dashboard/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@

import { useEffect, useMemo, useRef, useState } from "react";
import Link from "next/link";
import { useQuery, useMutation, useConvexAuth } from "convex/react";
import { useQuery, useConvexAuth } from "convex/react";
import { useUser, useClerk } from "@clerk/nextjs";
import { api } from "@/convex/_generated/api";
import {
DatasetCard,
type DatasetCardData,
} from "@/components/dataset/DatasetCard";
import { ThemeToggle } from "@/components/ThemeToggle";
import { QuotaBadge } from "@/components/QuotaBadge";
import { EVENTS, track } from "@/lib/analytics";

export default function DashboardPage() {
Expand All @@ -25,17 +26,13 @@ export default function DashboardPage() {
// Public datasets are open to anonymous users too, so no `skip` gate.
const curated = useQuery(api.datasets.listPublic, {});

const seedData = useMutation(api.seed.seed);
const hasSeeded = useRef(false);

useEffect(() => {
if (mine && mine.length === 0 && isAuthenticated && !hasSeeded.current) {
hasSeeded.current = true;
void seedData({}).catch(() => {
hasSeeded.current = false;
});
}
}, [mine, isAuthenticated, seedData]);
// Quota state drives the "+ New Dataset" button — disabled when the
// user is at their free-tier limit. `undefined` while loading.
const usage = useQuery(
api.quota.getMy,
isAuthenticated ? {} : "skip",
);
const atLimit = usage !== undefined && usage.remaining === 0;

// Fire dashboard_viewed once per mount when both queries have resolved,
// so we attach accurate counts. `dashboardFired` prevents the effect
Expand Down Expand Up @@ -86,6 +83,8 @@ export default function DashboardPage() {
<img src="/BigSetLogo.png" alt="BigSet" className="h-[30px] dark:hidden" />
<img src="/BigSetLogoDarkBG.png" alt="BigSet" className="h-[30px] hidden dark:block" />
<div className="flex items-center gap-4">
<QuotaBadge />
<div className="w-px h-4 bg-border" />
<ThemeToggle />
<div className="w-px h-4 bg-border" />
{/* PII: mask the email in session replays */}
Expand Down Expand Up @@ -148,12 +147,40 @@ export default function DashboardPage() {
className="w-full rounded-lg border border-border bg-surface py-2.5 pl-10 pr-3 text-sm outline-none placeholder:text-muted/60 focus:border-foreground/30 transition-[border-color] duration-150"
/>
</div>
<Link
href="/dataset/new"
className="rounded-lg border border-accent bg-accent px-5 py-2.5 text-sm font-semibold text-accent-text transition-opacity hover:opacity-90"
>
+ New Dataset
</Link>
{atLimit ? (
<div className="relative group">
<span
role="button"
tabIndex={0}
aria-disabled="true"
aria-describedby="quota-popover"
className="inline-block rounded-lg border border-border bg-surface px-5 py-2.5 text-sm font-semibold text-muted cursor-not-allowed select-none focus:outline-none focus:ring-1 focus:ring-foreground/20"
>
+ New Dataset
</span>
{/*
Custom popover beside the disabled button. Replaces the
native `title=""` tooltip so we can style consistently
with the rest of the UI and use the exact wording requested.
Shown on hover via Tailwind's `group-hover`.
*/}
<div
id="quota-popover"
role="tooltip"
className="pointer-events-none absolute left-full ml-3 top-1/2 -translate-y-1/2 z-20 w-64 rounded-md border border-border bg-surface px-3 py-2 text-xs text-foreground opacity-0 translate-x-[-4px] transition-all duration-150 ease-out shadow-[0_4px_12px_rgba(0,0,0,0.08)] dark:shadow-[0_4px_12px_rgba(0,0,0,0.4)] group-hover:opacity-100 group-hover:translate-x-0 group-focus-within:opacity-100 group-focus-within:translate-x-0"
>
<span className="absolute -left-1.5 top-1/2 -translate-y-1/2 h-3 w-3 rotate-45 border-l border-b border-border bg-surface" />
Free-tier limit reached (2,500 row modifications). Please upgrade.
</div>
</div>
Comment thread
coderabbitai[bot] marked this conversation as resolved.
) : (
<Link
href="/dataset/new"
className="rounded-lg border border-accent bg-accent px-5 py-2.5 text-sm font-semibold text-accent-text transition-opacity hover:opacity-90"
>
+ New Dataset
</Link>
)}
</div>

<Section
Expand All @@ -164,7 +191,9 @@ export default function DashboardPage() {
emptyState={
search
? `No datasets of yours match "${search}".`
: "You don't have any datasets yet. Create your first one above."
: atLimit
? "You've used all of this month's free-tier quota. New datasets will be available again when the quota resets at the start of next month."
: "No datasets yet. Click \"+ New Dataset\" above to create your first one."
}
/>

Expand Down
85 changes: 70 additions & 15 deletions frontend/app/dataset/[id]/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@

import { useParams } from "next/navigation";
import Link from "next/link";
import { useEffect, useRef, useState } from "react";
import { useEffect, useMemo, useRef, useState } from "react";
import { useQuery, useConvexAuth } from "convex/react";
import { useAuth } from "@clerk/nextjs";
import { api } from "@/convex/_generated/api";
import type { Id } from "@/convex/_generated/dataModel";
import { DatasetTable } from "@/components/table";
import { useSelection } from "@/components/table/use-selection";
import { ThemeToggle } from "@/components/ThemeToggle";
import { StatusBadge } from "@/components/dataset/StatusBadge";
import { downloadCSV, downloadXLSX } from "@/lib/export";
Expand All @@ -16,16 +17,24 @@ import { EVENTS, captureException, track } from "@/lib/analytics";

export default function DatasetPage() {
const params = useParams();
const { isLoading } = useConvexAuth();
const { isLoading: authLoading } = useConvexAuth();
const { userId, getToken } = useAuth();
const [exporting, setExporting] = useState<"csv" | "xlsx" | null>(null);
const [populating, setPopulating] = useState(false);

const datasetId = params.id as Id<"datasets">;
const dataset = useQuery(api.datasets.get, isLoading ? "skip" : { id: datasetId });
const rows = useQuery(api.datasetRows.listByDataset, isLoading ? "skip" : {
datasetId,
});
const dataset = useQuery(
api.datasets.get,
authLoading ? "skip" : { id: datasetId },
);
const rows = useQuery(
api.datasetRows.listByDataset,
authLoading ? "skip" : { datasetId },
);

const rowIds = useMemo(() => (rows ?? []).map((r) => r._id), [rows]);
const selection = useSelection(rowIds);
const selectedCount = selection.selected.size;

// Fire dataset_opened once per dataset visit, after the dataset has
// resolved. The ref keeps it idempotent across re-renders.
Expand All @@ -44,16 +53,28 @@ export default function DatasetPage() {

async function handleExport(format: "csv" | "xlsx") {
if (!dataset || !rows || exporting) return;

// If the user has rows selected, export ONLY those. Otherwise the
// entire dataset. Preserves column ordering (handled by the export
// util — it iterates `dataset.columns` in order).
const exportRows =
selectedCount > 0
? rows.filter((r) => selection.selected.has(r._id))
: rows;
if (exportRows.length === 0) return;

setExporting(format);
try {
if (format === "csv") {
downloadCSV(dataset.name, dataset.columns, rows);
downloadCSV(dataset.name, dataset.columns, exportRows);
} else {
await downloadXLSX(dataset.name, dataset.columns, rows);
await downloadXLSX(dataset.name, dataset.columns, exportRows);
}
track(EVENTS.DATASET_EXPORTED, {
format,
row_count: rows.length,
row_count: exportRows.length,
total_rows: rows.length,
selected_only: selectedCount > 0,
seedKey: dataset.seedKey,
});
} catch (err) {
Expand All @@ -62,7 +83,8 @@ export default function DatasetPage() {
operation: "dataset_export",
format,
datasetId: dataset._id,
row_count: rows.length,
row_count: exportRows.length,
selected_only: selectedCount > 0,
});
} finally {
setExporting(null);
Expand Down Expand Up @@ -98,7 +120,7 @@ export default function DatasetPage() {
}
}

if (isLoading || dataset === undefined || rows === undefined) {
if (authLoading || dataset === undefined || rows === undefined) {
return (
<div className="flex flex-1 items-center justify-center">
<p className="text-muted">Loading...</p>
Expand All @@ -110,6 +132,20 @@ export default function DatasetPage() {
// thrown instead — caught by /dataset/[id]/error.tsx, which renders
// the "Dataset not found" UI.

const exportDisabled = exporting !== null || rows.length === 0;
const csvLabel =
exporting === "csv"
? "Exporting…"
: selectedCount > 0
? `Export CSV (${selectedCount})`
: "Export CSV";
const xlsxLabel =
exporting === "xlsx"
? "Exporting…"
: selectedCount > 0
? `Export XLSX (${selectedCount})`
: "Export XLSX";

return (
<div className="flex flex-1 flex-col h-screen">
<header className="border-b border-border px-5 py-3 flex items-center justify-between bg-surface shrink-0">
Expand All @@ -130,17 +166,27 @@ export default function DatasetPage() {
</span>
<button
onClick={() => handleExport("csv")}
disabled={exporting !== null || rows.length === 0}
disabled={exportDisabled}
title={
selectedCount > 0
? `Export ${selectedCount} selected row${selectedCount === 1 ? "" : "s"} to CSV`
: "Export all rows to CSV"
}
className="border border-border px-3 py-1.5 text-xs font-medium text-foreground hover:bg-foreground/[0.03] transition-colors disabled:opacity-40 disabled:cursor-not-allowed"
>
{exporting === "csv" ? "Exporting…" : "Export CSV"}
{csvLabel}
</button>
<button
onClick={() => handleExport("xlsx")}
disabled={exporting !== null || rows.length === 0}
disabled={exportDisabled}
title={
selectedCount > 0
? `Export ${selectedCount} selected row${selectedCount === 1 ? "" : "s"} to XLSX`
: "Export all rows to XLSX"
}
className="border border-border px-3 py-1.5 text-xs font-medium text-foreground hover:bg-foreground/[0.03] transition-colors disabled:opacity-40 disabled:cursor-not-allowed"
>
{exporting === "xlsx" ? "Exporting…" : "Export XLSX"}
{xlsxLabel}
</button>
<button
onClick={handlePopulate}
Expand All @@ -159,6 +205,14 @@ export default function DatasetPage() {
{dataset.description}
</p>
<div className="ml-auto flex items-center gap-4 text-[11px] text-muted shrink-0">
{selectedCount > 0 && (
<>
<span className="text-foreground/80 font-medium">
{selectedCount} selected
</span>
<span className="text-foreground/10">|</span>
</>
)}
<span>{rows.length} rows</span>
<span className="text-foreground/10">|</span>
<span>{dataset.columns.length} columns</span>
Expand All @@ -169,6 +223,7 @@ export default function DatasetPage() {
dataset={dataset}
rows={rows}
datasetId={datasetId}
selection={selection}
/>
</div>
);
Expand Down
Loading