feat: detect and connect data warehouse sources#488
Conversation
🧙 Wizard CIRun the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands: Test all apps:
Test all apps in a directory:
Test an individual app:
Show more apps
Results will be posted here when complete. |
Add a config-driven detector registry that scans a project's dependencies and .env key names (never values) for data warehouse sources, and a new `warehouse` program that connects them: in-CLI creation for databases/API-key sources, deep-link to the app for OAuth sources. The main flow also surfaces a soft prompt when a source is detected during a normal run. The agent playbook lives in the context-mill `data-warehouse-source-setup` skill; the wizard handles detection and orchestration only.
0f0ae95 to
6255622
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new “data warehouse sources” capability to the wizard: it can detect common warehouse/DB/SaaS sources used by a project and then either guide setup via a new warehouse program or offer a post-run deep-link prompt in the main flow.
Changes:
- Introduces a config-driven detector registry + detection engine (
src/lib/warehouse-sources/) that scans dependency manifests and.envkey names to infer source kinds/modes. - Adds a new
npx @posthog/wizard warehouseprogram with a detection intro screen and skill-driven setup flow. - Adds a post-run “Connect your data warehouse?” soft prompt in the main PostHog integration program (gated on detected sources + dismiss state).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/open-url.ts | Adds a helper to open URLs in the default browser without blocking the wizard process. |
| src/ui/tui/store.ts | Adds a session setter to persist dismissal of the warehouse offer screen. |
| src/ui/tui/screens/WarehouseOfferScreen.tsx | New post-run TUI screen offering to open the PostHog “new source” setup URL. |
| src/ui/tui/screens/WarehouseIntroScreen.tsx | New intro/detection-result screen for the dedicated warehouse program. |
| src/ui/tui/screen-sequences.ts | Registers new screen IDs for warehouse intro/offer screens. |
| src/ui/tui/screen-registry.tsx | Registers the new WarehouseIntro/WarehouseOffer screens with the TUI screen factory. |
| src/ui/tui/tests/programs.test.ts | Adds predicate tests ensuring the warehouse offer is shown/hidden correctly. |
| src/lib/wizard-session.ts | Adds warehouseOfferDismissed to persisted session state defaults/types. |
| src/lib/warehouse-sources/types.ts | Defines typed detector config surface and detected-source payload shape. |
| src/lib/warehouse-sources/registry.ts | Adds initial detector entries mapping signals to PostHog source kinds + creation modes. |
| src/lib/warehouse-sources/detect.ts | Implements repo scanning and matching against the detector registry. |
| src/lib/warehouse-sources/tests/detect.test.ts | Adds unit tests for detection across npm/python/env/gemfile signals and deduping/ignore rules. |
| src/lib/programs/warehouse-source/steps.ts | Defines step sequence for the new warehouse program (detect → intro → auth → run → outro → skills). |
| src/lib/programs/warehouse-source/index.ts | Registers the new program config and builds a prompt including detected sources for the skill. |
| src/lib/programs/warehouse-source/detect.ts | Program-level adapter writing detect results/errors into frameworkContext + abort cases. |
| src/lib/programs/warehouse-source/content/index.tsx | Re-exports the generic agent-skill learn deck for the warehouse program. |
| src/lib/programs/program-registry.ts | Adds the new warehouse-source program to the registry/enum. |
| src/lib/programs/posthog-integration/steps.ts | Adds the post-outro warehouse offer step gated on detection results and dismiss state. |
| src/lib/programs/posthog-integration/detect.ts | Runs warehouse source detection during the main integration detect step. |
| README.md | Documents the new npx @posthog/wizard warehouse command and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add detectors for Convex, Clerk, Resend, Shopify, Klaviyo, Chargebee, Paddle, Polar, Mailchimp, Customer.io, Typeform (in-CLI) and Intercom, Linear (deep-link), covering most released sources with a reliable codebase footprint. Sentry uses token auth, so move it to in-CLI.
|
Hey, thank you for adding this! this will help us match up with the onboarding flow even more! What I'll ask is that you ship this as its own command, maybe instead of data-warehouse, call it I'm also gonna ask that we ship this as its own clean skills for now, because the actual task queue is a bit buggy right now, and we're looking to fix that soon. Also excuse our mess in this repo, and props for pushing through some of our slop <3 |
@gewenyu99 When you say its own command, you mean something other than |
|
Yep that's what I mean. Right now, I don't see edits to |
|
O @Gilbert09 let me know if you'd like to just hand this to me. We have a bunch of stuff related to refactoring the bin. I can roll this into my changes. It's a big PR so it might be problematic, I don't wanna accidentally bulldose some of this work. |
I was gonna set an agent to address your comment - but if you think you can get this over the line quicker, then by all means - otherwise I'll have something ready for review again by morning? |
|
@Gilbert09 hello!! not to keep throwing you to different people, I'm gonna pick this up to get it over the line for you. we are 🪓 the existing architecture for LLMA and Stripe and breaking them into their own wizard programs, so I'll pull your branch and do the same today! will have it ready for you by tomorrow morning with stamps for the rest of the work :) |
Brings Tom's PR #488 up to date with latest main (source maps, OAuth paste, 2.15.0 release). Conflicts resolved in program-registry, screen-registry, and screen-sequences by keeping both sides. Generated-By: PostHog Code Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
Per review feedback on #488: ship data warehouse setup as its own `warehouse` command only, not nested as a branch flow under the main wizard. Keeps product-specific knowledge out of shared infrastructure, matching the source-maps / revenue program pattern. Removes the post-run warehouse offer and every leak it introduced: the main detect step's source scan, the warehouse-offer step, warehouseOfferDismissed session state + store setter, the offer screen and its screen wiring, and the open-url util that only the offer used. The standalone `warehouse` program is unaffected. Generated-By: PostHog Code Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
Bugs: - WarehouseIntroScreen: error header now reflects the error kind (bad-directory vs no-sources) instead of always "No data warehouse source detected". - detect.ts: check filename before reading file contents, so only the ~6 manifest/.env files are read instead of every file in the tree. Detection: - MongoDB envKeys: single /^MONGO(DB)?_/ pattern — matches MONGO_* and MONGODB_* prefixes, closing the MONGO_HOST gap and removing redundancy. - Follow symlinked directories (with realpath loop protection) so monorepos that symlink packages are detected. Added a test. - Relax warehouse abort-case regexes to tolerate plural / trailing period from the (external) skill. Cleanup: - Extract shared walkProjectFiles + safeReadFile into file-utils.ts; warehouse detect.ts uses them instead of a hand-rolled walker. - Use the canonical PackageJson type; drop the duplicated directory guard. - Centralize the detectedWarehouseSources context key behind one accessor used by both the prompt builder and the intro screen. - Extract a SourceGroup component to remove copy-pasted list JSX. Generated-By: PostHog Code Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
Part B (the post-run offer nested in the main flow) was removed, so the README should no longer say the main wizard offers warehouse setup as a follow-up. Generated-By: PostHog Code Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
|
@Gilbert09 made a few updates!
if all looks good, we can ship!! |
Nice one - thank you! Does this mean people wont see this until you peeps have met up and figured out how to surface it etc? |
|
@Gilbert09 yes for now. we can add it to docs to surface it. we just have to refactor the other two embedded programs and come to a consensus for how we market visibility. Joe is doing some cool stuff with MCP installs where it recognizes the user's PostHog role after auth and surfaces things specific to what they may be interested in. could be useful to reuse that! |
@sarahxsanders Okay cool - could be useful to always run this one and only surface stuff when it find sources it could connect - as opposed to a user explicitly opting into it etc. Do gimme a ping in #team-warehouse-sources when this is figured out - I'm keen to track usage and see how well its doing etc |
|
@Gilbert09 yes!! I will ping y'all once it's figured out, leaving a not for myself now 🙏 |
Summary
Adds the ability for the wizard to detect data warehouse sources a project already uses (Postgres, MySQL, MongoDB, Snowflake, BigQuery, Stripe, …) and help connect them to PostHog's data warehouse.
src/lib/warehouse-sources/) — scans dependencies (npm/python/ruby) and.envkey names (never values) and maps signals to a PostHog sourcekind+ creation mode. Adding a source is one registry entry.kindstrings verified against the MCPexternal-data-sources-wizardtool.warehouseprogram (npx @posthog/wizard warehouse) — mirrorsrevenue-analytics. In-CLI creation for databases/API-key sources; deep-link to the app's new-source flow for OAuth sources. The hybrid behaviour lives in the context-mill skill (see below), keeping source knowledge out of the runner.Respects the codebase discipline: detection is a typed config surface; the creation procedure is a context-mill skill; field schemas come from the MCP at runtime.
Companion PR
The agent playbook ships as the
data-warehouse-source-setupskill in context-mill — this program references it byskillId. That skill must be released before the end-to-end flow works.Test plan
pnpm buildpassespnpm lint— 0 errorspnpm test— full suite green (added detector tests +warehouse-offerpredicate tests)npx @posthog/wizard warehouseagainst a Postgres/Stripe project once the context-mill skill is released🤖 Generated with Claude Code