Skip to content

feat: detect and connect data warehouse sources#488

Open
Gilbert09 wants to merge 7 commits into
mainfrom
feat/warehouse-source-detection
Open

feat: detect and connect data warehouse sources#488
Gilbert09 wants to merge 7 commits into
mainfrom
feat/warehouse-source-detection

Conversation

@Gilbert09
Copy link
Copy Markdown
Member

Summary

Adds the ability for the wizard to detect data warehouse sources a project already uses (Postgres, MySQL, MongoDB, Snowflake, BigQuery, Stripe, …) and help connect them to PostHog's data warehouse.

  • Config-driven detector registry (src/lib/warehouse-sources/) — scans dependencies (npm/python/ruby) and .env key names (never values) and maps signals to a PostHog source kind + creation mode. Adding a source is one registry entry. kind strings verified against the MCP external-data-sources-wizard tool.
  • New warehouse program (npx @posthog/wizard warehouse) — mirrors revenue-analytics. In-CLI creation for databases/API-key sources; deep-link to the app's new-source flow for OAuth sources. The hybrid behaviour lives in the context-mill skill (see below), keeping source knowledge out of the runner.
  • Main-flow soft prompt — the default wizard now detects sources during its normal run and, after the outro, offers to connect them (opens the pre-filled new-source page in the browser, or skip). Hidden entirely when nothing is detected.

Respects the codebase discipline: detection is a typed config surface; the creation procedure is a context-mill skill; field schemas come from the MCP at runtime.

Companion PR

The agent playbook ships as the data-warehouse-source-setup skill in context-mill — this program references it by skillId. That skill must be released before the end-to-end flow works.

Test plan

  • pnpm build passes
  • pnpm lint — 0 errors
  • pnpm test — full suite green (added detector tests + warehouse-offer predicate tests)
  • Manual: run npx @posthog/wizard warehouse against a Postgres/Stripe project once the context-mill skill is released
  • Manual: confirm the soft prompt appears after a normal run in a repo with a detectable source, and is hidden otherwise

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 28, 2026 13:53
@github-actions
Copy link
Copy Markdown

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

Add a config-driven detector registry that scans a project's
dependencies and .env key names (never values) for data warehouse
sources, and a new `warehouse` program that connects them: in-CLI
creation for databases/API-key sources, deep-link to the app for
OAuth sources. The main flow also surfaces a soft prompt when a
source is detected during a normal run.

The agent playbook lives in the context-mill `data-warehouse-source-setup`
skill; the wizard handles detection and orchestration only.
@Gilbert09 Gilbert09 force-pushed the feat/warehouse-source-detection branch from 0f0ae95 to 6255622 Compare May 28, 2026 13:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “data warehouse sources” capability to the wizard: it can detect common warehouse/DB/SaaS sources used by a project and then either guide setup via a new warehouse program or offer a post-run deep-link prompt in the main flow.

Changes:

  • Introduces a config-driven detector registry + detection engine (src/lib/warehouse-sources/) that scans dependency manifests and .env key names to infer source kinds/modes.
  • Adds a new npx @posthog/wizard warehouse program with a detection intro screen and skill-driven setup flow.
  • Adds a post-run “Connect your data warehouse?” soft prompt in the main PostHog integration program (gated on detected sources + dismiss state).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/utils/open-url.ts Adds a helper to open URLs in the default browser without blocking the wizard process.
src/ui/tui/store.ts Adds a session setter to persist dismissal of the warehouse offer screen.
src/ui/tui/screens/WarehouseOfferScreen.tsx New post-run TUI screen offering to open the PostHog “new source” setup URL.
src/ui/tui/screens/WarehouseIntroScreen.tsx New intro/detection-result screen for the dedicated warehouse program.
src/ui/tui/screen-sequences.ts Registers new screen IDs for warehouse intro/offer screens.
src/ui/tui/screen-registry.tsx Registers the new WarehouseIntro/WarehouseOffer screens with the TUI screen factory.
src/ui/tui/tests/programs.test.ts Adds predicate tests ensuring the warehouse offer is shown/hidden correctly.
src/lib/wizard-session.ts Adds warehouseOfferDismissed to persisted session state defaults/types.
src/lib/warehouse-sources/types.ts Defines typed detector config surface and detected-source payload shape.
src/lib/warehouse-sources/registry.ts Adds initial detector entries mapping signals to PostHog source kinds + creation modes.
src/lib/warehouse-sources/detect.ts Implements repo scanning and matching against the detector registry.
src/lib/warehouse-sources/tests/detect.test.ts Adds unit tests for detection across npm/python/env/gemfile signals and deduping/ignore rules.
src/lib/programs/warehouse-source/steps.ts Defines step sequence for the new warehouse program (detect → intro → auth → run → outro → skills).
src/lib/programs/warehouse-source/index.ts Registers the new program config and builds a prompt including detected sources for the skill.
src/lib/programs/warehouse-source/detect.ts Program-level adapter writing detect results/errors into frameworkContext + abort cases.
src/lib/programs/warehouse-source/content/index.tsx Re-exports the generic agent-skill learn deck for the warehouse program.
src/lib/programs/program-registry.ts Adds the new warehouse-source program to the registry/enum.
src/lib/programs/posthog-integration/steps.ts Adds the post-outro warehouse offer step gated on detection results and dismiss state.
src/lib/programs/posthog-integration/detect.ts Runs warehouse source detection during the main integration detect step.
README.md Documents the new npx @posthog/wizard warehouse command and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/utils/open-url.ts Outdated
Comment thread src/lib/warehouse-sources/detect.ts
Comment thread src/ui/tui/screens/WarehouseIntroScreen.tsx
Add detectors for Convex, Clerk, Resend, Shopify, Klaviyo, Chargebee,
Paddle, Polar, Mailchimp, Customer.io, Typeform (in-CLI) and Intercom,
Linear (deep-link), covering most released sources with a reliable
codebase footprint. Sentry uses token auth, so move it to in-CLI.
@gewenyu99
Copy link
Copy Markdown
Collaborator

Hey, thank you for adding this! this will help us match up with the onboarding flow even more!

What I'll ask is that you ship this as its own command, maybe instead of data-warehouse, call it @posthog/wizard data-source

I'm also gonna ask that we ship this as its own clean skills for now, because the actual task queue is a bit buggy right now, and we're looking to fix that soon.

Also excuse our mess in this repo, and props for pushing through some of our slop <3

@Gilbert09
Copy link
Copy Markdown
Member Author

Hey, thank you for adding this! this will help us match up with the onboarding flow even more!

What I'll ask is that you ship this as its own command, maybe instead of data-warehouse, call it @posthog/wizard data-source

I'm also gonna ask that we ship this as its own clean skills for now, because the actual task queue is a bit buggy right now, and we're looking to fix that soon.

Also excuse our mess in this repo, and props for pushing through some of our slop <3

@gewenyu99 When you say its own command, you mean something other than npx @posthog/wizard warehouse (minus the name change)? I thought this was under its own command. Happy to rename it regardless. Would this still run during the usual setup wizard?

@gewenyu99
Copy link
Copy Markdown
Collaborator

Yep that's what I mean. npx @posthog/wizard warehouse is a great command name as well. @Gilbert09

Right now, I don't see edits to bin.ts which is where you'd register a new command. It looks like this is nested as a branch flow under the main wizard flow. This is probably not a great place for now, we experimented with this but really don't love the results.

@gewenyu99
Copy link
Copy Markdown
Collaborator

O @Gilbert09 let me know if you'd like to just hand this to me. We have a bunch of stuff related to refactoring the bin. I can roll this into my changes. It's a big PR so it might be problematic, I don't wanna accidentally bulldose some of this work.

@Gilbert09
Copy link
Copy Markdown
Member Author

O @Gilbert09 let me know if you'd like to just hand this to me. We have a bunch of stuff related to refactoring the bin. I can roll this into my changes. It's a big PR so it might be problematic, I don't wanna accidentally bulldose some of this work.

I was gonna set an agent to address your comment - but if you think you can get this over the line quicker, then by all means - otherwise I'll have something ready for review again by morning?

@sarahxsanders sarahxsanders self-requested a review June 3, 2026 15:58
@sarahxsanders
Copy link
Copy Markdown
Collaborator

@Gilbert09 hello!! not to keep throwing you to different people, I'm gonna pick this up to get it over the line for you. we are 🪓 the existing architecture for LLMA and Stripe and breaking them into their own wizard programs, so I'll pull your branch and do the same today! will have it ready for you by tomorrow morning with stamps for the rest of the work :)

Brings Tom's PR #488 up to date with latest main (source maps,
OAuth paste, 2.15.0 release). Conflicts resolved in program-registry,
screen-registry, and screen-sequences by keeping both sides.

Generated-By: PostHog Code
Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
Per review feedback on #488: ship data warehouse setup as its own
`warehouse` command only, not nested as a branch flow under the main
wizard. Keeps product-specific knowledge out of shared infrastructure,
matching the source-maps / revenue program pattern.

Removes the post-run warehouse offer and every leak it introduced:
the main detect step's source scan, the warehouse-offer step,
warehouseOfferDismissed session state + store setter, the offer screen
and its screen wiring, and the open-url util that only the offer used.

The standalone `warehouse` program is unaffected.

Generated-By: PostHog Code
Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
Bugs:
- WarehouseIntroScreen: error header now reflects the error kind
  (bad-directory vs no-sources) instead of always "No data warehouse
  source detected".
- detect.ts: check filename before reading file contents, so only the
  ~6 manifest/.env files are read instead of every file in the tree.

Detection:
- MongoDB envKeys: single /^MONGO(DB)?_/ pattern — matches MONGO_* and
  MONGODB_* prefixes, closing the MONGO_HOST gap and removing redundancy.
- Follow symlinked directories (with realpath loop protection) so
  monorepos that symlink packages are detected. Added a test.
- Relax warehouse abort-case regexes to tolerate plural / trailing period
  from the (external) skill.

Cleanup:
- Extract shared walkProjectFiles + safeReadFile into file-utils.ts;
  warehouse detect.ts uses them instead of a hand-rolled walker.
- Use the canonical PackageJson type; drop the duplicated directory guard.
- Centralize the detectedWarehouseSources context key behind one accessor
  used by both the prompt builder and the intro screen.
- Extract a SourceGroup component to remove copy-pasted list JSX.

Generated-By: PostHog Code
Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
Part B (the post-run offer nested in the main flow) was removed, so the
README should no longer say the main wizard offers warehouse setup as a
follow-up.

Generated-By: PostHog Code
Task-Id: d2a3b0bb-9932-4e65-9fdc-cf2b3e677ab0
@sarahxsanders
Copy link
Copy Markdown
Collaborator

@Gilbert09 made a few updates!

  • made this its own program! this means the standalone npx @posthog/wizard warehouse command is how you run this. we are meeting in person next week + will discuss discoverability for wizard programs that live outside the core installation run to make sure it gets surfaced to users!
  • fixed the intro error header so it works better for bad/missing directories vs. no source detected
  • detection catches the filename before reading a file (it was slurping every file in the tree into memory first, so it's just a little more efficient)
  • pulled the directory walker out into a shared helped since we copy it across programs (this wasn't your doing, just some prep to pull the other programs out on their own)

if all looks good, we can ship!!

@Gilbert09
Copy link
Copy Markdown
Member Author

@Gilbert09 made a few updates!

  • made this its own program! this means the standalone npx @posthog/wizard warehouse command is how you run this. we are meeting in person next week + will discuss discoverability for wizard programs that live outside the core installation run to make sure it gets surfaced to users!
  • fixed the intro error header so it works better for bad/missing directories vs. no source detected
  • detection catches the filename before reading a file (it was slurping every file in the tree into memory first, so it's just a little more efficient)
  • pulled the directory walker out into a shared helped since we copy it across programs (this wasn't your doing, just some prep to pull the other programs out on their own)

if all looks good, we can ship!!

Nice one - thank you! Does this mean people wont see this until you peeps have met up and figured out how to surface it etc?

@sarahxsanders
Copy link
Copy Markdown
Collaborator

@Gilbert09 yes for now. we can add it to docs to surface it. we just have to refactor the other two embedded programs and come to a consensus for how we market visibility. Joe is doing some cool stuff with MCP installs where it recognizes the user's PostHog role after auth and surfaces things specific to what they may be interested in. could be useful to reuse that!

@Gilbert09
Copy link
Copy Markdown
Member Author

@Gilbert09 yes for now. we can add it to docs to surface it. we just have to refactor the other two embedded programs and come to a consensus for how we market visibility. Joe is doing some cool stuff with MCP installs where it recognizes the user's PostHog role after auth and surfaces things specific to what they may be interested in. could be useful to reuse that!

@sarahxsanders Okay cool - could be useful to always run this one and only surface stuff when it find sources it could connect - as opposed to a user explicitly opting into it etc. Do gimme a ping in #team-warehouse-sources when this is figured out - I'm keen to track usage and see how well its doing etc

@sarahxsanders
Copy link
Copy Markdown
Collaborator

@Gilbert09 yes!! I will ping y'all once it's figured out, leaving a not for myself now 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants