This repo contains the complete SQL walkthrough for the blog post churn series (published on tivon.io):
- "We Lost Customers, But Not Revenue: Measuring Churn Three Ways in SQL"
- churn definitions + scoreboard (SQL 00-10, notebook 01)
- "How I Used SQL to Find Early Churn Warning Signs"
- pre-churn signals + tier-level churn + reason codes by tier (SQL 11-19, notebook 02)
It uses a public SaaS-style dataset (Ravenstack) to walk through a practical churn workflow:
- Define churn consistently at the account-month level
- Reconcile logo churn, revenue churn, and activity churn
- Compare pre-churn behavior against retained accounts
- Segment churn patterns by plan tier
- Connect recorded churn reasons back to those tier-level patterns
You will compute three churn lenses from the same underlying account-month snapshot and reconcile them in a single monthly scoreboard:
- Logo churn: paying last month -> not paying this month
- Revenue churn + retention: MRR waterfall components + NRR
- Activity churn: paying customers who go inactive (usage drops to zero)
You will also reproduce three supporting views used in the post visuals:
- A Scoreboard trends chart
- An MRR waterfall for a highlighted month
- MRR mix shift by plan tier over time
You will extend the same project to analyze churn drivers:
- Support burden at the account-month level
- Usage behavior at the account-month level
- Pre-churn vs retained comparisons
- Recent 3 months vs prior 3 months change signals
- Change signals segmented by plan tier
- Recorded churn reasons by plan tier
You will also reproduce three supporting visuals:
- Change signals by tier
- Recorded churn reasons by tier
- Enterprise-focused support signal
These are the Ravenstack CSVs loaded into Postgres:
ravenstack_accounts.csvravenstack_subscriptions.csvravenstack_feature_usage.csvravenstack_support_tickets.csvravenstack_churn_events.csv
Run these in order:
-
00_create_tables.sql
Creates theravenstackschema and base tables. -
01_load_data.sql
Loads CSVs via server-sideCOPYfrom/data/raw. -
02_profile.sql
Quick profiling and validation checks (row counts, uniqueness, and basic integrity checks). -
03_account_mrr_month.sql
v_account_mrr_month(one row per account-month; month-end style snapshot). -
04_logo_churn.sql
v_logo_churn_month(logo churn, new customers, customer counts). -
05_revenue_waterfall.sql
v_revenue_waterfall_month(MRR waterfall, gross revenue churn, NRR). -
06_activity_churn.sql
v_activity_churn_month(inactive-but-paying, activity churn). -
07_scoreboard.sql
v_churn_scoreboard_month(logo + revenue + activity churn reconciled in one view). -
08_checks.sql
Reconciliation checks (MRR and customer count invariants). -
09_mix_shift_by_plan.sql
v_mix_shift_by_plan_month(tier mix shift over time). -
10_extracts_for_python.sql
Plot-ready extracts used by the notebook. -
11_support_metrics.sql
v_support_account_month(support burden at the account-month grain). -
12_usage_metrics.sql
v_usage_account_month(usage behavior at the account-month grain). -
13_prechurn_cohorts.sql
v_account_month_cohorts(retained / churn_month / pre_churn_3m labels). -
14_churn_driver_table.sql
v_churn_driver_table_3m(first joined comparison of pre-churn vs retained behavior). -
15_change_signals_3m.sql
v_change_signals_3m_account(recent_3m vs prior_3m change signals per account). -
16_change_signals_summary.sql
v_change_signals_3m_summary(churned vs retained summary plus coverage counts). -
17_change_signals_summary_by_tier.sql
v_change_signals_3m_summary_by_tier(change signals segmented by plan tier). -
18_reason_code_by_tier.sql
v_reason_code_by_tier(recorded churn reasons summarized by plan tier). -
19_extracts_for_python.sql
Plot-ready extracts for Part 2 visuals.
-
01_visuals.ipynb
Generates the Part 1 figures:fig_01_scoreboard_trends.pngfig_02_mrr_waterfall_2024_09.pngfig_03_mrr_mix_shift_stacked.png
-
02_visuals.ipynb
Generates the Part 2 figures:fig_04_change_signals_by_tier.pngfig_05_reason_code_by_tier.pngfig_06_enterprise_support_signal.png
Churn math is computed on a single consistent grain: one row per account per month, anchored to month close.
Month-end rule: a subscription counts for a month only if it is active as of the last day of the month.
Accounts can have overlapping subscription rows. v_account_mrr_month resolves this by selecting one “winner” subscription per account-month using a deterministic ranking rule:
- latest
start_date - then highest
mrr_amount - then
subscription_id
That avoids double-counting and makes month-over-month comparisons stable.
Account-month spine: month-end snapshots omit non-paying months, so downstream churn views build a complete account-month spine (every account × every month) and fill missing months with zeros. This ensures month-to-month comparisons are truly adjacent calendar months instead of “previous paying month”.
The source usage file can contain duplicate usage_id values. The table uses a surrogate row_id (bigserial) instead of enforcing a primary key on usage_id.
Downstream usage logic aggregates usage to the account-month level.
The pre-churn analysis does not create a new churn definition. It reuses the Part 1 logo-churn anchor:
prev_mrr > 0mrr_amount = 0
That churn month becomes the anchor for recent 3-month vs prior 3-month comparisons.
08_checks.sql validates two invariants:
-
MRR waterfall reconciliation
starting + new + expansion - contraction - churned = ending -
Customer reconciliation
customers_end = customers_start + new_customers - churned_customers
If diffs are non-zero, stop and debug before interpreting churn.
The Part 2 workflow adds three additional layers of validation:
- support metrics are one row per
(account_id, month_start) - usage metrics are one row per
(account_id, month_start) - pre-churn windows are aligned to the first logo-churn month and exclude post-churn months
Coverage counts are also carried into the summary views so sparse percent changes can be interpreted correctly.
- Docker Desktop
- A PostgreSQL container (this repo uses Postgres 18)
psqlavailable inside the container (standard for official postgres images)- Optional: VS Code + PostgreSQL extension (or SQLTools)
- Optional (for visuals): Python + Jupyter (see
requirements.txt)
cd $HOME
git clone https://github.com/tivonio/sql-churn-analysis.git
cd sql-churn-analysis
docker compose up -d
Confirm the container is running:
docker ps
You should see a container named pg18.
This project uses server-side COPY, which means the CSV path must be visible to the Postgres server.
The included docker-compose.yml mounts:
./data/raw(host) →/data/raw(container, read-only)
So /data/raw/*.csv exists inside the container.
If you want to run the SQL files from inside the container, the SQL scripts need to be accessible inside the container too.
The simplest approach is to mount the repo into the container (for example: /workspace) by adding this to your compose service volumes:
- ./:/workspace
Then run (example):
docker exec -it pg18 psql -U postgres -d lab -f /workspace/sql/00_create_tables.sql
docker exec -it pg18 psql -U postgres -d lab -f /workspace/sql/01_load_data.sql
- Open the repo folder in VS Code.
- Use the PostgreSQL extension to connect to the container database.
- Host:
localhost - Port:
5434(fromdocker-compose.yml) - Database:
lab - User:
postgres
- Host:
- Open a file in
sql/and run the statements from the editor (in order).
This keeps your workflow inside .sql files, which is ideal for learning and for clean GitHub diffs.
The notebook reads from the SQL extracts and produces the three post figures.
If you are using a virtual environment:
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
Then open:
01_visuals.ipynb for Part 1 figures
02_visuals.ipynb for Part 2 figures
Resetting is useful when you want to reproduce the walkthrough from scratch.
docker compose down -v
docker compose up -d
- If
01_load_data.sqlfails with file path errors, the most common cause is that/data/rawis not mounted into the container. - If you are running
psqlfrom your host machine (client-side), use\copyinstead ofCOPYso the file path resolves on the client. .gitignoreexcludes typical local artifacts.requirements.txtpins the Python environment used for the notebook (pandas/matplotlib/psycopg2, plus Jupyter dependencies).