Skip to content

Keelan test project#1

Open
keelansmithers wants to merge 21 commits intosource-medium:mainfrom
keelansmithers:keelan__test_project
Open

Keelan test project#1
keelansmithers wants to merge 21 commits intosource-medium:mainfrom
keelansmithers:keelan__test_project

Conversation

@keelansmithers
Copy link

Intent

Build a dbt project from scratch the takes raw subscription data and transforms it into a BI-ready, daily subscription metrics model, as instructed in this notion doc.

The goal is to match the output at the bottom of the notion doc, but getting close is sufficient for this first pass PR. In this case, some of the columns are exactly aligned, and a number are very close.

project/DAG structure

The project is made up of a three main stages of models:

  • base—raw data lightly cleaned + utility models like date spine
  • intermediate—broken out calculations for each metric + multi-step processes that change model grain 1+ times
  • final—the final daily subscription metrics output model

DAG
Screen Shot 2022-11-16 at 9 45 26 AM

The majority of metrics can be calculated as somewhat minor transforms (filtering + aggregating to date). I chose to break these each out into their own intermediate model for now for easier logical readability + debugging while developing. The most complex set of models is the cascading logic captured in the date_spined_fanouts and date_spine_derivatives folders:

  • subscriptions_days - full fanout of subscriptions by day from created_date to cancelled_date
  • subscribers_days - an intermediate-step aggregation from subscriptions by day to subscribers by day (customer_id * date)
  • the models in date_spine_derivatives then follow a similar process of pre-aggregating information to a date level as the minor transforms that directly reference the base model.

Validation of models

My final output is close in most cases to the notion doc output. The biggest sources of discrepancy are

  • subscriptions_churned
  • subscribers_active
  • subscribers_churned

Screen Shot 2022-11-16 at 9 59 16 AM

Screen Shot 2022-11-16 at 9 57 41 AM

I have some theories for where my model could be improved to get closer to the original output, but in the interest of time I will open this PR and save that for discussion.

Checklist

  • dbt run runs successfully
  • models have MVP test coverage (PK tests primarily) I did not write out date PK tests for every model, but this could be a to-do if there is a concern
  • dbt test passes on all tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant