Keelan test project by keelansmithers · Pull Request #1 · source-medium/analytics_engineer_test_project

keelansmithers · 2022-11-16T15:00:40Z

Intent

Build a dbt project from scratch the takes raw subscription data and transforms it into a BI-ready, daily subscription metrics model, as instructed in this notion doc.

The goal is to match the output at the bottom of the notion doc, but getting close is sufficient for this first pass PR. In this case, some of the columns are exactly aligned, and a number are very close.

project/DAG structure

The project is made up of a three main stages of models:

base—raw data lightly cleaned + utility models like date spine
intermediate—broken out calculations for each metric + multi-step processes that change model grain 1+ times
final—the final daily subscription metrics output model

DAG

The majority of metrics can be calculated as somewhat minor transforms (filtering + aggregating to date). I chose to break these each out into their own intermediate model for now for easier logical readability + debugging while developing. The most complex set of models is the cascading logic captured in the date_spined_fanouts and date_spine_derivatives folders:

subscriptions_days - full fanout of subscriptions by day from created_date to cancelled_date
subscribers_days - an intermediate-step aggregation from subscriptions by day to subscribers by day (customer_id * date)
the models in date_spine_derivatives then follow a similar process of pre-aggregating information to a date level as the minor transforms that directly reference the base model.

Validation of models

My final output is close in most cases to the notion doc output. The biggest sources of discrepancy are

subscriptions_churned
subscribers_active
subscribers_churned

I have some theories for where my model could be improved to get closer to the original output, but in the interest of time I will open this PR and save that for discussion.

Checklist

dbt run runs successfully
models have MVP test coverage (PK tests primarily) I did not write out date PK tests for every model, but this could be a to-do if there is a concern
dbt test passes on all tests

… cancelled)

keelansmithers added 21 commits November 11, 2022 12:57

update service_account.json for access to bigquery

cfe5d13

create first pass base table

fc01331

clean up base model, remove test.sql

71135bf

rename base model

550bf3a

establish final model and include easy metrics (subscriptions new and…

389b233

… cancelled)

add row_number to base model to derive subscribers_new

68257a5

first pass subscription date spine model

a8187a6

add tests to base models

a087b0d

edit subscriptions date spine model and create active subs models

dfa398d

quick logic fix

be1f6ee

rearrange final model to all join to date spine, add coalesces

375b235

fix typo

c5356f0

add final model PK test

f02313d

filter out bad records where status is cancelled but not cancelled at

ecadcca

losing the plot, need to capture work now before changing more

402c4a9

rename model to cancelled

9c6df21

reorganize models, clean up dependencies

b51c263

create subscribers_days model

ccf1c14

add subscribers churned metric

029ef96

add test for consistency of metric from two differrent calcs

e4f37d5

add composite key tests for date spined models

af9b0a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keelan test project#1

Keelan test project#1
keelansmithers wants to merge 21 commits intosource-medium:mainfrom
keelansmithers:keelan__test_project

keelansmithers commented Nov 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

keelansmithers commented Nov 16, 2022

Intent

project/DAG structure

Validation of models

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant