Skip to content

Feat(dbt): Add support for selected resources context variable#5177

Merged
themisvaltinos merged 5 commits intomainfrom
themis/selectdbt
Sep 9, 2025
Merged

Feat(dbt): Add support for selected resources context variable#5177
themisvaltinos merged 5 commits intomainfrom
themis/selectdbt

Conversation

@themisvaltinos
Copy link
Contributor

This aims to add support for selected resources in dbt projects. When accessed in jinja this would return they keys which correspond to the dbt nodes objects (example format: model.project_name.model_name), but that have been selected for the current plan or run using the sqlmesh selectors.

@themisvaltinos
Copy link
Contributor Author

address comments let me know if this looks better @erindru


def _dbt_model_id(self, sqlmesh_model_name: str) -> str:
parts = [part.strip('"') for part in sqlmesh_model_name.split(".")]
return f"model.{parts[0]}.{parts[-1]}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to check the dbt docs to understand why the hardcoded model. prefix is necessary, might be worth leaving a comment.

Also, sqlmesh_model_name is coming through as a quoted fqn, right? And this is on the RuntimeAdapter which has the snapshots available on self.snapshots?

Rather than string munging, is it possible to look up the matching snapshot and grab the original model name, something like:
f"model.{self.snapshots[sqlmesh_model_name].node.name}" ?

Copy link
Contributor Author

@themisvaltinos themisvaltinos Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes didn't like the string munging either don't know if there was a cleaner way to do this. we don't keep snapshots in the RuntimeAdapter just use it for the table mapping but we could save it or use it during init similar to table mapping and keep a mapping of snapshot names to dbt ones. but the issue is this {self.snapshots[sqlmesh_model_name].node.name} wouldn't work in the way that it would return main.orders for example rather than jaffle_shop.orders for '"jaffle_shop"."main"."orders"'. I feel you encountered a similar issue with the selectors. the reason I want to have the dbt way and not sqlmesh btw is they might be used in macros as a key to the graph which would contain the dbt way in the nodes dictionary so model - package - model name rather than schema

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't keep snapshots in the RuntimeAdapter just use it for the table mapping

Ah, gotcha, this trips me up all the time. Yes you're right, it appears the snapshots currently limited to whatever is referenced by the model query, not all the snapshots available in the environment.

the reason I want to have the dbt way and not sqlmesh btw is they might be used in macros as a key to the graph

yes, absolutely, this needs to return the dbt model name and not the sqlmesh model name. The thing is, I don't know if we can accurately determine the dbt model name based on munging the SQLMesh fqn.

Although I guess all the dbt models do live in a global namespace and schemas are an implementation detail, so maybe this isn't actually the problem I think it is. In the absence of a clear better solution, let's run with this and see how far it gets us 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, I don't know if we can accurately determine the dbt model name based on munging the SQLMesh fqn. Although I guess all the dbt models do live in a global namespace and schemas are an implementation detail, so maybe this isn't actually the problem I think it is.

actually it is the problem you think it is as I run more examples this solution falls short for example in Postgres where the package might not be the db name.. I'm looking for a better way to map the sqlmesh model names to the dbt ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erindru so I tried different ways but all got quite complicated and without having perfect robustness. so settled in getting it for the snapshot itself (which does simplify things a bunch as all these helper methods are removed) as you originally suggested:

Rather than string munging, is it possible to look up the matching snapshot and grab the original model name

but since we don't have the dbt name in there, had to add it in the node. I remember you also needed this dbt model name somewhere for a mapping, unless I'm mistaken, so if you can take a final look to see how this looks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I didnt realise we already had a property BaseModelConfig.node_name to produce this value.

Yes, adding it to the main _Node abstraction is probably the way to go here because it populates it early on and makes it accessible from everywhere.

I think this will provide a nice way to leverage this value in the selector engine too

@erindru
Copy link
Collaborator

erindru commented Aug 29, 2025

Yep this looks much better, just a couple of small things.

Also I guess the seed. and test. nodes are out of scope for this PR? I was curious about this, so I created a model in jaffle_shop_duckdb like so:

/* {{ selected_resources }} */
select 1;

And it returns the following on dbt compile:

/*
[
'test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a',
'model.jaffle_shop.stg_orders',
'test.jaffle_shop.unique_orders_order_id.fed79b3a6e',
'test.jaffle_shop.unique_stg_customers_customer_id.c7614daada',
'test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59',
'seed.jaffle_shop.raw_payments',
'test.jaffle_shop.not_null_orders_order_id.cf6c17daed',
'test.jaffle_shop.unique_stg_payments_payment_id.3744510712',
'test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075',
'test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa',
'seed.jaffle_shop.raw_customers',
'model.jaffle_shop.stg_payments',
'test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625',
'test.jaffle_shop.not_null_orders_amount.106140f9fd',
'test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3',
'test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad',
'test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64',
'model.jaffle_shop.customers',
'model.jaffle_shop.orders',
'test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a',
'test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278',
'test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1',
'test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49',
'test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2',
'seed.jaffle_shop.raw_orders',
'model.jaffle_shop.erin',
'test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d',
'test.jaffle_shop.not_null_orders_customer_id.c5f02694af',
'model.jaffle_shop.stg_customers'
] */

@themisvaltinos
Copy link
Contributor Author

Also I guess the seed. and test. nodes are out of scope for this PR?

yes good point so mainly the goal of this pr was to match the dbt run for models and there were a few things to discuss on how to tackle this topic.

So dbt if im not mistaken never runs tests as part of the run but as part of dbt test.

Also for dbt seed as well, while for sqlmesh seed models run as part of the sqlmesh plan and run commands and not sqlmesh_dbt run as well.

So since they are models they would appear in the selected resources during the run command but as models rather than seeds:

model.jaffle_shop.customers
model.jaffle_shop.orders
model.jaffle_shop.raw_customers
model.jaffle_shop.raw_orders
model.jaffle_shop.raw_payments
model.jaffle_shop.stg_customers
model.jaffle_shop.stg_orders
model.jaffle_shop.stg_payments

This I feel provides information to the user that these have been selected to be run.
In dbt you would have to first run dbt seed:

07:57:40  seed.jaffle_shop.raw_customers
07:57:40  seed.jaffle_shop.raw_orders
07:57:40  seed.jaffle_shop.raw_payments

and then dbt run:

08:25:42  model.jaffle_shop.stg_payments
08:25:42  model.jaffle_shop.customers
08:25:42  model.jaffle_shop.orders
08:25:42  model.jaffle_shop.stg_customers
08:25:42  model.jaffle_shop.stg_orders

So it is a bit tough to map this in exactly the same 1-1 way with dbt. I guess I could map these instead of model as seed to keep the same keys though that would be a bit inconsistent with sqlmesh.

Now regarding dbt tests, these are our audits which do run as part of the sqlmesh plan, my thinking was for now not include them to match more the dbt output when they use selectors and see the models. But in hindsight and to your point I wonder if it makes sense to include and have in the list of models and tests (mapped back from our audits) that have been run. Does that feel more intuitive to you?

Copy link
Collaborator

@erindru erindru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So dbt if im not mistaken never runs tests as part of the run but as part of dbt test
Also for dbt seed as well

Oh, sorry, of course 🤦 . It didnt occur to me that dbt list would have everything but dbt run, dbt test and dbt seed would just have their own relevant subsets, but it's very obvious when you mention it.

Ok, this LGTM then :)

interval_unit_: t.Optional[IntervalUnit] = Field(alias="interval_unit", default=None)
tags: t.List[str] = []
stamp: t.Optional[str] = None
node_name: t.Optional[str] = None # dbt node name
Copy link
Collaborator

@erindru erindru Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call this dbt_node_name instead? So it's clear to all internal code that this relates to dbt, is only populated in dbt mode and should not be confused with Snapshot.node.name.

Or even just dbt_name next to name since this is already on _Node

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point this will be much clearer revised it to dbt_name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to merge when ready

@themisvaltinos themisvaltinos merged commit 0583f96 into main Sep 9, 2025
36 checks passed
@themisvaltinos themisvaltinos deleted the themis/selectdbt branch September 9, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants