Feat: Add config flag to infer the state schema per dbt target#5485
Feat: Add config flag to infer the state schema per dbt target#5485
Conversation
8784e22 to
bd16a2f
Compare
sqlmesh/dbt/loader.py
Outdated
| # for the 'dev' target is overriden to something user-specific, rather than making the target name itself user-specific. | ||
| # This means that the schema name is the indicator of isolated state, not the target name which may be re-used across multiple schemas. | ||
| target_schema = profile.target.schema_ | ||
| gateway_kwargs["state_schema"] = f"sqlmesh_state_{profile_name}_{target_schema}" |
There was a problem hiding this comment.
Should we check whether target_schema is empty?
There was a problem hiding this comment.
Man, this was a rabbit hole.
So in dbt's Credentials object that defines this field, schema is required. If you don't define it, the base validator fails with something like:
Runtime Error
Credentials in profile "jaffle_shop", target "postgres" invalid: 'schema' is a required property
(and some adapters, like the duckdb adapter, have a hardcoded default to pass this check).
dbt will, however, happily allow you to specify an empty string for the schema. If you do this, you push the failure to runtime:
23:11:33 Encountered an error:
Database Error
zero-length delimited identifier at or near """"
LINE 2: create schema if not exists ""
So i've changed our side to raise an exception if the schema is an empty string. If someone encounters this and their project works fine on dbt core then hopefully theyre willing to work with us to understand the correct behaviour here (and the workaround is of course to define the state schema name manually)
3d4bf31 to
c11c6d3
Compare
c11c6d3 to
9cb3bce
Compare
Currently, when storing state in the warehouse, by default SQLMesh assumes that you only have a single project accessing the warehouse and creates a schema called
sqlmeshto store state in. This is because it also assumes that you want to use Virtual Data Environments.However, this creates a problem in dbt projects with existing workflows that are not built around VDE's. A common pattern is to use different targets to point to the same warehouse and just override the default schema.
For example, a
devtarget may populate models into adevschema while aprodtarget may put them in aprodschema, but these schemas exist side by side in the same warehouse. Creating models using--target devallows analysts to test things out and then deploy by running the models against--target prod.This creates a problem for SQLMesh because it assumes a single state schema but the state for these targets should not overlap.
So this PR:
dbtwith, for now, a single flaginfer_state_schema_namesqlmesh init -t dbtso that dbt projects by default have isolated state between targetsNote that users can still override the inferred schema by setting the state schema manually in
sqlmesh.yamlas documented here. Existing projects will not have this new property set so will continue to behave as usual.