This repository demonstrates on small scale how to replicate mysql databases to snowflake using Debezium connector to create a CDC stream to be consumed by Snowflake's Kafka connector to create CDC source tables in Snowflake. DBT is used to build the table replicas using incremental models that dynamically detect the columns and build the models using a couple of macros.
This could be scheduled to run on a regular basis using a scheduler or orchestration system to update the replicas and run tests on the models and sources.
- Docker
- dbt-snowflake
- Scripts were tested on linux, but should be able to run on mac-os.
- Set up the environment by running
docker compose up -dinside0-services/folder. - After the containers are up, execute
1-debezium/init_cdc.shto create the debezium connector. - Configure
2-snowflake/connect/snowflake-sink-connector.jsonto connect to your Snowflake environment. - After the containers are up, execute
2-snowflake/init_sink.shto create the snowflake connector. - Wait for tables to be created on your Snowflake account as per configurations of the snowflake connector.
- Once done, you can optionally run
3-database/init_db.shand3-database/mysql_crud.shto create a new table and run some inserts and updates on the new table. - Update
4-prototype/models/schema.ymlsources identifiers to the correct source table in your Snowflake environment. - Change directory into
4-prototypeand executedbt runto create a few models. - Play around making changes and creating models.
- Debezium is set to create topics for all tables + schema-changes on the source mysql
- Snowflake's Kafka Connector is using the
topics.regexparameter to detect all topics that start with 'mysqldb.inventory' and create source tables from it on thecdcschema.
- DBT is not meant to be used in micr-batching scenarios, so the performance and possible issues at higher scale are unknown
- the DBT models are set up to rebuild the models dynamically, which limits the flexibility on data types and transformations, which are recommended to be treated on another layer on top of the replicas.