|
4 | 4 |
|
5 | 5 | # **data-diff** |
6 | 6 |
|
7 | | -## What is `data-diff`? |
8 | | -data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. |
9 | | - |
10 | | -## Documentation |
11 | | - |
12 | | -[**🗎 Documentation**](https://docs.datafold.com/guides/os_data_diff) - our detailed documentation has everything you need to start diffing. |
| 7 | +<h2 align="center"> |
| 8 | +Develop dbt models faster by testing as you code. |
| 9 | +</h2> |
| 10 | +<h4 align="center"> |
| 11 | +See how every change to dbt code affects the data produced in the modified model and downstream. |
| 12 | +</h4> |
| 13 | +<br> |
13 | 14 |
|
14 | | -### Databases we support |
| 15 | +## What is `data-diff`? |
15 | 16 |
|
16 | | -- PostgreSQL >=10 |
17 | | -- MySQL |
18 | | -- Snowflake |
19 | | -- BigQuery |
20 | | -- Redshift |
21 | | -- Oracle |
22 | | -- Presto |
23 | | -- Databricks |
24 | | -- Trino |
25 | | -- Clickhouse |
26 | | -- Vertica |
27 | | -- DuckDB >=0.6 |
28 | | -- SQLite (coming soon) |
| 17 | +data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code. |
29 | 18 |
|
30 | | -For their corresponding connection strings, check out our [detailed table](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md). |
| 19 | +<div align="center"> |
31 | 20 |
|
32 | | -#### Looking for a database not on the list? |
33 | | -If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list. |
| 21 | + |
34 | 22 |
|
35 | | -## Get started |
| 23 | +</div> |
36 | 24 |
|
37 | | -### Installation |
| 25 | +<br> |
38 | 26 |
|
39 | | -#### First, install `data-diff` using `pip`. |
| 27 | +## Getting Started |
40 | 28 |
|
| 29 | +**Install `data-diff`** |
41 | 30 | ``` |
42 | 31 | pip install data-diff |
43 | 32 | ``` |
44 | 33 |
|
45 | | -#### Then, install one or more driver(s) specific to the database(s) you want to connect to. |
46 | | - |
47 | | -- `pip install 'data-diff[mysql]'` |
48 | | - |
49 | | -- `pip install 'data-diff[postgresql]'` |
50 | | - |
51 | | -- `pip install 'data-diff[snowflake]'` |
52 | | - |
53 | | -- `pip install 'data-diff[presto]'` |
54 | | - |
55 | | -- `pip install 'data-diff[oracle]'` |
56 | | - |
57 | | -- `pip install 'data-diff[trino]'` |
58 | | - |
59 | | -- `pip install 'data-diff[clickhouse]'` |
60 | | - |
61 | | -- `pip install 'data-diff[vertica]'` |
62 | | - |
63 | | -- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/ |
64 | | - |
65 | | -_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._ |
66 | | - |
67 | | -### Run your first diff |
68 | | - |
69 | | -Once you've installed `data-diff`, you can run it from the command line. |
70 | | - |
| 34 | +**Update a few lines in your `dbt_project.yml`** |
71 | 35 | ``` |
72 | | -data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS] |
| 36 | +#dbt_project.yml |
| 37 | +vars: |
| 38 | + data_diff: |
| 39 | + prod_database: my_database |
| 40 | + prod_schema: my_default_schema |
73 | 41 | ``` |
74 | 42 |
|
75 | | -Be sure to read [the docs](https://docs.datafold.com/reference/open_source/cli) for detailed instructions how to build one of these commands depending on your database setup. |
76 | | - |
77 | | -#### Code Example: Diff Tables Between Databases |
78 | | -Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres. |
| 43 | +**Run your first data diff!** |
79 | 44 |
|
80 | 45 | ``` |
81 | | -data-diff \ |
82 | | - postgresql://<username>:'<password>'@localhost:5432/<database> \ |
83 | | - <table> \ |
84 | | - "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \ |
85 | | - <TABLE> \ |
86 | | - -k activity_id \ |
87 | | - -c activity \ |
88 | | - -w "event_timestamp < '2022-10-10'" |
| 46 | +dbt run && data-diff --dbt |
89 | 47 | ``` |
90 | 48 |
|
91 | | -#### Code Example: Diff Tables Within a Database |
| 49 | +We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details. |
92 | 50 |
|
93 | | -``` |
94 | | -data-diff \ |
95 | | - "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \ |
96 | | - <SCHEMA_2>.<TABLE_2> \ |
97 | | - -k org_id \ |
98 | | - -c created_at -c is_internal \ |
99 | | - -w "org_id != 1 and org_id < 2000" \ |
100 | | - -m test_results_%t \ |
101 | | - --materialize-all-rows \ |
102 | | - --table-write-limit 10000 |
103 | | -``` |
104 | | - |
105 | | -In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like. |
| 51 | +Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started! |
106 | 52 |
|
107 | | -### We're here to help! |
| 53 | +<br><br> |
108 | 54 |
|
109 | | -We're here to help! Please post any questions in [GitHub Discussions](https://github.com/datafold/data-diff/discussions). |
| 55 | +### Diffing between databases |
110 | 56 |
|
111 | | -## How to Use |
| 57 | +Check out our [documentation](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md) if you're looking to compare data across databases (for example, between Postgres and Snowflake). |
112 | 58 |
|
113 | | -* [Examples with dbt, joindiff, and hashdiff](https://docs.datafold.com/reference/open_source/cli#examples) |
114 | | -* [Examples with Python](https://data-diff.readthedocs.io/en/latest/python-api.html) |
115 | | -* [How to use with TOML configuration file](https://docs.datafold.com/reference/open_source/cli#toml-config-file) |
| 59 | +<br> |
116 | 60 |
|
117 | | -## How to Contribute |
118 | | -* Feel free to open an issue or contribute to the project by working on an existing issue. |
119 | | -* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started. |
120 | | -* To add a new database driver, check out [docs](https://github.com/datafold/data-diff/blob/master/docs/new-database-driver-guide.rst). |
| 61 | +## Contributors |
121 | 62 |
|
122 | | -Big thanks to everyone who contributed so far: |
| 63 | +We thank everyone who contributed so far! |
123 | 64 |
|
124 | 65 | <a href="https://github.com/datafold/data-diff/graphs/contributors"> |
125 | 66 | <img src="https://contributors-img.web.app/image?repo=datafold/data-diff" /> |
126 | 67 | </a> |
127 | 68 |
|
128 | | -## Technical Explanation |
129 | | - |
130 | | -Check out this [technical explanation](https://github.com/datafold/data-diff/blob/master/docs/technical-explanation.md) of how data-diff works. |
| 69 | +<br> |
131 | 70 |
|
132 | 71 | ## Analytics |
| 72 | + |
133 | 73 | * [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md) |
134 | 74 |
|
| 75 | +<br> |
| 76 | + |
135 | 77 | ## License |
136 | 78 |
|
137 | 79 | This project is licensed under the terms of the [MIT License](https://github.com/datafold/data-diff/blob/master/LICENSE). |
0 commit comments