Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Commit c3d0a84

Browse files
committed
focus readme on the dbt use case
1 parent cfd941f commit c3d0a84

File tree

1 file changed

+32
-96
lines changed

1 file changed

+32
-96
lines changed

README.md

Lines changed: 32 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -4,134 +4,70 @@
44

55
# **data-diff**
66

7+
<h2 align="center">
8+
Develop dbt models faster by testing as you code.
9+
</h2>
10+
<h4 align="center">
11+
See how every change to dbt code affects the data produced in the modified model and downstream.
12+
</h4>
13+
<br>
714
## What is `data-diff`?
8-
data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables.
15+
data-diff is an open source package that you can use with dbt to gain visibility into the impact of your code changes on your dbt models.
916

10-
## Documentation
17+
[ GIF ]
1118

12-
[**🗎 Documentation**](https://docs.datafold.com/guides/os_data_diff) - our detailed documentation has everything you need to start diffing.
19+
<br>
1320

14-
### Databases we support
15-
16-
- PostgreSQL >=10
17-
- MySQL
18-
- Snowflake
19-
- BigQuery
20-
- Redshift
21-
- Oracle
22-
- Presto
23-
- Databricks
24-
- Trino
25-
- Clickhouse
26-
- Vertica
27-
- DuckDB >=0.6
28-
- SQLite (coming soon)
29-
30-
For their corresponding connection strings, check out our [detailed table](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md).
31-
32-
#### Looking for a database not on the list?
33-
If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list.
34-
35-
## Get started
36-
37-
### Installation
38-
39-
#### First, install `data-diff` using `pip`.
21+
## Getting Started
4022

23+
**Install `data-diff`**
4124
```
4225
pip install data-diff
4326
```
4427

45-
#### Then, install one or more driver(s) specific to the database(s) you want to connect to.
46-
47-
- `pip install 'data-diff[mysql]'`
48-
49-
- `pip install 'data-diff[postgresql]'`
50-
51-
- `pip install 'data-diff[snowflake]'`
52-
53-
- `pip install 'data-diff[presto]'`
54-
55-
- `pip install 'data-diff[oracle]'`
56-
57-
- `pip install 'data-diff[trino]'`
58-
59-
- `pip install 'data-diff[clickhouse]'`
60-
61-
- `pip install 'data-diff[vertica]'`
62-
63-
- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/
64-
65-
_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._
66-
67-
### Run your first diff
68-
69-
Once you've installed `data-diff`, you can run it from the command line.
70-
28+
**Update a few lines in your `dbt_project.yml`**
7129
```
72-
data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
30+
#dbt_project.yml
31+
vars:
32+
data_diff:
33+
prod_database: my_database
34+
prod_schema: my_default_schema
7335
```
7436

75-
Be sure to read [the docs](https://docs.datafold.com/reference/open_source/cli) for detailed instructions how to build one of these commands depending on your database setup.
76-
77-
#### Code Example: Diff Tables Between Databases
78-
Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres.
37+
**Run your first data diff!**
7938

8039
```
81-
data-diff \
82-
postgresql://<username>:'<password>'@localhost:5432/<database> \
83-
<table> \
84-
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
85-
<TABLE> \
86-
-k activity_id \
87-
-c activity \
88-
-w "event_timestamp < '2022-10-10'"
40+
dbt run && data-diff --dbt
8941
```
9042

91-
#### Code Example: Diff Tables Within a Database
43+
We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details.
9244

93-
```
94-
data-diff \
95-
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \
96-
<SCHEMA_2>.<TABLE_2> \
97-
-k org_id \
98-
-c created_at -c is_internal \
99-
-w "org_id != 1 and org_id < 2000" \
100-
-m test_results_%t \
101-
--materialize-all-rows \
102-
--table-write-limit 10000
103-
```
104-
105-
In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like.
45+
Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started!
10646

107-
### We're here to help!
47+
<br><br>
10848

109-
We're here to help! Please post any questions in [GitHub Discussions](https://github.com/datafold/data-diff/discussions).
49+
### Diffing between databases
11050

111-
## How to Use
51+
Check out our [documentation](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md) if you're looking to compare data across databases (for example, between Postgres and Snowflake).
11252

113-
* [Examples with dbt, joindiff, and hashdiff](https://docs.datafold.com/reference/open_source/cli#examples)
114-
* [Examples with Python](https://data-diff.readthedocs.io/en/latest/python-api.html)
115-
* [How to use with TOML configuration file](https://docs.datafold.com/reference/open_source/cli#toml-config-file)
53+
<br>
11654

117-
## How to Contribute
118-
* Feel free to open an issue or contribute to the project by working on an existing issue.
119-
* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started.
120-
* To add a new database driver, check out [docs](https://github.com/datafold/data-diff/blob/master/docs/new-database-driver-guide.rst).
55+
## Contributors
12156

122-
Big thanks to everyone who contributed so far:
57+
We thank everyone who contributed so far!
12358

12459
<a href="https://github.com/datafold/data-diff/graphs/contributors">
12560
<img src="https://contributors-img.web.app/image?repo=datafold/data-diff" />
12661
</a>
12762

128-
## Technical Explanation
129-
130-
Check out this [technical explanation](https://github.com/datafold/data-diff/blob/master/docs/technical-explanation.md) of how data-diff works.
63+
<br>
13164

13265
## Analytics
66+
13367
* [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md)
13468

69+
<br>
70+
13571
## License
13672

13773
This project is licensed under the terms of the [MIT License](https://github.com/datafold/data-diff/blob/master/LICENSE).

0 commit comments

Comments
 (0)