You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+46-18Lines changed: 46 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@
11
11
12
12
## Overview
13
13
14
-
Data-Tools is a Python library that helps you automatically build a semantic layer over your data. It streamlines the process of data profiling, discovering relationships between tables, and generating a business-friendly representation of your data. This makes it easier for both data and business teams to understand and query data without needing to be SQL experts.
14
+
Intugle's Data-Tools is a GenAI-powered Python library that simplifies and accelerates the journey from raw data to insights. It empowers data and business teams to build an intelligent semantic layer over their data, enabling self-serve analytics and natural language queries. By automating data profiling, link prediction, and SQL generation, Data-Tools helps you build data products faster and more efficiently than traditional methods.
15
15
16
16
## Who is this for?
17
17
@@ -25,19 +25,13 @@ This tool is designed for both **data teams** and **business teams**.
25
25
***Automated Data Profiling:** Generate detailed statistics for each column in your dataset, including distinct count, uniqueness, completeness, and more.
26
26
***Datatype Identification:** Automatically identify the data type of each column (e.g., integer, string, datetime).
27
27
***Key Identification:** Identify potential primary keys in your tables.
28
-
***LLM-Powered Link Prediction:** Use a Large Language Model (LLM) to automatically discover relationships (foreign keys) between tables.
29
-
***Business Glossary Generation:** Generate a business glossary for each column using an LLM, with support for industry-specific domains.
30
-
***Semantic Layer Generation:** Create a `manifest.json` file that defines your semantic layer, including models (tables) and their relationships.
28
+
***LLM-Powered Link Prediction:** Use GenAI to automatically discover relationships (foreign keys) between tables.
29
+
***Business Glossary Generation:** Generate a business glossary for each column, with support for industry-specific domains.
30
+
***Semantic Layer Generation:** Create YAML files that defines your semantic layer, including models (tables) and their relationships.
31
31
***SQL Generation:** Generate SQL queries from the semantic layer, allowing you to query your data using business-friendly terms.
32
-
***Extensible and Configurable:** Configure the tool to work with your specific environment and data sources.
33
32
34
33
## Getting Started
35
34
36
-
### Prerequisites
37
-
38
-
* Python 3.10+
39
-
* pip
40
-
41
35
### Installation
42
36
43
37
```bash
@@ -46,7 +40,7 @@ pip install data-tools
46
40
47
41
### Configuration
48
42
49
-
Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables.
43
+
Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.
50
44
51
45
You can configure the LLM by setting the following environment variables:
52
46
@@ -68,14 +62,48 @@ For a detailed, hands-on introduction to the project, please see the [`quickstar
68
62
69
63
The core workflow of the project involves the following steps:
70
64
71
-
1.**Load your data:** Load your data into pandas DataFrames.
72
-
2.**Create `DataSet` objects:** Create a `DataSet` object for each of your tables.
73
-
3.**Run the analysis pipeline:** Use the `run()` method to profile your data and generate a business glossary.
74
-
4.**Predict links:** Use the `LinkPredictor` to discover relationships between your tables.
75
-
5.**Generate the manifest:** Save the profiling and link prediction results to YAML files and then load them to create a `manifest.json` file.
76
-
6.**Generate SQL:** Use the `SqlGenerator` to generate SQL queries from the semantic layer.
65
+
1.**Load your data:** Load your data into a DataSet object.
66
+
2.**Run the analysis pipeline:** Use the `run()` method to profile your data and generate a business glossary.
67
+
3.**Predict links:** Use the `LinkPredictor` to discover relationships between your tables.
68
+
69
+
```python
70
+
from data_tools import LinkPredictor
71
+
72
+
# Initialize the predictor
73
+
predictor = LinkPredictor(datasets)
74
+
75
+
# Run the prediction
76
+
results = predictor.predict()
77
+
results.show_graph()
78
+
```
79
+
80
+
5. **Generate SQL:** Use the `SqlGenerator` to generate SQL queries from the semantic layer.
Copy file name to clipboardExpand all lines: quickstart.ipynb
+3-10Lines changed: 3 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@
11
11
"\n",
12
12
"**What is a Semantic Layer?**\n",
13
13
"\n",
14
-
"A semantic layer is a business-friendly representation of your data. It hides the complexity of the underlying data sources and provides a unified view of your data using familiar business terms. This makes it easier for business users to understand and query the data without needing to be SQL experts.\n",
14
+
"A semantic layer is a business-friendly representation of your data. It hides the complexity of the underlying data sources and provides a unified view of your data using familiar business terms. This makes it easier for both business users and data teams to understand and query the data, accelerating data-driven insights.\n",
15
15
"\n",
16
16
"**Who is this for?**\n",
17
17
"\n",
@@ -22,17 +22,10 @@
22
22
"\n",
23
23
"**In this notebook, you will learn how to:**\n",
24
24
"\n",
25
-
"1. **Configure your LLM Provider:** Set up the Large Language Model that will power the automated features.\n",
26
-
"2. **Profile your data:** Analyze your data sources to understand their structure, data types, and other characteristics.\n",
27
-
"3. **Automatically predict links:** Use a Large Language Model (LLM) to automatically discover relationships (foreign keys) between tables.\n",
28
-
"4. **Generate a semantic layer:** Create a `manifest.json` file that defines your semantic layer.\n",
29
-
"5. **Generate SQL queries:** Use the semantic layer to generate SQL queries and retrieve data.\n",
30
-
"\n",
31
-
"**In this notebook, you will learn how to:**\n",
32
-
"\n",
33
25
"1. **Profile your data:** Analyze your data sources to understand their structure, data types, and other characteristics.\n",
26
+
"2. **Business Glossary Generation:** Generate a business glossary for each column, with support for industry-specific domains.\n",
34
27
"2. **Automatically predict links:** Use a Large Language Model (LLM) to automatically discover relationships (foreign keys) between tables.\n",
35
-
"3. **Generate a semantic layer:** Create a `manifest.json` file that defines your semantic layer.\n",
28
+
"3. **Generate a semantic layer:** Create YAML files file that defines your semantic layer.\n",
36
29
"4. **Generate SQL queries:** Use the semantic layer to generate SQL queries and retrieve data."
0 commit comments