Skip to content

Commit c38e50f

Browse files
JaskaranIntuglenarotsit-intugle
authored andcommitted
updated readme and notebook
1 parent 764b267 commit c38e50f

File tree

2 files changed

+49
-28
lines changed

2 files changed

+49
-28
lines changed

README.md

Lines changed: 46 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
## Overview
1313

14-
Data-Tools is a Python library that helps you automatically build a semantic layer over your data. It streamlines the process of data profiling, discovering relationships between tables, and generating a business-friendly representation of your data. This makes it easier for both data and business teams to understand and query data without needing to be SQL experts.
14+
Intugle's Data-Tools is a GenAI-powered Python library that simplifies and accelerates the journey from raw data to insights. It empowers data and business teams to build an intelligent semantic layer over their data, enabling self-serve analytics and natural language queries. By automating data profiling, link prediction, and SQL generation, Data-Tools helps you build data products faster and more efficiently than traditional methods.
1515

1616
## Who is this for?
1717

@@ -25,19 +25,13 @@ This tool is designed for both **data teams** and **business teams**.
2525
* **Automated Data Profiling:** Generate detailed statistics for each column in your dataset, including distinct count, uniqueness, completeness, and more.
2626
* **Datatype Identification:** Automatically identify the data type of each column (e.g., integer, string, datetime).
2727
* **Key Identification:** Identify potential primary keys in your tables.
28-
* **LLM-Powered Link Prediction:** Use a Large Language Model (LLM) to automatically discover relationships (foreign keys) between tables.
29-
* **Business Glossary Generation:** Generate a business glossary for each column using an LLM, with support for industry-specific domains.
30-
* **Semantic Layer Generation:** Create a `manifest.json` file that defines your semantic layer, including models (tables) and their relationships.
28+
* **LLM-Powered Link Prediction:** Use GenAI to automatically discover relationships (foreign keys) between tables.
29+
* **Business Glossary Generation:** Generate a business glossary for each column, with support for industry-specific domains.
30+
* **Semantic Layer Generation:** Create YAML files that defines your semantic layer, including models (tables) and their relationships.
3131
* **SQL Generation:** Generate SQL queries from the semantic layer, allowing you to query your data using business-friendly terms.
32-
* **Extensible and Configurable:** Configure the tool to work with your specific environment and data sources.
3332

3433
## Getting Started
3534

36-
### Prerequisites
37-
38-
* Python 3.10+
39-
* pip
40-
4135
### Installation
4236

4337
```bash
@@ -46,7 +40,7 @@ pip install data-tools
4640

4741
### Configuration
4842

49-
Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables.
43+
Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.
5044

5145
You can configure the LLM by setting the following environment variables:
5246

@@ -68,14 +62,48 @@ For a detailed, hands-on introduction to the project, please see the [`quickstar
6862

6963
The core workflow of the project involves the following steps:
7064

71-
1. **Load your data:** Load your data into pandas DataFrames.
72-
2. **Create `DataSet` objects:** Create a `DataSet` object for each of your tables.
73-
3. **Run the analysis pipeline:** Use the `run()` method to profile your data and generate a business glossary.
74-
4. **Predict links:** Use the `LinkPredictor` to discover relationships between your tables.
75-
5. **Generate the manifest:** Save the profiling and link prediction results to YAML files and then load them to create a `manifest.json` file.
76-
6. **Generate SQL:** Use the `SqlGenerator` to generate SQL queries from the semantic layer.
65+
1. **Load your data:** Load your data into a DataSet object.
66+
2. **Run the analysis pipeline:** Use the `run()` method to profile your data and generate a business glossary.
67+
3. **Predict links:** Use the `LinkPredictor` to discover relationships between your tables.
68+
69+
```python
70+
from data_tools import LinkPredictor
71+
72+
# Initialize the predictor
73+
predictor = LinkPredictor(datasets)
74+
75+
# Run the prediction
76+
results = predictor.predict()
77+
results.show_graph()
78+
```
79+
80+
5. **Generate SQL:** Use the `SqlGenerator` to generate SQL queries from the semantic layer.
81+
82+
```python
83+
from data_tools import SqlGenerator
84+
85+
# Create a SqlGenerator
86+
sql_generator = SqlGenerator()
87+
88+
# Create an ETL model
89+
etl = {
90+
name": "test_etl",
91+
fields": [
92+
{"id": "patients.first", "name": "first_name"},
93+
{"id": "patients.last", "name": "last_name"},
94+
{"id": "allergies.start", "name": "start_date"},
95+
,
96+
filter": {
97+
"selections": [{"id": "claims.departmentid", "values": ["3", "20"]}],
98+
,
99+
}
100+
101+
# Generate the query
102+
sql_query = sql_generator.generate_query(etl_model)
103+
print(sql_query)
104+
```
77105

78-
For detailed code examples, please refer to the [`quickstart.ipynb`](quickstart.ipynb) notebook.
106+
For detailed code examples and a complete walkthrough, please refer to the [`quickstart.ipynb`](quickstart.ipynb) notebook.
79107

80108
## Contributing
81109

quickstart.ipynb

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"\n",
1212
"**What is a Semantic Layer?**\n",
1313
"\n",
14-
"A semantic layer is a business-friendly representation of your data. It hides the complexity of the underlying data sources and provides a unified view of your data using familiar business terms. This makes it easier for business users to understand and query the data without needing to be SQL experts.\n",
14+
"A semantic layer is a business-friendly representation of your data. It hides the complexity of the underlying data sources and provides a unified view of your data using familiar business terms. This makes it easier for both business users and data teams to understand and query the data, accelerating data-driven insights.\n",
1515
"\n",
1616
"**Who is this for?**\n",
1717
"\n",
@@ -22,17 +22,10 @@
2222
"\n",
2323
"**In this notebook, you will learn how to:**\n",
2424
"\n",
25-
"1. **Configure your LLM Provider:** Set up the Large Language Model that will power the automated features.\n",
26-
"2. **Profile your data:** Analyze your data sources to understand their structure, data types, and other characteristics.\n",
27-
"3. **Automatically predict links:** Use a Large Language Model (LLM) to automatically discover relationships (foreign keys) between tables.\n",
28-
"4. **Generate a semantic layer:** Create a `manifest.json` file that defines your semantic layer.\n",
29-
"5. **Generate SQL queries:** Use the semantic layer to generate SQL queries and retrieve data.\n",
30-
"\n",
31-
"**In this notebook, you will learn how to:**\n",
32-
"\n",
3325
"1. **Profile your data:** Analyze your data sources to understand their structure, data types, and other characteristics.\n",
26+
"2. **Business Glossary Generation:** Generate a business glossary for each column, with support for industry-specific domains.\n",
3427
"2. **Automatically predict links:** Use a Large Language Model (LLM) to automatically discover relationships (foreign keys) between tables.\n",
35-
"3. **Generate a semantic layer:** Create a `manifest.json` file that defines your semantic layer.\n",
28+
"3. **Generate a semantic layer:** Create YAML files file that defines your semantic layer.\n",
3629
"4. **Generate SQL queries:** Use the semantic layer to generate SQL queries and retrieve data."
3730
]
3831
},

0 commit comments

Comments
 (0)