Skip to content

Commit 82d6dde

Browse files
Merge pull request #120 from Intugle/feature/streamlit-integration
2 parents bfc1468 + d4023b1 commit 82d6dde

15 files changed

Lines changed: 1647 additions & 1281 deletions

File tree

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,26 @@ For detailed instructions on setting up the server and connecting your favorite
275275

276276
<!-- mcp-name: io.github.intugle/intugle-vibe-mcp -->
277277

278+
### Streamlit App
279+
280+
The `intugle` library includes a Streamlit application that provides an interactive web interface for building and visualizing semantic data models.
281+
282+
To use the Streamlit app, install `intugle` with the `streamlit` extra:
283+
284+
```bash
285+
pip install intugle[streamlit]
286+
```
287+
288+
You can launch the Streamlit application using the `intugle-mcp` command or `uvx`:
289+
290+
```bash
291+
intugle-streamlit
292+
# Or using uvx
293+
uvx --from intugle intugle-streamlit
294+
```
295+
296+
Open the URL provided in your terminal (usually `http://localhost:8501`) to access the application. For more details, refer to the [Streamlit App documentation](https://intugle.github.io/data-tools/docs/streamlit-app).
297+
278298
## Community
279299

280300
Join our community to ask questions, share your projects, and connect with other users.

docsite/docs/streamlit-app.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
sidebar_position: 8
3+
title: Streamlit App
4+
---
5+
6+
# Intugle - Streamlit App
7+
8+
This Streamlit application provides an interactive web interface for the `intugle` library. It allows users to upload their tabular data (CSV/Excel), configure a Large Language Model (LLM), and step through the process of building a semantic data model. The app profiles the data, generates a business glossary, identifies relationships between datasets, and visualizes the resulting semantic graph.
9+
10+
## ✨ Features
11+
12+
- **File Upload**: Upload multiple CSV or Excel files directly in the browser.
13+
- **Interactive Data Prep**: Interactively rename tables and select, rename, or drop columns before processing.
14+
- **LLM Configuration**: Securely configure and connect to your preferred LLM provider (OpenAI, Azure OpenAI, Gemini).
15+
- **Automated Data Profiling**: Automatically calculates key metrics like uniqueness, completeness, and data types for every column.
16+
- **AI-Powered Business Glossary**: Leverages an LLM to generate a business glossary for all tables and columns, adding crucial context.
17+
- **Automated Link Prediction**: Discovers potential relationships (foreign keys) between your tables.
18+
- **Interactive Visualization**: Displays the final semantic model as an interactive network graph.
19+
- **Detailed Results**: Provides a tabular view of all predicted links with detailed metrics.
20+
- **Export Artifacts**: Download the generated semantic model artifacts (`.yml` files) as a ZIP archive for use in other systems.
21+
22+
## 🚀 Getting Started
23+
24+
Follow these instructions to set up and run the application on your local machine.
25+
26+
### Prerequisites
27+
28+
- Python 3.10+
29+
- `uv` (Optional: for `uvx` command)
30+
31+
### 1. Installation
32+
33+
To use the Streamlit app, install `intugle` with the `streamlit` extra:
34+
35+
```bash
36+
pip install intugle[streamlit]
37+
```
38+
39+
### 2. Configuration
40+
41+
The application requires credentials for a Large Language Model to generate the business glossary and perform other AI-powered tasks.
42+
43+
You can configure your LLM provider and API keys directly in the application's sidebar after launching it. The app will guide you on which credentials are required for your chosen provider (e.g., `OPENAI_API_KEY` for OpenAI).
44+
45+
### 3. Running the App
46+
47+
You can launch the Streamlit application using the `intugle-streamlit` command or `uvx`:
48+
49+
```bash
50+
intugle-streamlit
51+
# Or using uvx
52+
uvx --from intugle intugle-streamlit
53+
```
54+
55+
Open the URL provided in your terminal (usually `http://localhost:8501`) to access the application.
56+
57+
## ⚙️ How It Works
58+
59+
The application guides you through a simple, multi-step process, which is tracked in the sidebar:
60+
61+
1. **Upload Files**: Start by uploading one or more CSV or Excel files. The app will display a summary of the uploaded tables.
62+
2. **Configure LLM**: In the sidebar, choose your LLM provider (OpenAI, Azure, or Gemini) and enter the necessary API keys and configuration details.
63+
3. **Prepare Data**: Review the uploaded tables. You can rename tables and modify columns (rename, or ignore/drop them). Once you are satisfied, click **"Freeze column names"** to lock in your changes.
64+
4. **Build Semantic Model**: After preparing your data, click **"Create Semantic Model"**. You will be prompted to provide a "domain" (e.g., *Healthcare*, *Manufacturing*) to give the LLM context. The app will then profile the data and generate a business glossary for each table.
65+
5. **Predict Links**: Once profiling is complete, click **"Run Link Prediction"** to discover the relationships between your datasets.
66+
6. **Explore & Download**: View the results as an interactive graph or a detailed table. You can download the underlying YAML configuration files from the sidebar at any time.

pyproject.toml

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "intugle"
7-
version = "1.0.11"
7+
version = "1.0.12"
88
authors = [
99
{ name="Intugle", email="hello@intugle.ai" },
1010
]
@@ -69,14 +69,23 @@ postgres = [
6969
"sqlglot>=27.20.0",
7070
]
7171

72+
streamlit = [
73+
"streamlit==1.50.0",
74+
"pyngrok==7.4.0",
75+
"python-dotenv==1.1.1",
76+
"xlsxwriter==3.2.9",
77+
"plotly",
78+
"graphviz"
79+
]
80+
7281

7382
[project.urls]
7483
"Homepage" = "https://github.com/Intugle/data-tools"
7584
"Bug Tracker" = "https://github.com/Intugle/data-tools/issues"
7685

7786
[project.scripts]
7887
intugle-mcp = "intugle.mcp.server:main"
79-
intugle-streamlit = "intugle.cli:export_data"
88+
intugle-streamlit = "intugle.cli:run_streamlit_app"
8089

8190
[dependency-groups]
8291
test = [
@@ -111,7 +120,7 @@ src = ["src"]
111120
where = ["src"]
112121

113122
[tool.setuptools.package-data]
114-
"intugle" = ["**/*.yaml", "**/*.txt", "**/*.pkl", "mcp/semantic_layer/prompts/*.md"]
123+
"intugle" = ["**/*.yaml", "**/*.txt", "**/*.pkl", "mcp/semantic_layer/prompts/*.md", "streamlit_app/**/*"]
115124

116125
[tool.pytest.ini_options]
117126
markers = [

src/intugle/adapters/factory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def get_dataset_data_type(cls) -> Type[Any]:
7575
return Any
7676
if len(cls.config_types) == 1:
7777
return cls.config_types[0]
78-
return Union[tuple(cls.config_types)] # type: ignore
78+
return Union[tuple(cls.config_types)] # noqa: UP007
7979

8080
@classmethod
8181
def create(cls, df: Any) -> Adapter:

src/intugle/cli.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import importlib.util
2+
import os
3+
import subprocess
4+
5+
6+
def run_streamlit_app():
7+
# A list of the required packages for the Streamlit app to run.
8+
# These correspond to the dependencies in the `[project.optional-dependencies].streamlit` section of pyproject.toml.
9+
required_modules = {
10+
"streamlit": "streamlit",
11+
"pyngrok": "pyngrok",
12+
"dotenv": "python-dotenv",
13+
"xlsxwriter": "xlsxwriter",
14+
"plotly": "plotly",
15+
"graphviz": "graphviz",
16+
}
17+
18+
missing_modules = []
19+
for module_name, package_name in required_modules.items():
20+
if not importlib.util.find_spec(module_name):
21+
missing_modules.append(package_name)
22+
23+
if missing_modules:
24+
print("Error: The Streamlit app is missing required dependencies.")
25+
print("The following packages are not installed:", ", ".join(missing_modules))
26+
print("\nTo use the Streamlit app, please install 'intugle' with the 'streamlit' extra:")
27+
print(" pip install 'intugle[streamlit]'")
28+
return
29+
30+
# Get the absolute path to the main.py of the Streamlit app
31+
app_dir = os.path.join(os.path.dirname(__file__), 'streamlit_app')
32+
app_path = os.path.join(app_dir, 'main.py')
33+
34+
# Ensure the app_path exists
35+
if not os.path.exists(app_path):
36+
print(f"Error: Streamlit app not found at {app_path}")
37+
return
38+
39+
# Run the Streamlit app using subprocess, setting the working directory
40+
print(f"Launching Streamlit app from: {app_path} with working directory {app_dir}")
41+
subprocess.run(["streamlit", "run", app_path], cwd=app_dir)
42+
43+
44+
if __name__ == "__main__":
45+
run_streamlit_app()
File renamed without changes.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Intugle - Streamlit App
1+
w# Intugle - Streamlit App
22

33
This Streamlit application provides an interactive web interface for the `intugle` library. It allows users to upload their tabular data (CSV/Excel), configure a Large Language Model (LLM), and step through the process of building a semantic data model. The app profiles the data, generates a business glossary, identifies relationships between datasets, and visualizes the resulting semantic graph.
44

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ def safe_filename(name: str, ext: str) -> str:
175175
str
176176
A sanitized filename like 'my_table.csv'.
177177
"""
178+
name = os.path.basename(name) # Sanitize against path traversal
178179
base = re.sub(r"[^A-Za-z0-9_.-]+", "_", name).strip("._")
179180
if not base:
180181
base = "table"
@@ -904,7 +905,8 @@ def plotly_table_graph(
904905
node_x, node_y, node_text, node_deg, node_labels = [], [], [], [], []
905906
for n in G.nodes():
906907
x, y = pos[n]
907-
node_x.append(x); node_y.append(y)
908+
node_x.append(x)
909+
node_y.append(y)
908910
indeg = G.in_degree(n)
909911
outdeg = G.out_degree(n)
910912
node_text.append(f"<b>{n}</b><br>in: {indeg} • out: {outdeg}")
@@ -947,14 +949,16 @@ def edge_hover(u: str, v: str, data: Mapping[str, Any]) -> str:
947949
# Fast path: one trace with constant width (keeps things interactive for big graphs)
948950
edge_x, edge_y, edge_hover_texts = [], [], []
949951
for u, v, data in edges_list:
950-
x0, y0 = pos[u]; x1, y1 = pos[v]
952+
x0, y0 = pos[u]
953+
x1, y1 = pos[v]
951954
edge_x += [x0, x1, None]
952955
edge_y += [y0, y1, None]
953956
edge_hover_texts.append(edge_hover(u, v, data))
954957

955958
# midpoint label text
956959
mx, my = (x0 + x1) / 2, (y0 + y1) / 2
957-
edge_label_x.append(mx); edge_label_y.append(my)
960+
edge_label_x.append(mx)
961+
edge_label_y.append(my)
958962
if len(data["labels"]) == 1:
959963
edge_label_text.append(data["labels"][0])
960964
else:
@@ -974,7 +978,8 @@ def edge_hover(u: str, v: str, data: Mapping[str, Any]) -> str:
974978
else:
975979
# Accurate path: one trace per edge so we can vary width by mean accuracy
976980
for u, v, data in edges_list:
977-
x0, y0 = pos[u]; x1, y1 = pos[v]
981+
x0, y0 = pos[u]
982+
x1, y1 = pos[v]
978983
acc_mean = sum(data["accs"]) / max(1, len(data["accs"]))
979984
width = max(edge_min_width, acc_mean * edge_width_scale)
980985

@@ -992,7 +997,8 @@ def edge_hover(u: str, v: str, data: Mapping[str, Any]) -> str:
992997

993998
# midpoint label
994999
mx, my = (x0 + x1) / 2, (y0 + y1) / 2
995-
edge_label_x.append(mx); edge_label_y.append(my)
1000+
edge_label_x.append(mx)
1001+
edge_label_y.append(my)
9961002
if len(data["labels"]) == 1:
9971003
edge_label_text.append(data["labels"][0])
9981004
else:

streamlit_app/intugle_assets/Intugle_main_logo.png renamed to src/intugle/streamlit_app/intugle_assets/Intugle_main_logo.png

File renamed without changes.

0 commit comments

Comments
 (0)