-
Notifications
You must be signed in to change notification settings - Fork 57
Add SDC Agents community toolsets module #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3e9ce15
48a265e
dda7447
59e39f9
638d111
6fed50c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # SDC Agents Demo | ||
|
|
||
| A minimal example composing SDC Agents toolsets with an ADK `LlmAgent`. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Python 3.11+ | ||
| - An SDCStudio API key (set as `SDC_API_KEY` environment variable) | ||
|
|
||
| ## Setup | ||
|
|
||
| ```bash | ||
| pip install google-adk-community[sdc-agents] | ||
|
|
||
| export SDC_API_KEY="your-sdcstudio-api-key" | ||
| export GOOGLE_API_KEY="your-google-api-key" | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ```bash | ||
| # Run with the ADK CLI | ||
| adk run . | ||
|
|
||
| # Or use the ADK web UI | ||
| adk web . | ||
| ``` | ||
|
|
||
| ## What This Demo Does | ||
|
|
||
| The agent composes two SDC Agents toolsets: | ||
|
|
||
| - **CatalogToolset**: Search published SDC4 schemas, download artifacts | ||
| (XSD, RDF, JSON-LD), and check wallet balance. | ||
| - **IntrospectToolset**: Analyze a datasource to infer column types, | ||
| constraints, and statistics. | ||
|
|
||
| ## Sample Queries | ||
|
|
||
| ``` | ||
| > Search the catalog for schemas related to lab results | ||
| > Introspect the sample datasource | ||
| > What published schemas match the columns in my datasource? | ||
| ``` | ||
|
|
||
| ## Structure | ||
|
|
||
| ``` | ||
| sdc_agents_demo/ | ||
| ├── agent.py # Agent definition with SDC toolsets | ||
| └── README.md # This file | ||
| ``` | ||
|
|
||
| ## Resources | ||
|
|
||
| - [SDC Agents Documentation](https://github.com/SemanticDataCharter/SDC_Agents) | ||
| - [SDC Agents on PyPI](https://pypi.org/project/sdc-agents/) | ||
| - [ADK Integration Guide](https://github.com/SemanticDataCharter/SDC_Agents/blob/main/docs/integrations/ADK_INTEGRATION.md) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| # Copyright 2025 Google LLC | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """SDC Agents demo -- full data governance pipeline. | ||
|
|
||
| Composes five SDC Agents toolsets into a single LlmAgent that can | ||
| introspect a datasource, discover matching catalog components, map | ||
| columns to schemas, and assemble validated data models. | ||
|
|
||
| Prerequisites: | ||
| pip install google-adk-community[sdc-agents] | ||
| export SDC_API_KEY="your-sdcstudio-api-key" | ||
| export SDC_BASE_URL="https://sdcstudio.axius-sdc.com" # optional; this is the default | ||
|
|
||
| Usage: | ||
| adk run . | ||
| """ | ||
|
|
||
| import os | ||
|
|
||
| from google.adk.agents import LlmAgent | ||
|
|
||
| from google.adk_community.tools.sdc_agents import ( | ||
| CatalogToolset, | ||
| IntrospectToolset, | ||
| MappingToolset, | ||
| AssemblyToolset, | ||
| ValidationToolset, | ||
| SDCAgentsConfig, | ||
| ) | ||
|
|
||
| api_key = os.environ.get("SDC_API_KEY") | ||
| if not api_key: | ||
| raise ValueError( | ||
| "SDC_API_KEY environment variable is required. " | ||
| "Get an API key from https://sdcstudio.axius-sdc.com and set it via: " | ||
| "export SDC_API_KEY=\"your-key\"" | ||
| ) | ||
|
|
||
| config = SDCAgentsConfig( | ||
| sdcstudio={ | ||
| "base_url": os.environ.get("SDC_BASE_URL", "https://sdcstudio.axius-sdc.com"), | ||
| "api_key": api_key, | ||
| }, | ||
| datasources={ | ||
| "sample": { | ||
| "type": "csv", | ||
| "path": "./data/sample.csv", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| }, | ||
| }, | ||
| cache={"root": ".sdc-cache"}, | ||
| audit={"path": ".sdc-cache/audit.jsonl"}, | ||
| ) | ||
|
|
||
| root_agent = LlmAgent( | ||
| name="sdc_demo_agent", | ||
| model="gemini-2.0-flash", | ||
| description=( | ||
| "Full data governance pipeline: introspect, discover, map," | ||
| " and assemble SDC4 data models." | ||
| ), | ||
| instruction=( | ||
| "You help data engineers govern their data. Follow this workflow:\n" | ||
| "1. Introspect the datasource to discover columns and types\n" | ||
| "2. Search the SDC4 catalog for matching published schemas\n" | ||
| "3. Discover catalog components that match the datasource structure\n" | ||
| "4. Map unmatched columns to schema components by similarity\n" | ||
| "5. Propose a cluster hierarchy for the data model\n" | ||
| "6. Assemble the final data model via the Assembly API\n" | ||
| "7. Validate the generated artifacts" | ||
| ), | ||
| tools=[ | ||
| IntrospectToolset(config=config), | ||
| CatalogToolset(config=config), | ||
| MappingToolset(config=config), | ||
| AssemblyToolset(config=config), | ||
| ValidationToolset(config=config), | ||
| ], | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| patient_id,test_date,test_name,result_value,result_units,reference_low,reference_high | ||
| P001,2026-01-15,Glucose,98,mg/dL,70,99 | ||
| P002,2026-01-15,HbA1c,5.4,%,4.0,5.6 | ||
| P003,2026-01-16,Total Cholesterol,182,mg/dL,0,200 | ||
| P004,2026-01-16,LDL Cholesterol,104,mg/dL,0,130 | ||
| P005,2026-01-17,HDL Cholesterol,58,mg/dL,40,60 | ||
| P006,2026-01-17,Triglycerides,142,mg/dL,0,150 | ||
| P007,2026-01-18,Creatinine,0.9,mg/dL,0.6,1.2 | ||
| P008,2026-01-18,BUN,14,mg/dL,7,20 | ||
| P009,2026-01-19,Sodium,140,mmol/L,135,145 | ||
| P010,2026-01-19,Potassium,4.1,mmol/L,3.5,5.0 | ||
| P011,2026-01-20,Hemoglobin,14.2,g/dL,13.5,17.5 | ||
| P012,2026-01-20,WBC,7.8,K/uL,4.0,11.0 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,4 +15,5 @@ | |
| from . import memory | ||
| from . import sessions | ||
| from . import version | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You've added |
||
| __version__ = version.__version__ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # SDC Agents -- Semantic Data Governance for ADK | ||
|
|
||
| Thin re-export wrapper over the | ||
| [`sdc-agents`](https://pypi.org/project/sdc-agents/) PyPI package. The | ||
| canonical source lives at | ||
| [SemanticDataCharter/SDC_Agents](https://github.com/SemanticDataCharter/SDC_Agents); | ||
| this module provides importability through the `google.adk_community` | ||
| namespace. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| pip install google-adk-community[sdc-agents] | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ```python | ||
| from google.adk.agents import LlmAgent | ||
| from google.adk_community.tools.sdc_agents import ( | ||
| load_config, | ||
| CatalogToolset, | ||
| IntrospectToolset, | ||
| MappingToolset, | ||
| ) | ||
|
|
||
| config = load_config("sdc-agents.yaml") | ||
|
|
||
| agent = LlmAgent( | ||
| name="data_governance_agent", | ||
| model="gemini-2.0-flash", | ||
| description="Introspects data sources and maps them to SDC4 schemas.", | ||
| instruction=( | ||
| "You help data engineers govern their data. When given a datasource:\n" | ||
| "1. Introspect the structure to discover columns and types\n" | ||
| "2. Search the SDC4 catalog for matching published schemas\n" | ||
| "3. Map columns to schema components by type and name similarity\n" | ||
| "4. Report the mapping with confidence scores" | ||
| ), | ||
| tools=[ | ||
| IntrospectToolset(config=config), | ||
| CatalogToolset(config=config), | ||
| MappingToolset(config=config), | ||
| ], | ||
| ) | ||
| ``` | ||
|
|
||
| ## Exported Toolsets | ||
|
|
||
| | Toolset | Description | | ||
| |---------|-------------| | ||
| | **CatalogToolset** | Discover published SDC4 schemas, download artifacts (XSD, RDF, JSON-LD) | | ||
| | **IntrospectToolset** | Analyze datasource structure -- infer column types and constraints from SQL, CSV, JSON, MongoDB with sidecar metadata support | | ||
| | **MappingToolset** | Match datasource columns to schema components by type compatibility and name similarity, persist mapping configs with schema and datasource context | | ||
| | **AssemblyToolset** | Compose data models from catalog components -- reuse existing or mint new, with catalog-first discovery and structured unmatched column reporting | | ||
| | **GeneratorToolset** | Generate validated XML instances, batch processing, and preview | | ||
| | **ValidationToolset** | Validate XML instances against schemas, digitally sign via VaaS API | | ||
| | **DistributionToolset** | Deliver RDF triples to Fuseki, Neo4j, GraphDB, or REST endpoints | | ||
| | **KnowledgeToolset** | Index domain documentation (JSON, CSV, TTL, Markdown, PDF, DOCX) for semantic search | | ||
|
|
||
| ## Configuration | ||
|
|
||
| SDC Agents uses a YAML config file with environment variable substitution: | ||
|
|
||
| ```yaml | ||
| sdcstudio: | ||
| base_url: "https://sdcstudio.com" | ||
| api_key: "${SDC_API_KEY}" | ||
|
|
||
| datasources: | ||
| warehouse: | ||
| type: csv | ||
| path: "./data/sample.csv" | ||
|
|
||
| cache: | ||
| root: ".sdc-cache" | ||
|
|
||
| audit: | ||
| path: ".sdc-cache/audit.jsonl" | ||
| ``` | ||
|
|
||
| ## Resources | ||
|
|
||
| - [SDC Agents on PyPI](https://pypi.org/project/sdc-agents/) | ||
| - [SDC Agents GitHub](https://github.com/SemanticDataCharter/SDC_Agents) | ||
| - [ADK Integration Guide](https://github.com/SemanticDataCharter/SDC_Agents/blob/main/docs/integrations/ADK_INTEGRATION.md) | ||
| - [SDCStudio](https://sdcstudio.com) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # Copyright 2025 Google LLC | ||
| # | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is a toolset, shall we create a tools/ folder and move it there?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, and I want to make sure I'm reading the convention right rather than guess and move eight modules either way.
If
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes in core adk-python we also put toolsets under /tools, like the apihub_tool
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in 638d111.
Thanks for the apihub_tool pointer. |
||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """SDC Agents -- Purpose-scoped semantic data governance toolsets for ADK. | ||
|
|
||
| Eight BaseToolset implementations (32 tools) that transform SQL, CSV, JSON, | ||
| and MongoDB data into validated, self-describing SDC4 artifacts with | ||
| structured audit trails and enforced agent isolation boundaries. | ||
|
|
||
| Install: pip install google-adk-community[sdc-agents] | ||
| Docs: https://github.com/SemanticDataCharter/SDC_Agents | ||
|
|
||
| Requires sdc-agents >= 4.3.3. | ||
| """ | ||
|
|
||
| from sdc_agents.common.config import load_config | ||
| from sdc_agents.common.config import SDCAgentsConfig | ||
| from sdc_agents.toolsets.assembly import AssemblyToolset | ||
| from sdc_agents.toolsets.catalog import CatalogToolset | ||
| from sdc_agents.toolsets.distribution import DistributionToolset | ||
| from sdc_agents.toolsets.generator import GeneratorToolset | ||
| from sdc_agents.toolsets.introspect import IntrospectToolset | ||
| from sdc_agents.toolsets.knowledge import KnowledgeToolset | ||
| from sdc_agents.toolsets.mapping import MappingToolset | ||
| from sdc_agents.toolsets.validation import ValidationToolset | ||
|
|
||
| __all__ = [ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| "load_config", | ||
| "SDCAgentsConfig", | ||
| "AssemblyToolset", | ||
| "CatalogToolset", | ||
| "DistributionToolset", | ||
| "GeneratorToolset", | ||
| "IntrospectToolset", | ||
| "KnowledgeToolset", | ||
| "MappingToolset", | ||
| "ValidationToolset", | ||
| ] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding the
base_urltohttps://sdcstudio.commight be restrictive if users need to point to a staging or local environment. Consider allowing this to be overridden via an environment variable as well, e.g.,"${SDC_BASE_URL:-https://sdcstudio.com}".