Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions contributing/samples/sdc_agents_demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# SDC Agents Demo

A minimal example composing SDC Agents toolsets with an ADK `LlmAgent`.

## Prerequisites

- Python 3.11+
- An SDCStudio API key (set as `SDC_API_KEY` environment variable)

## Setup

```bash
pip install google-adk-community[sdc-agents]

export SDC_API_KEY="your-sdcstudio-api-key"
export GOOGLE_API_KEY="your-google-api-key"
```

## Usage

```bash
# Run with the ADK CLI
adk run .

# Or use the ADK web UI
adk web .
```

## What This Demo Does

The agent composes two SDC Agents toolsets:

- **CatalogToolset**: Search published SDC4 schemas, download artifacts
(XSD, RDF, JSON-LD), and check wallet balance.
- **IntrospectToolset**: Analyze a datasource to infer column types,
constraints, and statistics.

## Sample Queries

```
> Search the catalog for schemas related to lab results
> Introspect the sample datasource
> What published schemas match the columns in my datasource?
```

## Structure

```
sdc_agents_demo/
├── agent.py # Agent definition with SDC toolsets
└── README.md # This file
```

## Resources

- [SDC Agents Documentation](https://github.com/SemanticDataCharter/SDC_Agents)
- [SDC Agents on PyPI](https://pypi.org/project/sdc-agents/)
- [ADK Integration Guide](https://github.com/SemanticDataCharter/SDC_Agents/blob/main/docs/integrations/ADK_INTEGRATION.md)
90 changes: 90 additions & 0 deletions contributing/samples/sdc_agents_demo/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""SDC Agents demo -- full data governance pipeline.

Composes five SDC Agents toolsets into a single LlmAgent that can
introspect a datasource, discover matching catalog components, map
columns to schemas, and assemble validated data models.

Prerequisites:
pip install google-adk-community[sdc-agents]
export SDC_API_KEY="your-sdcstudio-api-key"
export SDC_BASE_URL="https://sdcstudio.axius-sdc.com" # optional; this is the default

Usage:
adk run .
"""

import os

from google.adk.agents import LlmAgent

from google.adk_community.tools.sdc_agents import (
CatalogToolset,
IntrospectToolset,
MappingToolset,
AssemblyToolset,
ValidationToolset,
SDCAgentsConfig,
)

api_key = os.environ.get("SDC_API_KEY")
if not api_key:
raise ValueError(
"SDC_API_KEY environment variable is required. "
"Get an API key from https://sdcstudio.axius-sdc.com and set it via: "
"export SDC_API_KEY=\"your-key\""
)

config = SDCAgentsConfig(
sdcstudio={
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding the base_url to https://sdcstudio.com might be restrictive if users need to point to a staging or local environment. Consider allowing this to be overridden via an environment variable as well, e.g., "${SDC_BASE_URL:-https://sdcstudio.com}".

"base_url": os.environ.get("SDC_BASE_URL", "https://sdcstudio.axius-sdc.com"),
"api_key": api_key,
},
datasources={
"sample": {
"type": "csv",
"path": "./data/sample.csv",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample.csv file referenced here is not included in the PR. This will cause the demo to fail out-of-the-box. Please include a sample CSV or update the demo to use a dynamically generated one.

},
},
cache={"root": ".sdc-cache"},
audit={"path": ".sdc-cache/audit.jsonl"},
)

root_agent = LlmAgent(
name="sdc_demo_agent",
model="gemini-2.0-flash",
description=(
"Full data governance pipeline: introspect, discover, map,"
" and assemble SDC4 data models."
),
instruction=(
"You help data engineers govern their data. Follow this workflow:\n"
"1. Introspect the datasource to discover columns and types\n"
"2. Search the SDC4 catalog for matching published schemas\n"
"3. Discover catalog components that match the datasource structure\n"
"4. Map unmatched columns to schema components by similarity\n"
"5. Propose a cluster hierarchy for the data model\n"
"6. Assemble the final data model via the Assembly API\n"
"7. Validate the generated artifacts"
),
tools=[
IntrospectToolset(config=config),
CatalogToolset(config=config),
MappingToolset(config=config),
AssemblyToolset(config=config),
ValidationToolset(config=config),
],
)
13 changes: 13 additions & 0 deletions contributing/samples/sdc_agents_demo/data/sample.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
patient_id,test_date,test_name,result_value,result_units,reference_low,reference_high
P001,2026-01-15,Glucose,98,mg/dL,70,99
P002,2026-01-15,HbA1c,5.4,%,4.0,5.6
P003,2026-01-16,Total Cholesterol,182,mg/dL,0,200
P004,2026-01-16,LDL Cholesterol,104,mg/dL,0,130
P005,2026-01-17,HDL Cholesterol,58,mg/dL,40,60
P006,2026-01-17,Triglycerides,142,mg/dL,0,150
P007,2026-01-18,Creatinine,0.9,mg/dL,0.6,1.2
P008,2026-01-18,BUN,14,mg/dL,7,20
P009,2026-01-19,Sodium,140,mmol/L,135,145
P010,2026-01-19,Potassium,4.1,mmol/L,3.5,5.0
P011,2026-01-20,Hemoglobin,14.2,g/dL,13.5,17.5
P012,2026-01-20,WBC,7.8,K/uL,4.0,11.0
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ test = [
"pytest>=8.4.2",
"pytest-asyncio>=1.2.0",
]
sdc-agents = [
"sdc-agents>=4.3.3; python_version >= '3.11'",
]
spraay = ["web3>=6.0.0"]


Expand Down
1 change: 1 addition & 0 deletions src/google/adk_community/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@
from . import memory
from . import sessions
from . import version

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added src/google/adk_community/tools/sdc_agents/, but src/google/adk_community/__init__.py doesn't seem to import tools. Should from . import tools be added here to ensure the tools namespace is properly initialized and discoverable?

__version__ = version.__version__
87 changes: 87 additions & 0 deletions src/google/adk_community/tools/sdc_agents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# SDC Agents -- Semantic Data Governance for ADK

Thin re-export wrapper over the
[`sdc-agents`](https://pypi.org/project/sdc-agents/) PyPI package. The
canonical source lives at
[SemanticDataCharter/SDC_Agents](https://github.com/SemanticDataCharter/SDC_Agents);
this module provides importability through the `google.adk_community`
namespace.

## Installation

```bash
pip install google-adk-community[sdc-agents]
```

## Usage

```python
from google.adk.agents import LlmAgent
from google.adk_community.tools.sdc_agents import (
load_config,
CatalogToolset,
IntrospectToolset,
MappingToolset,
)

config = load_config("sdc-agents.yaml")

agent = LlmAgent(
name="data_governance_agent",
model="gemini-2.0-flash",
description="Introspects data sources and maps them to SDC4 schemas.",
instruction=(
"You help data engineers govern their data. When given a datasource:\n"
"1. Introspect the structure to discover columns and types\n"
"2. Search the SDC4 catalog for matching published schemas\n"
"3. Map columns to schema components by type and name similarity\n"
"4. Report the mapping with confidence scores"
),
tools=[
IntrospectToolset(config=config),
CatalogToolset(config=config),
MappingToolset(config=config),
],
)
```

## Exported Toolsets

| Toolset | Description |
|---------|-------------|
| **CatalogToolset** | Discover published SDC4 schemas, download artifacts (XSD, RDF, JSON-LD) |
| **IntrospectToolset** | Analyze datasource structure -- infer column types and constraints from SQL, CSV, JSON, MongoDB with sidecar metadata support |
| **MappingToolset** | Match datasource columns to schema components by type compatibility and name similarity, persist mapping configs with schema and datasource context |
| **AssemblyToolset** | Compose data models from catalog components -- reuse existing or mint new, with catalog-first discovery and structured unmatched column reporting |
| **GeneratorToolset** | Generate validated XML instances, batch processing, and preview |
| **ValidationToolset** | Validate XML instances against schemas, digitally sign via VaaS API |
| **DistributionToolset** | Deliver RDF triples to Fuseki, Neo4j, GraphDB, or REST endpoints |
| **KnowledgeToolset** | Index domain documentation (JSON, CSV, TTL, Markdown, PDF, DOCX) for semantic search |

## Configuration

SDC Agents uses a YAML config file with environment variable substitution:

```yaml
sdcstudio:
base_url: "https://sdcstudio.com"
api_key: "${SDC_API_KEY}"

datasources:
warehouse:
type: csv
path: "./data/sample.csv"

cache:
root: ".sdc-cache"

audit:
path: ".sdc-cache/audit.jsonl"
```

## Resources

- [SDC Agents on PyPI](https://pypi.org/project/sdc-agents/)
- [SDC Agents GitHub](https://github.com/SemanticDataCharter/SDC_Agents)
- [ADK Integration Guide](https://github.com/SemanticDataCharter/SDC_Agents/blob/main/docs/integrations/ADK_INTEGRATION.md)
- [SDCStudio](https://sdcstudio.com)
49 changes: 49 additions & 0 deletions src/google/adk_community/tools/sdc_agents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Copyright 2025 Google LLC
#
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a toolset, shall we create a tools/ folder and move it there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and I want to make sure I'm reading the convention right rather than guess and move eight modules either way.

sdc_agents/ exports 8 BaseToolset implementations (Assembly, Catalog, Distribution, Generator, Introspect, Knowledge, Mapping, Validation) plus a config loader, not individual FunctionTool callables. The existing tools/spraay/ precedent looks like the home for standalone tool functions (the README phrases it as "standalone tools that can be used by agents"), which is why I placed sdc_agents/ as a sibling at the adk_community/ level alongside memory/ and sessions/.

If tools/ is the correct home for BaseToolset implementations as well, I'm happy to move it to tools/sdc_agents/. Just want to confirm before refactoring imports across the PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in core adk-python we also put toolsets under /tools, like the apihub_tool

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 638d111.

  • Moved to src/google/adk_community/tools/sdc_agents/ via git mv so history follows both files.
  • Public import path is now google.adk_community.tools.sdc_agents across the demo agent, the test module, and the module README.
  • Dropped the try/except ImportError block in src/google/adk_community/__init__.py. With sdc_agents under tools/, the optional extra can fail at the user's end
    import site the same way spraay does, rather than being eagerly imported at package init.
  • All 12 tests in tests/unittests/test_sdc_agents_imports.py pass locally against sdc-agents==4.3.3.

Thanks for the apihub_tool pointer.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""SDC Agents -- Purpose-scoped semantic data governance toolsets for ADK.

Eight BaseToolset implementations (32 tools) that transform SQL, CSV, JSON,
and MongoDB data into validated, self-describing SDC4 artifacts with
structured audit trails and enforced agent isolation boundaries.

Install: pip install google-adk-community[sdc-agents]
Docs: https://github.com/SemanticDataCharter/SDC_Agents

Requires sdc-agents >= 4.3.3.
"""

from sdc_agents.common.config import load_config
from sdc_agents.common.config import SDCAgentsConfig
from sdc_agents.toolsets.assembly import AssemblyToolset
from sdc_agents.toolsets.catalog import CatalogToolset
from sdc_agents.toolsets.distribution import DistributionToolset
from sdc_agents.toolsets.generator import GeneratorToolset
from sdc_agents.toolsets.introspect import IntrospectToolset
from sdc_agents.toolsets.knowledge import KnowledgeToolset
from sdc_agents.toolsets.mapping import MappingToolset
from sdc_agents.toolsets.validation import ValidationToolset

__all__ = [
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __all__ list is comprehensive and correctly matches the imports. Using __all__ is a best practice for public-facing modules to control the exported API.

"load_config",
"SDCAgentsConfig",
"AssemblyToolset",
"CatalogToolset",
"DistributionToolset",
"GeneratorToolset",
"IntrospectToolset",
"KnowledgeToolset",
"MappingToolset",
"ValidationToolset",
]
Loading
Loading