Add an admin client for dataset management #17

fordN · 2025-11-07T22:04:27Z

This PR adds Python bindings for the Amp Admin API, enabling programmatic dataset registration, deployment, and monitoring.

Key Features

Unified Client: Single Client class supports both Flight SQL queries and admin operations
Auto-generated Models: Pydantic v2 models generated from OpenAPI spec for type safety
Fluent API: Chain from SQL query → manifest generation → registration → deployment
Schema Validation: Automatic Arrow schema inference via /schema endpoint
Job Monitoring: Poll and wait for deployment completion with typed error handling

Quick Example

from amp import Client

client = Client(
    query_url="grpc://localhost:8815",
    admin_url="http://localhost:8080"
)

# One-liner: Query → Register → Deploy
job = (
    client.query("SELECT * FROM eth.blocks")
    .with_dependency('eth', '_/eth_firehose@1.0.0')
    .register_as('_', 'my_dataset', '1.0.0', 'blocks', 'mainnet')
    .deploy(parallelism=4, wait=True)
)

Documentation

User Guide: docs/admin_client_guide.md
API Reference: docs/api/admin_api.md

- Add pydantic>=2.0,<2.12 for data models (constrained for PyIceberg) - Add datamodel-code-generator for model generation from OpenAPI - Add respx for HTTP mocking in tests - Add Makefile target for regenerating models - Generate 838 lines of Pydantic v2 models from OpenAPI spec - Add test manifest files for dataset registration testing

- Create AdminClient base class with HTTP request handling and error mapping - Implement DatasetsClient with register/deploy/list/delete operations - Implement JobsClient with get/list/wait/stop/delete operations - Implement SchemaClient for SQL validation and schema inference - Create DeploymentContext for chainable deployment workflows - Add exception hierarchy with 30+ typed error classes mapped from API codes - Support automatic job polling with configurable timeout

- Add query_url and admin_url parameters to Client (backward compatible with url) - Add datasets, jobs, schema properties for admin operations - Extend QueryBuilder with with_dependency() for manifest dependencies - Add to_manifest() for generating dataset manifests from SQL queries - Add register_as() for one-line registration returning DeploymentContext - Support fluent API: query → with_dependency → register_as → deploy - Maintain backward compatibility (existing Client(url=...) still works)

- Add 10 unit tests for error mapping and exception hierarchy - Add 10 unit tests for Pydantic model validation - Add 10 integration tests for AdminClient HTTP operations - Add 10 integration tests for DatasetsClient operations - Add 18 integration tests for JobsClient operations including polling - All 48 tests use respx for HTTP mocking (no real server required) - 0.65s execution time on dev machine

- Add admin client to feature list - Add quick start examples for admin operations - Add links to admin client guide and API reference - Update overview to highlight dataset management capabilities

- Add comprehensive admin_client_guide.md with usage patterns and best practices - Add complete API reference in docs/api/admin_api.md

craigtutterow · 2025-11-12T20:10:17Z

README.md

+# Register and deploy a dataset
+job = (
+    client.query("SELECT block_num, hash FROM eth.blocks")
+    .with_dependency('eth', '_/eth_firehose@1.0.0')


I don't understand what with_dependency is doing here in the context of that query. Does the eth_firehose dataset contain eth.blocks as well as logs and transactions datasets, etc.? Is it possible to automatically detect and populate dependencies based on the SQL, or to construct the SQL query in a way that automatically pulls in the dependency?

In this case the dataset's fully qualified name (FQN) is _/eth_firehose@1.0.0 (<namespace>/<dataset>@<version>) and it's being aliased as simply eth. blocks is a table in the dataset along with transactions and logs.

Agree that this is probably exposing more of the internals to the user than necessary. It reflects the structure of the manifest which requires a section listing all dependencies used in the SQL.

I think that the dependencies list could be generated based on the SQL, but I'm going to let the structure of these datasets and the recommended way to specify dependencies stabilize before making assumptions for the user. Right now we have a few ways of setting up these derived dataset dependencies: using the FQN in all places, specifying an alias like this, or simply using the dataset name without namespace and version (defaults to latest). The server side work for validating and deploying datasets employing all of these options has been WIP and is stabilizing.

craigtutterow · 2025-11-12T20:12:22Z

docs/admin_client_guide.md

+query = query.with_dependency('eth', '_/eth_firehose@1.0.0')
+
+# Generate manifest
+manifest = query.to_manifest(


craigtutterow · 2025-11-12T20:13:30Z

docs/admin_client_guide.md

+# Generate manifest
+manifest = query.to_manifest(
+    table_name='blocks',
+    network='mainnet'


Maybe the networks should be typed/enums since there are only a few of them?

craigtutterow · 2025-11-12T20:30:28Z

specs/admin.spec.json

@@ -0,0 +1,2586 @@
+{


Can this spec be generated by the server to avoid drift?

This was generated by the server. I should add a process to keep it in sync with the spec on the server as this was manually copied over.

Yeah, that would be a nice to have

fordN self-assigned this Nov 7, 2025

fordN force-pushed the ford/dataset-management branch from 2633560 to 00d08f7 Compare November 7, 2025 22:32

fordN force-pushed the ford/snowpipe-and-streaming-state branch from 68bbdb3 to e0e5765 Compare November 7, 2025 22:36

fordN force-pushed the ford/dataset-management branch from 00d08f7 to 26372d5 Compare November 8, 2025 18:24

fordN added the enhancement New feature or request label Nov 8, 2025

fordN requested review from LNSD, craigtutterow and vivianpengnyc November 8, 2025 18:30

fordN force-pushed the ford/dataset-management branch 3 times, most recently from a7d0be3 to 36f0af5 Compare November 11, 2025 05:29

Base automatically changed from ford/snowpipe-and-streaming-state to main November 12, 2025 18:51

fordN added 6 commits November 12, 2025 11:34

docs: Update README with admin client features

91f43dd

- Add admin client to feature list - Add quick start examples for admin operations - Add links to admin client guide and API reference - Update overview to highlight dataset management capabilities

docs: Add admin client documentation and examples

48e4160

- Add comprehensive admin_client_guide.md with usage patterns and best practices - Add complete API reference in docs/api/admin_api.md

craigtutterow reviewed Nov 12, 2025

View reviewed changes

craigtutterow approved these changes Nov 14, 2025

View reviewed changes

fordN force-pushed the ford/dataset-management branch from 36f0af5 to 48e4160 Compare November 17, 2025 03:01

linting and formatting

088a0ec

fordN merged commit f538843 into main Nov 17, 2025
9 checks passed

fordN deleted the ford/dataset-management branch November 17, 2025 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an admin client for dataset management #17

Add an admin client for dataset management #17

Uh oh!

fordN commented Nov 7, 2025 •

edited

Loading

Uh oh!

craigtutterow Nov 12, 2025

Uh oh!

fordN Nov 13, 2025

Uh oh!

craigtutterow Nov 12, 2025

Uh oh!

craigtutterow Nov 12, 2025

Uh oh!

craigtutterow Nov 12, 2025

Uh oh!

fordN Nov 13, 2025

Uh oh!

craigtutterow Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add an admin client for dataset management #17

Add an admin client for dataset management #17

Uh oh!

Conversation

fordN commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Quick Example

Documentation

Uh oh!

craigtutterow Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

fordN Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

craigtutterow Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

craigtutterow Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

craigtutterow Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

fordN Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

craigtutterow Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fordN commented Nov 7, 2025 •

edited

Loading