-
Notifications
You must be signed in to change notification settings - Fork 1
Add an admin client for dataset management #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2633560 to
00d08f7
Compare
68bbdb3 to
e0e5765
Compare
00d08f7 to
26372d5
Compare
a7d0be3 to
36f0af5
Compare
- Add pydantic>=2.0,<2.12 for data models (constrained for PyIceberg) - Add datamodel-code-generator for model generation from OpenAPI - Add respx for HTTP mocking in tests - Add Makefile target for regenerating models - Generate 838 lines of Pydantic v2 models from OpenAPI spec - Add test manifest files for dataset registration testing
- Create AdminClient base class with HTTP request handling and error mapping - Implement DatasetsClient with register/deploy/list/delete operations - Implement JobsClient with get/list/wait/stop/delete operations - Implement SchemaClient for SQL validation and schema inference - Create DeploymentContext for chainable deployment workflows - Add exception hierarchy with 30+ typed error classes mapped from API codes - Support automatic job polling with configurable timeout
- Add query_url and admin_url parameters to Client (backward compatible with url) - Add datasets, jobs, schema properties for admin operations - Extend QueryBuilder with with_dependency() for manifest dependencies - Add to_manifest() for generating dataset manifests from SQL queries - Add register_as() for one-line registration returning DeploymentContext - Support fluent API: query → with_dependency → register_as → deploy - Maintain backward compatibility (existing Client(url=...) still works)
- Add 10 unit tests for error mapping and exception hierarchy - Add 10 unit tests for Pydantic model validation - Add 10 integration tests for AdminClient HTTP operations - Add 10 integration tests for DatasetsClient operations - Add 18 integration tests for JobsClient operations including polling - All 48 tests use respx for HTTP mocking (no real server required) - 0.65s execution time on dev machine
- Add admin client to feature list - Add quick start examples for admin operations - Add links to admin client guide and API reference - Update overview to highlight dataset management capabilities
- Add comprehensive admin_client_guide.md with usage patterns and best practices - Add complete API reference in docs/api/admin_api.md
| # Register and deploy a dataset | ||
| job = ( | ||
| client.query("SELECT block_num, hash FROM eth.blocks") | ||
| .with_dependency('eth', '_/eth_firehose@1.0.0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what with_dependency is doing here in the context of that query. Does the eth_firehose dataset contain eth.blocks as well as logs and transactions datasets, etc.? Is it possible to automatically detect and populate dependencies based on the SQL, or to construct the SQL query in a way that automatically pulls in the dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the dataset's fully qualified name (FQN) is _/eth_firehose@1.0.0 (<namespace>/<dataset>@<version>) and it's being aliased as simply eth. blocks is a table in the dataset along with transactions and logs.
Agree that this is probably exposing more of the internals to the user than necessary. It reflects the structure of the manifest which requires a section listing all dependencies used in the SQL.
I think that the dependencies list could be generated based on the SQL, but I'm going to let the structure of these datasets and the recommended way to specify dependencies stabilize before making assumptions for the user. Right now we have a few ways of setting up these derived dataset dependencies: using the FQN in all places, specifying an alias like this, or simply using the dataset name without namespace and version (defaults to latest). The server side work for validating and deploying datasets employing all of these options has been WIP and is stabilizing.
| query = query.with_dependency('eth', '_/eth_firehose@1.0.0') | ||
|
|
||
| # Generate manifest | ||
| manifest = query.to_manifest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
| # Generate manifest | ||
| manifest = query.to_manifest( | ||
| table_name='blocks', | ||
| network='mainnet' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the networks should be typed/enums since there are only a few of them?
| @@ -0,0 +1,2586 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this spec be generated by the server to avoid drift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was generated by the server. I should add a process to keep it in sync with the spec on the server as this was manually copied over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be a nice to have
36f0af5 to
48e4160
Compare
This PR adds Python bindings for the Amp Admin API, enabling programmatic dataset registration, deployment, and monitoring.
Key Features
Quick Example
Documentation