Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
SHELL := /bin/bash
.PHONY: test test-unit test-integration test-all clean setup lint format
.PHONY: test test-unit test-integration test-all clean setup lint format generate-models

# Use UV for all commands
PYTHON = uv run --env-file .test.env
Expand Down Expand Up @@ -59,10 +59,19 @@ lint:
@echo "🔍 Linting code..."
$(PYTHON) ruff check .

lint-fix:
@echo "🔍 Linting code..."
$(PYTHON) ruff check . --fix

format:
@echo "✨ Formatting code..."
$(PYTHON) ruff format .

# Generate Pydantic models from OpenAPI spec
generate-models:
@echo "🏗️ Generating Pydantic models from OpenAPI spec..."
$(PYTHON) python scripts/generate_models.py

# Setup development environment
setup:
@echo "🚀 Setting up development environment..."
Expand Down Expand Up @@ -115,6 +124,7 @@ clean:
help:
@echo "Available commands:"
@echo " make setup - Setup development environment"
@echo " make generate-models - Generate Pydantic models from OpenAPI spec"
@echo " make test-unit - Run unit tests (fast)"
@echo " make test-integration - Run integration tests"
@echo " make test-parallel-streaming - Run parallel streaming integration tests"
Expand Down
71 changes: 66 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,16 @@
[![Formatting status](https://github.com/edgeandnode/amp-python/actions/workflows/ruff.yml/badge.svg?event=push)](https://github.com/edgeandnode/amp-python/actions/workflows/ruff.yml)


## Overview
## Overview

Client for issuing queries to an Amp server and working with the returned data.
Python client for Amp - a high-performance data infrastructure for blockchain data.

**Features:**
- **Query Client**: Issue Flight SQL queries to Amp servers
- **Admin Client**: Manage datasets, deployments, and jobs programmatically
- **Data Loaders**: Zero-copy loading into PostgreSQL, Redis, Snowflake, Delta Lake, Iceberg, and more
- **Parallel Streaming**: High-throughput parallel data ingestion with automatic resume
- **Manifest Generation**: Fluent API for creating and deploying datasets from SQL queries

## Installation

Expand All @@ -21,7 +28,57 @@ Client for issuing queries to an Amp server and working with the returned data.
uv venv
```

## Useage
## Quick Start

### Querying Data

```python
from amp import Client

# Connect to Amp server
client = Client(url="grpc://localhost:8815")

# Execute query and convert to pandas
df = client.query("SELECT * FROM eth.blocks LIMIT 10").to_pandas()
print(df)
```

### Admin Operations

```python
from amp import Client

# Connect with admin capabilities
client = Client(
query_url="grpc://localhost:8815",
admin_url="http://localhost:8080",
auth_token="your-token"
)

# Register and deploy a dataset
job = (
client.query("SELECT block_num, hash FROM eth.blocks")
.with_dependency('eth', '_/eth_firehose@1.0.0')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what with_dependency is doing here in the context of that query. Does the eth_firehose dataset contain eth.blocks as well as logs and transactions datasets, etc.? Is it possible to automatically detect and populate dependencies based on the SQL, or to construct the SQL query in a way that automatically pulls in the dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the dataset's fully qualified name (FQN) is _/eth_firehose@1.0.0 (<namespace>/<dataset>@<version>) and it's being aliased as simply eth. blocks is a table in the dataset along with transactions and logs.

Agree that this is probably exposing more of the internals to the user than necessary. It reflects the structure of the manifest which requires a section listing all dependencies used in the SQL.

I think that the dependencies list could be generated based on the SQL, but I'm going to let the structure of these datasets and the recommended way to specify dependencies stabilize before making assumptions for the user. Right now we have a few ways of setting up these derived dataset dependencies: using the FQN in all places, specifying an alias like this, or simply using the dataset name without namespace and version (defaults to latest). The server side work for validating and deploying datasets employing all of these options has been WIP and is stabilizing.

.register_as('_', 'my_dataset', '1.0.0', 'blocks', 'mainnet')
.deploy(parallelism=4, end_block='latest', wait=True)
)

print(f"Deployment completed: {job.status}")
```

### Loading Data

```python
# Load query results into PostgreSQL
loader = client.query("SELECT * FROM eth.blocks").load(
loader_type='postgresql',
connection='my_pg_connection',
table_name='eth_blocks'
)
print(f"Loaded {loader.rows_written} rows")
```

## Usage

### Marimo

Expand All @@ -30,19 +87,23 @@ Start up a marimo workspace editor
uv run marimo edit
```

The Marimo app will open a new browser tab where you can create a new notebook, view helpful resources, and
The Marimo app will open a new browser tab where you can create a new notebook, view helpful resources, and
browse existing notebooks in the workspace.

### Apps

You can execute python apps and scripts using `uv run <path>` which will give them access to the dependencies
You can execute python apps and scripts using `uv run <path>` which will give them access to the dependencies
and the `amp` package. For example, you can run the `execute_query` app with the following command.
```bash
uv run apps/execute_query.py
```

## Documentation

### Getting Started
- **[Admin Client Guide](docs/admin_client_guide.md)** - Complete guide for dataset management and deployment
- **[Admin API Reference](docs/api/admin_api.md)** - Full API documentation for admin operations

### Features
- **[Parallel Streaming Usage Guide](docs/parallel_streaming_usage.md)** - User guide for high-throughput parallel data loading
- **[Parallel Streaming Design](docs/parallel_streaming.md)** - Technical design documentation for parallel streaming architecture
Expand Down
Loading
Loading