Skip to content

Conversation

@fordN
Copy link
Contributor

@fordN fordN commented Oct 6, 2025

Reorg Aware Streaming Support for amp-python

Summary

This PR adds comprehensive blockchain reorganization (reorg) handling and streaming support to the amp-python client library. It enables real-time data streaming with automatic handling of blockchain reorganizations across all supported data loaders.

Key Features

1. Streaming Infrastructure

  • New streaming module with iterator, types, and reorg handling
  • ResponseBatchWithReorg type for streaming data and reorg events
  • BlockRange metadata tracking for multi-network support
  • Streaming iterator with automatic reorg detection

2. Enhanced Client API

  • Enhanced load() method in QueryBuilder with streaming support
  • New stream parameter enables continuous data streaming
  • Support for reorg handling via handle_reorgs parameter
  • Backward compatible - existing batch operations unchanged

3. Universal Reorg Support

  • All 6 data loaders now support blockchain reorganizations:
    • PostgreSQL: Efficient JSONB-based deletion
    • Redis: Sorted set index for O(log N) lookups
    • Snowflake: SQL with FLATTEN for JSON operations
    • Iceberg: Scan and overwrite approach
    • DeltaLake: PyArrow compute masks for filtering
    • LMDB: Two-pass key-based deletion

4. Metadata Architecture

  • Standardized _meta_block_ranges column across all loaders
  • JSON format: [{"network": "ethereum", "start": 100, "end": 110}]
  • Supports multi-network scenarios (bridges, cross-chain DEX, etc.)

Changes by Component

Core Library (552 lines added)

  • src/amp/client.py: Enhanced QueryBuilder.load() with streaming support
  • src/amp/loaders/base.py: Added load_stream_continuous() and _handle_reorg()
  • src/amp/streaming/: New module with iterator, types, and reorg handling

Data Loaders (740 lines added)

  • PostgreSQL: Single SQL DELETE with JSONB operations
  • Redis: Leverages sorted set indexes for efficient deletion
  • Snowflake: Native JSON SQL functions
  • Iceberg: Full scan with filtered overwrite
  • DeltaLake: Arrow compute operations
  • LMDB: Key iteration and batch deletion

Testing (2,063 lines added)

  • 110 new integration tests covering all loaders
  • Comprehensive unit tests for streaming types
  • Test scenarios: empty tables, missing metadata, multi-network, overlapping ranges

Documentation (355 lines added)

  • Detailed reorg handling guide for each loader
  • Implementation details and code examples

Usage Example

# Batch mode (existing)
result = client.query("SELECT * FROM blocks").load(postgres_loader, "blocks")

# Streaming mode (new)
for result in client.query("SELECT * FROM blocks").load(
    postgres_loader, 
    "blocks", 
    stream=True,
    handle_reorgs=True
):
    print(f"Loaded {result.rows_loaded} rows")
    if result.is_reorg:
        print(f"Handled reorg: {result.invalidation_ranges}")

Commits

ef17ef7 streaming: Base streaming implementations
c58b9fc loaders/base: Add streaming concepts to DataLoader class
5a23c12 client: Add streaming support to QueryBuilder and Client
cc5730f tests/unit: Add unit tests for reorg handling
d157e9d postgresql loader: Add reorg aware streaming support
876ade4 tests/performance: Fix config fixture usage in Redis loader tests
f1fea01 redis loader: Add reorg aware streaming support
073ada5 iceberg loader: Add reorg aware streaming support
21fd0ed deltalake loader: Add reorg aware streaming support
ea045c8 lmdb loader: Add reorg aware streaming support
a76fa1b snowflake loader: Add reorg aware streaming support
9c5aec4 docs: Summarizing reorg handling approach for each loader

fordN added 4 commits October 3, 2025 16:30
- Classes: StreamingResultIterator, ReorgAwareStream, BlockRange,
BatchMetadata, ResponseBatch, ResponseBatchType, ResponseBatchWithReorg,
ResumeWatermark
- Not yet wired up with user callable functions
- Add load_stream_continous() for streaming with reorg handling
- Enhance LoadResult to be prepared to handle reorgs
- Add query_and_load_streaming() to Client
- Use query_and_load_streaming() in QueryBuilder.load() if stream=True
- Add test_reorg_result_string_representation
- Enhance test_all_loaders_implement_required_methods to check
whether it has real implementation in each data loader
- Remove now redundant test test_create_table_from_schema_not_just_pass
@fordN fordN self-assigned this Oct 6, 2025
@fordN fordN added the enhancement New feature or request label Oct 6, 2025
@fordN fordN changed the title Ford/streaming Reorg aware streaming support Oct 14, 2025
Comment on lines +536 to +539
iceberg_table = self._catalog.load_table(table_identifier)
except NoSuchTableError:
self.logger.warning(f"Table '{table_identifier}' does not exist, skipping reorg handling")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a full table scan be problematic for very large datasets?

Comment on lines +715 to +717
row_mask = pa.array([j == i for j in range(current_table.num_rows)])
keep_mask = pa.compute.and_(keep_mask, pa.compute.invert(row_mask))
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very inn-efficient, crates huge arrays unneccesarily. A simple bool flag list would work here instead

Copy link
Member

@incrypto32 incrypto32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the full table scans for Iceberg and Deltalake loader and the deletion logic in Deltalake loader.

I'll add the commits to fix that

fordN added 2 commits October 22, 2025 08:21
Signed-off-by: Ford <ford@edgeandnode.com>
@fordN
Copy link
Contributor Author

fordN commented Oct 22, 2025

I'm going to go ahead and merge this and consider the specific loader implementations to be in beta (needing optimization and hardening).

@fordN fordN merged commit 9949095 into main Oct 22, 2025
7 checks passed
@fordN fordN deleted the ford/streaming branch October 22, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants