IceFrame (Alpha)

A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.

IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.

Features

DataFrame API: Familiar interface for working with tables
Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
CRUD Operations: Create, Read, Update, Delete tables and data
Maintenance: Expire snapshots, remove orphan files, compact data files
Export: Export data to Parquet, CSV, and JSON

Documentation

Getting Started

Data Ingestion

Native File Ingestion (CSV, JSON, Parquet, ORC, Avro)
Optional File Ingestion (Excel, Delta, Google Sheets)
Advanced File Ingestion (SQL, XML, SAS/SPSS)
API Ingestion
HuggingFace Ingestion
HTML Ingestion
Clipboard Ingestion
Folder Ingestion
Bulk Ingestion
Incremental Ingestion

Querying & Processing

Table Management

Maintenance & Quality

Advanced Features

Recipes

Installation

pip install iceframe

For cloud storage support:

pip install "iceframe[aws]"   # AWS S3
pip install "iceframe[gcs]"   # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage

Quick Start

Create a .env file with your catalog credentials (see .env.example):

ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest

Use IceFrame in your code:

from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl

# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)

# Create a table
schema = {
    "id": "long",
    "name": "string",
    "created_at": "timestamp"
}
ice.create_table("my_table", schema)

# Append data
data = pl.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)

# Read data
df = ice.read_table("my_table")
print(df)

# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum

df = (ice.query("my_table")
      .select("name", sum(col("id")).alias("total_id"))
      .group_by("name")
      .execute())
print(df)

Feature Comparison: IceFrame vs PyIceberg

IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.

Feature	PyIceberg (Native)	IceFrame (Enhanced)
Table CRUD	Low-level API	Simplified `create_table`, `drop_table`
Data Writing	Arrow/Pandas integration	Polars integration, Auto-schema inference
Branching	Basic support (WIP)	`create_branch`, `fast_forward`, WAP Pattern
Compaction	`rewrite_data_files` (limited)	`bin_pack`, `sort` strategies (Polars-based)
Views	Catalog-dependent	Unified `ViewManager` abstraction
Maintenance	`expire_snapshots`	`GarbageCollector`, Native `remove_orphan_files`
SQL Support	None	Fluent Query Builder (`select`, `filter`, `join`)
Ingestion	`add_files`	`add_files` wrapper + Incremental Ingestion recipes
Rollback	`manage_snapshots`	`rollback_to_snapshot`, `rollback_to_timestamp`
Async	None	`AsyncIceFrame` for non-blocking I/O

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IceFrame (Alpha)

Features

Documentation

Getting Started

Data Ingestion

Querying & Processing

Table Management

Maintenance & Quality

Advanced Features

Recipes

Installation

Quick Start

Feature Comparison: IceFrame vs PyIceberg

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

IceFrame (Alpha)

Features

Documentation

Getting Started

Data Ingestion

Querying & Processing

Table Management

Maintenance & Quality

Advanced Features

Recipes

Installation

Quick Start

Feature Comparison: IceFrame vs PyIceberg