Read and write Alteryx YXDB files from Python and C++.
Built on Alteryx's official open-source YXDB implementation, released under GPLv3. This fork modernizes the build system, fixes bugs, adds cross-platform support, and wraps the C++ core in a Python package with first-class PyArrow, Pandas, and Polars integration.
- Cross-platform support -- builds and runs on Linux, macOS, and Windows (the original targeted Windows only)
- Python bindings via nanobind, published to PyPI as
openyxdb - PyArrow, Pandas, Polars, and DuckDB read/write integration with automatic type mapping
- Bug fixes -- block index was never written to disk (broke files with more than 65,536 records), reference-counting crash on GCC
- Modern CMake (3.20+, target-based), pixi for dependency management, Catch2 test suite, GitHub Actions CI
YXDB is the native binary format used by Alteryx Designer. Two on-disk layouts are in active use: the original format used by the classic engine and a newer columnar-friendly variant emitted by the AMP engine.
This library reads both layouts automatically -- the file's magic bytes are
sniffed at open time and the appropriate decoder is selected. New files written
through openyxdb.Writer use the original format.
pip install openyxdbimport openyxdb
# Read to PyArrow, Pandas, or Polars
table = openyxdb.to_pyarrow("data.yxdb")
df = openyxdb.to_pandas("data.yxdb")
df = openyxdb.to_polars("data.yxdb")
# Write from any of them
openyxdb.from_polars(df, "output.yxdb")
openyxdb.from_pandas(df, "output.yxdb")
openyxdb.from_pyarrow(table, "output.yxdb")All 17 YXDB field types are supported: Bool, Byte, Int16, Int32, Int64, FixedDecimal, Float, Double, String, WString, V_String, V_WString, Date, Time, DateTime, Blob, and SpatialObj.
Importing openyxdb automatically monkey-patches polars with YXDB support (no-op if polars is not installed). No extra setup is required.
import polars as pl
import openyxdb # registers everything on import
# Eager read — returns a DataFrame
df = pl.read_yxdb("data.yxdb")
# Lazy scan — returns a LazyFrame with projection & predicate pushdown
lf = pl.scan_yxdb("data.yxdb")
df = lf.select("col_a", "col_b").filter(pl.col("col_a") > 10).collect()# Write a DataFrame directly
df.yxdb.write("output.yxdb")
# Stream a LazyFrame through the streaming engine and write in chunks
lf.yxdb.sink("output.yxdb", chunk_size=65_536)lf.yxdb.sink() (and the top-level openyxdb.sink_yxdb(lf, path)) execute the
lazy plan on Polars' streaming engine and write the result to the YXDB file in
chunks, so the writer never buffers more than chunk_size rows in Python at a
time. Polars 1.x does not expose a public plugin API for custom sink_*
formats, so the query result still passes through a single collect() call
before being chunk-written; true per-batch push sinks require upstream Polars
support.
scan_yxdb implements pushdown against the underlying reader:
| Optimization | Supported | Notes |
|---|---|---|
| Projection pushdown | Yes | Only requested columns are decoded from disk. |
| Row-limit pushdown | Yes | head(n) / fetch(n) stops decoding once n rows produced. |
| Batched streaming | Yes | Default 65 536 rows; honours Polars' batch_size hint. |
| Predicate pushdown | Partial | Predicates evaluate per batch after decode. YXDB has no per-block statistics, so genuine file-level skipping isn't possible; combined with head predicates still short-circuit once enough rows are collected. |
If you import openyxdb after polars is already loaded, everything is registered automatically. To re-register or verify:
openyxdb.register_polars()import duckdb, openyxdb
con = duckdb.connect()
# Register a YXDB file as a SQL view
openyxdb.register_duckdb(con, "yx", "data.yxdb")
con.execute("SELECT COUNT(*) FROM yx WHERE score > 90").fetchone()
# Or get a relation directly
rel = openyxdb.to_duckdb("data.yxdb", con)
# Write the result of a DuckDB query to YXDB
openyxdb.from_duckdb(
"SELECT id, score FROM yx WHERE score > 90",
"top_scores.yxdb",
con=con,
)DuckDB support is provided through Arrow interop: YXDB files are loaded into an
Arrow table and registered on the connection. A native DuckDB table function
(SELECT * FROM read_yxdb('file.yxdb')) would require a C++ extension, which is
out of scope for this package.
pixi install
pixi run build
pixi run test- Spatial indexes are skipped on read and not created on write.
- Spatial objects are accessible only as raw blobs (SHP-encoded binary).
- Little-endian architectures only (x86, ARM).
The reader has been validated against 1,012 real-world E1 YXDB files sourced from the community corpus at Sigilweaver/YXDB-Sources, covering a wide range of field types, encodings, record counts (0 to 200k+), and filenames including non-ASCII characters — 100% pass rate.
The 1.3.0 streaming sink (sink_yxdb) and DuckDB round-trip paths are exercised against a sampled 25-file subset of the corpus in addition to the per-feature unit tests. Full Python suite: 1,040 passed.
GPLv3 -- see LICENSE.