Evaluate Arrow IPC + flechette as transport format

## Motivation

Buckaroo currently uses **Parquet** as the binary transport between Python and JS:
- Python: `fastparquet` / `pyarrow` → Parquet bytes
- JS: `hyparquet` (~10 KB) decodes in browser

This works, but **fastparquet only operates on pandas DataFrames** and uses pandas internal APIs (`pandas._libs.tslibs`, `pandas._libs.json`). This is the #1 blocker for making pandas an optional dependency (#533-adjacent).

Switching the transport format to **Arrow IPC** with **[flechette](https://github.com/uwdata/flechette)** as the JS decoder would:
1. Make the serialization layer backend-agnostic (pyarrow works with `pa.Table` directly — no pandas needed)
2. Drop `fastparquet` from core dependencies
3. Enable zero-copy typed arrays on the JS side
4. Potentially improve deserialization performance (7-11x faster row extraction vs apache-arrow JS)

## Research summary

Full writeup: [`docs/flechette-arrow-ipc-research.md`](docs/flechette-arrow-ipc-research.md)

### flechette (`@uwdata/flechette`)

Built by the **UW Interactive Data Lab** (Heer et al — the D3/Vega/Mosaic group). Used by Mosaic, Arquero v7, and vega-loader-arrow in production.

| | hyparquet (current) | flechette | apache-arrow JS |
|---|---|---|---|
| Size (min+gz) | 10 KB | **14 KB** | 43 KB |
| Dependencies | 0 | 0 | Multiple |
| Reads Parquet | Yes | No | No |
| Reads Arrow IPC | No | Yes | Yes |
| Zero-copy | No | Yes | Yes |
| Tree-shaking | Works | Works | **Broken** |
| Row extraction speed | Baseline | 7-11x vs apache-arrow | Baseline |

**apache-arrow JS is not recommended** — 3x the bundle, broken tree-shaking (open JIRA for years), and still can't read Parquet.

### What the Python side looks like

pyarrow is already a core dependency. Both pandas and polars produce pyarrow Tables:

```python
import pyarrow as pa
import pyarrow.ipc as ipc

# From pandas
table = pa.Table.from_pandas(df)
# From polars
table = polars_df.to_arrow()

# Serialize to IPC bytes
sink = pa.BufferOutputStream()
writer = ipc.new_stream(sink, table.schema)
writer.write_table(table)
writer.close()
raw_bytes = sink.getvalue().to_pybytes()
```

This replaces `to_parquet()` (fastparquet, pandas-only) with a backend-agnostic path.

### What the JS side looks like

```typescript
import { tableFromIPC } from '@uwdata/flechette';

// Infinite scroll (binary buffer from model.send)
const table = tableFromIPC(buffer);
const rows = table.toArray(); // [{col1: val, col2: val}, ...]

// Summary stats (base64 payload)
const table = tableFromIPC(b64ToArrayBuffer(payload.data));
const rows = table.toArray();
```

### Mixed-type columns (summary stats)

Current approach: pre-JSON-encode each cell into a string column, then JSON.parse on JS side. Same strategy works identically with Arrow IPC.

## Proposed plan

### Phase 1 (minimal, unblocks optional pandas)
Swap Python serialization from fastparquet → pyarrow. Keep hyparquet on JS side. pyarrow-produced Parquet is readable by hyparquet. Zero JS changes.

### Phase 2 (performance + clean architecture)
Switch transport from Parquet to Arrow IPC. Replace hyparquet with flechette (+4 KB bundle). Enables zero-copy typed arrays and faster deserialization.

## Bundle impact

+4 KB gzipped (Phase 2 only). Phase 1 has zero JS impact.

## Files affected

**Python (both phases):**
- `buckaroo/serialization_utils.py` — main serialization logic
- `buckaroo/buckaroo_widget.py` — calls `to_parquet()` for infinite scroll
- `pyproject.toml` — move fastparquet to `[pandas]` extra

**JS (Phase 2 only):**
- `packages/buckaroo-js-core/src/components/DFViewerParts/resolveDFData.ts`
- `packages/js/widget.tsx` — buffer handling in infinite scroll
- `packages/buckaroo-js-core/package.json` — swap hyparquet → flechette

## References

- [flechette GitHub](https://github.com/uwdata/flechette)
- [flechette docs](https://idl.uw.edu/flechette/)
- [hyparquet GitHub](https://github.com/hyparam/hyparquet)
- [Arrow IPC spec](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format)
- [pyarrow IPC docs](https://arrow.apache.org/docs/python/ipc.html)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Arrow IPC + flechette as transport format #546

Motivation

Research summary

flechette (`@uwdata/flechette`)

What the Python side looks like

What the JS side looks like

Mixed-type columns (summary stats)

Proposed plan

Phase 1 (minimal, unblocks optional pandas)

Phase 2 (performance + clean architecture)

Bundle impact

Files affected

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	hyparquet (current)	flechette	apache-arrow JS
Size (min+gz)	10 KB	14 KB	43 KB
Dependencies	0	0	Multiple
Reads Parquet	Yes	No	No
Reads Arrow IPC	No	Yes	Yes
Zero-copy	No	Yes	Yes
Tree-shaking	Works	Works	Broken
Row extraction speed	Baseline	7-11x vs apache-arrow	Baseline

Evaluate Arrow IPC + flechette as transport format #546

Description

Motivation

Research summary

flechette (@uwdata/flechette)

What the Python side looks like

What the JS side looks like

Mixed-type columns (summary stats)

Proposed plan

Phase 1 (minimal, unblocks optional pandas)

Phase 2 (performance + clean architecture)

Bundle impact

Files affected

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

flechette (`@uwdata/flechette`)