Summary
Add a new plugin-transform-arrow sub-module implementing aq — a tool that applies jq-style filter expressions to columnar data files (Parquet, Arrow IPC, CSV, NDJSON).
What aq does
aq is "jq for Apache Arrow". Its pipeline:
- Read — auto-detect and read Parquet, Arrow IPC, CSV, or NDJSON into Arrow
RecordBatch objects
- Serialize — convert each row to a JSON object (via NDJSON intermediate)
- Filter/transform — apply a jq expression row-by-row (or in slurp mode over all rows)
- Output — write results as NDJSON, JSON, CSV, TSV, or Arrow IPC
Why plugin-transform, not plugin-serdes
plugin-serdes is purely a format converter (file ↔ Ion records) with no query or filter logic. aq's core value is the jq query engine applied to columnar data, which belongs in plugin-transform alongside the existing plugin-transform-json (JSONata) and plugin-transform-records (SQL-like filter/map/aggregate).
Why plugin-transform-arrow, not plugin-transform-jq
The name should reflect the data format, not the query language — consistent with how plugin-transform-json is named after JSON, not after JSONata. The differentiator here is Arrow/Parquet input support; jq is just the query mechanism. It also leaves room to add non-jq tasks later (schema inspection, format conversion within the Arrow ecosystem, etc.).
Proposed tasks
| Task |
Description |
Query |
Apply a jq expression to an Arrow/Parquet/CSV/NDJSON file, output as Ion records or file |
Mirrors the Transform / TransformItems pattern from plugin-transform-json.
Java implementation
The Rust-to-Java mapping is straightforward:
aq dependency (Rust) |
Java equivalent |
parquet crate |
org.apache.parquet:parquet-arrow |
arrow (IPC, CSV, JSON reader/writer) |
org.apache.arrow:arrow-vector, arrow-dataset, arrow-ipc |
| Row → NDJSON serialization |
Arrow Java JSON writer (ArrowToJson) |
jaq-core (pure-Rust jq engine) |
jackson-jq — best pure-Java jq implementation |
serde_json |
Jackson Databind |
Note on jackson-jq: it doesn't implement the full jq spec (missing $ENV, input/inputs, some path builtins). This is the same tradeoff aq itself makes (it uses jaq, not the reference jq), so it's acceptable for the target use cases.
References
Summary
Add a new
plugin-transform-arrowsub-module implementingaq— a tool that applies jq-style filter expressions to columnar data files (Parquet, Arrow IPC, CSV, NDJSON).What
aqdoesaqis "jq for Apache Arrow". Its pipeline:RecordBatchobjectsWhy
plugin-transform, notplugin-serdesplugin-serdesis purely a format converter (file ↔ Ion records) with no query or filter logic.aq's core value is the jq query engine applied to columnar data, which belongs inplugin-transformalongside the existingplugin-transform-json(JSONata) andplugin-transform-records(SQL-like filter/map/aggregate).Why
plugin-transform-arrow, notplugin-transform-jqThe name should reflect the data format, not the query language — consistent with how
plugin-transform-jsonis named after JSON, not after JSONata. The differentiator here is Arrow/Parquet input support; jq is just the query mechanism. It also leaves room to add non-jq tasks later (schema inspection, format conversion within the Arrow ecosystem, etc.).Proposed tasks
QueryMirrors the
Transform/TransformItemspattern fromplugin-transform-json.Java implementation
The Rust-to-Java mapping is straightforward:
aqdependency (Rust)parquetcrateorg.apache.parquet:parquet-arrowarrow(IPC, CSV, JSON reader/writer)org.apache.arrow:arrow-vector,arrow-dataset,arrow-ipcArrowToJson)jaq-core(pure-Rust jq engine)jackson-jq— best pure-Java jq implementationserde_jsonNote on
jackson-jq: it doesn't implement the full jq spec (missing$ENV,input/inputs, some path builtins). This is the same tradeoffaqitself makes (it usesjaq, not the referencejq), so it's acceptable for the target use cases.References
aqrepositoryplugin-transform-json— existing sibling sub-module using JSONatajackson-jq— Java jq engine