Add ORC and Lance as additional columnar outputs alongside the existing Parquet (canonical) and Vortex (optional) artifacts.
Per format
- New convert stage (or generalised convert that switches on format).
- Extend
sources.json: per-format convert.<fmt> + convert.<fmt>_skip_reason, mirroring the Vortex pair.
- Update
validate_manifest invariants.
- Outputs at
outputs/v{n}/<slug>/<fmt>/<slug>.<ext>.
- Regen
docs/datasets.md + docs/snapshot.json.
Formats in scope
- ORC — Apache ORC. PyArrow has read-only support; need an external writer (e.g.
pyorc, or PyArrow + orc-tools).
- Lance — Lance v2 via
pylance.
Add ORC and Lance as additional columnar outputs alongside the existing Parquet (canonical) and Vortex (optional) artifacts.
Per format
sources.json: per-formatconvert.<fmt>+convert.<fmt>_skip_reason, mirroring the Vortex pair.validate_manifestinvariants.outputs/v{n}/<slug>/<fmt>/<slug>.<ext>.docs/datasets.md+docs/snapshot.json.Formats in scope
pyorc, or PyArrow +orc-tools).pylance.