Add parquet for scatter plots in CLI#79
Conversation
|
Hi @Psy-Fer, this is ready for review! Per our discussion in #78, this initial implementation is:
I think parquet support is very nice for any modern data library. As a user, I try to stay in parquet as much as possible because it's 1) typed, 2) faster/more efficient, and 3) more compact. I think it's much more important than Rust Polars support, and is a more effective bridge to other tools, such as DuckDB. It can also serve as a future jumping-off-point to support in-memory Arrow data. Please review the code and share any comments. If you're happy with this implementation and desire this feature, it'd be great to merge this then discuss how to add parquet support for the rest of the CLI. |
|
This is great! I'll go through this in the coming week. Cheers mate, |
|
I'll fix up the clippy noise after the merge. nothing major there. |
|
Great, I'm glad this works! I'm curious to see how you decide to extend it to the rest of the library. |
|
Hey @Psy-Fer , I imagine you're busy w the rest of your life right now -- anything I can do to help out with Kuva more? Would you appreciate a design/suggestion for how to implement parquet support across all plots? Feel free to drop any loose threads of thinking in here if so. |
|
Hey, Yea getting back to kuva this week. Things have been a bit crazy lately. This is high on my list to do. Cheers, |
- Drop dep:arrow (removes ~60 transitive crates) - Move parquet reading into DataTable::parse() so all subcommands benefit automatically; delete parquet.rs and InputType enum - Simplify scatter.rs back to single DataTable code path - Fix docs typo in scatter.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Okay, I have some strong opinions around including arrow. I have come up with a way to just bring in parquet, and use it's row API to process the data. Even on 2 rows with 10M datapoints, it's still crazy fast. If this is an issue in the future, we can bring in some of arrow for that specifically, but i'd rather it be a complaint driven request when we can get around most of it with some simple type handling. and moving the parsing to data.rs, so it globally applies to all plots. Your scatter scoped example really did help in thinking about how to handle this properly. a few other clean ups, conflicts, clippy, etc. Anyway, it might be worth you testing this PR on your parquet files before I merge it, in case I made a dumb mistake along the way. Cheers, |
|
I love raw numbers. thanks for doing this benchmark. I'll have a think about it and have another crack at it tomorrow if I get some time, otherwise, next week :) Cheers, |

Description
What?
Adds support for generating scatter plots from CLI from parquet source data, at feature parity with CSV/TSV/{D}SV. Gated behind
--feature parquet.Why?
Parquet is the modern standard for data storage, especially large data. It is typed, stored in columns, and supports predicate pushdown + column selection, which is much more efficient than string-stored/string-delimited file formats.
CLI users may expect, or at least appreciate, the ability to generate charts directly from parquet, without having to downcast to a weaker format like CSV first.
Type of change
New feature / API addition (CLI only)
Checklist
Library (new plot type)
N/A
Tests
tests/with ≥ basic render + SVG content + legend tests (added totests/cli_basic.rs)cargo test --features cli,full— all existing tests still passCLI (if applicable)
src/bin/kuva/<name>.rs— Args struct (with/// doc comment) +run()= N/A no new commandsrc/bin/kuva/main.rs— module, Commands variant, match arm = N/A no new commandscripts/smoke_tests.sh— at least one invocationtests/cli_basic.rs— SVG output test + content verification testdocs/src/cli/index.md— subcommand entry = N/A no new commandman/kuva.1— regenerated (./target/debug/kuva man > man/kuva.1) = N/A no new commandDocumentation
examples/<name>.rs— Rust example for doc asset generation = N/A no new APIscripts/gen_docs.sh— invocations added;bash scripts/gen_docs.shruns cleandocs/src/plots/<name>.md— documentation page with embedded SVGsdocs/src/SUMMARY.md— link added = N/A. Given the 'it just works' philosophy in Add Polars support #78 discussion, I chose not to add any indication that parquet works for scatter.docs/src/gallery.md— gallery card added = N/A no new plotREADME.md— plot types table updated = N/A, no new plotVisual inspection
test_outputs/— new plot SVGs look correcttest_outputs/for layout regressionsbash scripts/smoke_tests.sh— all existing smoke test outputs still look correctHousekeeping
CHANGELOG.md— entry added under## [Unreleased]README.md— item marked done in TODO section if applicable = N/AAdding a new feature (non-plot-type)
src/file(s).docs/src/page(s) if the feature is user-visible.CHANGELOG.md— add an entry under## [Unreleased].