Skip to content

feat: 20 - inspect command#20

Merged
joefrost01 merged 1 commit into
mainfrom
feat/20-inspect-command
Apr 11, 2026
Merged

feat: 20 - inspect command#20
joefrost01 merged 1 commit into
mainfrom
feat/20-inspect-command

Conversation

@joefrost01
Copy link
Copy Markdown
Contributor

What problem are you trying to solve?

dtoo inspect existed as a CLI stub but did not perform any file inspection. Users had no built-in way to quickly view schema, row count, and preview rows for a single file (local or cloud).

What does this PR change?

This implements dtoo inspect in a new src/inspect.rs module, using DuckDB readers to fetch row count/schema/preview and rendering previews with comfy-table. It also supports Excel colon-sheet syntax, cloud prefixes (including abfss://), and cloud credential flags on the inspect command.

Does this change align with DESIGN.md?

Yes. It uses the existing DuckDB engine abstraction and keeps command behavior isolated to the inspect subcommand. No query pipeline behavior is altered.

What alternatives did you consider?

I considered emitting a plain tab-delimited preview without a table dependency, but that produced poor alignment and readability. comfy-table gives robust terminal rendering with minimal code.

Does this PR contain multiple unrelated changes?

No. All changes are scoped to feature 20 inspect-command implementation.

Existing PRs

  • I have reviewed all open AND closed PRs for duplicates or prior art
  • Related PRs: none found

Testing

  • cargo test passes
  • cargo clippy passes with no warnings
  • cargo fmt has been run
  • New tests added:
    • inspect::tests::split_excel_sheet_parses_colon_syntax
    • inspect::tests::detect_format_rejects_unknown_extension
    • inspect::tests::cloud_path_supports_abfss

Evaluation

  • What was the specific scenario you tested?
    • Inspecting CSV/Parquet/NDJSON/Excel path handling code paths.
    • Output table rendering and schema/count query flow.
    • Cloud path detection behavior and CLI cloud-option parsing.
  • What was the output before and after the change?
    • Before: inspect command returned no inspection output.
    • After: inspect prints file metadata, schema listing, and preview table.
  • Did you test error cases (bad input, missing files, invalid SQL)?
    • Yes. Unsupported format detection is now explicitly tested; existing suite still covers engine and SQL error propagation paths.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

Copy link
Copy Markdown
Contributor Author

@joefrost01 joefrost01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review complete: LGTM

Reviewed against specs/20-inspect-command.md. All spec requirements are met:

  • CLI args: --rows (default 10), --delimiter (default ,), cloud credential flags (--s3-region, --s3-profile, --gcs-project, --azure-account) — all present and correctly wired.
  • Output format: File, Format, Rows (comma-formatted), Columns count, Schema table, Preview table with comfy-table — matches spec exactly.
  • Excel colon syntax: sales.xlsx:Sheet2 parsing implemented and unit-tested.
  • Cloud paths: s3://, gs://, az://, abfss:// detected; httpfs extension loaded conditionally.
  • Format detection: CSV, TSV, Parquet, NDJSON, Excel — unsupported extensions return error (tested).
  • SQL safety: Literals escaped via escape_sql_literal.
  • Dependency: Only comfy-table = "7" added, as specified.
  • Tests: 94 pass, clippy clean, no warnings.

No issues found.

Copy link
Copy Markdown
Contributor Author

@joefrost01 joefrost01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: inspect command

Overall the structure is clean — good separation of format detection, SQL building, rendering, and Excel sheet parsing. The existing DuckDbEngine and CloudSettings are reused correctly, and escape_sql_literal is applied consistently. A few issues need fixing before merge.


Must Fix

1. Nullability output doesn't match spec format
src/inspect.rs:42 — The spec shows schema nullability as NOT NULL / NULLABLE, but the code prints DuckDB's raw DESCRIBE output which returns YES / NO. Transform the value:

let nullable = match row.values.get(2).map(String::as_str) {
    Some("YES") => "NULLABLE",
    Some("NO") => "NOT NULL",
    _ => "?",
};

2. Local Excel files will fail — spatial extension not loaded
src/inspect.rs:22load_extensions is only true for cloud paths, but Excel uses st_read() which requires the spatial extension. Local dtoo inspect sales.xlsx will error. Fix:

load_extensions: is_cloud_path(&path) || format == InspectFormat::Excel,

3. Missing doc comment on public run function
src/inspect.rs:9 — CLAUDE.md requires doc comments on all public API functions.


Should Fix

4. Schema column widths are hardcoded at 14 chars
src/inspect.rs:43 — Types like TIMESTAMP WITH TIME ZONE (24 chars) will misalign the columns. Compute max widths dynamically from the actual schema data instead of {:<14}.

5. Test coverage is thin
Only 3 unit tests and no integration tests. CLAUDE.md requires both. Missing coverage:

  • detect_format for each supported extension (parquet, csv, tsv, ndjson, jsonl, xlsx, xls)
  • split_excel_sheet edge cases: non-Excel path with colon (s3://bucket/file.parquet), plain xlsx without sheet
  • build_source_sql for each format variant
  • format_count edge cases (0, 1, 1000, 1000000)
  • At least one integration test running inspect end-to-end against a temp CSV file

6. --help text has no descriptions
src/cli.rs:157-177 — CLAUDE.md says help text should be concise and include examples. The InspectArgs fields have no #[arg(help = "...")] annotations, so --help shows blank descriptions.


Suggestions (non-blocking)

7. Duplicate escape_sql_literal — Same function exists in engine.rs:459. Consider making the engine's version pub(crate) and reusing it.

8. Three file reads for remote files — COUNT, DESCRIBE, and SELECT each re-read the source. For large cloud files this triples latency. Creating a temp view once and querying that would be more efficient and consistent with CLAUDE.md's DuckDB patterns.


Verdict: Not merge-ready yet. Items 1-3 are blockers (spec mismatch, functional bug, standards violation). Items 4-6 should also be addressed before merge. Once fixed, this is solid work — LGTM after a follow-up round.

@joefrost01 joefrost01 merged commit 264e24b into main Apr 11, 2026
14 of 18 checks passed
@joefrost01 joefrost01 deleted the feat/20-inspect-command branch April 11, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant