feat: add custom walker support#698
Conversation
Add documentation for selecting a custom walker with the global walk option and defining its command and options under [walker.<name>]. Document that custom walker commands run from the tree root, emit one relative path per line, and do not need to handle path arguments.
jfly
left a comment
There was a problem hiding this comment.
Thanks! This looks like a solid approach.
Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?
Accept documented camel-case walker names even when Viper normalizes table keys
What is camel-case? That's kebab-case. This is camelCase (or CamelCase).
Fix custom walker pipe handling
I did not have time to grok this commit during my review. I'll need some more handholding before I can approve it. Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").
| When you pass directory paths to `treefmt`, the walker command still runs for the tree root. | ||
| `treefmt` filters the command output to the requested directories. | ||
| The walker command doesn't need to implement path argument handling. |
There was a problem hiding this comment.
While I appreciate the simplicity of this decision, I'm not sure it's the right decision. I imagine people are most inclined to use the "format a directory" feature in very large repos, where the time spent for some VCS to determine which files are in the entire repo might be a lot slower than discovering just the files in a specific directory. #694 is an example of someone reporting just how slow it is for git to report all files in a repo.
How crazy would it be for the walk command to have to implement path argument handling? IIUC, it would be straightforward for git.
There was a problem hiding this comment.
I admit that I didn't pay too much attention to optimizing for subtrees. It's no concern for my own use cases.
For a regular git walker this might not be a problem, but as soon as you want a more custom file list - like in the first example from the PR description - implementing the filtering yourself becomes error-prone rather quickly.
Maybe we'll add a toggle so the user can switch between both methods?
| # You can also set this to the name of a configured custom walker | ||
| # Env $TREEFMT_WALK | ||
| # walk = "filesystem" | ||
| # walk = "myWalker" |
There was a problem hiding this comment.
I wonder if we should instead do walker.myWalker (or perhaps custom-walker.myWalker) here. Then we wouldn't have to worry about (or detect) conflicts with the builtin walkers.
| reader, err = NewReader(Git, root, path, db, statz) | ||
| if err != nil { | ||
| reader, err = NewReader(Jujutsu, root, path, db, statz) | ||
| if selector.Custom != nil { |
There was a problem hiding this comment.
This diff would be a lot simpler, and the if selector.Custom == nil && selector.Type == Stdin change below wouldn't have had to happen if we introduced a new selector.Type == Custom. Did you consider that?
|
Thank you for the review. I hope to be able to address the individual comments in the next few days.
I didn't really consider that because I wanted to keep the new path somewhat isolated.
I'll have to look into that. |
This sounds reasonable to me |
Parse [walker.<name>] tables into the configuration model and validate walker names, commands, and walk values. Accept documented camelCase walker names even when Viper normalizes table keys.
Introduce a command-backed walker reader that runs from the tree root, reads one path per stdout line, filters configured subpaths inside treefmt, and converts accepted paths into walk.File values. Route custom walk names through the format command without changing built-in walkers.
Cover selecting a custom walker from treefmt.toml, passing configured walker options, and filtering walker output when treefmt is invoked with a directory path argument.
Use bash -c for custom walker test commands instead of temporary scripts with /usr/bin/env shebangs, so the tests also run inside the Nix build sandbox.
Make the custom walker own its stdout and stderr pipes instead of combining StdoutPipe with a concurrent Wait call. This avoids surfacing a spurious file-closed read error at process exit and preserves command failures for Close().
Refactor custom walker configuration lookup to satisfy nesting limits, remove an unused gosec suppression, and adjust test spacing to satisfy golangci-lint in the Nix check derivation.
Remove the case-insensitive custom walker lookup and rely on exact configured names. Move the default walk value into NewViper so FromViper no longer silently rewrites an empty walk setting. Use lowercase custom walker names in examples and tests, and use --clear-cache in the custom walker CLI test.
Represent selector variants separately from the built-in walker enum. Keep selector fields private and expose IsBuiltin for callers that need to test for stdin.
Build the uncached reader in a helper and apply the cache wrapper once in NewReader. Keep auto fallback inside uncached construction so trying built-in walkers does not recurse through cache setup.
Use PathStreamConfig.PathFilters directly when post-filtering emitted paths. Remove the redundant filters field from PathStreamReader.
Store filesystem path filters as a slice and walk them sequentially in the filesystem reader. Remove the generic CompositeReader that was only used to fan out multiple filesystem paths. Add coverage for multi-path filesystem walking.
|
So I did somewhat of a refactor of the walker component now. Now there is a generic |
Record whether the walker command returned after its context was canceled. Treat those wait errors as expected during Close so platforms that report cancellation as signal termination do not fail the unblock test.
As the title says.
Example: Format a git repo + a specific submodule:
Example: Format a pijul repo: