Skip to content

feat: add custom walker support#698

Open
tobim wants to merge 18 commits into
numtide:mainfrom
tobim:push-spzzmzrpkzzq
Open

feat: add custom walker support#698
tobim wants to merge 18 commits into
numtide:mainfrom
tobim:push-spzzmzrpkzzq

Conversation

@tobim
Copy link
Copy Markdown

@tobim tobim commented May 15, 2026

As the title says.

Example: Format a git repo + a specific submodule:

walk = "repoPlusSubmodule"

[walker.repoPlusSubmodule]
command = "bash"
options = [
  "-c",
  '''
    submodule="path/to/submodule"
    {
      git ls-files --cached --others --exclude-standard --full-name
      git -C "$submodule" ls-files --cached --others --exclude-standard --full-name \
        | sed "s#^#$submodule/#"
    }
  ''',
]

Example: Format a pijul repo:

walk = "pijul"

[walker.pijul]
command = "pijul"
options = ["ls"]

Add documentation for selecting a custom walker with the global
walk option and defining its command and options under [walker.<name>].
Document that custom walker commands run from the tree root, emit one
relative path per line, and do not need to handle path arguments.
Copy link
Copy Markdown
Collaborator

@jfly jfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobim, thanks for the contribution! I am supportive of this change, but haven't had a chance to read the code yet.

I wanted to warn you that #694 touches some of the same code you've touched here. It may land before this PR, apologies if the conflicts are annoying to deal with.

Copy link
Copy Markdown
Collaborator

@jfly jfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks like a solid approach.

Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?

Accept documented camel-case walker names even when Viper normalizes table keys

What is camel-case? That's kebab-case. This is camelCase (or CamelCase).

Fix custom walker pipe handling

I did not have time to grok this commit during my review. I'll need some more handholding before I can approve it. Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").

Comment thread docs/site/getting-started/configure.md Outdated
Comment on lines +466 to +468
When you pass directory paths to `treefmt`, the walker command still runs for the tree root.
`treefmt` filters the command output to the requested directories.
The walker command doesn't need to implement path argument handling.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate the simplicity of this decision, I'm not sure it's the right decision. I imagine people are most inclined to use the "format a directory" feature in very large repos, where the time spent for some VCS to determine which files are in the entire repo might be a lot slower than discovering just the files in a specific directory. #694 is an example of someone reporting just how slow it is for git to report all files in a repo.

How crazy would it be for the walk command to have to implement path argument handling? IIUC, it would be straightforward for git.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that I didn't pay too much attention to optimizing for subtrees. It's no concern for my own use cases.

For a regular git walker this might not be a problem, but as soon as you want a more custom file list - like in the first example from the PR description - implementing the filtering yourself becomes error-prone rather quickly.

Maybe we'll add a toggle so the user can switch between both methods?

Comment thread cmd/init/init.toml Outdated
# You can also set this to the name of a configured custom walker
# Env $TREEFMT_WALK
# walk = "filesystem"
# walk = "myWalker"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should instead do walker.myWalker (or perhaps custom-walker.myWalker) here. Then we wouldn't have to worry about (or detect) conflicts with the builtin walkers.

Comment thread config/config.go Outdated
Comment thread config/config.go Outdated
Comment thread walk/custom.go Outdated
Comment thread walk/walk.go Outdated
reader, err = NewReader(Git, root, path, db, statz)
if err != nil {
reader, err = NewReader(Jujutsu, root, path, db, statz)
if selector.Custom != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff would be a lot simpler, and the if selector.Custom == nil && selector.Type == Stdin change below wouldn't have had to happen if we introduced a new selector.Type == Custom. Did you consider that?

Comment thread cmd/root_test.go Outdated
@tobim
Copy link
Copy Markdown
Author

tobim commented May 27, 2026

Thank you for the review. I hope to be able to address the individual comments in the next few days.

Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?

I didn't really consider that because I wanted to keep the new path somewhat isolated.
We could also replace git and jj with built in custom walkers now that the machinery is there.

Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").

I'll have to look into that.

@jfly
Copy link
Copy Markdown
Collaborator

jfly commented May 27, 2026

I didn't really consider that because I wanted to keep the new path somewhat isolated.
We could also replace git and jj with built in custom walkers now that the machinery is there.

This sounds reasonable to me

tobim added 16 commits May 31, 2026 19:30
Parse [walker.<name>] tables into the configuration model and validate
walker names, commands, and walk values. Accept documented camelCase
walker names even when Viper normalizes table keys.
Introduce a command-backed walker reader that runs from the tree root,
reads one path per stdout line, filters configured subpaths inside
treefmt, and converts accepted paths into walk.File values. Route custom
walk names through the format command without changing built-in walkers.
Cover selecting a custom walker from treefmt.toml, passing configured
walker options, and filtering walker output when treefmt is invoked with
a directory path argument.
Use bash -c for custom walker test commands instead of temporary scripts
with /usr/bin/env shebangs, so the tests also run inside the Nix build
sandbox.
Make the custom walker own its stdout and stderr pipes instead of
combining StdoutPipe with a concurrent Wait call. This avoids surfacing
a spurious file-closed read error at process exit and preserves command
failures for Close().
Refactor custom walker configuration lookup to satisfy nesting limits,
remove an unused gosec suppression, and adjust test spacing to satisfy
golangci-lint in the Nix check derivation.
Remove the case-insensitive custom walker lookup and rely on exact
configured names.

Move the default walk value into NewViper so FromViper no longer
silently rewrites an empty walk setting.

Use lowercase custom walker names in examples and tests, and use
--clear-cache in the custom walker CLI test.
Represent selector variants separately from the built-in walker enum.

Keep selector fields private and expose IsBuiltin for callers that need
to test for stdin.
Build the uncached reader in a helper and apply the cache wrapper once
in NewReader.

Keep auto fallback inside uncached construction so trying built-in
walkers does not recurse through cache setup.
Use PathStreamConfig.PathFilters directly when post-filtering emitted
paths.

Remove the redundant filters field from PathStreamReader.
Store filesystem path filters as a slice and walk them sequentially in
the filesystem reader.

Remove the generic CompositeReader that was only used to fan out
multiple filesystem paths.

Add coverage for multi-path filesystem walking.
@tobim tobim force-pushed the push-spzzmzrpkzzq branch from 7e5f3af to 69c5a22 Compare June 1, 2026 07:53
@tobim
Copy link
Copy Markdown
Author

tobim commented Jun 1, 2026

So I did somewhat of a refactor of the walker component now. Now there is a generic PathStreamReader abstraction that consumes a stream of paths that are separated by either a newline or a NULL character. jj and git are now constructed in terms of that PathStreamReader.
After that are a couple of clean-up commits that simplify the state representation and related control flow.

Record whether the walker command returned after its context was
canceled.

Treat those wait errors as expected during Close so platforms that
report cancellation as signal termination do not fail the unblock test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants