Skip to content

gitwalk clones local repositories #3

@aspiers

Description

@aspiers

The README says:

You can match repositories on your file system using glob. Note that gitwalk will clone the repository
even if it's stored locally on your computer. One can never be too careful.

Sorry, but I strongly disagree ;-) Firstly, I'd suggest that this is too careful:

  • For gitwalk processors which only read repos and don't write to them, there is zero risk.
  • The "original" repos (i.e. the ones which already existed on the local file system) are still git repos, so even if the changes by gitwalk were wrong, you can simply use normal git commands to revert them. That safety net is the whole point of git :-) And anyway, if the user is deliberately invoking changes on the repo then they are surely already aware of the risks. They could already do the same thing manually, one repo at a time.

If there were no downsides to being careful, I'd concede that OK, no harm done. But unfortunately this approach has several big disadvantages to the user experience:

  • For gitwalk processors which write to repos, this makes those processors highly inconvenient: the writes happen on the gitwalk-owned clones, so if you wanted the changes in your original repos, you'd then have to manually propagate the changes across. It would actually be less work to construct a shell for loop to work directly on the original repos, which defeats the point of gitwalk.
  • It's a needless waste of disk space, especially since gitwalk builds one clone per branch of each original repo.
  • The first time gitwalk runs on these local repos, it will be annoyingly slow due to this cloning.
  • For subsequent times gitwalk has to fetch updates from the original repos, which is also a loss in performance, and has another unfortunate side-effect:
  • If you run a processor which writes to the repos (e.g. via sed -i) and then another one which reads (e.g. grep) then the changes you just made vanish. This violates the Principle of Least Astonishment.

If I've misunderstood, please correct me. But otherwise please consider changing the design of this. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions