-
Notifications
You must be signed in to change notification settings - Fork 266
Feature: Enable Just-In-Time (JIT) Git LFS Conversion #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
| ### Dependencies | ||
|
|
||
| This plugin requires the `pathspec` package: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add it to the requirements-*.txt files in the .github folder?
If so, what version should I put?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're the author and fast-export didn't use pathspec before, so pick a version/versions that are compatible with the earliest/latest versions we support for fast-export.
I have approved the workflow, so we'll soon see if/what breaks if its missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put pathspec==11.2 in requirements earliest as this is the last version that supports Python 3.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work with testcases and well-written documentation. I just have some formatting nits as review comments.
Would you consider adding a pointer to the new plugin to the "Mercurial Largefiles Extension" section of the top level README?
t/first_commit_hash_option.t
Outdated
| git config core.ignoreCase false && | ||
| git lfs install --local && | ||
| git switch --create master && | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty trailing whitespace on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed.
t/first_commit_hash_option.t
Outdated
| cat > .gitattributes <<-EOF && | ||
| * -text | ||
| EOF | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty trailing whitespace on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed.
t/first_commit_hash_option.t
Outdated
| git config core.ignoreCase false && | ||
| git lfs install --local && | ||
| git switch --create master && | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty trailing whitespace on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed.
| git add readme.txt && | ||
| git commit -q -m "Initialize Git readme file" | ||
| ) && | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty trailing whitespace on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed.
t/first_commit_hash_option.t
Outdated
|
|
||
| FIRST_HASH=$(git -C gitrepo rev-parse HEAD) && | ||
|
|
||
| # 3. Run hg-fast-export with git_lfs_importer plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see that you're using the plugin, did you copy this file from the plugin test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I did copy it and didn't fix the comment. Will do.
plugins/git_lfs_importer/README.md
Outdated
|
|
||
| ### Large Files and Largefiles | ||
|
|
||
| If the Mercurial repository uses Mercurial's largefiles extension, those files are already converted to their original content before reaching this plugin, allowing the plugin to apply LFS conversion if they match the patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 80 cols, please line break.
plugins/git_lfs_importer/README.md
Outdated
|
|
||
| ### Force Requirement | ||
|
|
||
| You only need to pass the `--force` option when converting the *first* Mercurial commit into a non-empty Git repository. By default, `hg-fast-export` prevents importing Mercurial commits onto a non-empty Git repo to avoid creating conflicting histories. Passing `--force` overrides that safety check and allows the exporter to write the LFS pointer objects and integrate the converted data with the existing Git history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 80 cols, please line break.
plugins/git_lfs_importer/README.md
Outdated
|
|
||
| You only need to pass the `--force` option when converting the *first* Mercurial commit into a non-empty Git repository. By default, `hg-fast-export` prevents importing Mercurial commits onto a non-empty Git repo to avoid creating conflicting histories. Passing `--force` overrides that safety check and allows the exporter to write the LFS pointer objects and integrate the converted data with the existing Git history. | ||
|
|
||
| If you are doing an incremental conversion (i.e., running the script a second time to import new changesets into an already converted repository), the --force flag is not required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 80 cols, please line break.
plugins/git_lfs_importer/README.md
Outdated
|
|
||
| If you are doing an incremental conversion (i.e., running the script a second time to import new changesets into an already converted repository), the --force flag is not required. | ||
|
|
||
| Omitting `--force` when attempting to import the first Mercurial commit into a non-empty repository will cause the importer to refuse the operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 80 cols, please line break.
|
|
||
| ### Dependencies | ||
|
|
||
| This plugin requires the `pathspec` package: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're the author and fast-export didn't use pathspec before, so pick a version/versions that are compatible with the earliest/latest versions we support for fast-export.
I have approved the workflow, so we'll soon see if/what breaks if its missing.
The current conversion process mandates an empty repository for a clean start. This presents a barrier to performance optimization strategies. This change introduces the ability to pass a repository root commit hash. This is necessary to support the immediate next commit (Incremental LFS conversion), which uses a `.gitattributes` file and LFS pointers to bypass the slow, full-history rewriting often required on large non-empty monorepos (100GiB+, 1M+ files). The immediate benefit is allowing conversion to start when a non-empty repo already contains an orphan commit, laying the groundwork for the optimized LFS conversion feature.
065d026 to
237e25c
Compare
Converts large Mercurial repositories to Git/LFS significantly faster by integrating the LFS conversion into the history export process. Currently, converting large repositories requires two sequential, long-running steps: 1. Full history conversion (`hg` to `git`). 2. Full history rewrite/import (`git lfs import`). For huge monorepos (100GiB+, 1M+ files), this sequence can take hours or days. This commit introduces a new plugin that allows the repository to be converted *incrementally* (JIT: Just-In-Time). The plugin identifies large files during the initial `hg` to `git` conversion and immediately writes LFS pointers, eliminating the need for the second, time-consuming history rewrite step.
The data example was previously a single line, making it difficult to read. Replaced the one-liner with a multi-line format to improve clarity.
237e25c to
255d49c
Compare
I updated the largefile documentation in the top level README. |
This PR introduces core support and a new plugin (
git_lfs_importer) to perform Git LFS conversion concurrently with the history export.This approach significantly speeds up migration for large Mercurial repositories (100GiB+) by eliminating the need for the second, time-consuming full history rewrite step (
git lfs import --everything).Key Changes:
git_lfs_importer): Identifies matching files and immediately writes LFS pointers during the initial export, enabling efficient incremental conversion.pathspecdependencyplugins/git_lfs_importer/README.mdfor full setup and usage details.--first-commit-hashoption to allow exporting history onto a pre-existing Git commit. This is crucial for configuring LFS (.gitattributes) before the import begins.Usage Example: