Skip to content

Conversation

@Lesvek
Copy link

@Lesvek Lesvek commented Dec 12, 2025

This PR introduces core support and a new plugin (git_lfs_importer) to perform Git LFS conversion concurrently with the history export.

This approach significantly speeds up migration for large Mercurial repositories (100GiB+) by eliminating the need for the second, time-consuming full history rewrite step (git lfs import --everything).

Key Changes:

  • New Plugin (git_lfs_importer): Identifies matching files and immediately writes LFS pointers during the initial export, enabling efficient incremental conversion.
    • Requires Python pathspec dependency
    • Seeplugins/git_lfs_importer/README.md for full setup and usage details.
  • Core Feature: Introduces the --first-commit-hash option to allow exporting history onto a pre-existing Git commit. This is crucial for configuring LFS (.gitattributes) before the import begins.

Usage Example:

# Export onto an existing commit (e.g., one with .gitattributes)
hg-fast-export.sh -r /path/to/hgrepo -M master \
    --plugin "git_lfs_importer=/path/to/lfs-patterns.txt" \
    --first-commit-hash "$FIRST_GIT_HASH"


### Dependencies

This plugin requires the `pathspec` package:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add it to the requirements-*.txt files in the .github folder?

If so, what version should I put?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're the author and fast-export didn't use pathspec before, so pick a version/versions that are compatible with the earliest/latest versions we support for fast-export.

I have approved the workflow, so we'll soon see if/what breaks if its missing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put pathspec==11.2 in requirements earliest as this is the last version that supports Python 3.7.

Copy link
Owner

@frej frej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work with testcases and well-written documentation. I just have some formatting nits as review comments.

Would you consider adding a pointer to the new plugin to the "Mercurial Largefiles Extension" section of the top level README?

git config core.ignoreCase false &&
git lfs install --local &&
git switch --create master &&

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty trailing whitespace on this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed.

cat > .gitattributes <<-EOF &&
* -text
EOF

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty trailing whitespace on this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed.

git config core.ignoreCase false &&
git lfs install --local &&
git switch --create master &&

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty trailing whitespace on this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed.

git add readme.txt &&
git commit -q -m "Initialize Git readme file"
) &&

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty trailing whitespace on this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed.


FIRST_HASH=$(git -C gitrepo rev-parse HEAD) &&

# 3. Run hg-fast-export with git_lfs_importer plugin
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see that you're using the plugin, did you copy this file from the plugin test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I did copy it and didn't fix the comment. Will do.


### Large Files and Largefiles

If the Mercurial repository uses Mercurial's largefiles extension, those files are already converted to their original content before reaching this plugin, allowing the plugin to apply LFS conversion if they match the patterns.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 80 cols, please line break.


### Force Requirement

You only need to pass the `--force` option when converting the *first* Mercurial commit into a non-empty Git repository. By default, `hg-fast-export` prevents importing Mercurial commits onto a non-empty Git repo to avoid creating conflicting histories. Passing `--force` overrides that safety check and allows the exporter to write the LFS pointer objects and integrate the converted data with the existing Git history.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 80 cols, please line break.


You only need to pass the `--force` option when converting the *first* Mercurial commit into a non-empty Git repository. By default, `hg-fast-export` prevents importing Mercurial commits onto a non-empty Git repo to avoid creating conflicting histories. Passing `--force` overrides that safety check and allows the exporter to write the LFS pointer objects and integrate the converted data with the existing Git history.

If you are doing an incremental conversion (i.e., running the script a second time to import new changesets into an already converted repository), the --force flag is not required.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 80 cols, please line break.


If you are doing an incremental conversion (i.e., running the script a second time to import new changesets into an already converted repository), the --force flag is not required.

Omitting `--force` when attempting to import the first Mercurial commit into a non-empty repository will cause the importer to refuse the operation.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 80 cols, please line break.


### Dependencies

This plugin requires the `pathspec` package:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're the author and fast-export didn't use pathspec before, so pick a version/versions that are compatible with the earliest/latest versions we support for fast-export.

I have approved the workflow, so we'll soon see if/what breaks if its missing.

The current conversion process mandates an empty repository for a clean start.
This presents a barrier to performance optimization strategies.

This change introduces the ability to pass a repository root commit hash.

This is necessary to support the immediate next commit (Incremental LFS conversion),
which uses a `.gitattributes` file and LFS pointers to bypass the slow, full-history
rewriting often required on large non-empty monorepos (100GiB+, 1M+ files).

The immediate benefit is allowing conversion to start when a non-empty repo
already contains an orphan commit, laying the groundwork for the optimized LFS
conversion feature.
@Lesvek Lesvek force-pushed the task/non-empty-repo branch from 065d026 to 237e25c Compare December 22, 2025 19:00
Converts large Mercurial repositories to Git/LFS significantly faster by integrating
the LFS conversion into the history export process.

Currently, converting large repositories requires two sequential, long-running steps:
1. Full history conversion (`hg` to `git`).
2. Full history rewrite/import (`git lfs import`).

For huge monorepos (100GiB+, 1M+ files), this sequence can take hours or days.

This commit introduces a new plugin that allows the repository to be converted *incrementally*
(JIT: Just-In-Time). The plugin identifies large files during the initial `hg` to `git`
conversion and immediately writes LFS pointers, eliminating the need for the second,
time-consuming history rewrite step.
The data example was previously a single line, making it difficult to read.
Replaced the one-liner with a multi-line format to improve clarity.
@Lesvek Lesvek force-pushed the task/non-empty-repo branch from 237e25c to 255d49c Compare December 22, 2025 19:07
@Lesvek
Copy link
Author

Lesvek commented Dec 22, 2025

Solid work with testcases and well-written documentation. I just have some formatting nits as review comments.

Would you consider adding a pointer to the new plugin to the "Mercurial Largefiles Extension" section of the top level README?

I updated the largefile documentation in the top level README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants