Skip to content

Conversation

@f
Copy link
Owner

@f f commented Nov 27, 2025

feat: add commit splitting feature for cleaner git history

Summary

This PR adds a new --split option that allows users to split large commits into smaller, more focused commits for cleaner git history and easier rollbacks.

The feature uses AI to analyze commit diffs and suggest logical splits based on:

  • Different files serving different purposes (source vs tests vs config)
  • Different types of changes (feature + refactor + docs)
  • Unrelated functionality grouped together

Key additions:

  • New --split flag to enable commit splitting mode
  • New --max-splits option to limit splits per commit (default: 5)
  • AI-powered analysis to determine if/how commits should be split
  • Automatic backup branch creation before splitting
  • Dry-run support to preview splits without applying
  • Documentation in README with usage examples

Review & Testing Checklist for Human

  • Test the performCommitSplit method with real commits - This method performs destructive git operations (git reset --soft, git checkout, git cherry-pick). Verify the error recovery works if something fails mid-split.

  • Test non-HEAD commit splitting - The cherry-pick logic for commits after the split target could fail with merge conflicts. Verify the warning message is appropriate and the repo isn't left in a broken state.

  • Verify AI JSON parsing - The code expects specific JSON structure from the AI. Test with both OpenAI and Ollama to ensure responses parse correctly. Edge cases: malformed JSON, missing fields, markdown-wrapped responses.

  • Test with special characters in filenames - The git add "${file}" command may have issues with unusual filenames.

Recommended test plan:

  1. Create a test repo with a few multi-file commits
  2. Run npx git-rewrite-commits --split --dry-run to preview
  3. Run npx git-rewrite-commits --split on a feature branch
  4. Verify the backup branch was created
  5. Check git log --oneline to confirm splits look correct
  6. Test recovery by resetting to backup branch

Notes

  • This feature rewrites git history, so all the existing safety warnings apply
  • The implementation processes commits from newest to oldest to minimize rebase complications
  • Sensitive data redaction is applied before sending diffs to AI providers

Link to Devin run: https://app.devin.ai/sessions/cdf71c5996d741918aff6b5f70f2bf62
Requested by: Fatih Kadir Akın (fatihkadirakin@gmail.com) / @f

- Add --split option to analyze and split commits into smaller, focused commits
- Add --max-splits option to limit maximum splits per commit (default: 5)
- Use AI to analyze commit diffs and suggest logical splits based on:
  - Different files serving different purposes
  - Different types of changes (feat, fix, refactor, etc.)
  - Unrelated functionality grouped together
- Generate appropriate commit messages for each split
- Support dry-run mode to preview splits without applying
- Create backup branch before splitting for safety
- Update README with documentation and examples

Co-Authored-By: Fatih Kadir Akın <fatihkadirakin@gmail.com>
Copilot AI review requested due to automatic review settings November 27, 2025 18:48
@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a commit splitting feature that uses AI to analyze commits and split them into smaller, more focused commits for cleaner git history. The feature integrates with existing OpenAI/Ollama providers and follows the tool's pattern of AI-powered git operations with safety measures like dry-run mode and backup branches.

Key changes:

  • Added --split mode to analyze and split commits using AI-based analysis
  • Introduced two new interfaces (SplitSuggestion and CommitSplitAnalysis) to structure split proposals
  • Implemented commit splitting logic that handles both HEAD and non-HEAD commits with history rewriting

Critical Issues Found:
The implementation has several critical bugs that will prevent it from working correctly:

  1. Stale commit hashes: After splitting the first commit, all subsequent commit hashes become invalid, causing splits to fail
  2. Invalid git rev-list reference: Using a commit hash that was reset away causes git commands to fail
  3. Shell operator issues: Redirection operators won't work with execSync
  4. Insufficient input validation: AI responses are not validated, and shell commands don't properly escape special characters

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File Description
src/index.ts Adds core splitting functionality including commit analysis, diff extraction, and history rewriting logic; introduces new interfaces for split suggestions and analysis results
src/cli.ts Adds CLI options --split and --max-splits to enable the commit splitting feature from command line
README.md Documents the new commit splitting feature with usage examples and explanation of when commits should be split

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// Stage and commit remaining changes
this.execCommand('git add -A');
const remainingMessage = `chore: remaining changes from split of "${commitInfo.message}"`;
this.execCommand(`git commit -m "${remainingMessage}"`);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same shell escaping issue as line 784-785. The remainingMessage variable includes commitInfo.message which could contain shell metacharacters. This needs proper escaping or use of git commit -F <file>.

Suggested change
this.execCommand(`git commit -m "${remainingMessage}"`);
// Write commit message to a temp file and use git commit -F <file>
const os = await import('os');
const tmpDir = os.tmpdir();
const tmpFile = path.join(tmpDir, `split_commit_msg_${Date.now()}_${Math.random().toString(36).slice(2)}.txt`);
fs.writeFileSync(tmpFile, remainingMessage, { encoding: 'utf8' });
try {
this.execCommand(`git commit -F "${tmpFile}"`);
} finally {
fs.unlinkSync(tmpFile);
}

Copilot uses AI. Check for mistakes.
// Stage only the files for this split
for (const file of suggestion.files) {
try {
this.execCommand(`git add "${file}"`);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File paths with spaces, quotes, or other special characters may not be properly handled. While the double quotes provide some protection, consider using proper shell escaping or a more robust method. Also, this command uses the filename from AI output which could potentially be manipulated.

Copilot uses AI. Check for mistakes.
Comment on lines +807 to +810
// Get the commits that were after the split commit
const commitsAfter = this.execCommand(`git rev-list ${commitInfo.hash}..${currentBranch}`).trim().split('\n').filter(Boolean);

if (commitsAfter.length > 0) {
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git rev-list ${commitInfo.hash}..${currentBranch} command will fail because commitInfo.hash no longer exists after the reset on line 761. This commit was reset away, so git can't find it in the history. You need to store the parent commit hash before the reset (e.g., git rev-parse ${commitInfo.hash}^) and use that as the starting point for the rev-list.

Suggested change
// Get the commits that were after the split commit
const commitsAfter = this.execCommand(`git rev-list ${commitInfo.hash}..${currentBranch}`).trim().split('\n').filter(Boolean);
if (commitsAfter.length > 0) {
// Store the parent commit hash before the reset, so we can use it for rev-list
const parentHash = this.execCommand(`git rev-parse ${commitInfo.hash}^`).trim();
// Get the commits that were after the split commit
const commitsAfter = this.execCommand(`git rev-list ${parentHash}..${currentBranch}`).trim().split('\n').filter(Boolean);

Copilot uses AI. Check for mistakes.
Comment on lines +718 to +719
for (const file of files) {
if (section.includes(`a/${file}`) || section.includes(`b/${file}`)) {
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file matching logic is fragile. It checks if the section contains a/${file} or b/${file}, but this could lead to false positives. For example, a file named "test.ts" would match a diff section for "my-test.ts" or even content that contains "a/test.ts" in a diff hunk. Use more precise matching by checking the diff header line specifically: diff --git a/${file} b/${file}.

Suggested change
for (const file of files) {
if (section.includes(`a/${file}`) || section.includes(`b/${file}`)) {
// Get the first line of the section (the diff header)
const headerLine = section.split('\n', 1)[0].trim();
for (const file of files) {
const expectedHeader = `diff --git a/${file} b/${file}`;
if (headerLine === expectedHeader) {

Copilot uses AI. Check for mistakes.
Comment on lines +681 to +689
const suggestions: SplitSuggestion[] = (analysis.suggestions || []).map((s: { files: string[]; description: string; commitMessage: string }) => {
const diffContent = this.extractDiffForFiles(commitInfo.diff, s.files);
return {
files: s.files,
description: s.description,
commitMessage: s.commitMessage,
diffContent
};
});
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't enforce the maxSplits limit on the AI response. The AI is told via the prompt to limit splits to maxSplits, but there's no validation to ensure it actually does. If the AI returns more suggestions than maxSplits, they will all be processed. Add validation: suggestions.slice(0, maxSplits) or check and log a warning if the limit is exceeded.

Copilot uses AI. Check for mistakes.
Comment on lines +759 to +761
this.execCommand(`git checkout ${commitInfo.hash}`);
this.execCommand(`git checkout -b ${tempBranch}`);
this.execCommand('git reset --soft HEAD~1');
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking out to a specific commit hash, git enters detached HEAD state. The subsequent git checkout -b ${tempBranch} creates a branch from the commit, but then git reset --soft HEAD~1 unstages the commit. However, for non-HEAD commits, this logic is problematic because commitInfo.hash in the rev-list command on line 808 will no longer be valid after the reset. The rev-list will fail because that commit hash doesn't exist in the new branch history anymore.

Copilot uses AI. Check for mistakes.
Comment on lines +784 to +785
const escapedMessage = suggestion.commitMessage.replace(/"/g, '\\"');
this.execCommand(`git commit -m "${escapedMessage}"`);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping double quotes alone is insufficient for shell injection protection. Commit messages could contain other problematic characters like backticks, $(), newlines, or backslashes that could break the command or enable injection. Consider using a safer approach like writing the message to a temporary file and using git commit -F <file>, or use a library that properly escapes shell arguments.

Copilot uses AI. Check for mistakes.
Comment on lines +634 to +636
const prompt = `You are a git commit analyzer. Analyze the following commit and determine if it should be split into multiple smaller, more focused commits.
Original commit message: "${commitInfo.message}"
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is directly interpolated into the prompt without escaping. If the commit message contains double quotes, it could break out of the string and potentially manipulate the AI prompt. Consider escaping double quotes in commitInfo.message before including it in the prompt.

Suggested change
const prompt = `You are a git commit analyzer. Analyze the following commit and determine if it should be split into multiple smaller, more focused commits.
Original commit message: "${commitInfo.message}"
const escapedMessage = commitInfo.message.replace(/"/g, '\\"');
const prompt = `You are a git commit analyzer. Analyze the following commit and determine if it should be split into multiple smaller, more focused commits.
Original commit message: "${escapedMessage}"

Copilot uses AI. Check for mistakes.
Comment on lines +681 to +689
const suggestions: SplitSuggestion[] = (analysis.suggestions || []).map((s: { files: string[]; description: string; commitMessage: string }) => {
const diffContent = this.extractDiffForFiles(commitInfo.diff, s.files);
return {
files: s.files,
description: s.description,
commitMessage: s.commitMessage,
diffContent
};
});
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AI response is parsed without validation. The code doesn't verify that s.files is actually an array, or that s.description and s.commitMessage are strings. If the AI returns malformed data (e.g., files as a string instead of an array), this will cause runtime errors. Add validation to ensure the response structure matches expectations before using it.

Suggested change
const suggestions: SplitSuggestion[] = (analysis.suggestions || []).map((s: { files: string[]; description: string; commitMessage: string }) => {
const diffContent = this.extractDiffForFiles(commitInfo.diff, s.files);
return {
files: s.files,
description: s.description,
commitMessage: s.commitMessage,
diffContent
};
});
// Validate each suggestion before mapping
function isValidSuggestion(s: any): s is { files: string[]; description: string; commitMessage: string } {
return (
s &&
Array.isArray(s.files) &&
s.files.every((f: any) => typeof f === 'string') &&
typeof s.description === 'string' &&
typeof s.commitMessage === 'string'
);
}
const suggestions: SplitSuggestion[] = (Array.isArray(analysis.suggestions) ? analysis.suggestions : [])
.filter(isValidSuggestion)
.map((s) => {
const diffContent = this.extractDiffForFiles(commitInfo.diff, s.files);
return {
files: s.files,
description: s.description,
commitMessage: s.commitMessage,
diffContent
};
});

Copilot uses AI. Check for mistakes.
// Try to recover
try {
this.execCommand(`git checkout ${currentBranch}`);
this.execCommand(`git branch -D ${tempBranch} 2>/dev/null || true`);
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell redirection operators (2>/dev/null || true) will not work with execCommand as it uses execSync which doesn't process shell operators. This command will fail. Use Node.js error handling instead, or wrap the command in a shell: sh -c "git branch -D ${tempBranch} 2>/dev/null || true". However, since the catch block already ignores errors, you can simply remove the shell operators and just call git branch -D ${tempBranch}.

Suggested change
this.execCommand(`git branch -D ${tempBranch} 2>/dev/null || true`);
this.execCommand(`git branch -D ${tempBranch}`);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants