perf: optimize stop hook - transcript parsing and tree updates#407
perf: optimize stop hook - transcript parsing and tree updates#407yasunogithub wants to merge 3 commits intoentireio:mainfrom
Conversation
aa9a6dd to
30b8bc0
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes Claude Code stop-hook performance in Entire CLI by avoiding redundant transcript parsing and by updating checkpoint tree construction to reuse unchanged git subtrees (incremental tree updates). It also tweaks the repo’s Claude hook configuration to prefer a prebuilt entire binary over go run.
Changes:
- Add
FromLinesAPIs to reuse pre-parsed transcript lines for modified-file detection and token usage. - Replace flatten/rebuild checkpoint tree logic with an incremental tree update implementation.
- Update
.claude/settings.jsonhook commands to run a prebuilt./entirebinary with a fallback message.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/entire/cli/hooks_claudecode_handlers.go |
Reuses already-parsed transcript lines for modified-file extraction and (sometimes) token usage. |
cmd/entire/cli/checkpoint/tree_incremental.go |
New incremental git tree update implementation (changeTree, applyChangesToTree). |
cmd/entire/cli/checkpoint/temporary.go |
Switches checkpoint tree creation / metadata injection to incremental updates. |
cmd/entire/cli/agent/claudecode/transcript.go |
Adds CalculateTotalTokenUsageFromLines and ExtractAllModifiedFilesFromLines. |
.claude/settings.json |
Uses prebuilt binary instead of go run for Claude hooks. |
| for _, entry := range baseTree.Entries { | ||
| if entry.Mode == filemode.Dir { | ||
| if dirChanges, ok := changes.dirs[entry.Name]; ok { | ||
| // Recurse into subdirectory with changes | ||
| newSubHash, subErr := applyChangesToTree(repo, entry.Hash, dirChanges) |
There was a problem hiding this comment.
applyChangesToTree() doesn’t handle file/dir type collisions (e.g., base has a directory named X but changes adds a file X, or vice versa). In that case the existing entry is kept and the new one is added later, producing duplicate tree entries for the same name (invalid tree). Consider explicitly detecting name conflicts between changes.files and changes.dirs while iterating baseTree.Entries and either replacing the existing entry (drop the old one) or returning an error when both a file and directory change exist for the same name.
There was a problem hiding this comment.
Fixed. Added type conflict handling: when base has a dir but changes has a file (or vice versa), the entry is now replaced instead of creating duplicates.
| // Add new directories (not in base tree) | ||
| for name, dirChanges := range changes.dirs { | ||
| if !processedDirs[name] { | ||
| newSubHash, subErr := createTreeFromChanges(repo, dirChanges) | ||
| if subErr != nil { | ||
| return plumbing.ZeroHash, subErr | ||
| } | ||
| newEntries = append(newEntries, object.TreeEntry{ | ||
| Name: name, | ||
| Mode: filemode.Dir, | ||
| Hash: newSubHash, | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
When a path is marked for deletion but the directory doesn’t exist in the base tree, the “Add new directories” block will still create the directory via createTreeFromChanges(), even if that subtree contains only deletions and ends up empty. This can introduce empty directory entries into commits, which Git normally doesn’t track. Recommend skipping creation of new directory entries when the resulting subtree has no entries (and/or avoid recording delete-only paths for non-existent base paths).
There was a problem hiding this comment.
Fixed. createTreeFromChanges now returns ZeroHash when all entries are deletions, and callers skip adding empty directory entries.
| // Use forward slashes for git tree paths | ||
| treePath := dirPathRel + "/" + filepath.ToSlash(relWithinDir) | ||
| ct.addFile(treePath, blobHash, mode) |
There was a problem hiding this comment.
addDirectoryToChangeTree() documents that ct.addFile() paths must use forward slashes, but treePath is built from dirPathRel without normalizing it. Callers sometimes build MetadataDir with filepath.Join (which uses OS separators), so on Windows this can create directory names containing backslashes and break tree layout. Normalize dirPathRel (and/or the full path in ct.addFile/deleteFile) with filepath.ToSlash before splitting/recording paths.
There was a problem hiding this comment.
Fixed. dirPathRel is now normalized with filepath.ToSlash() before use.
Soph
left a comment
There was a problem hiding this comment.
@yasunogithub thanks a lot for this! The changes look good, I'd love to have more tests for tree_incremental.go. Would you be up to add them? Or would you be ok me taking over this PR and add more tests (and run it against our e2e suite).
I’m totally fine with you taking over the PR to add more tests and run the e2e suite. Thanks a lot! |
…ree updates Two main optimizations to reduce stop hook latency: 1. Transcript parsing 3x → 1x: commitWithMetadata() was parsing the transcript file three separate times (prompt extraction, modified files, token usage). Added FromLines variants (ExtractAllModifiedFilesFromLines, CalculateTotalTokenUsageFromLines) that accept pre-parsed lines, eliminating redundant file I/O and JSON parsing. 2. Incremental git tree updates: buildTreeWithChanges() and addTaskMetadataToTree() previously flattened the entire git tree into a map (O(N) where N=total files), modified entries, then rebuilt from scratch. Replaced with applyChangesToTree() which only traverses directories containing changes - unchanged subtrees are reused by hash reference. Complexity is now O(M*D) where M=changed files and D=tree depth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: ce934f044ff2
- Remove waitForTranscriptFlush (0-3s savings per stop hook call) - Use PrePromptState.UserPrompt instead of parsing transcript for commit messages - Use git status (DetectFileChanges) instead of transcript for modified file detection - Remove extractLastAssistantMessage, extractKeyActions, createContextFileMinimal (dead code) - Add transcript.CountLines for fast line counting without JSON parsing - Keep lightweight transcript file copy for rewind support - Fallback to transcript parsing only when UserPrompt unavailable (backward compat) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 54b3d488ad4d
- Handle file/dir type conflicts in applyChangesToTree (dir→file and file→dir) - Skip empty trees from delete-only changes in createTreeFromChanges - Normalize dirPathRel with filepath.ToSlash for Windows compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 6dc290b19fdd
015fb69 to
91e941d
Compare
Summary
Transcript parsing 3x → 1x:
commitWithMetadata()was parsing the same transcript file three times (for prompt extraction, modified file detection, and token usage calculation). AddedFromLinesvariants that accept pre-parsed[]TranscriptLine, eliminating redundant file I/O and JSON parsing.Incremental git tree updates:
buildTreeWithChanges()andaddTaskMetadataToTree()previously flattened the entire git tree (O(N) total files) then rebuilt from scratch. Replaced withapplyChangesToTree()which traverses only directories containing changes — unchanged subtrees are reused by hash. Complexity: O(changed_files × tree_depth).Hook binary fallback: Switched
.claude/settings.jsonfromgo run(recompiles every invocation ~1s) to pre-built binary with helpful error if missing.Test plan
mise run fmt— no formatting issuesmise run lint— 0 issuesmise run test:ci— all unit + integration tests passtime ./entire hooks claude-code stopto measure binary execution time🤖 Generated with Claude Code