perf(workspace): route already-indexed repos through the incremental update path#384
Merged
Merged
Conversation
…update path Workspace updates re-ran the full init pipeline for every stale repo -- full parse, full git walk, full analysis, delete-then-insert persistence -- even when only a handful of files changed since the last index. Extract the single-repo incremental orchestration (graph rebuild with parse-cache, changed-files git re-index, partial health/dead-code analysis, upsert persistence) out of the update command into core/pipeline/incremental.py; the CLI keeps thin delegating wrappers with unchanged signatures. update_single_repo_index now takes the incremental path when the repo was already indexed: state.json has a last_sync_commit that still resolves and wiki.db exists. The persisted git_tier / include_submodules / include_nested_repos flags are threaded through both the incremental rebuild and the full-pipeline fallback. Guard rails: - ChangeDetector returns an empty diff for unresolvable refs, so the base commit is verified up front (rebase/gc would otherwise masquerade as "no changes" and bump state past unindexed commits). - Diffs containing deletions or renames hand off to the full pipeline: the incremental persistence is upsert-only and cannot prune rows for removed paths. - Any incremental failure falls back to the full pipeline, whose delete-then-insert persistence overwrites a partial incremental write. Never-indexed repos run the full pipeline as before.
|
✅ Health: 7.0 (unchanged) 🚨 Change risk: high (riskier than 80% of this repo's commits · raw 9.6/10)
🩹 Review priority (files here with the most recent bug-fix history — defects cluster, so review these first)
🔥 Hotspots touched (2)
🔗 Hidden coupling (1 file)
📊 Full report · ⭐ Star Repowise · 📥 Install bot · Last updated 2026-06-05 11:32 UTC |
swati510
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Workspace updates ran the full init pipeline for every stale member repo (
update_single_repo_index→run_pipeline): full parse, full git walk, full analysis, delete-then-insert persistence — even when one file changed since the last index. Single-reporepowise updatehas had a much cheaper incremental path for a long time; workspace repos couldn't use it because that orchestration lived in the CLI command.Change
1. Extract the incremental orchestration into core — new
packages/core/src/repowise/core/pipeline/incremental.py:build_repo_graph/rebuild_graph_and_git/run_partial_analysis/persist_incremental_index(+persist_partial_health,persist_incremental_commits,build_filtered_changed_paths), moved fromupdate_cmd.pywith behavior preserved:console.print→ optionallogcallback (CLI passesconsole.print; workspace passes the module logger)run_async(...)→ nativeawait(the CLI wrapper wraps once at the boundary)load_config/get_db_url_for_repo→ coreload_repo_config/resolve_db_url(verified equivalent — the CLI helpers were thin delegates)persist_incremental_indexnow disposes its engine in afinally(the old CLI code leaked it; engines are created fresh per call, nothing shared)update_cmd.pykeeps thin delegating wrappers with unchanged signatures, so every existing caller and test is untouched.2. Route workspace updates through it —
update_single_repo_indextakes the incremental path when the repo was already indexed:state.jsonhas alast_sync_commitwhose commit still resolves (commit_exists) andwiki.dbexists. The resolve check matters:ChangeDetectorreturns an empty diff for unresolvable refs, which would masquerade as "no changes" and bump state past commits that were never indexed (rebase / aggressive gc).git_tier/include_submodules/include_nested_reposflags are threaded through the incremental rebuild — and through the full-pipeline call too.Fallbacks to the full pipeline (whose delete-then-insert persistence is authoritative):
Empty diffs (merge/empty commits, all changes excluded) report success so the caller bumps
last_sync_commitinstead of re-diffing forever.Tests
tests/unit/workspace/test_incremental_update.py:wiki.dbgit_tier+ submodule flags reach the incremental rebuildcommit_exists/read_repo_stateunitsFull
tests/unit: 3783+ passed.ruff checkclean on new files; pre-existing violations in untouched regions ofupdate_cmd.py/workspace/update.pyleft alone.