Skip to content

perf(workspace): route already-indexed repos through the incremental update path#384

Merged
RaghavChamadiya merged 1 commit into
mainfrom
perf/workspace-incremental-update
Jun 5, 2026
Merged

perf(workspace): route already-indexed repos through the incremental update path#384
RaghavChamadiya merged 1 commit into
mainfrom
perf/workspace-incremental-update

Conversation

@RaghavChamadiya
Copy link
Copy Markdown
Member

Problem

Workspace updates ran the full init pipeline for every stale member repo (update_single_repo_indexrun_pipeline): full parse, full git walk, full analysis, delete-then-insert persistence — even when one file changed since the last index. Single-repo repowise update has had a much cheaper incremental path for a long time; workspace repos couldn't use it because that orchestration lived in the CLI command.

Change

1. Extract the incremental orchestration into core — new packages/core/src/repowise/core/pipeline/incremental.py:

  • build_repo_graph / rebuild_graph_and_git / run_partial_analysis / persist_incremental_index (+ persist_partial_health, persist_incremental_commits, build_filtered_changed_paths), moved from update_cmd.py with behavior preserved:
    • console.print → optional log callback (CLI passes console.print; workspace passes the module logger)
    • run_async(...) → native await (the CLI wrapper wraps once at the boundary)
    • CLI load_config / get_db_url_for_repo → core load_repo_config / resolve_db_url (verified equivalent — the CLI helpers were thin delegates)
    • persist_incremental_index now disposes its engine in a finally (the old CLI code leaked it; engines are created fresh per call, nothing shared)
  • update_cmd.py keeps thin delegating wrappers with unchanged signatures, so every existing caller and test is untouched.

2. Route workspace updates through itupdate_single_repo_index takes the incremental path when the repo was already indexed:

  • eligibility: state.json has a last_sync_commit whose commit still resolves (commit_exists) and wiki.db exists. The resolve check matters: ChangeDetector returns an empty diff for unresolvable refs, which would masquerade as "no changes" and bump state past commits that were never indexed (rebase / aggressive gc).
  • the persisted git_tier / include_submodules / include_nested_repos flags are threaded through the incremental rebuild — and through the full-pipeline call too.

Fallbacks to the full pipeline (whose delete-then-insert persistence is authoritative):

  • never-indexed repos (first-time index, unchanged behavior)
  • diffs containing deleted or renamed files — the incremental persistence is upsert-only and can't prune rows for removed paths; the full pipeline's prune pass cleans them up
  • any incremental failure (logged with traceback); a partial incremental write is overwritten by the full run

Empty diffs (merge/empty commits, all changes excluded) report success so the caller bumps last_sync_commit instead of re-diffing forever.

Tests

tests/unit/workspace/test_incremental_update.py:

  • indexed repo → incremental path (full pipeline monkeypatched to fail if called), schema initialized in the pre-existing wiki.db
  • never-indexed / unresolvable base commit / deleted file / renamed file / incremental exception → full pipeline
  • persisted git_tier + submodule flags reach the incremental rebuild
  • commit_exists / read_repo_state units

Full tests/unit: 3783+ passed. ruff check clean on new files; pre-existing violations in untouched regions of update_cmd.py/workspace/update.py left alone.

…update path

Workspace updates re-ran the full init pipeline for every stale repo --
full parse, full git walk, full analysis, delete-then-insert persistence --
even when only a handful of files changed since the last index.

Extract the single-repo incremental orchestration (graph rebuild with
parse-cache, changed-files git re-index, partial health/dead-code
analysis, upsert persistence) out of the update command into
core/pipeline/incremental.py; the CLI keeps thin delegating wrappers with
unchanged signatures. update_single_repo_index now takes the incremental
path when the repo was already indexed: state.json has a last_sync_commit
that still resolves and wiki.db exists. The persisted git_tier /
include_submodules / include_nested_repos flags are threaded through both
the incremental rebuild and the full-pipeline fallback.

Guard rails:
- ChangeDetector returns an empty diff for unresolvable refs, so the base
  commit is verified up front (rebase/gc would otherwise masquerade as
  "no changes" and bump state past unindexed commits).
- Diffs containing deletions or renames hand off to the full pipeline:
  the incremental persistence is upsert-only and cannot prune rows for
  removed paths.
- Any incremental failure falls back to the full pipeline, whose
  delete-then-insert persistence overwrites a partial incremental write.

Never-indexed repos run the full pipeline as before.
@RaghavChamadiya RaghavChamadiya requested a review from swati510 as a code owner June 5, 2026 11:31
@repowise-bot
Copy link
Copy Markdown

repowise-bot Bot commented Jun 5, 2026

✅ Health: 7.0 (unchanged)
1 file moved · 2 hotspots · 5 hidden couplings · 2 with fix history

🚨 Change risk: high (riskier than 80% of this repo's commits · raw 9.6/10)
This change's risk is driven by:

  • large diff (many lines added)
  • scattered, high-entropy change

🩹 Review priority (files here with the most recent bug-fix history — defects cluster, so review these first)

File Score Δ Why
.../workspace/update.py 6.2 → 6.3 ▲ +0.1 🔻 introduced large method, primitive obsession · ✅ resolved brain method, dry violation
🔥 Hotspots touched (2)
  • .../commands/update_cmd.py — 38 commits/90d, 12 dependents · primary owner: Raghav Chamadiya (67%)
  • .../workspace/update.py — 3 commits/90d, 7 dependents · primary owner: Raghav Chamadiya (100%)
🔗 Hidden coupling (1 file)
  • .../commands/update_cmd.py co-changes with these files (not in this PR):
    • .../pipeline/orchestrator.py (8× — 🟢 routine)
    • .../cli/helpers.py (8× — 🟢 routine)
    • docs/CLI_REFERENCE.md (6× — 🟢 routine)
    • .../persistence/models.py (6× — 🟢 routine)
    • .../cli/test_commands.py (6× — 🟢 routine)

📊 Full report · ⭐ Star Repowise · 📥 Install bot · Last updated 2026-06-05 11:32 UTC
Silence on a single PR with [skip repowise] in the title · Per-repo toggle on repowise.dev/settings?tab=bot

@RaghavChamadiya RaghavChamadiya merged commit 9382719 into main Jun 5, 2026
5 checks passed
@RaghavChamadiya RaghavChamadiya deleted the perf/workspace-incremental-update branch June 5, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants