-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Story: Debounced cidx-meta Refresh on Batch Repository Registration
As a CIDX server administrator
I want to have all batch-registered repositories become searchable via cidx-meta after auto-discovery completes
So that newly discovered repos are reliably indexed without manual intervention, even when many repos are registered in rapid succession
Problem Statement
When multiple repositories are registered in rapid succession (e.g., via the auto-discovery feature), each registration triggers on_repo_added() in meta_description_hook.py (line 178), which calls _refresh_scheduler.trigger_refresh_for_repo("cidx-meta-global"). The first call succeeds, but all subsequent calls during the same refresh cycle raise DuplicateJobError from background_jobs.py (line 268) because a global_repo_refresh job is already running for cidx-meta-global.
The current code catches this exception as a warning (line 183) but takes no further action. The .md description files ARE created on disk for all repos, but the cidx-meta index only reflects the state at the time of the first (successful) refresh. There is no retry or catch-up mechanism, so rejected repos permanently have unindexed descriptions until something else happens to trigger a reindex.
Conversation reference: User observed the warning Failed to trigger cidx-meta refresh for [repo]: A 'global_repo_refresh' job is already running for repository 'cidx-meta-global' after registering multiple repos via auto-discovery.
Implementation Status
-
CidxMetaRefreshDebouncerclass with dirty flag and timer thread (meta_description_hook.py) - Debounce timer logic: set dirty, reset timer on each signal, fire single refresh on expiry
- Integration into
on_repo_added(): catchDuplicateJobErrorand signal debouncer instead of just warning - Integration into
on_repo_removed(): signal debouncer onDuplicateJobError(same pattern) - Module-level lifecycle:
set_refresh_scheduler()initializes debouncer, shutdown method for clean stop - Thread safety: lock protecting dirty flag and timer reset operations
- Unit tests for debouncer logic (timer reset, coalescing, single-fire behavior)
- Unit tests for
on_repo_added()/on_repo_removed()integration with debouncer - Integration test simulating batch registration with DuplicateJobError scenario
- E2E manual testing: batch register repos, verify all descriptions become searchable
Completion: 0/10 tasks complete (0%)
Algorithm
CidxMetaRefreshDebouncer:
_dirty = False
_timer = None
_lock = threading.Lock()
_debounce_seconds = 30 (configurable, default 30s)
_refresh_scheduler = RefreshScheduler reference
_shutdown = False
signal_dirty():
WITH _lock:
_dirty = True
IF _timer is not None:
_timer.cancel()
IF NOT _shutdown:
_timer = threading.Timer(_debounce_seconds, _on_timer_expired)
_timer.daemon = True
_timer.start()
LOG debug "cidx-meta marked dirty, debounce timer (re)started"
_on_timer_expired():
WITH _lock:
IF NOT _dirty OR _shutdown:
RETURN
_dirty = False
_timer = None
# Outside lock: trigger the actual refresh
TRY:
_refresh_scheduler.trigger_refresh_for_repo("cidx-meta-global")
LOG info "Debounced cidx-meta refresh triggered successfully"
EXCEPT DuplicateJobError:
# Still running from a previous cycle — re-mark dirty and retry later
WITH _lock:
_dirty = True
IF NOT _shutdown:
_timer = threading.Timer(_debounce_seconds, _on_timer_expired)
_timer.daemon = True
_timer.start()
LOG info "cidx-meta refresh still running, will retry after debounce"
EXCEPT Exception as e:
LOG warning "Debounced cidx-meta refresh failed: {e}"
shutdown():
WITH _lock:
_shutdown = True
IF _timer is not None:
_timer.cancel()
_timer = None
on_repo_added (modified flow, lines 177-183):
# After creating .md file...
IF _refresh_scheduler is not None:
TRY:
_refresh_scheduler.trigger_refresh_for_repo("cidx-meta-global")
LOG info "Triggered cidx-meta refresh after adding {repo_name}"
EXCEPT DuplicateJobError:
# Refresh already running — signal debouncer for deferred retry
IF _debouncer is not None:
_debouncer.signal_dirty()
LOG info "cidx-meta refresh deferred (debounced) for {repo_name}"
ELSE:
LOG warning "cidx-meta refresh skipped for {repo_name}: no debouncer"
EXCEPT Exception as e:
LOG warning "Failed to trigger cidx-meta refresh for {repo_name}: {e}"
on_repo_removed (modified flow, lines 211-221):
# Same pattern as on_repo_added — catch DuplicateJobError, signal debouncer
Acceptance Criteria
Scenario: Single repo registration triggers immediate refresh
Given the CIDX server is running with no active cidx-meta refresh jobs
When a single golden repository is registered via auto-discovery
Then on_repo_added creates the .md description file in cidx-meta
And trigger_refresh_for_repo succeeds immediately (no DuplicateJobError)
And the new repo description becomes searchable via cidx-meta query
Scenario: Batch registration coalesces into one deferred refresh
Given the CIDX server is running
And a cidx-meta refresh job is already running from the first registration
When 5 additional repositories are registered in rapid succession
Then each on_repo_added creates its .md description file on disk
And each DuplicateJobError is caught and signals the debouncer
And the debouncer resets its timer on each signal (coalescing)
And exactly one deferred refresh fires after the debounce interval expires
And all 6 repo descriptions (first + 5 deferred) are searchable after the deferred refresh completes
Scenario: Debounce timer resets on each new registration
Given the debouncer has been signaled and a timer is running
When another repository is registered before the timer expires
Then the timer is cancelled and restarted with the full debounce interval
And only one refresh fires after the final registration plus debounce interval
Scenario: Deferred refresh retries if still blocked
Given the debounce timer has expired and the debouncer attempts a refresh
When the refresh attempt also raises DuplicateJobError (previous refresh still running)
Then the debouncer re-marks itself dirty
And starts another debounce timer for a subsequent retry
And eventually succeeds when the running refresh completes
Scenario: Server shutdown cancels debounce timer cleanly
Given the debouncer has a pending timer
When the server is shutting down
Then the debounce timer is cancelled
And no refresh is attempted after shutdown begins
And no threads are left runningTesting Requirements
Unit Tests
CidxMetaRefreshDebouncer.signal_dirty(): verify dirty flag set, timer startedCidxMetaRefreshDebouncer.signal_dirty()called multiple times: verify timer resets (only one timer active)CidxMetaRefreshDebouncer._on_timer_expired(): verify dirty flag cleared,trigger_refresh_for_repocalled exactly onceCidxMetaRefreshDebouncer._on_timer_expired()with DuplicateJobError: verify re-marks dirty and schedules retryCidxMetaRefreshDebouncer.shutdown(): verify timer cancelled, no further signals processedon_repo_added()with DuplicateJobError: verify debouncer.signal_dirty() called instead of just warningon_repo_removed()with DuplicateJobError: verify debouncer.signal_dirty() called- Thread safety: concurrent
signal_dirty()calls do not corrupt state
Integration Tests
- Simulate batch of 5
on_repo_added()calls where first succeeds and rest get DuplicateJobError - Verify exactly one deferred refresh is triggered after debounce interval
- Verify all .md files are present on disk after the batch
E2E Manual Testing
- Start local CIDX server at localhost:8000
- Register multiple repos using auto-discovery or sequential add_golden_repo calls
- Observe logs: first refresh succeeds, subsequent ones are debounced (not just warned)
- Wait for debounce interval to expire
- Query cidx-meta for descriptions of all registered repos
- Verify all repos are discoverable
Key Files
| File | Relevance |
|---|---|
src/code_indexer/global_repos/meta_description_hook.py |
Primary change: add CidxMetaRefreshDebouncer, modify on_repo_added/on_repo_removed |
src/code_indexer/server/repositories/background_jobs.py |
Read-only: DuplicateJobError (line 40-50), conflict detection (line 164-189) |
src/code_indexer/global_repos/refresh_scheduler.py |
Read-only: trigger_refresh_for_repo (line 592-631), _submit_refresh_job (line 803-838) |
tests/unit/global_repos/test_meta_description_hook.py |
New/modified: unit tests for debouncer and integration |
Definition of Done
- All acceptance criteria satisfied
- >90% unit test coverage for new debouncer class and modified hook functions
- Integration tests passing (batch registration scenario)
- E2E manual testing completed by Claude Code (batch register + verify searchability)
- Code review approved (tdd-engineer + code-reviewer workflow)
- No lint/type errors (ruff, black, mypy clean)
- fast-automation.sh passes with zero failures
- No regressions to existing single-repo registration flow
- Working software deployable to staging/production