-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Story: Signal-Based Server Restart via Auto-Updater
As a CIDX server administrator
I want to restart the server from the Diagnostics web UI via a signal file that the auto-updater monitors
So that server restarts work reliably without requiring the server process to have sudo/privilege escalation capabilities
Context
The current restart mechanism in routes.py:_delayed_restart() uses sudo systemctl restart cidx-server when running under systemd. This fails when the systemd service unit has NoNewPrivileges=true security hardening enabled, because the kernel blocks sudo from escalating privileges.
The fix leverages the existing auto-updater infrastructure: the server writes a restart signal file to ~/.cidx-server/restart.signal, and the auto-updater (which already has restart capabilities and runs as the same user) picks up the signal, deletes it immediately, and executes the restart from the outside.
This follows the established PENDING_REDEPLOY_MARKER pattern already used in deployment_executor.py for the same shared directory.
Implementation Status
- Signal file constant and format definition (
~/.cidx-server/restart.signal, JSON with timestamp + reason) - Server:
_delayed_restart()writes signal file in systemd mode instead ofsudo systemctl restart - Auto-updater: restart signal detection in
poll_once()(before existing redeploy marker check) - Auto-updater: delete-then-restart execution (delete file immediately, then
systemctl restart) - Edge case: stale signal file cleanup (file exists but no restart needed, e.g., after crash)
- Remove sudoers dependency from server restart path (server no longer needs
sudo) - Unit tests (0/0 passing)
- Integration tests (0/0 passing)
- E2E manual testing on staging server (.20)
Completion: 0/9 tasks complete (0%)
Algorithm
Signal File Protocol:
SIGNAL_PATH = Path.home() / ".cidx-server" / "restart.signal"
SIGNAL_CONTENT = { "timestamp": ISO8601, "reason": string }
Server._delayed_restart(delay):
SLEEP delay seconds (allow HTTP response to complete)
IF running under systemd (INVOCATION_ID env var set):
WRITE SIGNAL_PATH with JSON { timestamp, reason="diagnostics_restart" }
LOG "Restart signal written, waiting for auto-updater"
RESET _restart_in_progress flag
ELSE:
os.execv (existing dev mode logic, unchanged)
AutoUpdater.poll_once() - restart signal check (before existing redeploy marker check):
STALENESS_THRESHOLD = 120 seconds (2x poll interval)
IF file_exists(SIGNAL_PATH):
READ signal file JSON content
signal_age = now() - signal.timestamp
IF signal_age > STALENESS_THRESHOLD:
LOG WARNING "Stale restart signal detected (age: {signal_age}s), deleting without restart"
DELETE SIGNAL_PATH
RETURN (skip normal update check for this cycle, no restart)
DELETE SIGNAL_PATH immediately (before any restart attempt)
LOG "Restart signal detected, file deleted, executing restart"
EXECUTE deployment_executor.restart_server()
IF restart fails:
LOG error (signal already deleted, no retry loop)
RETURN (skip normal update check for this cycle)
Key Design Decisions
- Delete-first: Prevents restart loops if systemctl restart triggers auto-updater restart too
- Stale file cleanup: If signal file exists when poll_once starts and server is already running, it's stale from a crash. Delete and log warning, do NOT restart.
- No retry: If restart after signal pickup fails, log and move on. Admin can click again.
- Signal file is JSON: For debuggability (contains timestamp and reason)
- Reuses existing patterns: Same directory as PENDING_REDEPLOY_MARKER, same restart_server() call
Acceptance Criteria
Scenario 1: Server writes restart signal file when restart requested
Given the CIDX server is running under systemd (INVOCATION_ID is set)
When an admin triggers a restart from the Diagnostics web UI
Then a signal file is created at ~/.cidx-server/restart.signal
And the file contains JSON with "timestamp" and "reason" fields
And the HTTP response returns success before the file is written
Scenario 2: Auto-updater detects signal and restarts server
Given the auto-updater polling loop is running
And a restart.signal file exists at ~/.cidx-server/
When the auto-updater executes its next poll cycle
Then the signal file is deleted immediately (before restart attempt)
And systemctl restart cidx-server is executed
Scenario 3: Signal file deleted even if restart fails
Given a restart.signal file exists at ~/.cidx-server/
When the auto-updater detects it and systemctl restart fails
Then the signal file is still deleted (no retry loop)
And the failure is logged
Scenario 4: Stale signal file cleaned up at startup
Given a restart.signal file exists from a previous crash/power-loss
When the auto-updater starts a new poll cycle
Then the stale signal file is deleted
And a warning is logged
And no restart is triggered (server is already starting fresh)
Scenario 5: Dev mode restart unchanged
Given the CIDX server is running in dev mode (no INVOCATION_ID)
When an admin triggers a restart from the Diagnostics web UI
Then the existing os.execv restart mechanism is used (no signal file)Key Files
src/code_indexer/server/web/routes.py-_delayed_restart()function (line ~8743)src/code_indexer/server/auto_update/service.py-AutoUpdateService.poll_once()methodsrc/code_indexer/server/auto_update/deployment_executor.py-PENDING_REDEPLOY_MARKERconstant,restart_server()methodtests/unit/server/web/test_restart_endpoint.py- Existing restart teststests/unit/server/auto_update/- Existing auto-update tests
Testing Requirements
- Unit tests covering signal file write (JSON format, path, permissions)
- Unit tests covering signal file detection and delete-before-restart ordering
- Unit tests covering stale signal file cleanup
- Unit tests covering dev mode unchanged behavior
- Integration tests for signal file write/read/delete protocol
- E2E manual testing on staging server (192.168.60.20)
Manual Testing Strategy (Staging .20)
- Deploy to staging via development -> staging merge
- SSH to staging, verify updated code deployed
- Trigger restart from Diagnostics web UI
- Verify signal file appears:
ls -la ~/.cidx-server/restart.signal - Verify auto-updater picks it up within 60s:
journalctl -u cidx-auto-update --since "1 min ago" - Verify signal file deleted:
ls -la ~/.cidx-server/restart.signal(should be gone) - Verify server back online:
systemctl status cidx-server - Verify Diagnostics page loads after restart
Definition of Done
- All acceptance criteria satisfied
-
90% unit test coverage achieved
- Integration tests passing
- E2E tests with zero mocking passing
- Code review approved
- Manual end-to-end testing on staging completed
- No lint/type errors
- Working software deployable to production