[STORY] Signal-Based Server Restart via Auto-Updater

# Story: Signal-Based Server Restart via Auto-Updater

**As a** CIDX server administrator
**I want to** restart the server from the Diagnostics web UI via a signal file that the auto-updater monitors
**So that** server restarts work reliably without requiring the server process to have sudo/privilege escalation capabilities

---

## Context

The current restart mechanism in `routes.py:_delayed_restart()` uses `sudo systemctl restart cidx-server` when running under systemd. This fails when the systemd service unit has `NoNewPrivileges=true` security hardening enabled, because the kernel blocks sudo from escalating privileges.

The fix leverages the existing auto-updater infrastructure: the server writes a restart signal file to `~/.cidx-server/restart.signal`, and the auto-updater (which already has restart capabilities and runs as the same user) picks up the signal, deletes it immediately, and executes the restart from the outside.

This follows the established `PENDING_REDEPLOY_MARKER` pattern already used in `deployment_executor.py` for the same shared directory.

---

## Implementation Status

- [ ] Signal file constant and format definition (`~/.cidx-server/restart.signal`, JSON with timestamp + reason)
- [ ] Server: `_delayed_restart()` writes signal file in systemd mode instead of `sudo systemctl restart`
- [ ] Auto-updater: restart signal detection in `poll_once()` (before existing redeploy marker check)
- [ ] Auto-updater: delete-then-restart execution (delete file immediately, then `systemctl restart`)
- [ ] Edge case: stale signal file cleanup (file exists but no restart needed, e.g., after crash)
- [ ] Remove sudoers dependency from server restart path (server no longer needs `sudo`)
- [ ] Unit tests (0/0 passing)
- [ ] Integration tests (0/0 passing)
- [ ] E2E manual testing on staging server (.20)

**Completion:** 0/9 tasks complete (0%)

---

## Algorithm

```
Signal File Protocol:
  SIGNAL_PATH = Path.home() / ".cidx-server" / "restart.signal"
  SIGNAL_CONTENT = { "timestamp": ISO8601, "reason": string }

Server._delayed_restart(delay):
  SLEEP delay seconds (allow HTTP response to complete)
  IF running under systemd (INVOCATION_ID env var set):
    WRITE SIGNAL_PATH with JSON { timestamp, reason="diagnostics_restart" }
    LOG "Restart signal written, waiting for auto-updater"
    RESET _restart_in_progress flag
  ELSE:
    os.execv (existing dev mode logic, unchanged)

AutoUpdater.poll_once() - restart signal check (before existing redeploy marker check):
  STALENESS_THRESHOLD = 120 seconds (2x poll interval)

  IF file_exists(SIGNAL_PATH):
    READ signal file JSON content
    signal_age = now() - signal.timestamp

    IF signal_age > STALENESS_THRESHOLD:
      LOG WARNING "Stale restart signal detected (age: {signal_age}s), deleting without restart"
      DELETE SIGNAL_PATH
      RETURN (skip normal update check for this cycle, no restart)

    DELETE SIGNAL_PATH immediately (before any restart attempt)
    LOG "Restart signal detected, file deleted, executing restart"
    EXECUTE deployment_executor.restart_server()
    IF restart fails:
      LOG error (signal already deleted, no retry loop)
    RETURN (skip normal update check for this cycle)
```

### Key Design Decisions

- **Delete-first**: Prevents restart loops if systemctl restart triggers auto-updater restart too
- **Stale file cleanup**: If signal file exists when poll_once starts and server is already running, it's stale from a crash. Delete and log warning, do NOT restart.
- **No retry**: If restart after signal pickup fails, log and move on. Admin can click again.
- **Signal file is JSON**: For debuggability (contains timestamp and reason)
- **Reuses existing patterns**: Same directory as PENDING_REDEPLOY_MARKER, same restart_server() call

---

## Acceptance Criteria

```gherkin
Scenario 1: Server writes restart signal file when restart requested
  Given the CIDX server is running under systemd (INVOCATION_ID is set)
  When an admin triggers a restart from the Diagnostics web UI
  Then a signal file is created at ~/.cidx-server/restart.signal
  And the file contains JSON with "timestamp" and "reason" fields
  And the HTTP response returns success before the file is written

Scenario 2: Auto-updater detects signal and restarts server
  Given the auto-updater polling loop is running
  And a restart.signal file exists at ~/.cidx-server/
  When the auto-updater executes its next poll cycle
  Then the signal file is deleted immediately (before restart attempt)
  And systemctl restart cidx-server is executed

Scenario 3: Signal file deleted even if restart fails
  Given a restart.signal file exists at ~/.cidx-server/
  When the auto-updater detects it and systemctl restart fails
  Then the signal file is still deleted (no retry loop)
  And the failure is logged

Scenario 4: Stale signal file cleaned up at startup
  Given a restart.signal file exists from a previous crash/power-loss
  When the auto-updater starts a new poll cycle
  Then the stale signal file is deleted
  And a warning is logged
  And no restart is triggered (server is already starting fresh)

Scenario 5: Dev mode restart unchanged
  Given the CIDX server is running in dev mode (no INVOCATION_ID)
  When an admin triggers a restart from the Diagnostics web UI
  Then the existing os.execv restart mechanism is used (no signal file)
```

---

## Key Files

- `src/code_indexer/server/web/routes.py` - `_delayed_restart()` function (line ~8743)
- `src/code_indexer/server/auto_update/service.py` - `AutoUpdateService.poll_once()` method
- `src/code_indexer/server/auto_update/deployment_executor.py` - `PENDING_REDEPLOY_MARKER` constant, `restart_server()` method
- `tests/unit/server/web/test_restart_endpoint.py` - Existing restart tests
- `tests/unit/server/auto_update/` - Existing auto-update tests

---

## Testing Requirements

- Unit tests covering signal file write (JSON format, path, permissions)
- Unit tests covering signal file detection and delete-before-restart ordering
- Unit tests covering stale signal file cleanup
- Unit tests covering dev mode unchanged behavior
- Integration tests for signal file write/read/delete protocol
- E2E manual testing on staging server (192.168.60.20)

### Manual Testing Strategy (Staging .20)

1. Deploy to staging via development -> staging merge
2. SSH to staging, verify updated code deployed
3. Trigger restart from Diagnostics web UI
4. Verify signal file appears: `ls -la ~/.cidx-server/restart.signal`
5. Verify auto-updater picks it up within 60s: `journalctl -u cidx-auto-update --since "1 min ago"`
6. Verify signal file deleted: `ls -la ~/.cidx-server/restart.signal` (should be gone)
7. Verify server back online: `systemctl status cidx-server`
8. Verify Diagnostics page loads after restart

---

## Definition of Done

- All acceptance criteria satisfied
- >90% unit test coverage achieved
- Integration tests passing
- E2E tests with zero mocking passing
- Code review approved
- Manual end-to-end testing on staging completed
- No lint/type errors
- Working software deployable to production


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STORY] Signal-Based Server Restart via Auto-Updater #355

Story: Signal-Based Server Restart via Auto-Updater

Context

Implementation Status

Algorithm

Key Design Decisions

Acceptance Criteria

Key Files

Testing Requirements

Manual Testing Strategy (Staging .20)

Definition of Done

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[STORY] Signal-Based Server Restart via Auto-Updater #355

Description

Story: Signal-Based Server Restart via Auto-Updater

Context

Implementation Status

Algorithm

Key Design Decisions

Acceptance Criteria

Key Files

Testing Requirements

Manual Testing Strategy (Staging .20)

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions