The `monitor` service doesn't always recover from failures automatically.

ci0.servo.org and c1.servo.org were returning 503 today. The `monitor` service on both these hosts were in failed state.

Here is the log from ci1:

```
[root@ci1:~]# journalctl -fu monitor
Feb 09 18:52:20 ci1 systemd[1]: Started monitor.service.
Feb 09 18:52:21 ci1 monitor-start[43540]: 2026-02-09T18:52:21.730729Z  INFO cmd_lib::child: gh: Requires authentication (HTTP 401)
Feb 09 18:52:21 ci1 monitor-start[43540]: 2026-02-09T18:52:21.863188Z ERROR monitor: Monitor thread error report=Running ["gh" "api" "-H" "Accept: application/vnd.github+json" "-H" "X-GitHub-Api-Version: 2022-11-28" "/orgs/servo/actions/runners" "--paginate" "-q" ".runners[]" | "jq" "-s" "."] exited with error; status code: 1 at src/github.rs:111
Feb 09 18:52:21 ci1 systemd[1]: monitor.service: Main process exited, code=exited, status=1/FAILURE
Feb 09 18:52:21 ci1 systemd[1]: monitor.service: Failed with result 'exit-code'.
Feb 09 18:52:21 ci1 systemd[1]: monitor.service: Consumed 91ms CPU time, 25.7M memory peak, 4.7K incoming IP traffic, 1.3K outgoing IP traffic.
Feb 09 18:52:22 ci1 systemd[1]: monitor.service: Scheduled restart job, restart counter is at 67.
Feb 09 18:52:22 ci1 systemd[1]: monitor.service: Start request repeated too quickly.
Feb 09 18:52:22 ci1 systemd[1]: monitor.service: Failed with result 'exit-code'.
Feb 09 18:52:22 ci1 systemd[1]: Failed to start monitor.service.
```

I've restarted both services manually and this fixed the issue. But perhaps it would be possible to tweak the service's systemd configuration to ensure we don't restart too quickly and exhaust systemd's default rate limits  or even disable the limits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `monitor` service doesn't always recover from failures automatically. #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The monitor service doesn't always recover from failures automatically. #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `monitor` service doesn't always recover from failures automatically. #112