Skip to content

The monitor service doesn't always recover from failures automatically. #112

@mukilan

Description

@mukilan

ci0.servo.org and c1.servo.org were returning 503 today. The monitor service on both these hosts were in failed state.

Here is the log from ci1:

[root@ci1:~]# journalctl -fu monitor
Feb 09 18:52:20 ci1 systemd[1]: Started monitor.service.
Feb 09 18:52:21 ci1 monitor-start[43540]: 2026-02-09T18:52:21.730729Z  INFO cmd_lib::child: gh: Requires authentication (HTTP 401)
Feb 09 18:52:21 ci1 monitor-start[43540]: 2026-02-09T18:52:21.863188Z ERROR monitor: Monitor thread error report=Running ["gh" "api" "-H" "Accept: application/vnd.github+json" "-H" "X-GitHub-Api-Version: 2022-11-28" "/orgs/servo/actions/runners" "--paginate" "-q" ".runners[]" | "jq" "-s" "."] exited with error; status code: 1 at src/github.rs:111
Feb 09 18:52:21 ci1 systemd[1]: monitor.service: Main process exited, code=exited, status=1/FAILURE
Feb 09 18:52:21 ci1 systemd[1]: monitor.service: Failed with result 'exit-code'.
Feb 09 18:52:21 ci1 systemd[1]: monitor.service: Consumed 91ms CPU time, 25.7M memory peak, 4.7K incoming IP traffic, 1.3K outgoing IP traffic.
Feb 09 18:52:22 ci1 systemd[1]: monitor.service: Scheduled restart job, restart counter is at 67.
Feb 09 18:52:22 ci1 systemd[1]: monitor.service: Start request repeated too quickly.
Feb 09 18:52:22 ci1 systemd[1]: monitor.service: Failed with result 'exit-code'.
Feb 09 18:52:22 ci1 systemd[1]: Failed to start monitor.service.

I've restarted both services manually and this fixed the issue. But perhaps it would be possible to tweak the service's systemd configuration to ensure we don't restart too quickly and exhaust systemd's default rate limits or even disable the limits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions