Skip to content

Ubuntu CI jobs hang for the full 6-hour timeout window #167

@thedavidmeister

Description

@thedavidmeister

Ubuntu runners on this repo pick up jobs and go silent — they never report progress, never complete, and the cancel API has no effect. The 6h job timeout is the only thing that clears them.

Observed pattern (2026-05-13)

Run 25785724500 (Rainix CI, branch 2026-05-12-bump-nixpkgs-and-add-cargo-expand, started 07:47:14):

Job OS Status
Cargo test job macos-latest success @ 07:59:08 (12 min)
Cargo build release macos-latest success @ 07:59:16 (12 min)
Slither static analysis ubuntu-latest in_progress, no update since 07:47:23
Cargo build release ubuntu-latest in_progress, no update since 07:47:17
Forge fmt check ubuntu-latest in_progress, no update since 07:47:18
Rust static checks ubuntu-latest in_progress, no update since 07:47:23
Cargo test ubuntu-latest in_progress, no update since 07:47:18
Reuse lint ubuntu-latest in_progress, no update since 07:47:17
Solidity artifacts ubuntu-latest in_progress, no update since 07:47:23
Solidity verbose test ubuntu-latest in_progress, no update since 07:47:22

Run 25785724502 (Rainix CI check shell, same branch, same time):

Job OS Status
rainix-check-shell macos-latest success @ 07:59:40 (12 min)
rainix-check-shell ubuntu-latest in_progress, no update since 07:47:17

macOS jobs from the same matrix complete in 12 min; the Ubuntu jobs run until they hit the 6h timeout.

A second hang on the same branch started 12:36 (runs 25799505814, 25799505847) — same pattern.

gh api -X POST repos/rainlanguage/rainix/actions/runs/<id>/cancel was accepted (HTTP 202, empty body) but the runs stayed in_progress. The cancel signal needs an alive runner to honor it.

What to investigate

  1. Are these jobs landing on a specific self-hosted Ubuntu runner that's wedged? Check the runner registration list and recent runner heartbeats.
  2. If self-hosted: the runner agent is alive enough to claim jobs but not run them — restart or replace the host.
  3. If GitHub-hosted: file with GitHub support; this is not a configuration issue.
  4. The Rainix CI workflow has no timeout-minutes per job. Add one (e.g., 30 min) so a wedged job clears quickly instead of holding a slot for 6 hours.
  5. The workflow has no concurrency group on the PR branch. A re-push spawns a parallel hang instead of cancelling the previous one. Adding concurrency: group: rainix-${{ github.ref }}, cancel-in-progress: true would prevent stacking.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions