Skip to content

fix: k_timer_status_sync blocks on an embedded sem, not a poll loop (#28)#49

Merged
swoisz merged 3 commits into
mainfrom
fix/k-timer-status-sync-blocking
Jun 8, 2026
Merged

fix: k_timer_status_sync blocks on an embedded sem, not a poll loop (#28)#49
swoisz merged 3 commits into
mainfrom
fix/k-timer-status-sync-blocking

Conversation

@swoisz

@swoisz swoisz commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Rewrites k_timer_status_sync (both the esp_timer and linux backends) from a k_msleep(1) busy-poll to a true blocking wait on a binary struct k_sem embedded in struct k_timer. This was the original design intent; it is finally safe because the corruption that killed the earlier attempt was root-caused to the pre-#18 k_thread zombies (#21) — those exact crash shapes are now green regression tests — and the semaphore itself is notification-backed (#41) with synchronous reference severance.

Wakes are now immediate instead of quantized to the FreeRTOS tick period, and the busy-poll divergence note on the declaration is removed.

Upstream-verified semantics (kernel/timer.c)

  • Single-waiter model — upstream's wait_q holds "the (single) thread waiting on this timer", and expiry/stop wake exactly one. A binary sem latch matches this exactly (not an approximation).
  • Woken by expiry OR stop; the wake fires after the expiry callback / stop function (upstream order).
  • status is re-read after the wake and reset to 0 on every return path; returns the count immediately if already non-zero, or 0 immediately if the timer is stopped (no block).
  • The sem is a wake latch, not a counter — the expiry count stays in timer->status. A give latched while no waiter was blocked would satisfy a later take early, so status_sync re-checks and re-blocks in a loop.

k_sem_give is ISR-safe (IRAM), so the expiry-path give works under CONFIG_K_TIMER_DISPATCH_ISR=y, the ESP_TIMER_TASK fallback, and the linux dispatcher. K_TIMER_DEFINE gains a compile-time sem initializer; k_work_init_delayable inherits the sem init through k_timer_init. struct k_timer grows by sizeof(struct k_sem) (~24 B), including inside k_work_delayable.

k_timer_remaining_get needed no change — it already returns uint32_t milliseconds, matching upstream's k_ticks_to_ms_floor32 shape.

Review fan-out (boreas-review + adversarial tracer)

One real semantic gap, fixed: between status_sync's status exchange and its running load, a one-shot expiry could complete entirely (status=1, running=false, give latched) and the waiter returned 0 where upstream's spinlocked read returns 1 — terminally lost for a one-shot in a while (k_timer_status_sync()) drain loop. The !running path now re-reads status instead of returning 0; the callback orders status++ (RELEASE) before running=false (RELEASE), so an ACQUIRE load observing running==false is guaranteed to observe the increment.

Adversarial probes refuted (no change needed):

  • Zeroed-sem gives before k_timer_init are structurally unreachable (the running guard + handle==NULL guard prevent every give/expiry path).
  • Stale latches cost at most one extra loop iteration — the binary sem caps at 1, no busy-spin.
  • SMP ordering is sound (RELEASE-before-give, portMUX barriers in k_sem, aligned 32-bit status).

Doc + hygiene:

  • @note on k_timer_start: a restart does not wake a blocked status_sync waiter (it waits for the restarted timer's first expiry — upstream parity).
  • @note on k_timer_init: re-init while a status_sync waiter is blocked is caller error (clobbers the embedded sem's waiter list).
  • Z_SEM_INITIALIZER single-sources the compile-time k_sem initializer for K_SEM_DEFINE and K_TIMER_DEFINE.

Test plan

  • linux: 206/0 ×3 (204 + 2 new tests)
  • ESP32-S3: 235/0 on hardware (233 + 2), CONFIG_K_TIMER_DISPATCH_ISR=y verified in the regenerated sdkconfig
  • New tests: test_timer_status_sync_after_status_get_blocks (stale-latch regression, made deterministic — one-shot + restart with t0 captured before the restart, so the elapsed bound holds under arbitrary host stall), test_timer_status_sync_stopped_returns_immediately
  • clang-format 21.1.8 clean

Closes #28

🤖 Generated with Claude Code

swoisz and others added 2 commits June 7, 2026 12:11
Rewrites k_timer_status_sync (both backends) from the k_msleep(1)
busy-poll to a true blocking wait on a binary k_sem embedded in
struct k_timer -- the original PR-4 design, now safe: the April
crash shapes that killed it were root-caused to the pre-#18 k_thread
zombies (#21) and are green regression tests, and the sem itself is
notification-backed (#41) with synchronous severance.

Upstream-verified semantics (kernel/timer.c):
- single-waiter model (upstream's wait_q holds "the (single) thread
  waiting on this timer"; expiry/stop wake exactly one) -> a binary
  sem latch matches exactly
- woken by expiry OR stop; status re-read after wake; status reset
  to 0 on every path; returns 0 immediately when stopped
- the wake fires AFTER the expiry callback / stop_fn (upstream order)

The sem is a wake LATCH, not a counter -- the expiry count stays in
timer->status. A give latched while nobody waited would satisfy a
later take early, so status_sync re-checks and re-blocks in a loop;
pinned by the new test_timer_status_sync_after_status_get_blocks
regression. Wakes are now immediate instead of quantized to the
FreeRTOS tick; the busy-poll divergence note on the declaration is
replaced.

k_sem_give is ISR-safe (IRAM), so the expiry-path give works under
CONFIG_K_TIMER_DISPATCH_ISR=y, the ESP_TIMER_TASK fallback, and the
linux dispatcher. K_TIMER_DEFINE gains the compile-time sem
initializer; k_work_init_delayable inherits the sem init via
k_timer_init. struct k_timer grows by sizeof(struct k_sem) (~24 B),
including inside k_work_delayable.

k_timer_remaining_get needed no change: it already returns uint32_t
milliseconds like upstream's k_ticks_to_ms_floor32 shape.

linux suite: 206/0 x3 (204 + 2).

Closes #28

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fan-out findings folded:

- One-shot exchange-vs-running window closed: between status_sync's
  status exchange and its running load, a one-shot expiry could
  complete entirely (status=1, running=false, give latched) and the
  waiter returned 0 where upstream's spinlocked read returns 1 --
  terminally lost for a one-shot in a while(status_sync()) drain
  loop. The !running path now re-reads status instead of returning
  0; the callback orders status++ (RELEASE) before running=false
  (RELEASE), so an ACQUIRE load of running==false is guaranteed to
  observe the increment.
- test_timer_status_sync_after_status_get_blocks made deterministic:
  one-shot expiries only and t0 captured BEFORE the restart, so the
  elapsed lower bound holds under arbitrary host stall (the periodic
  re-arm racing the entry was CI-flaky both directions).
- @note on k_timer_start: restart does not wake a blocked
  status_sync waiter (upstream parity).
- @note on k_timer_init: re-init while a status_sync waiter is
  blocked is caller error (clobbers the embedded sem's waiter list).
- Z_SEM_INITIALIZER single-sources the compile-time k_sem
  initializer body for K_SEM_DEFINE and K_TIMER_DEFINE.

Adversarial review refuted (no change needed): zeroed-sem gives are
structurally unreachable before k_timer_init (running-guard +
handle-guard); stale latches cost at most one extra loop iteration
(binary sem caps at 1, no busy-spin); SMP ordering is sound
(RELEASE-before-give, portMUX barriers, aligned 32-bit status).

linux suite: 206/0 x3.

Refs #28

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites k_timer_status_sync() to block on an embedded binary k_sem latch (instead of k_msleep(1) polling), aligning Boreas’ timer wait behavior more closely with upstream Zephyr semantics while improving wake latency and avoiding tick-quantized busy polling.

Changes:

  • Implement sem-backed k_timer_status_sync() in both ESP and Linux timer backends and wake waiters on expiry/stop.
  • Extend struct k_timer with an embedded sync_sem and add Z_SEM_INITIALIZER to support compile-time initialization (including K_TIMER_DEFINE).
  • Add regression tests covering stale-latch behavior and “stopped timer returns immediately” behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
test/main/test_k_timer.c Adds regression tests for stale latch handling and immediate return when stopped.
components/zkernel/src/k_timer.c ESP backend: gives sync_sem on expiry/stop and replaces poll loop with sem-backed wait loop.
components/zkernel/src/k_timer_linux.c Linux backend: mirrors sem-backed status_sync and waiter wakes on expiry/stop.
components/zkernel/include/boreas/zephyr/kernel.h Adds sync_sem to k_timer, introduces Z_SEM_INITIALIZER, updates docs for k_timer_status_sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread components/zkernel/include/boreas/zephyr/kernel.h Outdated
Upstream unpends aborted threads from the timer wait queue; Boreas
cannot, so this restriction is a divergence, not parity -- matching
the wording already on k_sem_take.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@swoisz swoisz merged commit 0d1045f into main Jun 8, 2026
5 checks passed
@swoisz swoisz deleted the fix/k-timer-status-sync-blocking branch June 8, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

k_timer: remaining semantic divergences from upstream (remaining_get type, status_sync busy-poll)

2 participants