fix: k_timer_status_sync blocks on an embedded sem, not a poll loop (#28)#49
Merged
Conversation
Rewrites k_timer_status_sync (both backends) from the k_msleep(1) busy-poll to a true blocking wait on a binary k_sem embedded in struct k_timer -- the original PR-4 design, now safe: the April crash shapes that killed it were root-caused to the pre-#18 k_thread zombies (#21) and are green regression tests, and the sem itself is notification-backed (#41) with synchronous severance. Upstream-verified semantics (kernel/timer.c): - single-waiter model (upstream's wait_q holds "the (single) thread waiting on this timer"; expiry/stop wake exactly one) -> a binary sem latch matches exactly - woken by expiry OR stop; status re-read after wake; status reset to 0 on every path; returns 0 immediately when stopped - the wake fires AFTER the expiry callback / stop_fn (upstream order) The sem is a wake LATCH, not a counter -- the expiry count stays in timer->status. A give latched while nobody waited would satisfy a later take early, so status_sync re-checks and re-blocks in a loop; pinned by the new test_timer_status_sync_after_status_get_blocks regression. Wakes are now immediate instead of quantized to the FreeRTOS tick; the busy-poll divergence note on the declaration is replaced. k_sem_give is ISR-safe (IRAM), so the expiry-path give works under CONFIG_K_TIMER_DISPATCH_ISR=y, the ESP_TIMER_TASK fallback, and the linux dispatcher. K_TIMER_DEFINE gains the compile-time sem initializer; k_work_init_delayable inherits the sem init via k_timer_init. struct k_timer grows by sizeof(struct k_sem) (~24 B), including inside k_work_delayable. k_timer_remaining_get needed no change: it already returns uint32_t milliseconds like upstream's k_ticks_to_ms_floor32 shape. linux suite: 206/0 x3 (204 + 2). Closes #28 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fan-out findings folded: - One-shot exchange-vs-running window closed: between status_sync's status exchange and its running load, a one-shot expiry could complete entirely (status=1, running=false, give latched) and the waiter returned 0 where upstream's spinlocked read returns 1 -- terminally lost for a one-shot in a while(status_sync()) drain loop. The !running path now re-reads status instead of returning 0; the callback orders status++ (RELEASE) before running=false (RELEASE), so an ACQUIRE load of running==false is guaranteed to observe the increment. - test_timer_status_sync_after_status_get_blocks made deterministic: one-shot expiries only and t0 captured BEFORE the restart, so the elapsed lower bound holds under arbitrary host stall (the periodic re-arm racing the entry was CI-flaky both directions). - @note on k_timer_start: restart does not wake a blocked status_sync waiter (upstream parity). - @note on k_timer_init: re-init while a status_sync waiter is blocked is caller error (clobbers the embedded sem's waiter list). - Z_SEM_INITIALIZER single-sources the compile-time k_sem initializer body for K_SEM_DEFINE and K_TIMER_DEFINE. Adversarial review refuted (no change needed): zeroed-sem gives are structurally unreachable before k_timer_init (running-guard + handle-guard); stale latches cost at most one extra loop iteration (binary sem caps at 1, no busy-spin); SMP ordering is sound (RELEASE-before-give, portMUX barriers, aligned 32-bit status). linux suite: 206/0 x3. Refs #28 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR rewrites k_timer_status_sync() to block on an embedded binary k_sem latch (instead of k_msleep(1) polling), aligning Boreas’ timer wait behavior more closely with upstream Zephyr semantics while improving wake latency and avoiding tick-quantized busy polling.
Changes:
- Implement sem-backed
k_timer_status_sync()in both ESP and Linux timer backends and wake waiters on expiry/stop. - Extend
struct k_timerwith an embeddedsync_semand addZ_SEM_INITIALIZERto support compile-time initialization (includingK_TIMER_DEFINE). - Add regression tests covering stale-latch behavior and “stopped timer returns immediately” behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
test/main/test_k_timer.c |
Adds regression tests for stale latch handling and immediate return when stopped. |
components/zkernel/src/k_timer.c |
ESP backend: gives sync_sem on expiry/stop and replaces poll loop with sem-backed wait loop. |
components/zkernel/src/k_timer_linux.c |
Linux backend: mirrors sem-backed status_sync and waiter wakes on expiry/stop. |
components/zkernel/include/boreas/zephyr/kernel.h |
Adds sync_sem to k_timer, introduces Z_SEM_INITIALIZER, updates docs for k_timer_status_sync. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Upstream unpends aborted threads from the timer wait queue; Boreas cannot, so this restriction is a divergence, not parity -- matching the wording already on k_sem_take. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites
k_timer_status_sync(both the esp_timer and linux backends) from ak_msleep(1)busy-poll to a true blocking wait on a binarystruct k_semembedded instruct k_timer. This was the original design intent; it is finally safe because the corruption that killed the earlier attempt was root-caused to the pre-#18 k_thread zombies (#21) — those exact crash shapes are now green regression tests — and the semaphore itself is notification-backed (#41) with synchronous reference severance.Wakes are now immediate instead of quantized to the FreeRTOS tick period, and the busy-poll divergence note on the declaration is removed.
Upstream-verified semantics (kernel/timer.c)
wait_qholds "the (single) thread waiting on this timer", and expiry/stop wake exactly one. A binary sem latch matches this exactly (not an approximation).statusis re-read after the wake and reset to 0 on every return path; returns the count immediately if already non-zero, or 0 immediately if the timer is stopped (no block).timer->status. A give latched while no waiter was blocked would satisfy a later take early, sostatus_syncre-checks and re-blocks in a loop.k_sem_giveis ISR-safe (IRAM), so the expiry-path give works underCONFIG_K_TIMER_DISPATCH_ISR=y, theESP_TIMER_TASKfallback, and the linux dispatcher.K_TIMER_DEFINEgains a compile-time sem initializer;k_work_init_delayableinherits the sem init throughk_timer_init.struct k_timergrows bysizeof(struct k_sem)(~24 B), including insidek_work_delayable.k_timer_remaining_getneeded no change — it already returnsuint32_tmilliseconds, matching upstream'sk_ticks_to_ms_floor32shape.Review fan-out (boreas-review + adversarial tracer)
One real semantic gap, fixed: between
status_sync's status exchange and its running load, a one-shot expiry could complete entirely (status=1, running=false, give latched) and the waiter returned 0 where upstream's spinlocked read returns 1 — terminally lost for a one-shot in awhile (k_timer_status_sync())drain loop. The!runningpath now re-readsstatusinstead of returning 0; the callback ordersstatus++(RELEASE) beforerunning=false(RELEASE), so an ACQUIRE load observingrunning==falseis guaranteed to observe the increment.Adversarial probes refuted (no change needed):
k_timer_initare structurally unreachable (therunningguard +handle==NULLguard prevent every give/expiry path).status).Doc + hygiene:
@noteonk_timer_start: a restart does not wake a blockedstatus_syncwaiter (it waits for the restarted timer's first expiry — upstream parity).@noteonk_timer_init: re-init while astatus_syncwaiter is blocked is caller error (clobbers the embedded sem's waiter list).Z_SEM_INITIALIZERsingle-sources the compile-time k_sem initializer forK_SEM_DEFINEandK_TIMER_DEFINE.Test plan
CONFIG_K_TIMER_DISPATCH_ISR=yverified in the regenerated sdkconfigtest_timer_status_sync_after_status_get_blocks(stale-latch regression, made deterministic — one-shot + restart with t0 captured before the restart, so the elapsed bound holds under arbitrary host stall),test_timer_status_sync_stopped_returns_immediatelyCloses #28
🤖 Generated with Claude Code