test: stress coverage for k_sem/k_event race windows (#43)#48
Merged
Conversation
Closes the coverage gaps from the #41/#42 design review (issue #43): - timeout-vs-give stress (k_sem) and timeout-vs-post stress (k_event): sweep the giver/poster firing time across the taker's 20 ms timeout (early / same-tick / late) for 100 iterations, checking conservation invariants on every outcome and probing the reserved notification index for stranded notifications after each cycle. The giver runs as a raw FreeRTOS task pinned to the other core on multicore targets (k_thread_create pins to core 0). - give-racing-the-park: 1000 zero-delay handshake iterations hammer the unlock-sample-relock recheck window and the enqueue->park window inside k_sem_take. - multi-waiter beyond two: 4-waiter conservation, FIFO order among equal-priority waiters (pins the strict '>' in z_sem_pop_waiter), k_sem_reset waking 3 waiters, single k_event_post waking 3 waiters. - FromISR coverage (HW-gated on CONFIG_K_TIMER_DISPATCH_ISR): k_sem_give from real ISR context with a prompt-wake latency bound proving the portYIELD_FROM_ISR path (mid-tick expiry via K_USEC); k_sem_take(K_NO_WAIT) from ISR (upstream isr-ok contract); k_event_post from ISR waking 3 waiters (accumulate-yield-once path). Doc notes on the declarations (inline-divergence convention): - k_sem_take: ISR-legal only with K_NO_WAIT (upstream contract); waiter priority is cached at enqueue (k_thread_priority_set on a blocked waiter does not re-sort the wake order -- upstream re-sorts). - k_sem_give: documented wake order (highest priority, FIFO among equals). - k_sem_reset: task context only. Upstream verification showed upstream's k_sem_reset is NOT isr-ok either (no @isr_ok; calls z_reschedule), so this is parity, not a divergence -- the #43 item proposing an ISR-safe reset was based on a wrong premise. The stale-priority note is NOT added to k_event_wait: k_event wakes ALL satisfied waiters with no priority ordering, so there is no stale value to document. linux suite: 204/0 (197 + 7; the 3 ISR tests compile out on linux). Refs #43 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fan-out findings folded: - k_sem_take, z_event_wait_internal, and the four k_event_wait* wrappers are now K_ISR_SAFE (IRAM-resident). The documented isr-ok-with-K_NO_WAIT contract was unsound for flash-resident code: the esp_timer ISR is allocated ESP_INTR_FLAG_IRAM, so the take/wait fast paths would fault if the ISR fired during a concurrent flash operation. New HW-gated tests pin the IRAM residency (prior art: the iram_attr tests in test_k_timer.c). - Prompt-wake test retuned: expiry at 10.1 ms (was 10.5) makes a missing portYIELD_FROM_ISR cost ~900 us instead of ~500, so the 700 us bound has wide margin against both false pass (deferred wake) and false fail (cache-cold first run, tick coincidence). - The both-outcomes sweep asserts are now HW-only: on the linux target a loaded CI host can stall the tick clock and collapse the sweep onto one outcome. The per-iteration conservation invariants (which hold under every interleaving) still run everywhere. - BUILD_ASSERT(CONFIG_FREERTOS_HZ >= 1000): the 18..22 ms sweep separation quantizes away at coarser ticks. - Multi-waiter state hygiene: mw_result/mw_order sentinel-reset per test (stale values from a prior test can no longer satisfy assertions), mw_next is a Zephyr atomic_t. - kernel.h: precise wording on the K_NO_WAIT-from-ISR note (spinlock only, no task-notify/blocking calls, IRAM-resident). - test_k_event_mt.c: pre-existing sizeof(stack) call sites migrated to K_THREAD_STACK_SIZEOF; FreeRTOS-priority-convention reminder on the priority-wake test. linux suite: 204/0 x3. Refs #43 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR expands test coverage for the notification-backed k_sem / k_event implementations, specifically targeting previously untested race windows (timeout-vs-give/post, give racing internal park windows, multi-waiter behaviors) and adds ISR-context validation for key paths. It also closes a contract gap by making k_sem_take and the k_event_wait* family IRAM-resident (K_ISR_SAFE) to satisfy the documented “ISR-ok with K_NO_WAIT” requirement in IRAM-only ISR contexts.
Changes:
- Add stress + multi-waiter tests for
k_semandk_event, including reserved-notification “no stranded notify” invariants. - Add HW-gated FromISR tests (prompt-yield path, ISR
K_NO_WAITpaths, IRAM residency asserts). - Mark
k_sem_takeandk_event_wait*pathsK_ISR_SAFE(IRAM) and update API docs with ISR/divergence notes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/main/test_k_sem.c | Adds k_sem stress, park-race, multi-waiter, and ISR/IRAM-residency tests. |
| test/main/test_k_event_mt.c | Adds k_event stress, multi-waiter wake-all, and ISR/IRAM-residency tests; minor stack sizing cleanup. |
| components/zkernel/src/k_sem.c | Makes k_sem_take IRAM-resident (K_ISR_SAFE) to uphold ISR K_NO_WAIT contract under cache-disabled windows. |
| components/zkernel/src/k_event.c | Makes k_event_wait* and internal wait path IRAM-resident (K_ISR_SAFE) for the same contract reason. |
| components/zkernel/include/boreas/zephyr/kernel.h | Documents ISR limitations/divergences for k_sem_take and task-context-only restriction for k_sem_reset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the coverage gaps from the #41/#42 design review (#43): the handled race windows in the notification-backed k_sem/k_event had little or no test coverage. Adds 12 tests (7 on both targets, 5 HW-gated) plus divergence
@notes, and fixes one real contract gap the review fan-out surfaced.Stress tests
ret == 0 → count == 0;ret == -EAGAIN → count == 1) hold under every interleaving; after each cycle the reserved notification index is probed for stranded notifications (immediate take must fail-EBUSYAND a short blocking take must time out cleanly). The giver is a raw FreeRTOS task pinned to core 1 on multicore targets (k_thread_createpins to core 0), so the race windows see true concurrency on the S3. The both-outcomes meta-asserts are HW-only: a loaded CI host can stall the linux port's tick clock and collapse the sweep onto one outcome.k_sem_take.>inz_sem_pop_waiter),k_sem_resetwaking 3 waiters, a singlek_event_postwaking 3 waiters.CONFIG_K_TIMER_DISPATCH_ISR):k_sem_givefrom real ISR context with a prompt-wake latency bound proving theportYIELD_FROM_ISRpath (early-in-tick expiry at 10.1 ms makes a missing yield cost ~900 µs vs the 700 µs bound);k_sem_take(K_NO_WAIT)from ISR;k_event_postfrom ISR waking 3 waiters (the accumulate-then-yield-once path); IRAM-residency asserts fork_sem_take/k_event_wait.Contract fix (found in review)
k_sem_takeand thek_event_waitfamily are documented isr-ok withK_NO_WAIT(upstream contract), but were flash-resident — and the esp_timer ISR is allocatedESP_INTR_FLAG_IRAM, so the documented contract would fault if the ISR fired during a concurrent flash operation. They are nowK_ISR_SAFE(IRAM-resident), matching the existing precedent fork_sem_give/k_event_post/k_work_submit, with HW tests pinning the residency.Doc notes (inline-divergence convention)
k_sem_take: ISR-legal only withK_NO_WAIT; waiter priority is cached at enqueue (k_thread_priority_seton a blocked waiter does not re-sort the wake order — upstream re-sorts the pend queue).k_sem_give: wake order documented (highest priority, FIFO among equals).k_sem_reset: task context only. Upstream verification showed upstream's reset is not isr-ok either (no@isr_ok; callsz_reschedule), so this is parity — the k_sem/k_event: stress coverage for the handled races + two doc notes #43 item proposing an ISR-safe reset was based on a wrong premise.k_event_wait: k_event wakes ALL satisfied waiters with no priority ordering, so there is no stale value to document.Test plan
CONFIG_K_TIMER_DISPATCH_ISR=yverified in the regenerated sdkconfignmartifact probe:k_sem_take/k_event_waitinside[_iram_text_start, _iram_text_end)Closes #43
🤖 Generated with Claude Code