Skip to content

test: stress coverage for k_sem/k_event race windows (#43)#48

Merged
swoisz merged 3 commits into
mainfrom
test/k-sem-event-stress-hardening
Jun 7, 2026
Merged

test: stress coverage for k_sem/k_event race windows (#43)#48
swoisz merged 3 commits into
mainfrom
test/k-sem-event-stress-hardening

Conversation

@swoisz

@swoisz swoisz commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes the coverage gaps from the #41/#42 design review (#43): the handled race windows in the notification-backed k_sem/k_event had little or no test coverage. Adds 12 tests (7 on both targets, 5 HW-gated) plus divergence @notes, and fixes one real contract gap the review fan-out surfaced.

Stress tests

  • Timeout-vs-give / timeout-vs-post stress (k_sem + k_event): 100 iterations sweeping the giver's firing time across the taker's 20 ms timeout (18/19 wake, 20 races the timeout tick, 21/22 latch). Per-iteration conservation invariants (ret == 0 → count == 0; ret == -EAGAIN → count == 1) hold under every interleaving; after each cycle the reserved notification index is probed for stranded notifications (immediate take must fail -EBUSY AND a short blocking take must time out cleanly). The giver is a raw FreeRTOS task pinned to core 1 on multicore targets (k_thread_create pins to core 0), so the race windows see true concurrency on the S3. The both-outcomes meta-asserts are HW-only: a loaded CI host can stall the linux port's tick clock and collapse the sweep onto one outcome.
  • Give-racing-the-park: 1000 zero-delay handshake iterations against the unlock-sample-relock recheck window and the enqueue→notification-wait window inside k_sem_take.
  • Multi-waiter beyond two: 4-waiter conservation, FIFO order among equal-priority waiters (pins the strict > in z_sem_pop_waiter), k_sem_reset waking 3 waiters, a single k_event_post waking 3 waiters.
  • FromISR (gated on CONFIG_K_TIMER_DISPATCH_ISR): k_sem_give from real ISR context with a prompt-wake latency bound proving the portYIELD_FROM_ISR path (early-in-tick expiry at 10.1 ms makes a missing yield cost ~900 µs vs the 700 µs bound); k_sem_take(K_NO_WAIT) from ISR; k_event_post from ISR waking 3 waiters (the accumulate-then-yield-once path); IRAM-residency asserts for k_sem_take/k_event_wait.

Contract fix (found in review)

k_sem_take and the k_event_wait family are documented isr-ok with K_NO_WAIT (upstream contract), but were flash-resident — and the esp_timer ISR is allocated ESP_INTR_FLAG_IRAM, so the documented contract would fault if the ISR fired during a concurrent flash operation. They are now K_ISR_SAFE (IRAM-resident), matching the existing precedent for k_sem_give/k_event_post/k_work_submit, with HW tests pinning the residency.

Doc notes (inline-divergence convention)

  • k_sem_take: ISR-legal only with K_NO_WAIT; waiter priority is cached at enqueue (k_thread_priority_set on a blocked waiter does not re-sort the wake order — upstream re-sorts the pend queue).
  • k_sem_give: wake order documented (highest priority, FIFO among equals).
  • k_sem_reset: task context only. Upstream verification showed upstream's reset is not isr-ok either (no @isr_ok; calls z_reschedule), so this is parity — the k_sem/k_event: stress coverage for the handled races + two doc notes #43 item proposing an ISR-safe reset was based on a wrong premise.
  • The stale-priority note is deliberately NOT on k_event_wait: k_event wakes ALL satisfied waiters with no priority ordering, so there is no stale value to document.

Test plan

  • linux: 204/0 ×3 (197 + 7; ISR tests compile out)
  • ESP32-S3: 233/0 on hardware (221 + 12), CONFIG_K_TIMER_DISPATCH_ISR=y verified in the regenerated sdkconfig
  • nm artifact probe: k_sem_take/k_event_wait inside [_iram_text_start, _iram_text_end)
  • clang-format 21.1.8 clean

Closes #43

🤖 Generated with Claude Code

swoisz and others added 2 commits June 7, 2026 08:38
Closes the coverage gaps from the #41/#42 design review (issue #43):

- timeout-vs-give stress (k_sem) and timeout-vs-post stress (k_event):
  sweep the giver/poster firing time across the taker's 20 ms timeout
  (early / same-tick / late) for 100 iterations, checking conservation
  invariants on every outcome and probing the reserved notification
  index for stranded notifications after each cycle. The giver runs as
  a raw FreeRTOS task pinned to the other core on multicore targets
  (k_thread_create pins to core 0).
- give-racing-the-park: 1000 zero-delay handshake iterations hammer
  the unlock-sample-relock recheck window and the enqueue->park window
  inside k_sem_take.
- multi-waiter beyond two: 4-waiter conservation, FIFO order among
  equal-priority waiters (pins the strict '>' in z_sem_pop_waiter),
  k_sem_reset waking 3 waiters, single k_event_post waking 3 waiters.
- FromISR coverage (HW-gated on CONFIG_K_TIMER_DISPATCH_ISR):
  k_sem_give from real ISR context with a prompt-wake latency bound
  proving the portYIELD_FROM_ISR path (mid-tick expiry via K_USEC);
  k_sem_take(K_NO_WAIT) from ISR (upstream isr-ok contract);
  k_event_post from ISR waking 3 waiters (accumulate-yield-once path).

Doc notes on the declarations (inline-divergence convention):
- k_sem_take: ISR-legal only with K_NO_WAIT (upstream contract);
  waiter priority is cached at enqueue (k_thread_priority_set on a
  blocked waiter does not re-sort the wake order -- upstream re-sorts).
- k_sem_give: documented wake order (highest priority, FIFO among
  equals).
- k_sem_reset: task context only. Upstream verification showed
  upstream's k_sem_reset is NOT isr-ok either (no @isr_ok; calls
  z_reschedule), so this is parity, not a divergence -- the #43 item
  proposing an ISR-safe reset was based on a wrong premise.

The stale-priority note is NOT added to k_event_wait: k_event wakes
ALL satisfied waiters with no priority ordering, so there is no stale
value to document.

linux suite: 204/0 (197 + 7; the 3 ISR tests compile out on linux).

Refs #43

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fan-out findings folded:

- k_sem_take, z_event_wait_internal, and the four k_event_wait*
  wrappers are now K_ISR_SAFE (IRAM-resident). The documented
  isr-ok-with-K_NO_WAIT contract was unsound for flash-resident code:
  the esp_timer ISR is allocated ESP_INTR_FLAG_IRAM, so the take/wait
  fast paths would fault if the ISR fired during a concurrent flash
  operation. New HW-gated tests pin the IRAM residency (prior art:
  the iram_attr tests in test_k_timer.c).
- Prompt-wake test retuned: expiry at 10.1 ms (was 10.5) makes a
  missing portYIELD_FROM_ISR cost ~900 us instead of ~500, so the
  700 us bound has wide margin against both false pass (deferred
  wake) and false fail (cache-cold first run, tick coincidence).
- The both-outcomes sweep asserts are now HW-only: on the linux
  target a loaded CI host can stall the tick clock and collapse the
  sweep onto one outcome. The per-iteration conservation invariants
  (which hold under every interleaving) still run everywhere.
- BUILD_ASSERT(CONFIG_FREERTOS_HZ >= 1000): the 18..22 ms sweep
  separation quantizes away at coarser ticks.
- Multi-waiter state hygiene: mw_result/mw_order sentinel-reset per
  test (stale values from a prior test can no longer satisfy
  assertions), mw_next is a Zephyr atomic_t.
- kernel.h: precise wording on the K_NO_WAIT-from-ISR note (spinlock
  only, no task-notify/blocking calls, IRAM-resident).
- test_k_event_mt.c: pre-existing sizeof(stack) call sites migrated
  to K_THREAD_STACK_SIZEOF; FreeRTOS-priority-convention reminder on
  the priority-wake test.

linux suite: 204/0 x3.

Refs #43

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands test coverage for the notification-backed k_sem / k_event implementations, specifically targeting previously untested race windows (timeout-vs-give/post, give racing internal park windows, multi-waiter behaviors) and adds ISR-context validation for key paths. It also closes a contract gap by making k_sem_take and the k_event_wait* family IRAM-resident (K_ISR_SAFE) to satisfy the documented “ISR-ok with K_NO_WAIT” requirement in IRAM-only ISR contexts.

Changes:

  • Add stress + multi-waiter tests for k_sem and k_event, including reserved-notification “no stranded notify” invariants.
  • Add HW-gated FromISR tests (prompt-yield path, ISR K_NO_WAIT paths, IRAM residency asserts).
  • Mark k_sem_take and k_event_wait* paths K_ISR_SAFE (IRAM) and update API docs with ISR/divergence notes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/main/test_k_sem.c Adds k_sem stress, park-race, multi-waiter, and ISR/IRAM-residency tests.
test/main/test_k_event_mt.c Adds k_event stress, multi-waiter wake-all, and ISR/IRAM-residency tests; minor stack sizing cleanup.
components/zkernel/src/k_sem.c Makes k_sem_take IRAM-resident (K_ISR_SAFE) to uphold ISR K_NO_WAIT contract under cache-disabled windows.
components/zkernel/src/k_event.c Makes k_event_wait* and internal wait path IRAM-resident (K_ISR_SAFE) for the same contract reason.
components/zkernel/include/boreas/zephyr/kernel.h Documents ISR limitations/divergences for k_sem_take and task-context-only restriction for k_sem_reset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/main/test_k_sem.c
Comment thread test/main/test_k_sem.c
Comment thread test/main/test_k_sem.c Outdated
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@swoisz swoisz merged commit 5590fce into main Jun 7, 2026
9 of 10 checks passed
@swoisz swoisz deleted the test/k-sem-event-stress-hardening branch June 7, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

k_sem/k_event: stress coverage for the handled races + two doc notes

2 participants