feat!: notification-backed k_sem (own state, no FreeRTOS control block)#41
Conversation
First primitive converted under the #40 design direction: wrap the scheduler, own the synchronization state. The count, limit, and waiter list now live in the caller-owned struct k_sem, guarded by a portMUX; blocking and wakeup ride FreeRTOS direct-to-task notifications on reserved index 1, whose state is kernel-owned (inside the TCB). Nothing of the caller's memory ever enters a kernel list -- the object-lifetime principle (#21/#22) holds by construction: when k_sem_take returns, the kernel has zero references into the caller's storage. Waiter nodes are stack-resident in k_sem_take and unlinked under the sem lock before return, by the giver (which copies only the task handle out of the node) or by the timeout path. The timeout-vs-give race uses the consume-the-in-flight-give pattern from the k_work sync slot: a waiter that times out after being targeted drains the notification before returning so it cannot poison a later take. Upstream-parity wins that the FreeRTOS-backed sem could not express: - K_SEM_DEFINE is a true compile-time initializer -- usable from any constructor; the per-instance ctor machinery and its Xtensa .init_array ordering caveat are deleted (closes the K_SEM_DEFINE half of #26) - k_sem_reset wakes all waiters with -EAGAIN (previously could only drain the count) - k_sem_give wakes the highest-priority waiter (FIFO among equals); priority cached at enqueue so no FreeRTOS calls happen under the lock - k_sem_init returns -EINVAL for limit == 0 or initial > limit BREAKING (config): requires CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES >= 2 -- zkernel reserves notification index 1; index 0 stays free for ESP-IDF internals (esp_ipc, pthread, eth, usb). Enforced by a compile-time #error with the sdkconfig line in the message; set in the test app and all example sdkconfig.defaults; documented in the README configuration table. Tests: 186 -> 190 (constructor-time K_SEM_DEFINE use, -EINVAL args, reset wakes a blocked waiter with -EAGAIN, give wakes the higher-priority of two waiters enqueued FIFO-adversarially). Full suite green x4 consecutive runs on linux -- every k_work queue and the #21 corruption regression shapes now run on the new sem. Part of #40 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates k_sem to a notification-backed implementation where semaphore state (count/limit/waiter list) is fully owned by struct k_sem, and blocking/wakeup uses FreeRTOS direct-to-task notifications on reserved index 1. This advances Boreas’ “wrap the scheduler, own the synchronization state” direction and restores key Zephyr parity behaviors while enforcing the object-lifetime principle by construction.
Changes:
- Replaces the FreeRTOS counting-semaphore-backed
k_semwith a notification-backed implementation, adds a compile-time config guard forconfigTASK_NOTIFICATION_ARRAY_ENTRIES >= 2, and updatesK_SEM_DEFINEto be a true compile-time initializer. - Deduplicates ISR-aware critical section helpers into a private
zkernel_internal.hand updatesk_work.cto use them. - Updates docs/config defaults and extends tests for constructor usability, invalid args, reset-wakes semantics, and priority-ordered wakeup.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
components/zkernel/src/k_sem.c |
Implements notification-backed semaphore with reserved notification index and Zephyr-parity behaviors. |
components/zkernel/include/boreas/zephyr/kernel.h |
Updates struct k_sem, switches K_SEM_DEFINE to compile-time init, and documents new semantics/divergences. |
components/zkernel/src/zkernel_internal.h |
Adds shared ISR-aware z_kernel_lock()/z_kernel_unlock() helpers for portMUX critical sections. |
components/zkernel/src/k_work.c |
Replaces local queue lock helpers with shared z_kernel_lock()/unlock() usage. |
components/zkernel/README.md |
Documents notification-backed k_sem, new return codes, and required FreeRTOS notification-array config. |
test/main/test_k_sem.c |
Adds new tests for constructor use, invalid init args, reset waking with -EAGAIN, and priority wakeup. |
test/sdkconfig.defaults |
Sets CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES=2 for tests. |
examples/work_queue_demo/sdkconfig.defaults |
Sets required FreeRTOS notification-array entries for example builds. |
examples/uart_rs485_echo/sdkconfig.defaults |
Sets required FreeRTOS notification-array entries for example builds. |
examples/uart_loopback/sdkconfig.defaults |
Sets required FreeRTOS notification-array entries for example builds. |
examples/shell_demo/sdkconfig.defaults |
Sets required FreeRTOS notification-array entries for example builds. |
examples/log_demo/sdkconfig.defaults |
Sets required FreeRTOS notification-array entries for example builds. |
examples/blink_device_model/sdkconfig.defaults |
Sets required FreeRTOS notification-array entries for example builds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
k_sem_take initialized the waiter struct (xTaskGetCurrentTaskHandle + uxTaskPriorityGet) inside the critical section, contradicting the design intent documented on z_sem_waiter.prio: uxTaskPriorityGet enters FreeRTOS's own critical section, nesting spinlocks under the sem lock on SMP. Sample before locking; the fast path pays two cheap getters, and no FreeRTOS call happens under the sem lock anywhere. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous fix moved xTaskGetCurrentTaskHandle/uxTaskPriorityGet to the top of k_sem_take, putting them on every path -- including the pre-scheduler constructor path that K_SEM_DEFINE's static initializer exists to support, where pxCurrentTCB is NULL and uxTaskPriorityGet dereferences it. CI's Linux-proper runner caught it as an instant pre-main SIGSEGV (macOS tolerated the access, so local runs passed -- exactly why the CI matrix runs Linux-proper). Restore the invariant both ways: fast paths (count hit, K_NO_WAIT) never sample and stay pre-scheduler-safe; the must-block path samples OUTSIDE the lock via unlock -> sample -> relock -> re-check (the re-check loop), so no FreeRTOS call ever happens under the sem lock. Blocking before the scheduler starts is invalid regardless. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Design note — why the notification-array requirement isn't auto-selected in Kconfig: investigated and empirically tested. |
Raw CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES=2 lines in every
consumer's sdkconfig.defaults were correct but unreadable -- setting an
unfamiliar FreeRTOS int gives the user no idea why. Kconfig cannot make
it automatic (select is bool-only; a component-supplied default for an
int it does not own loses the parse-order race to the owning component
-- verified empirically), so the requirement moves one level up:
- sdkconfig.boreas at the repo root carries all Boreas-required
settings with the rationale inline; future requirements centralize
there instead of accreting raw lines downstream.
- Consumers reference it by name in SDKCONFIG_DEFAULTS:
set(SDKCONFIG_DEFAULTS "sdkconfig.defaults;<boreas>/sdkconfig.boreas")
The test app and all examples are wired this way and their raw
config lines removed.
- The compile-time #error remains as the backstop, and the README
configuration section documents the pattern.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sdkconfig.boreas requirement lived only in the zkernel component README -- the top-level Usage section (the first thing an integrator reads) now wires it into the submodule instructions, including the one-time config regeneration step. CHANGELOG.md captures the 2026-06 hardening series as a downstream migration checklist: the defaults-list addition (#41), the k_work return-code audit (#37), the k_work_cancel polarity inversion (#38), the cancel_delayable_sync collapse of triple-cancel workarounds, the reserved notification index, and the k_sem_take abort restriction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Part of #40 — the first primitive converted under the "wrap the scheduler, own the synchronization state" direction. The count, limit, and waiter list now live in the caller-owned
struct k_semunder a portMUX; blocking and wakeup ride FreeRTOS direct-to-task notifications on reserved index 1, whose state is kernel-owned (inside the TCB). Nothing of the caller's memory ever enters a kernel list — the object-lifetime principle (#21/#22, zkernel README) holds by construction: whenk_sem_takereturns, the kernel has zero references into the caller's storage.Waiter nodes are stack-resident in
k_sem_takeand unlinked under the sem lock before return — by the giver (which copies only the task handle out of the node) or by the timeout path. The timeout-vs-give race uses the consume-the-in-flight-give pattern proven in the k_work sync slot (#38).CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES=2(or higher) — zkernel reserves notification index 1; index 0 stays free for ESP-IDF internals (esp_ipc,pthread, eth, usb — all verified index-0 users). Enforced by a compile-time#errorwhose message contains the sdkconfig line; set in the test app and all examplesdkconfig.defaults; documented in the README configuration table.Upstream-parity wins the FreeRTOS-backed sem could not express
K_SEM_DEFINEis a true compile-time initializer — usable from any constructor; the per-instance ctor machinery and its Xtensa.init_arrayordering caveat are deleted (closes theK_SEM_DEFINEhalf of K_SEM_DEFINE is not a static initializer (requires runtime k_sem_init) #26), with upstream'sBUILD_ASSERTon the limitsk_sem_resetwakes all waiters with-EAGAIN(previously could only drain the count)k_sem_givewakes the highest-priority waiter (FIFO among equals; priority cached at enqueue so no FreeRTOS calls occur under the lock)k_sem_initreturns-EINVALforlimit == 0orinitial > limitDocumented divergence (from review)
A thread blocked in
k_sem_takemust not be aborted: upstream unpends aborted threads from wait queues, while Boreas cannot reach into the sem's waiter list fromk_thread_abort— the old FreeRTOS-backed sem tolerated this viavTaskDelete's own event-list cleanup.@notes on bothk_sem_takeandk_thread_abort; the structural fix (per-thread pend registry) is future #40 work.Internals cleanup
ISR-aware critical-section helpers deduplicated into
src/zkernel_internal.h(private);k_work.cmigrated to the sharedz_kernel_lock/z_kernel_unlock.Tests (190 on linux / 214 on S3)
Four new:
K_SEM_DEFINEconsumed from a constructor beforemain();-EINVALargs; reset wakes a blocked waiter with-EAGAIN; give wakes the higher-priority of two waiters enqueued FIFO-adversarially. The whole suite is itself the deep test — every k_work queue and the #21 corruption regression shapes now run on the new sem.Validation
Both via clean
rm -rf build sdkconfigregeneration with the Kconfig flip verified insdkconfig.🤖 Generated with Claude Code