Skip to content

feat!: notification-backed k_sem (own state, no FreeRTOS control block)#41

Merged
swoisz merged 5 commits into
mainfrom
feature/k-sem-native
Jun 7, 2026
Merged

feat!: notification-backed k_sem (own state, no FreeRTOS control block)#41
swoisz merged 5 commits into
mainfrom
feature/k-sem-native

Conversation

@swoisz

@swoisz swoisz commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Part of #40 — the first primitive converted under the "wrap the scheduler, own the synchronization state" direction. The count, limit, and waiter list now live in the caller-owned struct k_sem under a portMUX; blocking and wakeup ride FreeRTOS direct-to-task notifications on reserved index 1, whose state is kernel-owned (inside the TCB). Nothing of the caller's memory ever enters a kernel list — the object-lifetime principle (#21/#22, zkernel README) holds by construction: when k_sem_take returns, the kernel has zero references into the caller's storage.

Waiter nodes are stack-resident in k_sem_take and unlinked under the sem lock before return — by the giver (which copies only the task handle out of the node) or by the timeout path. The timeout-vs-give race uses the consume-the-in-flight-give pattern proven in the k_work sync slot (#38).

⚠️ BREAKING (config): requires CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES=2 (or higher) — zkernel reserves notification index 1; index 0 stays free for ESP-IDF internals (esp_ipc, pthread, eth, usb — all verified index-0 users). Enforced by a compile-time #error whose message contains the sdkconfig line; set in the test app and all example sdkconfig.defaults; documented in the README configuration table.

Upstream-parity wins the FreeRTOS-backed sem could not express

  • K_SEM_DEFINE is a true compile-time initializer — usable from any constructor; the per-instance ctor machinery and its Xtensa .init_array ordering caveat are deleted (closes the K_SEM_DEFINE half of K_SEM_DEFINE is not a static initializer (requires runtime k_sem_init) #26), with upstream's BUILD_ASSERT on the limits
  • k_sem_reset wakes all waiters with -EAGAIN (previously could only drain the count)
  • k_sem_give wakes the highest-priority waiter (FIFO among equals; priority cached at enqueue so no FreeRTOS calls occur under the lock)
  • k_sem_init returns -EINVAL for limit == 0 or initial > limit

Documented divergence (from review)

A thread blocked in k_sem_take must not be aborted: upstream unpends aborted threads from wait queues, while Boreas cannot reach into the sem's waiter list from k_thread_abort — the old FreeRTOS-backed sem tolerated this via vTaskDelete's own event-list cleanup. @notes on both k_sem_take and k_thread_abort; the structural fix (per-thread pend registry) is future #40 work.

Internals cleanup

ISR-aware critical-section helpers deduplicated into src/zkernel_internal.h (private); k_work.c migrated to the shared z_kernel_lock/z_kernel_unlock.

Tests (190 on linux / 214 on S3)

Four new: K_SEM_DEFINE consumed from a constructor before main(); -EINVAL args; reset wakes a blocked waiter with -EAGAIN; give wakes the higher-priority of two waiters enqueued FIFO-adversarially. The whole suite is itself the deep test — every k_work queue and the #21 corruption regression shapes now run on the new sem.

Validation

Target Result
linux (host) 190 Tests 0 Failures × 7 consecutive runs
esp32s3 (on hardware) 214 Tests 0 Failures

Both via clean rm -rf build sdkconfig regeneration with the Kconfig flip verified in sdkconfig.

🤖 Generated with Claude Code

First primitive converted under the #40 design direction: wrap the
scheduler, own the synchronization state. The count, limit, and waiter
list now live in the caller-owned struct k_sem, guarded by a portMUX;
blocking and wakeup ride FreeRTOS direct-to-task notifications on
reserved index 1, whose state is kernel-owned (inside the TCB). Nothing
of the caller's memory ever enters a kernel list -- the object-lifetime
principle (#21/#22) holds by construction: when k_sem_take returns, the
kernel has zero references into the caller's storage.

Waiter nodes are stack-resident in k_sem_take and unlinked under the
sem lock before return, by the giver (which copies only the task handle
out of the node) or by the timeout path. The timeout-vs-give race uses
the consume-the-in-flight-give pattern from the k_work sync slot: a
waiter that times out after being targeted drains the notification
before returning so it cannot poison a later take.

Upstream-parity wins that the FreeRTOS-backed sem could not express:
- K_SEM_DEFINE is a true compile-time initializer -- usable from any
  constructor; the per-instance ctor machinery and its Xtensa
  .init_array ordering caveat are deleted (closes the K_SEM_DEFINE
  half of #26)
- k_sem_reset wakes all waiters with -EAGAIN (previously could only
  drain the count)
- k_sem_give wakes the highest-priority waiter (FIFO among equals);
  priority cached at enqueue so no FreeRTOS calls happen under the lock
- k_sem_init returns -EINVAL for limit == 0 or initial > limit

BREAKING (config): requires
CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES >= 2 -- zkernel
reserves notification index 1; index 0 stays free for ESP-IDF
internals (esp_ipc, pthread, eth, usb). Enforced by a compile-time
#error with the sdkconfig line in the message; set in the test app and
all example sdkconfig.defaults; documented in the README configuration
table.

Tests: 186 -> 190 (constructor-time K_SEM_DEFINE use, -EINVAL args,
reset wakes a blocked waiter with -EAGAIN, give wakes the
higher-priority of two waiters enqueued FIFO-adversarially). Full
suite green x4 consecutive runs on linux -- every k_work queue and the
#21 corruption regression shapes now run on the new sem.

Part of #40

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates k_sem to a notification-backed implementation where semaphore state (count/limit/waiter list) is fully owned by struct k_sem, and blocking/wakeup uses FreeRTOS direct-to-task notifications on reserved index 1. This advances Boreas’ “wrap the scheduler, own the synchronization state” direction and restores key Zephyr parity behaviors while enforcing the object-lifetime principle by construction.

Changes:

  • Replaces the FreeRTOS counting-semaphore-backed k_sem with a notification-backed implementation, adds a compile-time config guard for configTASK_NOTIFICATION_ARRAY_ENTRIES >= 2, and updates K_SEM_DEFINE to be a true compile-time initializer.
  • Deduplicates ISR-aware critical section helpers into a private zkernel_internal.h and updates k_work.c to use them.
  • Updates docs/config defaults and extends tests for constructor usability, invalid args, reset-wakes semantics, and priority-ordered wakeup.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
components/zkernel/src/k_sem.c Implements notification-backed semaphore with reserved notification index and Zephyr-parity behaviors.
components/zkernel/include/boreas/zephyr/kernel.h Updates struct k_sem, switches K_SEM_DEFINE to compile-time init, and documents new semantics/divergences.
components/zkernel/src/zkernel_internal.h Adds shared ISR-aware z_kernel_lock()/z_kernel_unlock() helpers for portMUX critical sections.
components/zkernel/src/k_work.c Replaces local queue lock helpers with shared z_kernel_lock()/unlock() usage.
components/zkernel/README.md Documents notification-backed k_sem, new return codes, and required FreeRTOS notification-array config.
test/main/test_k_sem.c Adds new tests for constructor use, invalid init args, reset waking with -EAGAIN, and priority wakeup.
test/sdkconfig.defaults Sets CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES=2 for tests.
examples/work_queue_demo/sdkconfig.defaults Sets required FreeRTOS notification-array entries for example builds.
examples/uart_rs485_echo/sdkconfig.defaults Sets required FreeRTOS notification-array entries for example builds.
examples/uart_loopback/sdkconfig.defaults Sets required FreeRTOS notification-array entries for example builds.
examples/shell_demo/sdkconfig.defaults Sets required FreeRTOS notification-array entries for example builds.
examples/log_demo/sdkconfig.defaults Sets required FreeRTOS notification-array entries for example builds.
examples/blink_device_model/sdkconfig.defaults Sets required FreeRTOS notification-array entries for example builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread components/zkernel/src/k_sem.c Outdated
swoisz and others added 2 commits June 6, 2026 20:23
k_sem_take initialized the waiter struct (xTaskGetCurrentTaskHandle +
uxTaskPriorityGet) inside the critical section, contradicting the
design intent documented on z_sem_waiter.prio: uxTaskPriorityGet
enters FreeRTOS's own critical section, nesting spinlocks under the
sem lock on SMP. Sample before locking; the fast path pays two cheap
getters, and no FreeRTOS call happens under the sem lock anywhere.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous fix moved xTaskGetCurrentTaskHandle/uxTaskPriorityGet to
the top of k_sem_take, putting them on every path -- including the
pre-scheduler constructor path that K_SEM_DEFINE's static initializer
exists to support, where pxCurrentTCB is NULL and uxTaskPriorityGet
dereferences it. CI's Linux-proper runner caught it as an instant
pre-main SIGSEGV (macOS tolerated the access, so local runs passed --
exactly why the CI matrix runs Linux-proper).

Restore the invariant both ways: fast paths (count hit, K_NO_WAIT)
never sample and stay pre-scheduler-safe; the must-block path samples
OUTSIDE the lock via unlock -> sample -> relock -> re-check (the
re-check loop), so no FreeRTOS call ever happens under the sem lock.
Blocking before the scheduler starts is invalid regardless.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@swoisz

swoisz commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

Design note — why the notification-array requirement isn't auto-selected in Kconfig: investigated and empirically tested. select/imply are bool-only, so an int symbol like FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES cannot be forced by another symbol. The only cross-component mechanism — redefining the symbol with an added default 2 in zkernel's Kconfig — loses the parse-order race deterministically (core IDF components parse first; tested: the regenerated sdkconfig came out =1, with the redefinition flagged by IDF's kconfig tooling). A relational depends on ... >= 2 would silently hide zkernel in menuconfig — worse discoverability than the compile-time #error, which names the exact sdkconfig line to add. Project sdkconfig.defaults + loud #error is the IDF-idiomatic pattern for int-config requirements across components, so that's what ships.

swoisz and others added 2 commits June 6, 2026 21:04
Raw CONFIG_FREERTOS_TASK_NOTIFICATION_ARRAY_ENTRIES=2 lines in every
consumer's sdkconfig.defaults were correct but unreadable -- setting an
unfamiliar FreeRTOS int gives the user no idea why. Kconfig cannot make
it automatic (select is bool-only; a component-supplied default for an
int it does not own loses the parse-order race to the owning component
-- verified empirically), so the requirement moves one level up:

- sdkconfig.boreas at the repo root carries all Boreas-required
  settings with the rationale inline; future requirements centralize
  there instead of accreting raw lines downstream.
- Consumers reference it by name in SDKCONFIG_DEFAULTS:
      set(SDKCONFIG_DEFAULTS "sdkconfig.defaults;<boreas>/sdkconfig.boreas")
  The test app and all examples are wired this way and their raw
  config lines removed.
- The compile-time #error remains as the backstop, and the README
  configuration section documents the pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sdkconfig.boreas requirement lived only in the zkernel component
README -- the top-level Usage section (the first thing an integrator
reads) now wires it into the submodule instructions, including the
one-time config regeneration step.

CHANGELOG.md captures the 2026-06 hardening series as a downstream
migration checklist: the defaults-list addition (#41), the k_work
return-code audit (#37), the k_work_cancel polarity inversion (#38),
the cancel_delayable_sync collapse of triple-cancel workarounds, the
reserved notification index, and the k_sem_take abort restriction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants