Skip to content

Design: own the synchronization object state; wrap only the scheduler (notification-backed k_sem first) #40

@swoisz

Description

@swoisz

Strategic direction distilled from the corruption-family investigations (#18, #21, #22, #38): every place Boreas owns the object state machine (k_work's dlist+sem+lock, the k_timer linux dispatcher), Zephyr semantics came cheap and the caller-owned-memory contract was enforceable. Every place we wrap a FreeRTOS object whole, we inherit the impedance mismatch — kernel wait-lists threaded through caller-owned control blocks, no severance signal, semantics ceilings — and pay for it in protocol PRs.

Principle: wrap the scheduler, own the synchronization state. Tasks/blocking/tick must come from FreeRTOS (that is ESP-IDF's ground truth). Synchronization object state (counts, bits, waiter queues) is cheap to own, and direct-to-task notifications are the clean seam: notification state lives inside the TCB, kernel-owned — nothing of ours ever enters a kernel list from caller memory, structurally eliminating the dead-frame/dangling-node bug family for that object.

Per-primitive verdict

Primitive Verdict Rationale
k_thread wrap forever tasks ARE the scheduler; the #18 lifecycle protocol is the right shape
k_mutex keep wrapping FreeRTOS priority inheritance is the whole value; reimplementing PI correctly is the hardest primitive in the catalog
k_sem own it (notification-backed) see below — three wins attached
k_event likely own it second currently on xEventGroupCreateStatic (k_event.c) — EventBits_t reserves the top 8 bits, capping usable events at 24 vs Zephyr's 32: a hard semantics ceiling, and the own-impl is structurally identical to the sem (state word + waiter list + notifications)
k_msgq defer FreeRTOS queues do the copy-ring + dual-wait-queue job well; revisit only if purge/peek parity or another lifetime bite forces it
k_timer, k_work already own state the existing proof this approach pays

k_sem, notification-backed (first work item)

{count, limit, waiter sys_dlist, portMUX} in the Boreas struct; block via xTaskNotifyWait on a dedicated index; give pops a waiter under the lock and xTaskNotifyGives its handle. Buys in one move:

Spike first (blocking question): task-notification index allocation. Some ESP-IDF internals use notifications (index 0); a dedicated index needs configTASK_NOTIFICATION_ARRAY_ENTRIES > 1 and a project-wide convention. Verify availability/configurability on both silicon and the linux port before committing to the design.

Sequencing

One primitive per PR via the established loop (linux suite + S3 + both-IDF CI). The #21 corruption regression harness exists precisely to validate this class of change. k_sem first; evaluate k_event after it lands clean; stop there unless something bites.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designArchitecture / design-principle workzephyr-parityDivergence from upstream Zephyr API/semantics

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions