motion: make emcmotError fifo lockfree MPSC (fixes #3951) by grandixximo · Pull Request #3993 · LinuxCNC/linuxcnc

grandixximo · 2026-05-02T11:24:28Z

cc @BsAtHome @hdiethelm (from the discussion in #3948 / #3951).

emc_message_handler(), installed by motmod at the end of rtapi_app_main, can be invoked from any RT thread. emcmotErrorPutfv() and emcmotErrorGet() share plain int counters with no synchronisation: concurrent RT producers race on num / start / end, and the RT producer also races num++ against the userspace consumer's num-- across processes.

rtapi_mutex_get is unsuitable here: it spins with sched_yield, which is priority inversion when an RT producer waits on the non-RT consumer (the header explicitly says it should not be used in realtime code). rtapi_mutex_try would convert the race into a silent message drop on contention.

This patch replaces the counters with a lockfree multi-producer single-consumer ring using GCC __atomic_* builtins on plain unsigned long long fields, so the layout is identical in the C producer (motmod) and the C++ consumer (milltask) sharing the shmem region.

Producers CAS-claim a sequence number on write_reserve, write the payload to slot S % EMCMOT_ERROR_NUM, then spin until write_commit == S and release-store S + 1. The in-order publish step ensures the consumer never sees a committed slot whose payload from a lower seq is still being written. The consumer compares read_seq to acquire-loaded write_commit, copies the slot, and release-stores read_seq + 1. Buffer-full returns -1, same bound as before.

Properties: RT-safe, no sched_yield, no priority inversion, MPSC-correct, no silent drops outside the buffer-full case, no API change.

Files touched:

src/emc/motion/motion.h: struct fields replaced (head / tail / start / end / num removed; write_reserve / write_commit / read_seq added).
src/emc/motion/emcmotutil.c: emcmotErrorInit / emcmotErrorPutfv / emcmotErrorGet rewritten.

This does not change when emc_message_handler is installed (#3948); that symptom is orthogonal and intentionally left for the discussion in #3948 to converge. The pushmsg component proposed by @BsAtHome in #3948 would be a useful test bed once available.

Tested: clean RIP build, no warnings.

BsAtHome · 2026-05-02T11:59:00Z

I fixed the rtapi/rtapi_atomic.h some time ago to actually be useful and compile successfully when I fixed the hal streamer code.
I just never tested the kernel mode side. However, it seems that the RTAI code just compiles fine and it uses rtapi_atomic.h. Therefore, the default wrappers should be available in kernel mode and should be used (just include <rtapi_atomic.h>).

grandixximo · 2026-05-02T12:06:54Z

Thanks @BsAtHome. Switched to <stdatomic.h> via <rtapi_atomic.h>, force-pushed the squash.

BsAtHome · 2026-05-02T12:26:12Z

There does seem to be a problem with atomics in kernel compile mode...
Check the rip-rtai check.

grandixximo · 2026-05-02T12:28:32Z

There does seem to be a problem with atomics in kernel compile mode... Check the rip-rtai check.

I think it was because of the includes order, has to be last.

Edit:

It was the order, passes now.

BsAtHome · 2026-05-02T12:40:06Z

You resolved the type declaration comment without saying why... So, why did you?

grandixximo · 2026-05-02T12:45:44Z

You resolved the type declaration comment without saying why... So, why did you?

I must have done it by mistake, wasn't intentional, I was wondering why it was compacted lol
I see now, it says I did mark it as resolved, must have clicked while scrolling.

hdiethelm · 2026-05-02T13:04:23Z

Some quick comments from my side. But I have to look into it in detail. This atomic stuff is always complicated and can have delays / deadlock if not done properly.

An RT thread should never spinnlock and even yield() can lock due to the cpu takes always the highest priority thread except priority inheritance is given but the way it is done now, it looks like it is not the case.

grandixximo · 2026-05-02T13:10:35Z

Reworked. Dropped the centralized write_commit and the in-order publish spin. Each slot now has its own slot_pub[i] marker; producer release-stores slot_pub[S % N] = S + 1 after writing payload, no producer-producer wait. Consumer at seq R checks slot_pub[R % N] == R + 1. Force-pushed for review.

hdiethelm · 2026-05-02T14:14:04Z

There is one thing we can assume for this queue which makes the algorithm simpler:
The queue is not getting to many messages, so we can poll in reader until all was written and all writers are finished.

So might be something like this would work, quick and dirty pseudo code:

push(in):
  atomic(write_reserve++ % SIZE)
  data[write_reserve] = in
  atomic(write_commit++ % SIZE)
  
pop(out):
  if(atomic(w = (write_reserve == write_commit))){ //I don't know if there is an atomic to do this
      if(read_ptr < w){
         out=data[read_ptr]
         read_ptr++ % size //No atomic needed, single consumer
      }
  }

grandixximo · 2026-05-02T14:32:17Z

@hdiethelm thanks for the alternative design. Question: in your version the consumer drains only when write_reserve == write_commit, so if any one producer is preempted mid-write, the consumer stalls until that producer resumes and bumps write_commit. Other completed producers' messages also wait, even though their slots are ready. For an ERR-rate queue this should be fine, but I want to confirm: is that consumer stall acceptable to you, or should I keep the per-slot publish marker (current d4c4d786b2) which lets the consumer make progress as soon as the next contiguous slot is published?

hdiethelm · 2026-05-02T14:55:18Z

I think the simpler the code the better as long as it fulfills the purpose.

The consumer doesn't really stall, it only skips it until the next cycle. As long as there is no message storm, this should be no issue and happen like once in a year.

Producers are all threads, consumer the emctaskmain which consumes only one message per cycle.

It can be an issue if no messages are printed any more due to for example the stepper thread writes a message at 25kHz but as long as the ringbuffer is not overwritten when it is full, as soon it is full, a message will be written again. But this is not in my pseudo code.

hdiethelm · 2026-05-02T14:57:03Z

Ah but there is an edge case: If a producer task freezes or crashes mid write, the ring buffer is locked forever.

hdiethelm · 2026-05-02T15:02:33Z

There is a reason why lock free queue algorithms have papers about it, it's hard to get it right:
https://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf

grandixximo · 2026-05-02T15:04:51Z

Thanks both. Pushed be3497e196: rtapi_atomic_* typedefs added per @BsAtHome, simpler counter design per @hdiethelm (CAS-claim + fetch_add(write_commit), consumer gates on reserve == commit).

grandixximo · 2026-05-02T15:09:54Z

Agreed, lock-free is treacherous. Walked through the race cases on the current design before pushing: producer-producer concurrency, mid-write skip, buffer wraparound, and the memory-ordering on fetch_add(write_commit) release vs consumer acquire. The quiescence gate is what makes it tractable: out-of-order commits are tolerated because the consumer only reads when reserve == commit, at which point all reserved slots are valid in seq order. Happy to dig into any specific scenario you want me to verify.

hdiethelm · 2026-05-02T15:15:16Z

Thanks, looks good at first glance. Does read_seq needs to be an atomic? emcmotErrorGet() is anyway not thread save and emcmotErrorPutfv() only reads it.

BsAtHome · 2026-05-02T15:15:27Z

You have again that header ordering problem. Where does that happen? What is it that needs to be included before the rtapi_atomic.h header?

emc_message_handler() can be invoked from any RT thread, making emcmotErrorPutfv() a multi-producer path that races on num/start/end against itself and against emcmotErrorGet() in the userspace task. rtapi_mutex_get spins with sched_yield, unsafe in RT context (priority inversion vs the non-RT consumer). rtapi_mutex_try would trade the race for silent message drops on contention. Replace the int counters with a lockfree MPSC ring using C11 <stdatomic.h> primitives via rtapi_atomic.h, matching the convention used by hal_lib.c. Struct fields stay plain unsigned long long for shmem layout compatibility with C++ TUs; access casts to _Atomic at the use site. - producers CAS-claim a slot on write_reserve, write payload, then publish in seq order via release-store on write_commit; - consumer compares read_seq against acquire-loaded write_commit. Buffer-full path returns -1 (same bound as before). No sched_yield in producer, no priority inversion, no silent drops outside the full case.

grandixximo · 2026-05-02T15:27:32Z

@hdiethelm yes, read_seq is single-writer (consumer) but read concurrently by RT producers for the buffer-full check w_reserve - read_seq < N; the producer's acquire-load pairs with the consumer's release-store so the slot copy happens-before reuse. C11 calls non-atomic concurrent access UB even when natural alignment would make it work in practice.

@BsAtHome the conflict is <stdatomic.h> (macro atomic_fetch_add(PTR, VAL)) vs <linux/atomic.h> via <asm-generic/atomic-instrumented.h> (function atomic_fetch_add(int, atomic_t*)). Macro defined first text-substitutes the function declaration. CI chain: motion.h => rtapi_atomic.h => stdatomic.h defines macros, then dbuf.h => rtapi_string.h => rtapi.h => linux/atomic.h tries to declare the functions and breaks. hal_lib.c already orders rtapi.h before rtapi_atomic.h manually; the motion.h change makes that order transitive for consumers.

Heading to sleep, will pick up tomorrow.

hdiethelm · 2026-05-02T19:16:15Z

@hdiethelm yes, read_seq is single-writer (consumer) but read concurrently by RT producers for the buffer-full check w_reserve - read_seq < N; the producer's acquire-load pairs with the consumer's release-store so the slot copy happens-before reuse. C11 calls non-atomic concurrent access UB even when natural alignment would make it work in practice.

Yes, that's true. It normally works assuming that the CPU can write the value in one op. But this assuming makes it probably UB when you look at it from a standards viewpoint.

For testing: Do you have a machine with good real time behavior? I got mine down to ~1us jitter as long as the GPU has not a lot to do.

Testing two scenarios would make sense:

Latency impact of the atomic when you trigger a log message in an RT task. Atomic can trigger a cache line flush and stall the CPU a bit. Should not be an issue but better save than sorry. I remember that there was a discussion somewhere about a hal module to trigger logs using signals but I don't remember where.
Concurrent producers and a single consumer, as many threads as you have cores, let it run for a while and check all received messages. Is there a framework for unit tests in linuxcnc?

grandixximo · 2026-05-03T02:20:03Z

Latency A/B on a non-RT N100 (no_isolcpus), 5 min each, latency-histogram +/- --pushmsg:

Metric	A base	A stress	Δ A	B base	B stress	Δ B
servo sdev µs	2.1	3.6	+1.5	2.0	3.5	+1.5
servo max µs	32.7	50.4	+17.7	26.8	38.2	+11.4
base sdev µs	0.5	0.7	+0.2	0.5	0.9	+0.4
base max µs	35.8	54.3	+18.5	48.0	47.5	-0.5

Same delta on servo sdev. Servo max delta is smaller on B (+11.4) than A (+17.7), so the atomic fifo is at worst neutral and shows tighter servo-max under stress. Box is non-isolated so absolute numbers are inflated; the relevant signal is the delta within each package. No regression.

Concurrency test was added as a separate commit (7a8f52f966): pthread driver under tests/lowlevel/emcmot-error-mpsc/, 8 producers x 100k messages, verifies exactly-once delivery and per-producer order. Run with scripts/runtests -p tests/lowlevel/emcmot-error-mpsc.

hdiethelm · 2026-05-03T18:57:26Z

Nice, even with a test! :-) Just checked, the test fails on master, so that's good.

Just a tiny thing: tests/lowlevel/emcmot-error-mpsc/test.sh should probably have the executable flag set.

How do you do --pushmsg?

grandixximo · 2026-05-04T00:11:34Z

How do you do --pushmsg?

I made a modified version of latency-histogram with component proposed by Bertho here #3948

pushmsg.zip
put in /src/hal/components/pushmsg.comp
latency-histogram.zip
this put in /scripts

Then rebuild, and you should be able to spam messages with --pushmsg flag

Just a tiny thing: tests/lowlevel/emcmot-error-mpsc/test.sh should probably have the executable flag set.

I believe I fixed it

8 producer pthreads pump 100k messages each; one consumer verifies exactly-once delivery and per-producer sequence ordering. Builds against the in-tree motion sources (emcmotutil.c, dbuf.c, stashf.c). Run with: scripts/runtests -p tests/lowlevel/emcmot-error-mpsc

BsAtHome · 2026-05-04T08:28:08Z

My pushmsg was a quick-and-dirty hack. Your version is pushing messages at the rate of the thread because it is level triggered. That may be very inadequate.

I suggest to make a version that can be used generically with load-time settable messages and edge trigger. Maybe something like:

loadrt pushmsg emsg="bla","more bla","ping pong" wmsg="oops","what about that..." imsg="told you so","telling you again"
addf pushmsg.0
...
# normaly done externally...
setp pushmsg.0.info.0 1

dmsg - debug
emsg - error
wmsg - warning
imsg - info

Could f.ex. use personality to set the amount of messages per level (to limit the loop).

grandixximo · 2026-05-04T09:21:48Z

It was only used as a quick and dirty test to see if performance regressed, are you asking for a component to put in tree? Or you want me to run the test again with the more general component, but still keep it out of tree??

BsAtHome · 2026-05-04T09:33:27Z

It was only used as a quick and dirty test to see if performance regressed, are you asking for a component to put in tree? Or you want me to run the test again with the more general component, but still keep it out of tree??

This pushmsg was just a dirty hack for testing. But it may be interesting to be able to push custom messages. There is one component, message, but that one is rather inflexible afaics.

Need to think a bit about a generic message generator that is flexible. Most constructs are difficult, compute heavy or have problematic interfacing. Here a genuine KISS approach is sought after ;-)

grandixximo · 2026-05-04T10:56:33Z

Got it, will leave it as a future separate PR. The dirty pushmsg stays a local test rig for this PR; the fifo race fix and concurrency test here don't depend on it. We should be good here right?

We can work on pushmsg over at #4003

@BsAtHome

A KISS RT-to-non-RT message generator. Each rising edge on pushmsg.error-N, pushmsg.warning-N, pushmsg.info-N, or pushmsg.debug-N emits the corresponding string from the load-time emsg/wmsg/imsg/dmsg arrays through rtapi_print_msg(). Slots without a configured message are inert. 16 slots per level, singleton instance. Discussed with @BsAtHome in LinuxCNC#3993; replaces the inflexible existing message HAL component for diagnostic / test-rig use.

grandixximo mentioned this pull request May 2, 2026

motion.c emc_message_handler() probably not thread save #3951

Open

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from d40f329 to f823e40 Compare May 2, 2026 12:06

BsAtHome reviewed May 2, 2026

View reviewed changes

Comment thread src/emc/motion/emcmotutil.c Outdated

Comment thread src/emc/motion/motion.h Outdated

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch 2 times, most recently from c803c75 to 4d0fe3c Compare May 2, 2026 12:22

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from 4d0fe3c to 10b2d30 Compare May 2, 2026 12:41

BsAtHome reviewed May 2, 2026

View reviewed changes

Comment thread src/emc/motion/motion.h Outdated

hdiethelm reviewed May 2, 2026

View reviewed changes

Comment thread src/emc/motion/emcmotutil.c

hdiethelm reviewed May 2, 2026

View reviewed changes

Comment thread src/emc/motion/emcmotutil.c Outdated

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from 10b2d30 to bbcb60a Compare May 2, 2026 13:07

BsAtHome reviewed May 2, 2026

View reviewed changes

Comment thread src/emc/motion/motion.h

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from bbcb60a to d4c4d78 Compare May 2, 2026 14:32

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from d4c4d78 to be3497e Compare May 2, 2026 15:03

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from be3497e to 2db72fe Compare May 2, 2026 15:18

BsAtHome reviewed May 3, 2026

View reviewed changes

Comment thread tests/lowlevel/emcmot-error-mpsc/test.c Outdated

Comment thread tests/lowlevel/emcmot-error-mpsc/test.c Outdated

Comment thread tests/lowlevel/emcmot-error-mpsc/test.sh Outdated

Comment thread tests/lowlevel/emcmot-error-mpsc/test.sh Outdated

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from 7a8f52f to 4330810 Compare May 4, 2026 00:24

grandixximo force-pushed the fix/3951-emcmot-error-mpsc branch from 4330810 to ff3c254 Compare May 4, 2026 00:46

grandixximo mentioned this pull request May 4, 2026

hal: add pushmsg component for RT message generation #4003

Draft

Conversation

grandixximo commented May 2, 2026

Uh oh!

BsAtHome commented May 2, 2026

Uh oh!

grandixximo commented May 2, 2026

Uh oh!

Uh oh!

Uh oh!

BsAtHome commented May 2, 2026

Uh oh!

grandixximo commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BsAtHome commented May 2, 2026

Uh oh!

grandixximo commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hdiethelm commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandixximo commented May 2, 2026

Uh oh!

Uh oh!

hdiethelm commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandixximo commented May 2, 2026

Uh oh!

hdiethelm commented May 2, 2026

Uh oh!

hdiethelm commented May 2, 2026

Uh oh!

hdiethelm commented May 2, 2026

Uh oh!

grandixximo commented May 2, 2026

Uh oh!

grandixximo commented May 2, 2026

Uh oh!

hdiethelm commented May 2, 2026

Uh oh!

BsAtHome commented May 2, 2026

Uh oh!

grandixximo commented May 2, 2026

Uh oh!

hdiethelm commented May 2, 2026

Uh oh!

grandixximo commented May 3, 2026

Uh oh!

hdiethelm commented May 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grandixximo commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BsAtHome commented May 4, 2026

Uh oh!

grandixximo commented May 4, 2026

Uh oh!

BsAtHome commented May 4, 2026

Uh oh!

grandixximo commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grandixximo commented May 2, 2026 •

edited

Loading

grandixximo commented May 2, 2026 •

edited

Loading

hdiethelm commented May 2, 2026 •

edited

Loading

hdiethelm commented May 2, 2026 •

edited

Loading

grandixximo commented May 4, 2026 •

edited

Loading

grandixximo commented May 4, 2026 •

edited

Loading