Skip to content

OpenThread stability issues with esp32c6/h2 and nrf52840 #53

@ivmarkov

Description

@ivmarkov

EDIT: Items addressed or partially addressed by the updated ESP 802.15.4 driver explicitly marked as such.

This is an umbrella issue for tracking stability of openthread when operating with the esp-hal 802.15.4 radio. To be extended in future for NRF52.

The purpose of this issue is to:

======

Missing incoming UDP packets

The problem is that with MTD devices (we build openthread only with MTD mode for now), the radio will enter a sleep mode after transmission, rather than go into RX mode.
As a result, incoming packets might get missed.

  • Proposed solution: enable rx_when_idle = true.
  • Implemented by the updated 802.15.4 driver: Yes

UPDATE: Confirmed partially.

ESP-IDF actually, does the same, unconditionally.

UPDATE: Note that this is some sort of "logical" rx_on_when_idle = true operation!
The rx_on_when_idle = true is NEVER set into the PIB register of the radio.
It is just a software flag, which - when set to true - causes the software around the radio to "manually" switch it to RX mode - in the ISR handler itself. There is this NEXT_OPERATION(bool) macro, which is used only within the ISR and which - at the end of the ISR execution - causes the radio either to enter an RX receive mode, or not.

More findings:

  • Implemented by the updated 802.15.4 driver: Yes - for timer0; timer1 not urgent as it is only necessary for tx/rx "at that time" operations

The ESP-IDF uses two timers - timer0 and timer1

  • timer1 is used for "receive at" commands, so for now maybe less interesting
  • timer0 however, is set to fire after ~ 200ms after a TX frame is sent and if it does fire, then this is treated as "TX ACK timeout event", i.e. we have not received the ACK for a TX event. Moreover, while waiting for the TX ack frame, the driver enters an "RX_ACK" state

Moreover, auto tx-ack (i.e. sending an ACK for a received RX frame) works only for frame version 1; for newer frames, the ESP-IDF radio driver "manually" sends an ACK frame:

// auto tx ack only works for the frame with version 0b00 and 0b01

Even for the "automatic" sending of an ACK frames, this is done explicitly by setting the radio driver in state ieee802154_set_state(IEEE802154_STATE_TX_ACK); - check if we are doing this.

Ditto for getting ACKs for sent TX frames, the radio driver explicitly sets the state to
ieee802154_set_state(IEEE802154_STATE_RX_ACK); - check if we are doing this. This is the moment when timer0 is set to 200ms.

Sub-proposal
  • REJECTED - see above; all traces of this now removed from openthread

Only enable this AFTER the device had joined the Thread network. Justification here.

While I indeed do observe that the device cannot join the Thread network if rx_when_idle is - at that time - already set to true, I can't explain to myself yet how ESP-IDF does it, given that it unconditionally enables the flag already at the very beginning?

Moreover, just setting log-level to "Debug" in the light_thread rs-matter-embassy example makes the problem disappear (or become less of an issue) as it connects then after a few retries. But the question how ESP-IDF does it with always-enabled rx_when_idle remains. So indeed - maybe a deeper problem here, related to the ones below.

I suspect we have a deeper issue which is not diagnosed yet and which is exposed during device joining the Thread network with rx_when_idle = true?

UPDATE: See above - the ESP-IDF radio drive never sets rx_on_when_idle = true in the PIB register!!

openthread is configured with too few buffers ("NoBufs" error)

  • Implemented by the updated 802.15.4 driver: Yes

This manifests itself with

INFO - [OpenThread] [N] 0MeshForwarder-: Dropping rx lowpan HC frame, error:NoBufs, len:52, src:2e9acaa65349e639, dst:0xffff, sec:no

Confirmed

The solution is to re-compile OpenThread C with this (which is already merged).

Perhaps, going with 128 is a bit too much though?
ESP-IDF uses 65 by default.

openthread cannot reassemble the 802.15.4 RX frames into an Ipv6 frame

  • Implemented by the updated 802.15.4 driver: Yes

TBD: Put an openthread log entry of the error.

Confirmed

Also not observed with ESP-IDF.

The solution seems to be to:

  • Enable rx_when_idle = true
  • Increase the queue of the baremetal 802.15.4 driver from 10 to something larger (I use 50 ATM, but that might be too much). Since this is a conf parameter, no changes to esp-radio necessary

The esp-radio 802.15.4 driver seems to run in low-power mode?

  • Implemented by the updated 802.15.4 driver: Yes (though only some bugfixes done in the driver); the default (in openthread) was 8dBm or less; ESP IDF was using 20dBm (max) by default)

Confirmed

What I'm observing is that ESP-IDF can connect from quite far, while esp-radio 802.15.4 seems to struggle with distance. I literally have to have the Matter controller at ~ 2-3m from the device, when using the baremetal driver.

In any case, it seems we are too simplistic in esp-radio as to how we set the power which is in dBm. We just set it in the register. ESP-IDF - in contrast - normalizes it to an "index" first.

Lot's of re-send retires observed at the Matter UDP packet level.

  • Implemented by the updated 802.15.4 driver: No

The larger the packet, the more difficult to send:

WARN - 
<<SND Re-sending
      => UDP [fdac:bcdf:a3c9:1:cd4b:f2ff:8956:c860]:5540 [SID:72d9,CTR:58c96b1][I|A|R,EID:1,PROTO:1,OP:5,ACTR:82a3c88]
      IM::ReportData

Cause - unknown.

Some speculation:

The esp driver does not "give back" to openthread the TX ACK frame it got

  • Implemented by the updated 802.15.4 driver: Yes

Basically here we should return to openthread the TX ACK frame. Might be the root cause for the re-send problem.

Need to figure out where exactly the received TX ACK frame is stored. Seems to be in the RX queue?! - https://github.com/esp-rs/esp-hal/blob/327071c262afaa0682ffd9ac55e0a747966e5214/esp-radio/src/ieee802154/raw.rs#L185

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions