[CMSIS-NN] Fix stateful execution and batch-major striding for CMSIS-NN LSTM by veblush · Pull Request #3564 · tensorflow/tflite-micro

veblush · 2026-05-21T18:24:41Z

Problem

The current CMSIS-NN LSTM wrapper uses arm_lstm_unidirectional_s8 and arm_lstm_unidirectional_s16. These CMSIS-NN functions are designed for stateless sequence evaluation: they explicitly wipe the cell state at t=0 and ignore any initial hidden state, returning only the sequence outputs.

This breaks TFLM's streaming/embedded ML workloads which rely on stateful LSTMs where the CellStateTensor and HiddenStateTensor persist as variable tensors across Invoke() calls.

Furthermore, CMSIS-NN's internal implementation for batch-major tensors (time_major=false with batch_size > 1) incorrectly jumps memory by time_steps, causing an out-of-bounds read on the contiguous hidden_state buffer.

Solution

Fallback to explicit looping: Implemented a manual time/batch loop within CMSIS_NN_EvalInteger8x8_16Lstm and CMSIS_NN_EvalInteger16x8_16Lstm that bypasses the stateless sequence evaluator and instead iteratively calls the single-step CMSIS-NN kernels (arm_nn_lstm_step_s8 and arm_nn_lstm_step_s16).
State Persistence: The fallback loop properly preserves the CellStateTensor and HiddenStateTensor across timesteps and invocations.
Stride Bug Bypass: For time_major=false, the loop evaluates one batch at a time (batch_size=1 passed to the kernel), which guarantees cache-friendly contiguous memory reads and avoids CMSIS-NN's batch striding bug entirely.
Future-proofing: Introduced #ifdef CMSIS_NN_STATEFUL_LSTM. Once ARM merges a fix upstream to support the optional hidden_state context pointer, this flag will seamlessly switch back to using the native CMSIS-NN sequence evaluator.

This completely solves the mismatch between the reference TFLM output and the CMSIS-NN implementation!

BUG=N/A

veblush requested a review from a team as a code owner May 21, 2026 18:24

veblush added the ci:full Triggers the comprehensive cross-platform test suite. label May 21, 2026

veblush mentioned this pull request May 21, 2026

Fixed LSTM ARM-software/CMSIS-NN#219

Open

Fixed unidirectional_sequence_lstm

d2fd6ae

veblush force-pushed the cm-lstm branch from fef2465 to d2fd6ae Compare May 21, 2026 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CMSIS-NN] Fix stateful execution and batch-major striding for CMSIS-NN LSTM#3564

[CMSIS-NN] Fix stateful execution and batch-major striding for CMSIS-NN LSTM#3564
veblush wants to merge 1 commit into
tensorflow:mainfrom
veblush:cm-lstm

veblush commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

veblush commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

veblush commented May 21, 2026 •

edited

Loading