Skip to content

feat: add SpeechNet (SilentWear) training test#31

Draft
runwangdl wants to merge 7 commits into
develfrom
feat/speechnet-training
Draft

feat: add SpeechNet (SilentWear) training test#31
runwangdl wants to merge 7 commits into
develfrom
feat/speechnet-training

Conversation

@runwangdl
Copy link
Copy Markdown
Owner

Summary

  • Add SpeechNet EMG silent speech recognition model (14ch × 700 samples, 9 classes, ~15K params) to training test suite
  • Fix ConvLayer.computeShapes bias shape bug: inputShapes[1][0](inputShapes[1][0],) — prevents graphsurgeon export crash on Conv layers with bias
  • Register SpeechNet in L2 singlebuffer training config (l1=128000, l2=2000000)

Test results

Untiled (verified):

[loss 0] computed=2.267950  ref=2.267950  diff=0.000000
[loss 1] computed=2.498553  ref=2.498553  diff=0.000000
[loss 2] computed=2.083153  ref=2.083153  diff=0.000000
[loss 3] computed=1.905963  ref=1.905963  diff=0.000000
Errors: 0 out of 4
BENCH train_cycles=285250543 opt_cycles=429083 weight_sram=61956

Test plan

  • Untiled Siracusa training: PASS (4/4 loss exact match)
  • Tiled Siracusa training (L2 singlebuffer, l1=128000)
  • CI regression on existing models

🤖 Generated with Claude Code

Add SpeechNet EMG silent speech recognition model (14 channels, 700
time samples, 9 classes, ~15K params) to the training test suite.

Changes:
- Add SpeechNet training ONNX artifacts (network.onnx, inputs.npz,
  outputs.npz, optimizer network) exported from Onnx4Deeploy with
  static reshape (no dynamic Shape/Flatten ops).
- Fix ConvLayer.computeShapes bias shape: wrap scalar int in tuple
  to prevent graphsurgeon export crash on Conv layers with bias.
- Register SpeechNet in L2 singlebuffer training test config
  (l1=128000, l2=2000000).

Untiled test verified: 4/4 loss diff=0.000000, 285M train cycles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@runwangdl runwangdl marked this pull request as draft May 18, 2026 08:55
runwangdl and others added 4 commits May 18, 2026 12:49
Freeze Block0 (first conv layer with large 14×701 activations) to
avoid tiling issues with its backward pass. Train Block1-4 + FC
(18 trainable params, 4 ConvGrad + 4 BatchNormGrad).

Tiled test verified: 4/4 loss diff < 0.001, 96M train cycles.

Block0 backward tiling hang is tracked separately — the 314 KB
activation tensor requires heavy L1 tiling that triggers a
simulation hang in the ConvGrad/AveragePoolGrad backward path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Im2Col ConvGradX kernel passes ctxtBufferSize computed from
full-op dimensions to the kernel, but the actual L1 allocation is
much smaller. The kernel's co_block auto-tuning reads ctxtBufferSize
to decide how much to write into the im2col buffer — with the
inflated value it writes past L1 bounds, causing a hang.

Switch ConvGradX to the naive (non-im2col) kernel which does not
require a transient im2col buffer. This fixes the SpeechNet tiled
training hang on Block0+Block1 backward pass where the ConvGradX
im2col buffer would need 1.2 MB (full-op) vs 128 KB L1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that ConvGradX uses the naive kernel, full SpeechNet training
(all 5 blocks + FC, 22 trainable params) passes tiled simulation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@runwangdl runwangdl force-pushed the feat/speechnet-training branch from f034897 to a74bfac Compare May 18, 2026 13:30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@runwangdl runwangdl force-pushed the feat/speechnet-training branch from a74bfac to 95fef65 Compare May 18, 2026 13:30
Step-by-step tutorial covering PyTorch model design, Onnx4Deeploy
export, untiled/tiled Deeploy deployment, tiling pipeline overview,
common pitfalls, and GVSoC trace debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant