feat: add SpeechNet (SilentWear) training test by runwangdl · Pull Request #31 · runwangdl/TrainDeeploy

runwangdl · 2026-05-18T08:55:00Z

Summary

Add SpeechNet EMG silent speech recognition model (14ch × 700 samples, 9 classes, ~15K params) to training test suite
Fix ConvLayer.computeShapes bias shape bug: inputShapes[1][0] → (inputShapes[1][0],) — prevents graphsurgeon export crash on Conv layers with bias
Register SpeechNet in L2 singlebuffer training config (l1=128000, l2=2000000)

Test results

Untiled (verified):

[loss 0] computed=2.267950  ref=2.267950  diff=0.000000
[loss 1] computed=2.498553  ref=2.498553  diff=0.000000
[loss 2] computed=2.083153  ref=2.083153  diff=0.000000
[loss 3] computed=1.905963  ref=1.905963  diff=0.000000
Errors: 0 out of 4
BENCH train_cycles=285250543 opt_cycles=429083 weight_sram=61956

Test plan

Untiled Siracusa training: PASS (4/4 loss exact match)
Tiled Siracusa training (L2 singlebuffer, l1=128000)
CI regression on existing models

🤖 Generated with Claude Code

Add SpeechNet EMG silent speech recognition model (14 channels, 700 time samples, 9 classes, ~15K params) to the training test suite. Changes: - Add SpeechNet training ONNX artifacts (network.onnx, inputs.npz, outputs.npz, optimizer network) exported from Onnx4Deeploy with static reshape (no dynamic Shape/Flatten ops). - Fix ConvLayer.computeShapes bias shape: wrap scalar int in tuple to prevent graphsurgeon export crash on Conv layers with bias. - Register SpeechNet in L2 singlebuffer training test config (l1=128000, l2=2000000). Untiled test verified: 4/4 loss diff=0.000000, 285M train cycles. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Freeze Block0 (first conv layer with large 14×701 activations) to avoid tiling issues with its backward pass. Train Block1-4 + FC (18 trainable params, 4 ConvGrad + 4 BatchNormGrad). Tiled test verified: 4/4 loss diff < 0.001, 96M train cycles. Block0 backward tiling hang is tracked separately — the 314 KB activation tensor requires heavy L1 tiling that triggers a simulation hang in the ConvGrad/AveragePoolGrad backward path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Im2Col ConvGradX kernel passes ctxtBufferSize computed from full-op dimensions to the kernel, but the actual L1 allocation is much smaller. The kernel's co_block auto-tuning reads ctxtBufferSize to decide how much to write into the im2col buffer — with the inflated value it writes past L1 bounds, causing a hang. Switch ConvGradX to the naive (non-im2col) kernel which does not require a transient im2col buffer. This fixes the SpeechNet tiled training hang on Block0+Block1 backward pass where the ConvGradX im2col buffer would need 1.2 MB (full-op) vs 128 KB L1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Now that ConvGradX uses the naive kernel, full SpeechNet training (all 5 blocks + FC, 22 trainable params) passes tiled simulation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Step-by-step tutorial covering PyTorch model design, Onnx4Deeploy export, untiled/tiled Deeploy deployment, tiling pipeline overview, common pitfalls, and GVSoC trace debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

runwangdl marked this pull request as draft May 18, 2026 08:55

runwangdl and others added 4 commits May 18, 2026 12:49

update(speechnet): full 22-trainable test data (all blocks)

d8a627f

Now that ConvGradX uses the naive kernel, full SpeechNet training (all 5 blocks + FC, 22 trainable params) passes tiled simulation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: fix yapf/isort formatting

3cd9e56

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

runwangdl force-pushed the feat/speechnet-training branch from f034897 to a74bfac Compare May 18, 2026 13:30

style: fix yapf formatting in codeGenerateTraining.py

95fef65

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

runwangdl force-pushed the feat/speechnet-training branch from a74bfac to 95fef65 Compare May 18, 2026 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add SpeechNet (SilentWear) training test#31

feat: add SpeechNet (SilentWear) training test#31
runwangdl wants to merge 7 commits into
develfrom
feat/speechnet-training

runwangdl commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

runwangdl commented May 18, 2026

Summary

Test results

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant