feat: add SpeechNet (SilentWear) training test#31
Draft
runwangdl wants to merge 7 commits into
Draft
Conversation
Add SpeechNet EMG silent speech recognition model (14 channels, 700 time samples, 9 classes, ~15K params) to the training test suite. Changes: - Add SpeechNet training ONNX artifacts (network.onnx, inputs.npz, outputs.npz, optimizer network) exported from Onnx4Deeploy with static reshape (no dynamic Shape/Flatten ops). - Fix ConvLayer.computeShapes bias shape: wrap scalar int in tuple to prevent graphsurgeon export crash on Conv layers with bias. - Register SpeechNet in L2 singlebuffer training test config (l1=128000, l2=2000000). Untiled test verified: 4/4 loss diff=0.000000, 285M train cycles. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Freeze Block0 (first conv layer with large 14×701 activations) to avoid tiling issues with its backward pass. Train Block1-4 + FC (18 trainable params, 4 ConvGrad + 4 BatchNormGrad). Tiled test verified: 4/4 loss diff < 0.001, 96M train cycles. Block0 backward tiling hang is tracked separately — the 314 KB activation tensor requires heavy L1 tiling that triggers a simulation hang in the ConvGrad/AveragePoolGrad backward path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Im2Col ConvGradX kernel passes ctxtBufferSize computed from full-op dimensions to the kernel, but the actual L1 allocation is much smaller. The kernel's co_block auto-tuning reads ctxtBufferSize to decide how much to write into the im2col buffer — with the inflated value it writes past L1 bounds, causing a hang. Switch ConvGradX to the naive (non-im2col) kernel which does not require a transient im2col buffer. This fixes the SpeechNet tiled training hang on Block0+Block1 backward pass where the ConvGradX im2col buffer would need 1.2 MB (full-op) vs 128 KB L1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that ConvGradX uses the naive kernel, full SpeechNet training (all 5 blocks + FC, 22 trainable params) passes tiled simulation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f034897 to
a74bfac
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a74bfac to
95fef65
Compare
Step-by-step tutorial covering PyTorch model design, Onnx4Deeploy export, untiled/tiled Deeploy deployment, tiling pipeline overview, common pitfalls, and GVSoC trace debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConvLayer.computeShapesbias shape bug:inputShapes[1][0]→(inputShapes[1][0],)— prevents graphsurgeon export crash on Conv layers with biasl1=128000,l2=2000000)Test results
Untiled (verified):
Test plan
🤖 Generated with Claude Code