日本語版のREADME.mdはこちら
This repository contains scripts for data collection, preprocessing, feature extraction, and model training for wake word detection.
The following procedure can be used to collect data and train the model.
python app.pyThis will start the Web UI, and you can save audio in folders on your browser.
- When recording audio that includes the wake word, it is okay to have ambient sounds. It is better to have ambient sounds.
- Record audio that does not include the wake word.
- It is okay to use separate wake words for each person.
python split_long_audio.pyRunning split_long_audio.py will automatically split long audio recordings every 2 seconds. It doesn't matter how many minutes the ambient sound recording is.
python extract_features.pyRun extract_features.py to extract features from the audio data and convert the training data to an easy-to-use .pkl file.
python prepare_dataset.pyRun prepare_dataset.py to split the .pkl file into training data and validation data.
Note If the following error is displayed when executing,
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (14, 40) + inhomogeneous part.You may not have run
split_long_audio.pybeforehand. Please run it again.
python train_cnn.pytrain_cnn.py defines a lightweight CNN model and trains it. The output model file is date-stamped, making version control easy.
python predict.py --model models/wakeword_cnn_<timestamp>.pth --audio path/to/audio.wavRun predict.py to perform inference on a WAV file using a trained model, outputting the predicted label and confidence.
\\\\\`bash
python app.py
\`\`\`
Open your browser at http://localhost:5000 to access the UI.
- モデル学習: Click the "モデル学習" button to start training. Logs will stream live in the UI.
- 推論実行: Select a trained model and upload a WAV or WebM audio file, then click "推論実行" to perform inference. The predicted label and confidence will be displayed.