Wake word detection system

日本語版のREADME.mdはこちら

Wake word detection system

This repository contains scripts for data collection, preprocessing, feature extraction, and model training for wake word detection.

How to use

The following procedure can be used to collect data and train the model.

1. Start the Web UI

python app.py

This will start the Web UI, and you can save audio in folders on your browser.

Points to remember when collecting data

When recording audio that includes the wake word, it is okay to have ambient sounds. It is better to have ambient sounds.
Record audio that does not include the wake word.
It is okay to use separate wake words for each person.

2. Splitting long audio

python split_long_audio.py

Running split_long_audio.py will automatically split long audio recordings every 2 seconds. It doesn't matter how many minutes the ambient sound recording is.

3. Feature Extraction

python extract_features.py

Run extract_features.py to extract features from the audio data and convert the training data to an easy-to-use .pkl file.

4. Dataset Preparation

python prepare_dataset.py

Run prepare_dataset.py to split the .pkl file into training data and validation data.

Note If the following error is displayed when executing,
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (14, 40) + inhomogeneous part.
You may not have run split_long_audio.py beforehand. Please run it again.

5. Model training

python train_cnn.py

train_cnn.py defines a lightweight CNN model and trains it. The output model file is date-stamped, making version control easy.

6. Inference

python predict.py --model models/wakeword_cnn_<timestamp>.pth --audio path/to/audio.wav

Run predict.py to perform inference on a WAV file using a trained model, outputting the predicted label and confidence.

List of files

Web UI Training and Inference

Start the Web UI

\\\\\`bash python app.py \`\`\`

Open your browser at http://localhost:5000 to access the UI.

モデル学習: Click the "モデル学習" button to start training. Logs will stream live in the UI.
推論実行: Select a trained model and upload a WAV or WebM audio file, then click "推論実行" to perform inference. The predicted label and confidence will be displayed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wake word detection system

How to use

1. Start the Web UI

Points to remember when collecting data

2. Splitting long audio

3. Feature Extraction

4. Dataset Preparation

5. Model training

6. Inference

List of files

Web UI Training and Inference

Start the Web UI

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dataset		dataset
static		static
templates		templates
README.md		README.md
READMEJP.md		READMEJP.md
app.py		app.py
extract_features.py		extract_features.py
predict.py		predict.py
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
run.sh		run.sh
split_long_audio.py		split_long_audio.py
train_cnn.py		train_cnn.py

Folders and files

Latest commit

History

Repository files navigation

Wake word detection system

How to use

1. Start the Web UI

Points to remember when collecting data

2. Splitting long audio

3. Feature Extraction

4. Dataset Preparation

5. Model training

6. Inference

List of files

Web UI Training and Inference

Start the Web UI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages