Skip to content
This repository was archived by the owner on Aug 28, 2024. It is now read-only.

Commit 772c64e

Browse files
authored
Merge pull request #49 from jeffxtang/wav2vec2
Speech Recognition on iOS with Wav2Vec2
2 parents f9e92f7 + 885bf37 commit 772c64e

30 files changed

+1080
-7
lines changed

ImageSegmentation/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ To Test Run the Image Segmentation iOS App, follow the steps below:
1010

1111
### 1. Prepare the Model
1212

13-
If you don't have the PyTorch environment set up to run the script below to generate the model file, you can download it to the `ios-demo-app/ImageSegmentation` folder using the link [here](https://drive.google.com/file/d/17KeE6mKo67l14XxTl8a-NbtqwAvduVZG/view?usp=sharing).
13+
If you don't have the PyTorch environment set up to run the script below to generate the model file, you can download it to the `ios-demo-app/ImageSegmentation` folder using the link [here](https://drive.google.com/file/d/1FHV9tN6-e3EWUgM_K3YvDoRLPBj7NHXO/view?usp=sharing).
1414

1515
Be aware that the downloadable model file was created with PyTorch 1.7.0, matching the iOS LibTorch library 1.7.0 specified in the `Podfile`. If you use a different version of PyTorch to create your model by following the instructions below, make sure you specify the same iOS LibTorch version in the `Podfile` to avoid possible errors caused by the version mismatch. Furthermore, if you want to use the latest prototype features in the PyTorch master branch to create the model, follow the steps at [Building PyTorch iOS Libraries from Source](https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source) on how to use the model in iOS.
1616

ObjectDetection/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ To Test Run the Object Detection iOS App, follow the steps below:
1717

1818
### 1. Prepare the Model
1919

20-
If you don't have the PyTorch environment set up to run the script, you can download the model file [here](https://drive.google.com/file/d/15FFbi1ajWh02Dqc4W4HdGI45Th3VhbkR/view?usp=sharing) to the `ios-demo-app/ObjectDetection/ObjectDetection` folder, then skip the rest of this step and go to step 2 directly.
20+
If you don't have the PyTorch environment set up to run the script, you can download the model file [here](https://drive.google.com/file/d/1QOxNfpy_j_1KbuhN8INw2AgAC82nEw0F/view?usp=sharing) to the `ios-demo-app/ObjectDetection/ObjectDetection` folder, then skip the rest of this step and go to step 2 directly.
2121

2222
Be aware that the downloadable model file was created with PyTorch 1.7.0, matching the iOS LibTorch library 1.7.0 specified in the `Podfile`. If you use a different version of PyTorch to create your model by following the instructions below, make sure you specify the same iOS LibTorch version in the `Podfile` to avoid possible errors caused by the version mismatch. Furthermore, if you want to use the latest prototype features in the PyTorch master branch to create the model, follow the steps at [Building PyTorch iOS Libraries from Source](https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source) on how to use the model in iOS.
2323

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@ The [PyTorch demo app](https://github.com/pytorch/ios-demo-app/tree/master/PyTor
3030

3131
[Vision Transformer](https://github.com/pytorch/ios-demo-app/tree/master/ViT4MNIST) demonstrates how to use Facebook's latest Vision Transformer [DeiT](https://github.com/facebookresearch/deit) model to do image classification, and how convert another Vision Transformer model and use it in an iOS app to perform handwritten digit recognition.
3232

33+
### Speech recognition
34+
35+
[Speech Recognition](https://github.com/pytorch/ios-demo-app/tree/master/SpeechRecognition) demonstrates how to convert Facebook AI's wav2vec 2.0, one of the leading models in speech recognition, to TorchScript and how to use the scripted model in an iOS app to perform speech recognition.
36+
37+
### Video Classification
38+
39+
[TorchVideo](https://github.com/pytorch/ios-demo-app/tree/master/TorchVideo) demonstrates how to use a pre-trained video classification model, available at the newly released [PyTorchVideo](https://github.com/facebookresearch/pytorchvideo), on iOS to see video classification results, updated per second while the video plays, on tested videos, videos from the Photos library, or even real-time videos.
3340

3441
## LICENSE
3542

SpeechRecognition/Podfile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Uncomment the next line to define a global platform for your project
2+
# platform :ios, '9.0'
3+
4+
target 'SpeechRecognition' do
5+
# Comment the next line if you don't want to use dynamic frameworks
6+
use_frameworks!
7+
8+
# Pods for SpeechRecognition
9+
pod 'LibTorch', '~>1.8.0'
10+
end

SpeechRecognition/README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Speech Recognition on iOS with Wav2Vec2
2+
3+
## Introduction
4+
5+
Facebook AI's [wav2vec 2.0](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) is one of the leading models in speech recognition. It is also available in the [Huggingface Transformers](https://github.com/huggingface/transformers) library, which is also used in another PyTorch iOS demo app for [Question Answering](https://github.com/pytorch/ios-demo-app/tree/master/QuestionAnswering).
6+
7+
In this demo app, we'll show how to quantize, trace, and optimize the wav2vec2 model for mobile and how to use the converted model on an iOS demo app to perform speech recognition.
8+
9+
## Prerequisites
10+
11+
* PyTorch 1.8.0/1.8.1 (Optional)
12+
* Python 3.8 or above (Optional)
13+
* iOS PyTorch pod library 1.8.0
14+
* Xcode 12 or later
15+
16+
## Quick Start
17+
18+
### 1. Prepare the Model
19+
20+
First, run the following commands on a Terminal:
21+
```
22+
git clone https://github.com/pytorch/ios-demo-app
23+
cd ios-demo-app/SpeechRecognition
24+
```
25+
26+
If you don't have PyTorch 1.8.1 installed or want to have a quick try of the demo app, you can download the quantized scripted wav2vec2 model file [here](https://drive.google.com/file/d/1RcCy3K3gDVN2Nun5IIdDbpIDbrKD-XVw/view?usp=sharing), then drag and drop to the project, and continue to Step 2.
27+
28+
Be aware that the downloadable model file was created with PyTorch 1.8, matching the iOS LibTorch library 1.8.0 specified in the `Podfile`. If you use a different version of PyTorch to create your model by following the instructions below, make sure you specify the same iOS LibTorch version in the `Podfile` to avoid possible errors caused by the version mismatch. Furthermore, if you want to use the latest prototype features in the PyTorch master branch to create the model, follow the steps at [Building PyTorch iOS Libraries from Source](https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source) on how to use the model in iOS.
29+
30+
With PyTorch 1.8.1 installed, first install the `soundfile` package by running `pip install pysoundfile`, then install the Huggingface `transformers` by running `pip install transformers` (the version that has been tested is 4.4.2). Finally run `python create_wav2vec2.py`, which creates `wav2vec2.pt` in the project folder. [Dynamic quantization](https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html) is used to quantize the model to reduce its size.
31+
32+
Note that the sample `scent_of_a_woman_future.wav` file used to trace the model is about 6 second long, so 6 second is the limit of the recorded audio for speech recognition in the demo app. If your speech is less than 6 seconds, padding is applied in the iOS code to make the model work correctly.
33+
34+
### 2. Use LibTorch
35+
36+
Run the commands below:
37+
38+
```
39+
cd SpeechRecognition
40+
pod install
41+
open SpeechRecognition.xcworkspace/
42+
```
43+
44+
### 3. Build and run with Xcode
45+
46+
After the app runs, tap the Start button and start saying something; after 6 seconds, the model will infer to recognize your speech. Only basic decoding of the recognition result, in the form of an array of floating numbers of logits, to a list of tokens is provided in this demo app, but it is easy to see, without further post-processing, whether the model can recognize your utterances. Some example results are as follows:
47+
48+
![](screenshot1.png)
49+
![](screenshot2.png)
50+
![](screenshot3.png)

0 commit comments

Comments
 (0)