- Author: Jessica Hwang
- Course: CS 5001 & 5003 - Intensive Foundations of Computer Science
- Program: Align MS in CS, Khoury College of Computer Sciences at Northeastern University
- Instructors: Dr. Albert Lionelle, Dr. Mark Miller
- Semester: Spring 2023
This is a program in Python 3 that transforms an audio recording of grocery list into a simple, text version of that list. The audio recording input is in the form of a .wav file. The output is in the form of a .txt file.
- Learn a new Python library
- Begin exploration on speech recognition and ML
- Use programming concepts learned in the course, such as:
- Modular design
- Divide-and-conquer
- Defensive programming: error handling
- OOP basics: abstraction, encapsulation
- Functions, dictionaries, classes
- speech_recognition library
- OS library
- Google Web Speech API
- Visual Studio Code
Run the program from app.py, which transcribes the sample file grocery_final.wav and writes output into "New List.txt." Given nothing in the code changes, every run will write over existing content in "New List.txt."
The execution of this app depends on an Internet connection and the availability of the Google Web Speech API. As of April 2023, Google does not require any authentication with an API key or a username/password combination.
- Audio files must be in .wav format. Replace "grocery_file.wav" with the name of your new file in the main() of app.py
- Audio must contain the following keywords for the programming to run properly:
- "start" indicates the start of the list. All audio before this word will be disregarded in the final output.
- "stop" indicates the end of the list. All audio after this word will be disregarded in the final output.
- "comma" separates each list item from another
- Every list item in the audio MUST contain the following details, in this order:
- quantity, such as "one"
- unit, such as "package of". Including a unit is optional. However, all units must be followed with "of."
- item name, such as "chicken legs" or "apple"
- The program does not have language capabilities beyond English.
The main code is divided into:
- app.py: contains the function that calls the API, and main().
- shopping.py: contains helper functions that transform the raw transcription into text, and then load text into a file
- shopping_classes.py: contains ListItem and ShoppingList classes, and their individual attributes.
There are three data dictionaries used to correct mispellings from the transcription given by speech_recognition and Google Web Speech API.
- item.dat, which corrects commonly mispelled grocery item names
- unit.dat, which corrects commonly mispelled unit names, such as package, box, carton, etc.
- quantity.dat, which corrects commonly mispelled integers in word form into its integer form
These data dictionaries are sufficient for transcribing grocery_final.wav, and most common American grocery items. The dictionaries are meant to be expanded upon should new item names, quantities, and units that are transcribed incorrectly are introduced via future audio recordings.
All .py files using for test code are prefixed with "test_" in front of the name of the file they are testing. The .dat and .txt files prefixed with "test_" are used by these .py files to run tests.
- CS5001 Lectures, Khoury College of Computer Sciences at Northeastern University
- Amos, David. "The Ultimate Guide To Speech Recognition With Python", accessed April 15, 2023.