VoCopilot

Using Large Language Models for the Voice Activated Tracking of Everyday Interactions

Prior Publication

Poster: VoCopilot: Enabling Voice-Activated Tracking for Everyday Interactions

Authors:

Publication Details:

Conference: MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services
DOI: https://doi.org/10.1145/3581791.3597375

This repository contains the code for both the embedded device, as well as the backend, to run the end to end system for VoCopilot

To get started with the frontend, train and deploy a TinyML Model for Keyword Spotting (KWS) into the embedded device, using Edge Impulse.
- For an example of an Edge Impulse Project that has been trained, refer to []
- Remember to run the .sh script to deploy the TinyML model into Nicla Voice.
Ensure the following pre-requisites are met before running step 3.
- Follow this guide to install Arduino Libraries to install the following Libraries
  - arduino-libg722
  - arduino-audio-tools
- Connect an SD Card Module and SD Card to Nicla Voice, following this documentation.
After the firmware and model has been deployed into Nicla Voice, deploy the code in ./embedded_device/nicla_voice/record_to_sd.ino using Arduino IDE into the Nicla Voice.

cd to backend folder.
Create an .env file, with parameters similar to that of .env.example.
Start the pipenv shell with pipenv shell (Make sure you have pipenv installed)
Install the dependencies with pipenv install.
Ensure ffmpeg is installed. (e.g. with brew install ffmpeg on Mac OS). If there are errors with whisper or ffmpeg, try to run brew reinstall tesseract.
Install llama 2 via ollama.
Start the application via python3 app/main.py.
Drop a wav or g722 file into WATCH_FILES_PATH and let the server pick up the file, transcrie and summarize it.