Skip to content

Latest commit

 

History

History
51 lines (34 loc) · 2.41 KB

File metadata and controls

51 lines (34 loc) · 2.41 KB

VoCopilot

Using Large Language Models for the Voice Activated Tracking of Everyday Interactions

Prior Publication

Poster: VoCopilot: Enabling Voice-Activated Tracking for Everyday Interactions

Authors:

  • Goh Sheen An
  • Ambuj Varshney

Publication Details:

Getting Started

This repository contains the code for both the embedded device, as well as the backend, to run the end to end system for VoCopilot

Embedded Device

  1. To get started with the frontend, train and deploy a TinyML Model for Keyword Spotting (KWS) into the embedded device, using Edge Impulse.

    • For an example of an Edge Impulse Project that has been trained, refer to []
    • Remember to run the .sh script to deploy the TinyML model into Nicla Voice.
  2. Ensure the following pre-requisites are met before running step 3.

  3. After the firmware and model has been deployed into Nicla Voice, deploy the code in ./embedded_device/nicla_voice/record_to_sd.ino using Arduino IDE into the Nicla Voice.

Backend

  1. cd to backend folder.
  2. Create an .env file, with parameters similar to that of .env.example.
  3. Start the pipenv shell with pipenv shell (Make sure you have pipenv installed)
  4. Install the dependencies with pipenv install.
  5. Ensure ffmpeg is installed. (e.g. with brew install ffmpeg on Mac OS). If there are errors with whisper or ffmpeg, try to run brew reinstall tesseract.
  6. Install llama 2 via ollama.
  7. Start the application via python3 app/main.py.
  8. Drop a wav or g722 file into WATCH_FILES_PATH and let the server pick up the file, transcrie and summarize it.

Benchmark

  1. To run the benchmark, run the command python3 app/benchmark.py.