GitHub - colonelpanic01/conversational_aid: Real time conversation parser and summarizer. Uses facial recognition, speaker diarization, with more AR functionality to be added!

ConvoFlow - A Real-Time Conversational Aid

Parse conversations in real time, providing a transcription and summarization of what the speaker in front of a camera is saying, beside them.

ar_webapp_stats is a vite react project using speach recognition and face detection. It implements the Cohere API for name detection and conversation summarization and uses face-api.js for face detection and determining speaker confidence. Make sure you create your own Cohere token and add it in a .env file in the root directory.

Start AR webapp
- cd ar_webapp_stats
- npm install
- npm run dev

In the other implementation, audio_processing (flask and fastapi backend) performs speaker diarization to distinguish different speakers and transcribes using whisper (still have yet to summarize key details about the person you're talking to). In the transcription_client, we provide speaker and transcription details and send audio chunks to the backend for processing.

Start Audio Processing Server
- cd audio_processing
- pip install -r requirements.txt
- uvicorn main:app --reload --port 8000
Start Transcription Client Frontend (In another terminal)
- cd transcript_client
- npm install
- npm run dev

TO-DO

Add feature for prompt generation so the user can select a key point about the person they are talking to and get a question/ conversational "tips" about something specific speaker_2 mentioned.
Add feature to view historical conversations and key points. Store conversational data for each person and use facial recognition to display info from previous conversation beside them whenever you see them again.
Make Raspberry Pi Zero 2 AR glasses to display speaker info, and add other functionality (web search, live translation, etc.)
(Not quite sure if this is ethically sound) but if speaker permitting, use Linkedin API to search for the speaker's name as its detected and display key points from their profile beside their face. Or something like the Harvard AR glasses project where we reverse image search the speaker's face and filter based on geographic location to display online persona. It sounds super invasive and icky but I think its just something cool to build.
Restructure for a more cohesive execution.
Fix latency issue in the flask backend (whisper transcription just takes too darn long, react speech recognition is perfect however).

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
LLM_backend_server		LLM_backend_server
ar_webapp_stats		ar_webapp_stats
audio_processing		audio_processing
firbaseDB		firbaseDB
transcription_client		transcription_client
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ConvoFlow - A Real-Time Conversational Aid

TO-DO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

colonelpanic01/conversational_aid

Folders and files

Latest commit

History

Repository files navigation

ConvoFlow - A Real-Time Conversational Aid

TO-DO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages