This project uses AI + LLMs to transform real-time audio into searchable transcripts, reports, and Q&A experiences.
- Goal A (primary): Build a working pipeline for English audio ingestion → transcription → reports → Q&A.
- Goal B (stretch): Extend the pipeline for Hawaiian audio, including transcription, translation, and bilingual analysis.
- ASR (Automatic Speech Recognition): Converts speech to text (e.g., Whisper).
- ACR (Audio Content Recognition): Broader concept that includes identifying speakers, topics, or content type.
- LLM Q&A / RAG (Retrieval-Augmented Generation): Use LLMs to answer questions by pulling relevant transcript passages.
- Evaluation Metrics:
- WER (Word Error Rate) – measures transcription accuracy in English.
- CER (Character Error Rate) – important for Hawaiian because of ʻokina and kahakō.
- MT Metrics (COMET/BLEU) – evaluate translation quality.
- Data Ingestion & Processing
- Transcription Pipeline
- Transcript Storage & Indexing
- Reporting Features
- Generate an auto-summary of each run.
- Produce chapter markers (topics/themes with timestamps).
- Extract highlighted quotes and keywords.
- Deliver example reports for at least 2 English podcasts.
- Q&A System (Goal A)
- Implement a retrieval-augmented Q&A prototype:
- Users ask questions.
- Relevant transcript chunks are retrieved.
- LLM answers, citing transcript + timestamps.
- Demo with at least 3 English podcast episodes.
- Implement a retrieval-augmented Q&A prototype:
- Evaluation