[FEAT]: Offline Speaker Diarization for Multi-Speaker Environments

## 📝 Description
Implement a 100% offline speaker diarization pipeline using `pyannote.audio` to distinguish between multiple voices in a single audio file. This will map transcriptions to specific speakers (e.g., `SPEAKER_01`, `SPEAKER_02`), allowing the LLM to differentiate between official First Responders and bystanders/patients.

## 💡 Rationale
Emergency scenes are chaotic. When a responder interviews a patient, both voices end up in the same `.wav` file. Currently, the LLM processes this as a single block of text, which creates a high risk of "hallucinating" data by confusing a panicked bystander's guess with a responder's official medical assessment. We need this to ensure factual accuracy while maintaining our strict zero-cloud privacy mandate.

## 🛠️ Proposed Solution
Implement Voice Activity Detection (VAD) to trim silence, then run `pyannote.audio` locally to cluster voice frequencies. We will align the Whisper transcript with these clusters to create a "movie script" format, then update the LLM prompt to explicitly filter facts based on the speaker.
- [x] Logic change in `src/` (New module: `src/diarization.py` for pipeline execution and memory management)
- [x] Update to `requirements.txt` (Add `pyannote.audio` and `torchaudio`)
- [x] New prompt for Mistral/Ollama (Instruct LLM to identify the responder and only extract their confirmed facts)

## ✅ Acceptance Criteria
How will we know this is finished?
- [x] Feature works in Docker container.
- [ ] Documentation updated in `docs/` (Specifically regarding local model caching for pyannote weights).
- [x] JSON output validates against the schema.
- [ ] Pipeline successfully runs on a local machine without Out-Of-Memory (OOM) crashes by explicitly unloading the diarization model before Ollama inference.

## 📌 Additional Context
* **Hardware Constraints:** Running Pyannote, Whisper, and Ollama simultaneously will crash most standard edge devices. The implementation *must* include sequential loading/unloading of models (e.g., `del pipeline`, `torch.cuda.empty_cache()`) to manage VRAM effectively.
* **Fallback:** If `pyannote` proves too heavy for the target hardware during testing, we may need to pivot to an app-level "Push-to-Talk" segregation as a lighter alternative.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Offline Speaker Diarization for Multi-Speaker Environments #479

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEAT]: Offline Speaker Diarization for Multi-Speaker Environments #479

Description

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions