Skip to content

server: Return speaker information in JSON#3782

Open
alubbe wants to merge 2 commits into
ggml-org:masterfrom
alubbe:server-speaker-information
Open

server: Return speaker information in JSON#3782
alubbe wants to merge 2 commits into
ggml-org:masterfrom
alubbe:server-speaker-information

Conversation

@alubbe
Copy link
Copy Markdown
Contributor

@alubbe alubbe commented Apr 29, 2026

This PR includes the speaker information in the JSON response of the server, and also makes the wav audio stereo in diarization mode.

@alubbe
Copy link
Copy Markdown
Contributor Author

alubbe commented May 7, 2026

I just watched a talk on an AI conference about 'clankers', so I wanted to make sure to leave a note that this PR was made by a real human with a real issue he's trying to solve, and I thought opening a PR is more useful than an issue because we can discuss both on whether this PR has merit but also see and discuss code changes.

For this PR specifically, my issue is that I'm migrating an application from using the CLI to using one central server and there a) was no way to get speaker information from the JSON and b) all audio was forced into mono.

@alubbe
Copy link
Copy Markdown
Contributor Author

alubbe commented May 13, 2026

For comparison, openAI also uses the speaker field for diarization: https://developers.openai.com/api/docs/guides/speech-to-text#speaker-diarization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant