server: Return speaker information in JSON by alubbe · Pull Request #3782 · ggml-org/whisper.cpp

alubbe · 2026-04-29T11:15:34Z

This PR includes the speaker information in the JSON response of the server, and also makes the wav audio stereo in diarization mode.

alubbe · 2026-05-07T07:42:08Z

I just watched a talk on an AI conference about 'clankers', so I wanted to make sure to leave a note that this PR was made by a real human with a real issue he's trying to solve, and I thought opening a PR is more useful than an issue because we can discuss both on whether this PR has merit but also see and discuss code changes.

For this PR specifically, my issue is that I'm migrating an application from using the CLI to using one central server and there a) was no way to get speaker information from the JSON and b) all audio was forced into mono.

alubbe · 2026-05-13T09:29:13Z

For comparison, openAI also uses the speaker field for diarization: https://developers.openai.com/api/docs/guides/speech-to-text#speaker-diarization

server: Return speaker information in JSON

d2a766a

Merge branch 'master' into server-speaker-information

e342e5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: Return speaker information in JSON#3782

server: Return speaker information in JSON#3782
alubbe wants to merge 2 commits into
ggml-org:masterfrom
alubbe:server-speaker-information

alubbe commented Apr 29, 2026 •

edited

Loading

Uh oh!

alubbe commented May 7, 2026

Uh oh!

alubbe commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alubbe commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alubbe commented May 7, 2026

Uh oh!

alubbe commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alubbe commented Apr 29, 2026 •

edited

Loading