-
Notifications
You must be signed in to change notification settings - Fork 244
SpeakerManager improvements #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
3b7fb76
Vectorized embedding quality calculation
SGD2718 80aa250
Added SpeakerInitializationMode enum to dictate behavior when initial…
SGD2718 8978171
Implemented a lot more features for finer control
SGD2718 aa0f95e
Added speaker permanence and merge detection
SGD2718 82c7239
Merge branch 'FluidInference:main' into main
SGD2718 ac05a63
Updated Documentation
SGD2718 a976c9c
Improved Concurrency Safety & fixed typos in documentation
SGD2718 c00d1e9
Fixed capitalization in docstrings
SGD2718 44d5b55
Fixed issue where merging did not respect permanent speakers in initi…
SGD2718 f9a48a0
fixed typos
SGD2718 bacd0b1
Update Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift
SGD2718 35abfb2
Merge branch 'main' into main
SGD2718 545d561
fixed a typo
SGD2718 4c822e6
Merge branch 'main' into main
SGD2718 afdb6d8
fixed formatting for a logger warning
SGD2718 fe1e3c2
Merge remote-tracking branch 'refs/remotes/origin/main'
SGD2718 189e38b
removed extra newline from end of file
SGD2718 f57b86d
I think i fixed the formatting issue
SGD2718 a492c82
Added Test Cases with Codex
SGD2718 ba68030
fixed the test that failed
SGD2718 432ee7c
Update Documentation/SpeakerManager.md
SGD2718 de77a38
Update Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift
SGD2718 7204b4d
Update Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift
SGD2718 e9ea4d2
Update Documentation/SpeakerManager.md
SGD2718 4b60a44
Update Documentation/SpeakerManager.md
SGD2718 125c9af
Fixed Minor bugs
SGD2718 da57fc4
Merge remote-tracking branch 'refs/remotes/origin/main'
SGD2718 4a07829
Update Documentation/SpeakerManager.md
SGD2718 a6b99fd
Fixed Formatting Issues (I think)
SGD2718 e443a66
Merge remote-tracking branch 'refs/remotes/origin/main'
SGD2718 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,3 +1,4 @@ | ||||||
|
|
||||||
| # SpeakerManager API | ||||||
|
|
||||||
| Tracks and manages speaker identities across audio chunks for streaming diarization. | ||||||
|
|
@@ -73,6 +74,21 @@ let bob = Speaker(id: "bob", name: "Bob", currentEmbedding: bobEmbedding) | |||||
| speakerManager.initializeKnownSpeakers([alice, bob]) | ||||||
| ``` | ||||||
|
|
||||||
| Sometimes, there are already speakers in the database that may have the same ID. | ||||||
| ```swift | ||||||
| let alice = Speaker(id: "alice", name: "Alice", currentEmbedding: aliceEmbedding) | ||||||
| let bob = Speaker(id: "bob", name: "Bob", currentEmbedding: bobEmbedding) | ||||||
| speakerManager.initializeKnownSpeakers([alice, bob], mode: .overwrite, preserveIfPermanent: false) // replace any speakers with ID "alice" or "bob" with the new speakers, even if the old ones were marked as permanent. | ||||||
| ``` | ||||||
|
|
||||||
| > The `mode` argument dictates how to handle redundant speakers. It is of type `SpeakerInitializationMode`, and can take on one of four values: | ||||||
| > - `.reset`: reset the speaker database and add the new speakers | ||||||
| > - `.merge`: merge new speakers whose IDs match with existing ones | ||||||
| > - `.overwrite`: overwrite existing speakers with the same IDs as the new ones | ||||||
| > - `.skip`: skip adding speakers whose IDs match existing ones | ||||||
| > | ||||||
| > The `preserveIfPermanent` argument determines whether existing speakers marked as permanent should be preserved (i.e., not overwritten or merged). It is `true` by default. | ||||||
|
|
||||||
| **Use case:** When you have pre-recorded voice samples of known speakers and want to recognize them by name instead of numeric IDs. | ||||||
|
|
||||||
| #### upsertSpeaker | ||||||
|
|
@@ -91,16 +107,145 @@ speakerManager.upsertSpeaker( | |||||
| updateCount: 5, // optional | ||||||
| createdAt: Date(), // optional | ||||||
| updatedAt: Date() // optional | ||||||
| isPermanent: false // optional | ||||||
| ) | ||||||
| ``` | ||||||
|
|
||||||
| **Behavior:** | ||||||
| - If speaker ID exists: updates the existing speaker's data | ||||||
| - If speaker ID is new: inserts as a new speaker | ||||||
| - Maintains ID uniqueness and tracks numeric IDs for auto-increment | ||||||
| - If `isPermanent` is true, then the new speaker or the existing speaker will become permanent. This means that the speaker will not be merged or removed without an override. | ||||||
|
|
||||||
| #### mergeSpeaker | ||||||
| ```swift | ||||||
| // merge speaker 1 into "alice" | ||||||
| speakerManager.mergeSpeaker("1", into: "alice") | ||||||
|
|
||||||
| // merge speaker 2 into speaker 3 under the name "bob", regardless of whether speaker 2 is permanent. | ||||||
| speakerManager.mergeSpeaker("2", into: "3", mergedName: "Bob", stopIfPermanent: false) | ||||||
| ``` | ||||||
|
|
||||||
| **Behavior:** | ||||||
| - Unless `stopIfPermanent` is `false`, the merge will be stopped if the first speaker is permanent. | ||||||
| - Otherwise: Merges the first speaker into the destination speaker and removes the first speaker from the known speaker database. | ||||||
| - If `mergedName` is provided, the destination speaker will be renamed. Otherwise, its name will be preserved. | ||||||
|
|
||||||
| > Note: the `mergedName` argument is optional. | ||||||
| > Note: `stopIfPermanent` is `true` by default. | ||||||
|
|
||||||
| #### removeSpeaker | ||||||
| Remove a speaker from the database. | ||||||
|
|
||||||
| ```swift | ||||||
| // remove speaker 1 | ||||||
| speakerManager.removeSpeaker("1") | ||||||
|
|
||||||
| // remove "alice" from the known speaker database, even if they are marked as permanent | ||||||
| speakerManager.removeSpeaker("alice", keepIfPermanent: false) | ||||||
| ``` | ||||||
| > Note: `keepIfPermanent` is `true` by default. | ||||||
|
|
||||||
| #### removeSpeakersInactive | ||||||
| Remove speakers that have been inactive since a certain date or for a certain duration. | ||||||
|
|
||||||
| ```swift | ||||||
| // remove speakers that have been inactive since `date` | ||||||
| speakerManager.removeSpeakersInactive(since: date) | ||||||
|
|
||||||
| // remove speakers that have been inactive for 10 seconds, even if they were marked as permanent | ||||||
| speakerManager.removeSpeakersInactive(for: 10.0, keepIfPermanent: false) | ||||||
| ``` | ||||||
|
|
||||||
| > Note: Both versions of the method have an optional `keepIfPermanent` argument that defaults to `true`. | ||||||
|
|
||||||
| #### removeAllSpeakers | ||||||
| Remove all speakers that match a given predicate. | ||||||
|
|
||||||
| ```swift | ||||||
| // remove all speakers with less than 5 seconds of speaking time | ||||||
| speakerManager.removeSpeakers( | ||||||
| where: { $0.duration < 5.0 }, | ||||||
| keepIfPermanent: false // also remove permanent speakers (optional) | ||||||
| ) | ||||||
|
|
||||||
| // Alternate syntax (does NOT remove permanent speakers) | ||||||
| speakerManager.removeSpeakers { | ||||||
| $0.duration < 5.0 | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| > Note: the predicate should take in a `Speaker` object and return a `Bool`. | ||||||
|
|
||||||
| #### makeSpeakerPermanent | ||||||
| Make the speaker permanent. | ||||||
|
|
||||||
| ```swift | ||||||
| speakerManager.makeSpeakerPermanent("alice") // mark "alice" as permanent | ||||||
| ``` | ||||||
|
|
||||||
| #### revokePermanence | ||||||
| Make the speaker not permanent. | ||||||
|
|
||||||
| ```swift | ||||||
| speakerManager.revokePermanence(from: "alice") // mark "alice" as not permanent | ||||||
| ``` | ||||||
|
|
||||||
| #### resetPermanentFlags | ||||||
| Mark all speakers as not permanent. | ||||||
|
|
||||||
| ```swift | ||||||
| speakerManager.resetPermanentFlags() | ||||||
| ``` | ||||||
|
|
||||||
| ### Speaker Retrieval | ||||||
|
|
||||||
| #### findSpeaker | ||||||
| Find the best matching speaker to an embedding vector and the cosine distance to them, unless no match is found. | ||||||
|
|
||||||
| ```swift | ||||||
| let (id, distance) = speakerManager.findSpeaker(with: embedding) | ||||||
| ``` | ||||||
| > Note: there is an optional `speakerThreshold` argument to use a threshold other than the default. | ||||||
|
|
||||||
| #### findMatchingSpeakers | ||||||
| Find all speakers within the maximum `speakerThreshold` to an embedding vector. | ||||||
|
|
||||||
| ```swift | ||||||
| for speaker in speakerManager.findMatchingSpeakers(with: embedding) { | ||||||
| print("ID: \(speaker.id), Distance: \(speaker.distance)") | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| > Note: there is an optional `speakerThreshold` argument to use a threshold other than the default. | ||||||
|
|
||||||
| #### findSpeakers | ||||||
| Find all speakers that meet a certain predicate. | ||||||
| ```swift | ||||||
| // two ways to find all speakers with > 5.0s of speaking time. | ||||||
| speakerManager.findSpeakers(where: { $0.duration > 5.0 }) | ||||||
| speakerManager.findSpeakers{ $0.duration > 5.0 } | ||||||
| // Returns an array of IDs corresponding to speakers that meet the predicate. | ||||||
| ``` | ||||||
|
|
||||||
| > Note: the predicate should take in a `Speaker` object and return a `Bool`. | ||||||
|
|
||||||
| #### findMergeablePairs | ||||||
| Find all pairs of speakers that might be the same person. Specifically, find the pairs of speakers such that the cosine distance between them is less than the `speakerThreshold`. | ||||||
|
|
||||||
| Returns a list of pairs of speaker IDs. | ||||||
|
|
||||||
| ```swift | ||||||
| let pairs = speakerManager.findMergeablePairs( | ||||||
| speakerThreshold: 0.6, // optional | ||||||
| excludeIfBothPermanent: true // optional | ||||||
| ) | ||||||
|
|
||||||
| for pair in pairs { | ||||||
| print("Merge Speaker \(pair.speakerToMerge) into Speaker \(pair.destination)") | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| #### getSpeaker | ||||||
| Get a specific speaker by ID. | ||||||
|
|
||||||
|
|
@@ -118,6 +263,22 @@ let allSpeakers = speakerManager.getAllSpeakers() | |||||
| // Returns: [String: Speaker] - dictionary keyed by speaker ID | ||||||
| ``` | ||||||
|
|
||||||
| #### getSpeakerList | ||||||
| Get all speakers in the database as an array of speakers (for testing/debugging) | ||||||
| ```swift | ||||||
| let allSpeakers = speakerManager.getSpeakerList() | ||||||
| // Returns: [Speaker] - Array of speakers | ||||||
| ``` | ||||||
|
|
||||||
| #### hasSpeaker | ||||||
| Check if the speaker database has a speaker with a given ID. | ||||||
|
|
||||||
| ```swift | ||||||
| if speakerManager.hasSpeaker("alice") { | ||||||
| print("Alice was found in the database") | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| #### speakerCount | ||||||
| Get the total number of tracked speakers. | ||||||
|
|
||||||
|
|
@@ -140,13 +301,16 @@ Clear all speakers from the database. | |||||
|
|
||||||
| ```swift | ||||||
| speakerManager.reset() | ||||||
| speakerManager.reset(keepIfPermanent: true) // remove all non-permanent speakers from the database | ||||||
| ``` | ||||||
|
|
||||||
| Useful for: | ||||||
| - Starting a new session | ||||||
| - Freeing memory between recordings | ||||||
| - Resetting speaker tracking | ||||||
|
|
||||||
|
|
||||||
|
|
||||||
| ## Speaker Enrollment | ||||||
|
|
||||||
| The `Speaker` class includes a `name` field for speaker enrollment workflows: | ||||||
|
|
@@ -237,6 +401,7 @@ public final class Speaker: Identifiable, Codable { | |||||
| public var updatedAt: Date // Last update timestamp | ||||||
| public var updateCount: Int // Number of updates | ||||||
| public var rawEmbeddings: [RawEmbedding] // Historical embeddings (max 50) | ||||||
| public var isPermanent: Bool // Permanence flag | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
|
|
@@ -547,13 +712,25 @@ class RealtimeDiarizer { | |||||
| | Method | Returns | Description | | ||||||
Alex-Wengg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| | Method | Returns | Description | | |
| | `findMergeablePairs(speakerThreshold:excludeIfBothPermanent:)` | [(speakerToMerge: String, destination: String)] | Find all pairs of very similar speakers | |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect Swift syntax: parameter labels should use colons (
:) not equals signs (=). The correct syntax is: