SpeakerManager improvements#180
Conversation
…izing known speakers
|
hi @SGD2718 , your pr details make sense here its fine. |
Alex-Wengg
left a comment
There was a problem hiding this comment.
so far so good its a good start but i have left some comments that need to be addressed
There was a problem hiding this comment.
we could use some unit tests for these new functions such as these cases
// Returns false (doesn't merge) if:
guard sourceId != destinationId else { return false } // Same ID
guard let speakerToMerge = ... else { return false } // Speaker doesn't exist
guard !(stopIfPermanent && ...) else { return false } // Is permanent
There was a problem hiding this comment.
Where do i write them?
There was a problem hiding this comment.
Documentation/SpeakerManager.md
There was a problem hiding this comment.
If I removed the return type from mergeSpeaker, do these examples still need to be added to the docs?
There was a problem hiding this comment.
@SGD2718 apologies i misunderstood the context . sorry the unit tests should be in this path , it shouldn't be too difficult for an coding agent to just write them to cover all the cases
https://github.com/FluidInference/FluidAudio/tree/main/Tests/FluidAudioTests
There was a problem hiding this comment.
Pull Request Overview
This pull request adds a comprehensive speaker permanence feature to the diarization system, allowing certain speakers to be protected from automatic merging and removal operations. The update also improves performance in the embedding quality calculation and adds extensive new APIs for speaker management.
- Adds
isPermanentflag toSpeakerclass to protect speakers from automatic operations - Introduces
SpeakerInitializationModeenum with reset/merge/overwrite/skip options for handling duplicate speaker IDs - Expands
SpeakerManagerAPI with 15+ new methods for finding, merging, and removing speakers with permanence controls - Optimizes embedding quality calculation using
vDSP.sumOfSquaresinstead of manual map-reduce
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 21 comments.
| File | Description |
|---|---|
| Sources/FluidAudio/Diarizer/Core/DiarizerManager.swift | Optimized calculateEmbeddingQuality to use Accelerate's vDSP.sumOfSquares for better performance |
| Sources/FluidAudio/Diarizer/Clustering/SpeakerTypes.swift | Added isPermanent flag to Speaker class and introduced SpeakerInitializationMode enum for controlling speaker initialization behavior |
| Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift | Extensive API expansion with methods for speaker permanence management, finding/merging/removing speakers, and enhanced initialization with conflict resolution modes |
| Documentation/SpeakerManager.md | Comprehensive documentation updates covering all new APIs, initialization modes, permanence features, and usage examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| /// Remove a speaker's permanent marker | ||
| /// - Parameter speakerId: the ID of the speaker to mark as permanent |
There was a problem hiding this comment.
Incorrect parameter description: the documentation says "the ID of the speaker to mark as permanent" but this method removes the permanent marker, not adds it. Should say: "the ID of the speaker to remove the permanent marker from"
| /// - Parameter speakerId: the ID of the speaker to mark as permanent | |
| /// - Parameter speakerId: the ID of the speaker to remove the permanent marker from |
| /// - excludeIfBothPermanent: whether to exclude speaker pairs where both speakers are permanent | ||
| /// - Returns: a list of speaker ID pairs that belong to speakers that are similar enough to be merged | ||
| public func findMergeablePairs(speakerThreshold: Float? = nil, excludeIfBothPermanent: Bool = true) -> [(speakerToMerge: String, destination: String)] { | ||
| queue.sync(flags: .barrier) { |
There was a problem hiding this comment.
Using .barrier flag with queue.sync for a read-only operation. The findMergeablePairs method only reads from speakerDatabase and doesn't modify it, so it should use queue.sync without the .barrier flag for better concurrency. The .barrier flag is meant for write operations that need exclusive access.
| queue.sync(flags: .barrier) { | |
| queue.sync { |
| } | ||
| } | ||
|
|
||
| /// remove non-permanent speakers that meet a certain predicate |
There was a problem hiding this comment.
Inconsistent capitalization: method name "remove non-permanent speakers" should start with a capital letter to be consistent with other method documentation. Should be: "Remove non-permanent speakers that meet a certain predicate"
| /// remove non-permanent speakers that meet a certain predicate | |
| /// Remove non-permanent speakers that meet a certain predicate |
| /// Update main embedding with new segment data using exponential moving average | ||
| /// - Parameters: | ||
| /// - duration: segment duration | ||
| /// - |
There was a problem hiding this comment.
Incomplete parameter documentation. The - is present but the parameter description is missing. This should describe what the embedding parameter is.
| /// - | |
| /// - embedding: The new embedding vector for the speaker segment |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift:154
- The
assignSpeakerfunction acceptsnewNameparameter (line 121) but doesn't pass it tocreateNewSpeaker. This means the custom name provided by callers will be ignored when creating new speakers.
let newSpeakerId = createNewSpeaker(
embedding: normalizedEmbedding,
duration: speechDuration,
distanceToClosest: distance
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Looks like the tests finally passed haha. Just that annoying format error |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
There we go |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.warning( | ||
| "Failed to overwrite Speaker \(speaker.id) because it is permanent. Skipping") | ||
| } |
There was a problem hiding this comment.
When the overwrite is skipped due to permanence, the code should use continue to skip the rest of the loop iteration. Without it, the code proceeds to line 89-96, which incorrectly tracks the numeric ID and logs "Initialized known speaker" even though the speaker was not actually added or updated. Add continue after line 70.
| existingSpeaker.rawEmbeddings = rawEmbeddings | ||
| existingSpeaker.updateCount = updateCount | ||
| existingSpeaker.updatedAt = updatedAt ?? now | ||
| existingSpeaker.isPermanent = isPermanent |
There was a problem hiding this comment.
When updating an existing speaker, unconditionally setting isPermanent to the parameter value (which defaults to false) can inadvertently remove the permanent flag from speakers that were previously marked as permanent. Consider preserving the existing isPermanent value unless explicitly provided, or document this behavior clearly to prevent accidental removal of permanence.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift:156
- The
newNameparameter accepted byassignSpeaker(line 123) is not passed tocreateNewSpeaker, which has anameparameter (line 481). This means the caller cannot specify a custom name for newly created speakers, making the parameter non-functional.
let newSpeakerId = createNewSpeaker(
embedding: normalizedEmbedding,
duration: speechDuration,
distanceToClosest: distance
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Alright I think that that should be all the formatting issues |
|
Finally |
|
@SGD2718 looks good to me great job on your first PR ! |
|
Do i close the PR or do you do that? |
|
@SGD2718 you can merge it, lmk if you are unable to merge the pr |
|
I don't think I can merge it. My earlier question was due to my getting confused about what the close with comment button did. |
|
I will merge it , the close with comment would cause the pr to be cancelled |
I added some new features to the SpeakerManager class:
I also updated the documentation in SpeakerManager.md to reflect these new features.
This is the first time I've made a PR to a public repository, so I'm not sure what else I need to do.