Skip to content

SpeakerManager improvements#180

Merged
Alex-Wengg merged 30 commits intoFluidInference:mainfrom
SGD2718:main
Nov 9, 2025
Merged

SpeakerManager improvements#180
Alex-Wengg merged 30 commits intoFluidInference:mainfrom
SGD2718:main

Conversation

@SGD2718
Copy link
Copy Markdown
Collaborator

@SGD2718 SGD2718 commented Nov 6, 2025

I added some new features to the SpeakerManager class:

  • Merging within the database
  • Speaker removal
  • Permanent speakers (they cannot be removed without an override in a function parameter)
  • Known speaker initialization when the database has speakers in it already that might overlap
  • Speaker filtering
  • Similar speaker detection

I also updated the documentation in SpeakerManager.md to reflect these new features.

This is the first time I've made a PR to a public repository, so I'm not sure what else I need to do.

Copilot AI review requested due to automatic review settings November 6, 2025 02:24
@Alex-Wengg
Copy link
Copy Markdown
Member

hi @SGD2718 , your pr details make sense here its fine.

@Alex-Wengg Alex-Wengg requested review from Alex-Wengg and removed request for Copilot November 6, 2025 02:25
Copy link
Copy Markdown
Member

@Alex-Wengg Alex-Wengg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far so good its a good start but i have left some comments that need to be addressed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use some unit tests for these new functions such as these cases


// Returns false (doesn't merge) if:
guard sourceId != destinationId else { return false }  // Same ID
guard let speakerToMerge = ... else { return false }   // Speaker doesn't exist
guard !(stopIfPermanent && ...) else { return false }  // Is permanent

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do i write them?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation/SpeakerManager.md

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I removed the return type from mergeSpeaker, do these examples still need to be added to the docs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SGD2718 apologies i misunderstood the context . sorry the unit tests should be in this path , it shouldn't be too difficult for an coding agent to just write them to cover all the cases

https://github.com/FluidInference/FluidAudio/tree/main/Tests/FluidAudioTests

@Alex-Wengg
Copy link
Copy Markdown
Member

@BrandonWeng BrandonWeng added enhancement New feature or request speaker-diarization Issues related to speaker diarization labels Nov 6, 2025
Copilot AI review requested due to automatic review settings November 6, 2025 05:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a comprehensive speaker permanence feature to the diarization system, allowing certain speakers to be protected from automatic merging and removal operations. The update also improves performance in the embedding quality calculation and adds extensive new APIs for speaker management.

  • Adds isPermanent flag to Speaker class to protect speakers from automatic operations
  • Introduces SpeakerInitializationMode enum with reset/merge/overwrite/skip options for handling duplicate speaker IDs
  • Expands SpeakerManager API with 15+ new methods for finding, merging, and removing speakers with permanence controls
  • Optimizes embedding quality calculation using vDSP.sumOfSquares instead of manual map-reduce

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 21 comments.

File Description
Sources/FluidAudio/Diarizer/Core/DiarizerManager.swift Optimized calculateEmbeddingQuality to use Accelerate's vDSP.sumOfSquares for better performance
Sources/FluidAudio/Diarizer/Clustering/SpeakerTypes.swift Added isPermanent flag to Speaker class and introduced SpeakerInitializationMode enum for controlling speaker initialization behavior
Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift Extensive API expansion with methods for speaker permanence management, finding/merging/removing speakers, and enhanced initialization with conflict resolution modes
Documentation/SpeakerManager.md Comprehensive documentation updates covering all new APIs, initialization modes, permanence features, and usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

/// Remove a speaker's permanent marker
/// - Parameter speakerId: the ID of the speaker to mark as permanent
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect parameter description: the documentation says "the ID of the speaker to mark as permanent" but this method removes the permanent marker, not adds it. Should say: "the ID of the speaker to remove the permanent marker from"

Suggested change
/// - Parameter speakerId: the ID of the speaker to mark as permanent
/// - Parameter speakerId: the ID of the speaker to remove the permanent marker from

Copilot uses AI. Check for mistakes.
/// - excludeIfBothPermanent: whether to exclude speaker pairs where both speakers are permanent
/// - Returns: a list of speaker ID pairs that belong to speakers that are similar enough to be merged
public func findMergeablePairs(speakerThreshold: Float? = nil, excludeIfBothPermanent: Bool = true) -> [(speakerToMerge: String, destination: String)] {
queue.sync(flags: .barrier) {
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .barrier flag with queue.sync for a read-only operation. The findMergeablePairs method only reads from speakerDatabase and doesn't modify it, so it should use queue.sync without the .barrier flag for better concurrency. The .barrier flag is meant for write operations that need exclusive access.

Suggested change
queue.sync(flags: .barrier) {
queue.sync {

Copilot uses AI. Check for mistakes.
}
}

/// remove non-permanent speakers that meet a certain predicate
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization: method name "remove non-permanent speakers" should start with a capital letter to be consistent with other method documentation. Should be: "Remove non-permanent speakers that meet a certain predicate"

Suggested change
/// remove non-permanent speakers that meet a certain predicate
/// Remove non-permanent speakers that meet a certain predicate

Copilot uses AI. Check for mistakes.
/// Update main embedding with new segment data using exponential moving average
/// - Parameters:
/// - duration: segment duration
/// -
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete parameter documentation. The - is present but the parameter description is missing. This should describe what the embedding parameter is.

Suggested change
/// -
/// - embedding: The new embedding vector for the speaker segment

Copilot uses AI. Check for mistakes.
@Alex-Wengg
Copy link
Copy Markdown
Member

@SGD2718 SGD2718 requested a review from Alex-Wengg November 8, 2025 04:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift:154

  • The assignSpeaker function accepts newName parameter (line 121) but doesn't pass it to createNewSpeaker. This means the custom name provided by callers will be ignored when creating new speakers.
                let newSpeakerId = createNewSpeaker(
                    embedding: normalizedEmbedding,
                    duration: speechDuration,
                    distanceToClosest: distance
                )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Alex-Wengg
Copy link
Copy Markdown
Member

Looks like the tests finally passed haha. Just that annoying format error

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@SGD2718
Copy link
Copy Markdown
Collaborator Author

SGD2718 commented Nov 8, 2025

There we go

@SGD2718 SGD2718 requested a review from Copilot November 8, 2025 05:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +69 to +71
logger.warning(
"Failed to overwrite Speaker \(speaker.id) because it is permanent. Skipping")
}
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the overwrite is skipped due to permanence, the code should use continue to skip the rest of the loop iteration. Without it, the code proceeds to line 89-96, which incorrectly tracks the numeric ID and logs "Initialized known speaker" even though the speaker was not actually added or updated. Add continue after line 70.

Copilot uses AI. Check for mistakes.
existingSpeaker.rawEmbeddings = rawEmbeddings
existingSpeaker.updateCount = updateCount
existingSpeaker.updatedAt = updatedAt ?? now
existingSpeaker.isPermanent = isPermanent
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When updating an existing speaker, unconditionally setting isPermanent to the parameter value (which defaults to false) can inadvertently remove the permanent flag from speakers that were previously marked as permanent. Consider preserving the existing isPermanent value unless explicitly provided, or document this behavior clearly to prevent accidental removal of permanence.

Copilot uses AI. Check for mistakes.
SGD2718 and others added 6 commits November 7, 2025 21:35
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@SGD2718 SGD2718 requested a review from Copilot November 8, 2025 05:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift:156

  • The newName parameter accepted by assignSpeaker (line 123) is not passed to createNewSpeaker, which has a name parameter (line 481). This means the caller cannot specify a custom name for newly created speakers, making the parameter non-functional.
                let newSpeakerId = createNewSpeaker(
                    embedding: normalizedEmbedding,
                    duration: speechDuration,
                    distanceToClosest: distance
                )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SGD2718
Copy link
Copy Markdown
Collaborator Author

SGD2718 commented Nov 8, 2025

Alright I think that that should be all the formatting issues

@SGD2718
Copy link
Copy Markdown
Collaborator Author

SGD2718 commented Nov 8, 2025

Finally

@Alex-Wengg
Copy link
Copy Markdown
Member

@SGD2718 looks good to me great job on your first PR !

@SGD2718
Copy link
Copy Markdown
Collaborator Author

SGD2718 commented Nov 8, 2025

Do i close the PR or do you do that?

@Alex-Wengg
Copy link
Copy Markdown
Member

@SGD2718 you can merge it, lmk if you are unable to merge the pr

@SGD2718
Copy link
Copy Markdown
Collaborator Author

SGD2718 commented Nov 9, 2025

I don't think I can merge it. My earlier question was due to my getting confused about what the close with comment button did.

@Alex-Wengg
Copy link
Copy Markdown
Member

I will merge it , the close with comment would cause the pr to be cancelled

@Alex-Wengg Alex-Wengg merged commit 3985a4e into FluidInference:main Nov 9, 2025
10 checks passed
Alex-Wengg pushed a commit that referenced this pull request Jan 1, 2026
SGD2718 added a commit that referenced this pull request Jan 4, 2026
Alex-Wengg pushed a commit that referenced this pull request Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request speaker-diarization Issues related to speaker diarization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants