Skip to content

Commit d6402bd

Browse files
authored
SpeakerManager improvements (#180)
1 parent 4350b16 commit d6402bd

5 files changed

Lines changed: 832 additions & 22 deletions

File tree

Documentation/SpeakerManager.md

Lines changed: 183 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# SpeakerManager API
23

34
Tracks and manages speaker identities across audio chunks for streaming diarization.
@@ -73,6 +74,21 @@ let bob = Speaker(id: "bob", name: "Bob", currentEmbedding: bobEmbedding)
7374
speakerManager.initializeKnownSpeakers([alice, bob])
7475
```
7576

77+
Sometimes, there are already speakers in the database that may have the same ID.
78+
```swift
79+
let alice = Speaker(id: "alice", name: "Alice", currentEmbedding: aliceEmbedding)
80+
let bob = Speaker(id: "bob", name: "Bob", currentEmbedding: bobEmbedding)
81+
speakerManager.initializeKnownSpeakers([alice, bob], mode: .overwrite, preserveIfPermanent: false) // replace any speakers with ID "alice" or "bob" with the new speakers, even if the old ones were marked as permanent.
82+
```
83+
84+
> The `mode` argument dictates how to handle redundant speakers. It is of type `SpeakerInitializationMode`, and can take on one of four values:
85+
> - `.reset`: reset the speaker database and add the new speakers
86+
> - `.merge`: merge new speakers whose IDs match with existing ones
87+
> - `.overwrite`: overwrite existing speakers with the same IDs as the new ones
88+
> - `.skip`: skip adding speakers whose IDs match existing ones
89+
>
90+
> The `preserveIfPermanent` argument determines whether existing speakers marked as permanent should be preserved (i.e., not overwritten or merged). It is `true` by default.
91+
7692
**Use case:** When you have pre-recorded voice samples of known speakers and want to recognize them by name instead of numeric IDs.
7793

7894
#### upsertSpeaker
@@ -91,16 +107,145 @@ speakerManager.upsertSpeaker(
91107
updateCount: 5, // optional
92108
createdAt: Date(), // optional
93109
updatedAt: Date() // optional
110+
isPermanent: false // optional
94111
)
95112
```
96113

97114
**Behavior:**
98115
- If speaker ID exists: updates the existing speaker's data
99116
- If speaker ID is new: inserts as a new speaker
100117
- Maintains ID uniqueness and tracks numeric IDs for auto-increment
118+
- If `isPermanent` is true, then the new speaker or the existing speaker will become permanent. This means that the speaker will not be merged or removed without an override.
119+
120+
#### mergeSpeaker
121+
```swift
122+
// merge speaker 1 into "alice"
123+
speakerManager.mergeSpeaker("1", into: "alice")
124+
125+
// merge speaker 2 into speaker 3 under the name "bob", regardless of whether speaker 2 is permanent.
126+
speakerManager.mergeSpeaker("2", into: "3", mergedName: "Bob", stopIfPermanent: false)
127+
```
128+
129+
**Behavior:**
130+
- Unless `stopIfPermanent` is `false`, the merge will be stopped if the first speaker is permanent.
131+
- Otherwise: Merges the first speaker into the destination speaker and removes the first speaker from the known speaker database.
132+
- If `mergedName` is provided, the destination speaker will be renamed. Otherwise, its name will be preserved.
133+
134+
> Note: the `mergedName` argument is optional.
135+
> Note: `stopIfPermanent` is `true` by default.
136+
137+
#### removeSpeaker
138+
Remove a speaker from the database.
139+
140+
```swift
141+
// remove speaker 1
142+
speakerManager.removeSpeaker("1")
143+
144+
// remove "alice" from the known speaker database, even if they are marked as permanent
145+
speakerManager.removeSpeaker("alice", keepIfPermanent: false)
146+
```
147+
> Note: `keepIfPermanent` is `true` by default.
148+
149+
#### removeSpeakersInactive
150+
Remove speakers that have been inactive since a certain date or for a certain duration.
151+
152+
```swift
153+
// remove speakers that have been inactive since `date`
154+
speakerManager.removeSpeakersInactive(since: date)
155+
156+
// remove speakers that have been inactive for 10 seconds, even if they were marked as permanent
157+
speakerManager.removeSpeakersInactive(for: 10.0, keepIfPermanent: false)
158+
```
159+
160+
> Note: Both versions of the method have an optional `keepIfPermanent` argument that defaults to `true`.
161+
162+
#### removeAllSpeakers
163+
Remove all speakers that match a given predicate.
164+
165+
```swift
166+
// remove all speakers with less than 5 seconds of speaking time
167+
speakerManager.removeSpeakers(
168+
where: { $0.duration < 5.0 },
169+
keepIfPermanent: false // also remove permanent speakers (optional)
170+
)
171+
172+
// Alternate syntax (does NOT remove permanent speakers)
173+
speakerManager.removeSpeakers {
174+
$0.duration < 5.0
175+
}
176+
```
177+
178+
> Note: the predicate should take in a `Speaker` object and return a `Bool`.
179+
180+
#### makeSpeakerPermanent
181+
Make the speaker permanent.
182+
183+
```swift
184+
speakerManager.makeSpeakerPermanent("alice") // mark "alice" as permanent
185+
```
186+
187+
#### revokePermanence
188+
Make the speaker not permanent.
189+
190+
```swift
191+
speakerManager.revokePermanence(from: "alice") // mark "alice" as not permanent
192+
```
193+
194+
#### resetPermanentFlags
195+
Mark all speakers as not permanent.
196+
197+
```swift
198+
speakerManager.resetPermanentFlags()
199+
```
101200

102201
### Speaker Retrieval
103202

203+
#### findSpeaker
204+
Find the best matching speaker to an embedding vector and the cosine distance to them, unless no match is found.
205+
206+
```swift
207+
let (id, distance) = speakerManager.findSpeaker(with: embedding)
208+
```
209+
> Note: there is an optional `speakerThreshold` argument to use a threshold other than the default.
210+
211+
#### findMatchingSpeakers
212+
Find all speakers within the maximum `speakerThreshold` to an embedding vector.
213+
214+
```swift
215+
for speaker in speakerManager.findMatchingSpeakers(with: embedding) {
216+
print("ID: \(speaker.id), Distance: \(speaker.distance)")
217+
}
218+
```
219+
220+
> Note: there is an optional `speakerThreshold` argument to use a threshold other than the default.
221+
222+
#### findSpeakers
223+
Find all speakers that meet a certain predicate.
224+
```swift
225+
// two ways to find all speakers with > 5.0s of speaking time.
226+
speakerManager.findSpeakers(where: { $0.duration > 5.0 })
227+
speakerManager.findSpeakers{ $0.duration > 5.0 }
228+
// Returns an array of IDs corresponding to speakers that meet the predicate.
229+
```
230+
231+
> Note: the predicate should take in a `Speaker` object and return a `Bool`.
232+
233+
#### findMergeablePairs
234+
Find all pairs of speakers that might be the same person. Specifically, find the pairs of speakers such that the cosine distance between them is less than the `speakerThreshold`.
235+
236+
Returns a list of pairs of speaker IDs.
237+
238+
```swift
239+
let pairs = speakerManager.findMergeablePairs(
240+
speakerThreshold: 0.6, // optional
241+
excludeIfBothPermanent: true // optional
242+
)
243+
244+
for pair in pairs {
245+
print("Merge Speaker \(pair.speakerToMerge) into Speaker \(pair.destination)")
246+
}
247+
```
248+
104249
#### getSpeaker
105250
Get a specific speaker by ID.
106251

@@ -118,6 +263,22 @@ let allSpeakers = speakerManager.getAllSpeakers()
118263
// Returns: [String: Speaker] - dictionary keyed by speaker ID
119264
```
120265

266+
#### getSpeakerList
267+
Get all speakers in the database as an array of speakers (for testing/debugging)
268+
```swift
269+
let allSpeakers = speakerManager.getSpeakerList()
270+
// Returns: [Speaker] - Array of speakers
271+
```
272+
273+
#### hasSpeaker
274+
Check if the speaker database has a speaker with a given ID.
275+
276+
```swift
277+
if speakerManager.hasSpeaker("alice") {
278+
print("Alice was found in the database")
279+
}
280+
```
281+
121282
#### speakerCount
122283
Get the total number of tracked speakers.
123284

@@ -140,13 +301,16 @@ Clear all speakers from the database.
140301

141302
```swift
142303
speakerManager.reset()
304+
speakerManager.reset(keepIfPermanent: true) // remove all non-permanent speakers from the database
143305
```
144306

145307
Useful for:
146308
- Starting a new session
147309
- Freeing memory between recordings
148310
- Resetting speaker tracking
149311

312+
313+
150314
## Speaker Enrollment
151315

152316
The `Speaker` class includes a `name` field for speaker enrollment workflows:
@@ -237,6 +401,7 @@ public final class Speaker: Identifiable, Codable {
237401
public var updatedAt: Date // Last update timestamp
238402
public var updateCount: Int // Number of updates
239403
public var rawEmbeddings: [RawEmbedding] // Historical embeddings (max 50)
404+
public var isPermanent: Bool // Permanence flag
240405
}
241406
```
242407

@@ -547,13 +712,25 @@ class RealtimeDiarizer {
547712
| Method | Returns | Description |
548713
|--------|---------|-------------|
549714
| `assignSpeaker(_:speechDuration:confidence:)` | `Speaker?` | Assign/create speaker from embedding |
550-
| `initializeKnownSpeakers(_:)` | `Void` | Pre-load known speaker profiles |
715+
| `initializeKnownSpeakers(_:mode:preserveIfPermanent:)` | `Void` | Pre-load known speaker profiles |
716+
| `findSpeaker(with:speakerThreshold:)` | `(id: String?, distance: Float)` | Find speaker that matches an embedding |
717+
| `findMatchingSpeakers(with:speakerThreshold:)` | `[(id: String, distance: Float)]` | Find all speakers that match an embedding |
718+
| `findSpeakers(where:)` | `[String]` | Find all speakers that meet a certain predicate
719+
| `findMergeablePairs(speakerThreshold:excludeIfBothPermanent:)` | `[(speakerToMerge: String, destination: String)]` | Find all pairs of very similar speakers |
720+
| `removeSpeaker(_:keepIfPermanent:)` | `Void` | Remove a speaker from the database |
721+
| `removeSpeakersInactive(since:keepIfPermanent:)` | `Void` | Remove speakers inactive since a given date |
722+
| `removeSpeakersInactive(for:keepIfPermanent:)` | `Void` | Remove speakers inactive for a given duration |
723+
| `removeSpeakers(where:)` | `Void` | Remove speakers that satisfy a given predicate |
724+
| `removeSpeakers(where:keepIfPermanent:)` | `Void` | Remove speakers that satisfy a given predicate |
725+
| `mergeSpeaker(_:into:mergedName:stopIfPermanent:)` | `Void` | Merge a speaker into another one |
551726
| `upsertSpeaker(_:)` | `Void` | Update or insert speaker (from object) |
552727
| `upsertSpeaker(id:currentEmbedding:duration:...)` | `Void` | Update or insert speaker (from params) |
553728
| `getSpeaker(for:)` | `Speaker?` | Get speaker by ID |
554729
| `getAllSpeakers()` | `[String: Speaker]` | Get all speakers (debugging) |
555-
| `reset()` | `Void` | Clear speaker database |
556-
| `reassignSegment(segmentId:from:to:)` | `Bool` | Move segment between speakers |
730+
| `getSpeakerList()` | `[Speaker]` | Get array of all speakers (debugging) |
731+
| `hasSpeaker(_:)` | `Bool` | Check if database has a speaker with a given ID |
732+
| `reset(keepIfPermanent:)` | `Void` | Clear speaker database |
733+
| `resetPermanentFlags()` | `Void` | Mark all speakers as not permanent |
557734
| `getCurrentSpeakerNames()` | `[String]` | Get sorted speaker IDs |
558735
| `getGlobalSpeakerStats()` | `(Int, Float, Float, Int)` | Aggregate statistics |
559736

@@ -567,6 +744,7 @@ class RealtimeDiarizer {
567744
| `minEmbeddingUpdateDuration` | `Float` | Min duration to update embeddings (seconds) |
568745
| `speakerCount` | `Int` | Number of tracked speakers |
569746
| `speakerIds` | `[String]` | Sorted array of speaker IDs |
747+
| `permanentSpeakerIds` | `[String]` | Sorted array of speaker IDs of permanent speakers |
570748

571749
### Speaker Properties
572750

@@ -580,6 +758,7 @@ class RealtimeDiarizer {
580758
| `updatedAt` | `Date` | Last update timestamp |
581759
| `updateCount` | `Int` | Number of embedding updates |
582760
| `rawEmbeddings` | `[RawEmbedding]` | Historical embeddings (max 50) |
761+
| `isPermanent` | `Bool` | Permanence flag |
583762

584763
### Speaker Methods
585764

@@ -602,6 +781,7 @@ class RealtimeDiarizer {
602781
| `averageEmbeddings(_:)` | `[Float]?` | Average multiple embeddings |
603782
| `createSpeaker(id:name:duration:embedding:config:)` | `Speaker?` | Create validated speaker |
604783
| `updateEmbedding(current:new:alpha:)` | `[Float]?` | EMA update (pure function) |
784+
| `reassignSegment(segmentId:from:to:)` | `Bool` | Move segment between speakers |
605785

606786
## See Also
607787

0 commit comments

Comments
 (0)