Skip to content

Transcript segments mapped to wrong version when user switches versions mid-utterance (decision made at confirmation time, not speech time) #140

@DmitriyG228

Description

@DmitriyG228

Summary

Transcript segments are assigned to the wrong version when the user switches versions while speaking. Long pending-to-confirmed latency on the Vexa side (10–60 s observed) means a segment finalized now often represents speech that started under a previous in_review. We currently tag the segment with whichever version is in review at confirmation time, not at speech time.

Downstream: LLM note generation for version A misses feedback that was spoken while A was the in-review version (because the segment landed on B); notes for B include feedback that was actually about A.

Current behaviour (the bug)

backend/src/dna/transcription_service.py::on_transcription_updated — line 256:

metadata = await self.storage_provider.get_playlist_metadata(playlist_id)
...
version_id = metadata.in_review        # ← read ONCE per tick
...
for seg in confirmed:
    ...
    await self.storage_provider.upsert_segment(
        playlist_id=playlist_id,
        version_id=version_id,          # ← same version_id for every seg in the tick
        segment_id=segment_id,
        data=segment_create,
    )

Every segment in a single Vexa tick gets the same version_id, chosen by reading metadata.in_review exactly once at the top of the handler. The segment's own absolute_start_time (when the speech happened) is ignored for routing.

Why this drifts

Vexa's pending-to-confirmed latency is not small. Sampled pair from the live Vexa Cloud feed (Google Meet, vexa_meeting_id=10459, speaker "Dmitriy Grankin"):

// final pending draft before confirmation
{ "absolute_start_time": "2026-04-20T19:35:51.013Z",
  "absolute_end_time":   "2026-04-20T19:35:54.373Z" }   // 3.4 s window

// confirmed (same utterance)
{ "absolute_start_time": "2026-04-20T19:35:51.013Z",
  "absolute_end_time":   "2026-04-20T19:36:51.689Z" }   // 60.7 s window

So a single confirmed segment can span 60+ seconds of speech, finalizing long after the speech began. If the user switched versions during that 60-second window, the whole segment lands on the new version.

Reproduction (manual)

t=0   view version A → in_review = A
t=2   speak "mk020_0020 looks great"             [Vexa: pending draft]
t=15  click version B → in_review = B           (PUT /playlists/<id>/metadata)
t=17  speak "but mk020_0250 is off"              [Vexa: new pending]
t=62  (no user action)                           [Vexa: confirmed segment,
                                                  absolute_start_time=t+2s,
                                                  absolute_end_time=t+62s,
                                                  text spans both utterances]
      → DNA reads metadata.in_review = B         → stored as version_id=B

Expected: feedback about version A is stored under A. Actual: all 60 s of speech attributed to B.

Expected behaviour

A confirmed segment should be stored against the version that was in review during the speech, not during the confirmation.

If the speech straddles a version switch (e.g., 10 s under A and 50 s under B), either:

  • (a) assign to whichever version covered the majority of [absolute_start_time, absolute_end_time], OR
  • (b) split the segment at the boundary (requires Vexa word-level timestamps — out of scope for a first pass).

(a) is sufficient for correct LLM attribution in practice; (b) is a future refinement.

Proposed fix

Track in_review history and look up the historical value by segment timestamp at save time.

1. New collection playlist_metadata_history

Append-only log of in_review transitions per playlist.

{
  "_id": ObjectId(...),
  "playlist_id": 45,
  "version_id": 6991,                   // the in_review value during this span
  "started_at": "2026-04-20T19:35:00Z",
  "ended_at":   "2026-04-20T19:50:12Z"  // null if still active
}

Compound index: {playlist_id: 1, started_at: 1} (also supports range queries).

2. Storage provider additions

async def append_in_review_history(self, playlist_id: int, version_id: int, at: datetime) -> None:
    """Close the open row for this playlist (ended_at=at) and insert a new
    row with started_at=at, version_id=new value."""

async def get_in_review_at(self, playlist_id: int, at: datetime) -> Optional[int]:
    """Return the version_id that was in_review at the given instant, or
    None if unknown (no history yet / pre-history segment)."""

3. Wire-up on in_review change

In upsert_playlist_metadata (or at the endpoint level), when the new in_review differs from the existing one, call append_in_review_history(playlist_id, new_in_review, now).

4. Route segments at save time by speech timestamp

In on_transcription_updated:

for seg in confirmed:
    ...
    # Midpoint of the utterance — fair split for segments that straddle a
    # switch (option A above). Falls back to current metadata.in_review if
    # history lookup returns nothing (first run, clock skew, pre-history).
    start = datetime.fromisoformat(absolute_start_time.replace("Z", "+00:00"))
    end   = datetime.fromisoformat(absolute_end_time.replace("Z", "+00:00"))
    midpoint = start + (end - start) / 2
    seg_version = await self.storage_provider.get_in_review_at(playlist_id, midpoint)
    if seg_version is None:
        seg_version = metadata.in_review            # fallback
    await self.storage_provider.upsert_segment(
        playlist_id=playlist_id,
        version_id=seg_version,
        segment_id=segment_id,
        data=segment_create,
    )

5. Migration

No data migration required — existing segments keep their current version_id. History is only consulted for future saves. On first bot dispatch after deploy, seed an initial history row from current metadata.in_review (or let the fallback handle it).

Acceptance

  • Speaking a sentence while viewing A, then switching to B mid-sentence, then pausing: the confirmed segment (which Vexa finalises after the switch) is stored with version_id = A because the utterance midpoint falls in A's span.
  • Speaking across a switch: the segment is assigned to whichever version covered the majority of the utterance duration.
  • No regression on playlists that haven't had a version switch during the transcription.
  • Unit tests covering: (a) single-version happy path, (b) mid-utterance switch, (c) fallback when history is missing.
  • playlist_metadata_history row written on each actual in_review change (not on idempotent PUTs that don't change the value).

Out of scope

  • Splitting segments at a boundary (requires Vexa word-level timestamps — separate issue).
  • Retroactive re-attribution of existing segments (cost/benefit unclear; probably one-off script later if users want it).

Related

Labels

bug · Backend

Metadata

Metadata

Assignees

Labels

BackendWork Associated with the backendbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions