Transcript segments mapped to wrong version when user switches versions mid-utterance (decision made at confirmation time, not speech time)

## Summary

Transcript segments are assigned to the wrong version when the user switches versions while speaking. Long pending-to-confirmed latency on the Vexa side (10–60 s observed) means a segment finalized **now** often represents speech that started under a **previous** `in_review`. We currently tag the segment with whichever version is in review at **confirmation** time, not at **speech** time.

Downstream: LLM note generation for version A misses feedback that was spoken while A was the in-review version (because the segment landed on B); notes for B include feedback that was actually about A.

## Current behaviour (the bug)

`backend/src/dna/transcription_service.py::on_transcription_updated` — line 256:

```python
metadata = await self.storage_provider.get_playlist_metadata(playlist_id)
...
version_id = metadata.in_review        # ← read ONCE per tick
...
for seg in confirmed:
    ...
    await self.storage_provider.upsert_segment(
        playlist_id=playlist_id,
        version_id=version_id,          # ← same version_id for every seg in the tick
        segment_id=segment_id,
        data=segment_create,
    )
```

Every segment in a single Vexa tick gets the same `version_id`, chosen by reading `metadata.in_review` exactly once at the top of the handler. The segment's own `absolute_start_time` (when the speech happened) is ignored for routing.

## Why this drifts

Vexa's pending-to-confirmed latency is **not small**. Sampled pair from the live Vexa Cloud feed (Google Meet, vexa_meeting_id=10459, speaker "Dmitriy Grankin"):

```json
// final pending draft before confirmation
{ "absolute_start_time": "2026-04-20T19:35:51.013Z",
  "absolute_end_time":   "2026-04-20T19:35:54.373Z" }   // 3.4 s window

// confirmed (same utterance)
{ "absolute_start_time": "2026-04-20T19:35:51.013Z",
  "absolute_end_time":   "2026-04-20T19:36:51.689Z" }   // 60.7 s window
```

So a single confirmed segment can span 60+ seconds of speech, finalizing long after the speech began. If the user switched versions during that 60-second window, the whole segment lands on the **new** version.

### Reproduction (manual)

```
t=0   view version A → in_review = A
t=2   speak "mk020_0020 looks great"             [Vexa: pending draft]
t=15  click version B → in_review = B           (PUT /playlists/<id>/metadata)
t=17  speak "but mk020_0250 is off"              [Vexa: new pending]
t=62  (no user action)                           [Vexa: confirmed segment,
                                                  absolute_start_time=t+2s,
                                                  absolute_end_time=t+62s,
                                                  text spans both utterances]
      → DNA reads metadata.in_review = B         → stored as version_id=B
```

Expected: feedback about version A is stored under A. Actual: all 60 s of speech attributed to B.

## Expected behaviour

A confirmed segment should be stored against the version that was in review **during the speech**, not during the confirmation.

If the speech straddles a version switch (e.g., 10 s under A and 50 s under B), either:

- (a) assign to whichever version covered the **majority** of `[absolute_start_time, absolute_end_time]`, OR
- (b) split the segment at the boundary (requires Vexa word-level timestamps — out of scope for a first pass).

(a) is sufficient for correct LLM attribution in practice; (b) is a future refinement.

## Proposed fix

Track `in_review` **history** and look up the historical value by segment timestamp at save time.

### 1. New collection `playlist_metadata_history`

Append-only log of `in_review` transitions per playlist.

```jsonc
{
  "_id": ObjectId(...),
  "playlist_id": 45,
  "version_id": 6991,                   // the in_review value during this span
  "started_at": "2026-04-20T19:35:00Z",
  "ended_at":   "2026-04-20T19:50:12Z"  // null if still active
}
```

Compound index: `{playlist_id: 1, started_at: 1}` (also supports range queries).

### 2. Storage provider additions

```python
async def append_in_review_history(self, playlist_id: int, version_id: int, at: datetime) -> None:
    """Close the open row for this playlist (ended_at=at) and insert a new
    row with started_at=at, version_id=new value."""

async def get_in_review_at(self, playlist_id: int, at: datetime) -> Optional[int]:
    """Return the version_id that was in_review at the given instant, or
    None if unknown (no history yet / pre-history segment)."""
```

### 3. Wire-up on in_review change

In `upsert_playlist_metadata` (or at the endpoint level), when the new `in_review` differs from the existing one, call `append_in_review_history(playlist_id, new_in_review, now)`.

### 4. Route segments at save time by speech timestamp

In `on_transcription_updated`:

```python
for seg in confirmed:
    ...
    # Midpoint of the utterance — fair split for segments that straddle a
    # switch (option A above). Falls back to current metadata.in_review if
    # history lookup returns nothing (first run, clock skew, pre-history).
    start = datetime.fromisoformat(absolute_start_time.replace("Z", "+00:00"))
    end   = datetime.fromisoformat(absolute_end_time.replace("Z", "+00:00"))
    midpoint = start + (end - start) / 2
    seg_version = await self.storage_provider.get_in_review_at(playlist_id, midpoint)
    if seg_version is None:
        seg_version = metadata.in_review            # fallback
    await self.storage_provider.upsert_segment(
        playlist_id=playlist_id,
        version_id=seg_version,
        segment_id=segment_id,
        data=segment_create,
    )
```

### 5. Migration

No data migration required — existing segments keep their current `version_id`. History is only consulted for future saves. On first bot dispatch after deploy, seed an initial history row from current `metadata.in_review` (or let the fallback handle it).

## Acceptance

- [ ] Speaking a sentence while viewing A, then switching to B mid-sentence, then pausing: the confirmed segment (which Vexa finalises after the switch) is stored with `version_id = A` because the utterance midpoint falls in A's span.
- [ ] Speaking across a switch: the segment is assigned to whichever version covered the majority of the utterance duration.
- [ ] No regression on playlists that haven't had a version switch during the transcription.
- [ ] Unit tests covering: (a) single-version happy path, (b) mid-utterance switch, (c) fallback when history is missing.
- [ ] `playlist_metadata_history` row written on each actual `in_review` change (not on idempotent PUTs that don't change the value).

## Out of scope

- Splitting segments at a boundary (requires Vexa word-level timestamps — separate issue).
- Retroactive re-attribution of existing segments (cost/benefit unclear; probably one-off script later if users want it).

## Related

- #135 / PR #139 — transcript pipeline passthrough refactor (prerequisite; this issue is layered on top).
- `backend/docs/TRANSCRIPT_MESSAGE_FLOW.md` — describes the current flow this issue modifies.

## Labels

`bug` · `Backend`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcript segments mapped to wrong version when user switches versions mid-utterance (decision made at confirmation time, not speech time) #140

Summary

Current behaviour (the bug)

Why this drifts

Reproduction (manual)

Expected behaviour

Proposed fix

1. New collection `playlist_metadata_history`

2. Storage provider additions

3. Wire-up on in_review change

4. Route segments at save time by speech timestamp

5. Migration

Acceptance

Out of scope

Related

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transcript segments mapped to wrong version when user switches versions mid-utterance (decision made at confirmation time, not speech time) #140

Description

Summary

Current behaviour (the bug)

Why this drifts

Reproduction (manual)

Expected behaviour

Proposed fix

1. New collection playlist_metadata_history

2. Storage provider additions

3. Wire-up on in_review change

4. Route segments at save time by speech timestamp

5. Migration

Acceptance

Out of scope

Related

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. New collection `playlist_metadata_history`