Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
7ac6e2f
test(transcription): failing cases for build_transcript_payload
thc1006 Apr 18, 2026
13ba7c2
feat(transcription): add build_transcript_payload helper
thc1006 Apr 18, 2026
b2b675c
test(storage): failing cases for published_transcripts persistence
thc1006 Apr 18, 2026
69c9b9e
feat(storage): persist published-transcript bookkeeping in mongo
thc1006 Apr 18, 2026
26e1aa3
test(prodtrack): failing cases for publish_transcript / update_transc…
thc1006 Apr 18, 2026
c6eeabf
feat(prodtrack): publish_transcript / update_transcript on contract + SG
thc1006 Apr 18, 2026
dc5ee7b
test(api): failing cases for POST /playlists/{id}/publish-transcript
thc1006 Apr 18, 2026
abd142a
feat(api): POST /playlists/{id}/publish-transcript, flag-gated
thc1006 Apr 18, 2026
04a7f27
test(core): failing case for publishTranscript on ApiHandler
thc1006 Apr 18, 2026
6500a41
feat(core): PublishTranscript types + ApiHandler.publishTranscript
thc1006 Apr 18, 2026
6cba7c2
test(hooks): failing cases for usePublishTranscript
thc1006 Apr 18, 2026
737c5ec
test(ui): failing cases for PublishTranscriptDialog
thc1006 Apr 18, 2026
39d8247
feat(app): PublishTranscriptDialog + trigger in TranscriptPanel
thc1006 Apr 18, 2026
44a3ba3
docs(transcript): QUICKSTART, DEPLOYMENT, pipeline + ADR-005/006/007
thc1006 Apr 18, 2026
3407ee2
fix(transcript-publish): review-pass corrections
thc1006 Apr 18, 2026
3708264
fix(transcript-publish): second-pass corrections
thc1006 Apr 18, 2026
4c416da
fix(transcript-publish): third-pass corrections
thc1006 Apr 18, 2026
a366612
fix(transcript-publish): a11y + doc drift
thc1006 Apr 18, 2026
6faac71
style(tests): sort from-main import into third-party block
thc1006 Apr 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,52 @@ echo -n "new-value" | gcloud secrets versions add SECRET_NAME --data-file=-

---

## Transcript Publishing Setup (optional, issue #120)

`POST /playlists/{id}/publish-transcript` is feature-flagged off by default.
Turn it on only after the ShotGrid site is prepared.

### ShotGrid site-side checklist

1. In **Site Preferences -> Entities**, enable one of the `CustomEntityNN`
slots and set its display name (e.g. "DNA Note"). Note the slot
number — the API still addresses it as `CustomEntityNN`, not the
display name.
2. On that custom entity, add the following fields:
- `code` (text, built-in)
- `project` (entity link -> Project, built-in)
- `sg_playlist` (entity link -> Playlist)
- `sg_versions` (multi-entity link -> Version)
- `sg_meeting_id` (text)
- `sg_meeting_date` (date)
- `sg_platform` (list: `google_meet`, `teams`)
- `sg_summary` (text, long; left blank by V1, users fill in manually)
- `sg_transcript_body` (text, long)
3. Grant the DNA script user read/create/update on the new entity.

### DNA side

Set both variables. The endpoint stays 404 without the flag.

```
DNA_ENABLE_TRANSCRIPT_PUBLISH=true
SHOTGRID_TRANSCRIPT_ENTITY=CustomEntity05 # whichever slot you enabled
```

For the frontend build, also set the Vite flag so the Publish button
renders:

```
VITE_ENABLE_TRANSCRIPT_PUBLISH=true
```

If the flag is off or the custom entity has not been provisioned, the
backend returns 404 on that route; the frontend does not show the
Publish button. Dropping the flag reverts behaviour with no data
migration.

---

## Authentication Setup

DNA uses Google OAuth for authentication. Users sign in with their Google accounts, and the backend validates Google tokens.
Expand Down
2 changes: 2 additions & 0 deletions QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ The React app will be available at `http://localhost:5173`.
| `GEMINI_MODEL` | No | `gemini-2.5-flash` | Gemini model to use when `LLM_PROVIDER=gemini` |
| `GEMINI_TIMEOUT` | No | `30.0` | Request timeout in seconds when `LLM_PROVIDER=gemini` |
| `GEMINI_URL` | No | `https://generativelanguage.googleapis.com/v1beta/openai/` | Override the Gemini OpenAI-compatible base URL |
| `DNA_ENABLE_TRANSCRIPT_PUBLISH` | No | `false` | Set to `true` to enable `POST /playlists/{id}/publish-transcript`. When off, the endpoint returns 404. |
| `SHOTGRID_TRANSCRIPT_ENTITY` | No | `CustomEntity01` | ShotGrid custom entity slot used when publishing transcripts. Match whichever `CustomEntityNN` the site admin has enabled. |
| `PYTHONUNBUFFERED` | No | `1` | Disable Python output buffering |

### Vexa Service (`vexa` service)
Expand Down
92 changes: 92 additions & 0 deletions backend/docs/TRANSCRIPTION_PIPELINE.md
Original file line number Diff line number Diff line change
Expand Up @@ -1340,3 +1340,95 @@ logging.getLogger("dna.events.event_publisher").setLevel(logging.DEBUG)
- The bot remains in the meeting during pause, ready to resume instantly
- `transcription_resumed_at` prevents replay of stale segments
- Minimal state changes: only a boolean flag and an optional timestamp

---

## Publishing to the Production Tracking System

Tracked by issue #120. Off by default behind `DNA_ENABLE_TRANSCRIPT_PUBLISH=true`.

### Pipeline

```
POST /playlists/{playlist_id}/publish-transcript {version_id}
-> storage.get_playlist_metadata(playlist_id) # meeting_id, platform
-> storage.get_segments_for_version(...) # existing call
-> build_transcript_payload(segments) # pure, dedupe + collapse
-> storage.get_published_transcript(...) # bookkeeping lookup
-> prodtrack.publish_transcript(entity_type from env, ...)
# create path: reads SHOTGRID_TRANSCRIPT_ENTITY
/ prodtrack.update_transcript(entity_type=existing.sg_entity_type, ...)
# update path: honours the bookkeeping row, not the current env
-> storage.upsert_published_transcript(...)
-> { transcript_entity_id, outcome: created | updated | skipped }
```

### Collections touched

| Collection | Used for |
|------------|----------|
| `segments` | Source of the transcript body (read-only here) |
| `playlist_metadata` | Pulls `meeting_id` + `platform` |
| `published_transcripts` | Stores the SG entity ID and body_hash per `(playlist_id, version_id, meeting_id)` |

### ShotGrid side

Publishes a row into `SHOTGRID_TRANSCRIPT_ENTITY` (default `CustomEntity01`).
Payload mapping:

| DNA field | ShotGrid field |
|-----------|----------------|
| `code` (auto) | `code` |
| `project_id` | `project` |
| `playlist_id` | `sg_playlist` |
| `[version_id]` | `sg_versions` |
| `meeting_id` | `sg_meeting_id` |
| `meeting_date` | `sg_meeting_date` |
| `platform` | `sg_platform` |
| `body` | `sg_transcript_body` |

`sg_summary` is intentionally left blank in V1 so studio staff can fill it
on the ShotGrid side without the publisher overwriting it.

### ADR-005: Custom entity, not a ShotGrid Note

**Decision:** Transcripts live in a custom entity (configurable via
`SHOTGRID_TRANSCRIPT_ENTITY`), not as ShotGrid `Note` rows.

**Rationale:**
- Notes are tied to review addressings and read state; transcripts are
reference material with different fields.
- Admins can restrict the custom-entity page per the mockup on #120
without affecting Notes.
- The field shape (playlist link + multi-version link + `sg_platform`
list + long `sg_transcript_body`) does not fit Note's schema.

### ADR-006: Idempotence via body_hash in MongoDB, not SG lookup

**Decision:** Track which `(playlist, version, meeting)` tuples have
been published in a local Mongo collection. Skip re-publish when the
new body_hash matches the stored one. The bookkeeping row also stores
`sg_entity_type`; the update path uses that value instead of the
current `SHOTGRID_TRANSCRIPT_ENTITY` env so studios can migrate to a
new custom-entity slot without breaking updates on already-published
rows.

**Rationale:**
- SG is not efficiently queryable for "has this been published before".
- The existing DraftNote publish path uses the same pattern
(`published_note_id` on the draft).
- Loss of the Mongo row is a known edge-case; duplicate SG rows in that
scenario are an acceptable V1 trade-off documented on issue #120.
- Pinning the entity_type to the bookkeeping row (not env) prevents
misdirected updates after a slot migration.

### ADR-007: Build publishable body at publish time, not ingest time

**Decision:** `build_transcript_payload` is called inside the publish
endpoint, not in the ingest pipeline.

**Rationale:**
- Dedup rules may change once issue #135 lands (Vexa-side segment IDs
become authoritative). Keeping the builder isolated means that change
is one file here rather than a re-ingest.
- The builder is pure and trivially testable, unlike the ingest loop.
5 changes: 5 additions & 0 deletions backend/example.docker-compose.local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,8 @@ services:
- VEXA_API_URL=http://vexa:8056
- OPENAI_API_KEY=your-openai-api-key
- AUTH_PROVIDER=none
# Transcript publishing (V1, disabled by default). Set to "true" to
# expose POST /playlists/{id}/publish-transcript. See DEPLOYMENT.md
# for the ShotGrid site-setup checklist the custom entity depends on.
- DNA_ENABLE_TRANSCRIPT_PUBLISH=false
- SHOTGRID_TRANSCRIPT_ENTITY=CustomEntity01
12 changes: 11 additions & 1 deletion backend/src/dna/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
PlaylistMetadata,
PlaylistMetadataUpdate,
)
from dna.models.published_transcript import (
PublishedTranscript,
PublishedTranscriptUpdate,
)
from dna.models.requests import (
CreateNoteRequest,
EntityLink,
Expand All @@ -36,6 +40,8 @@
GenerateNoteResponse,
PublishNotesRequest,
PublishNotesResponse,
PublishTranscriptRequest,
PublishTranscriptResponse,
SearchRequest,
SearchResult,
StatusOption,
Expand Down Expand Up @@ -69,6 +75,7 @@
"Version",
"Playlist",
"User",
"Transcript",
"DNAEntity",
"ENTITY_MODELS",
"EntityLink",
Expand All @@ -82,13 +89,17 @@
"StatusOption",
"PublishNotesRequest",
"PublishNotesResponse",
"PublishTranscriptRequest",
"PublishTranscriptResponse",
"DraftNote",
"DraftNoteBase",
"DraftNoteCreate",
"DraftNoteLink",
"DraftNoteUpdate",
"PlaylistMetadata",
"PlaylistMetadataUpdate",
"PublishedTranscript",
"PublishedTranscriptUpdate",
"StoredSegment",
"StoredSegmentCreate",
"generate_segment_id",
Expand All @@ -97,7 +108,6 @@
"BotStatusEnum",
"DispatchBotRequest",
"Platform",
"Transcript",
"TranscriptSegment",
"UserSettings",
"UserSettingsUpdate",
Expand Down
45 changes: 45 additions & 0 deletions backend/src/dna/models/published_transcript.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""Published transcript bookkeeping model.

Tracks which (playlist, version, meeting) has already been pushed to the
production tracking system so re-publishing can be idempotent. The actual
transcript content lives in SG; here we only keep the reference plus a
body_hash used to skip no-op re-publishes.
"""

from datetime import datetime
from typing import Optional

from pydantic import BaseModel, ConfigDict, Field


class PublishedTranscriptUpdate(BaseModel):
"""Upsert payload for the published_transcripts collection."""

playlist_id: int
version_id: int
meeting_id: str
sg_entity_type: str = Field(
description="Custom entity type in the tracking system (e.g. CustomEntity01)"
)
sg_entity_id: int = Field(description="ID of the row created in tracking system")
author_email: str
body_hash: str = Field(description="sha256 of the published body for idempotence")
segments_count: int


class PublishedTranscript(BaseModel):
"""Full record for a row we have pushed to the tracking system."""

model_config = ConfigDict(populate_by_name=True)

id: str = Field(alias="_id")
playlist_id: int
version_id: int
meeting_id: str
sg_entity_type: str
sg_entity_id: int
author_email: str
body_hash: str
segments_count: int
created_at: datetime
updated_at: datetime
17 changes: 17 additions & 0 deletions backend/src/dna/models/requests.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,20 @@ class PublishNotesResponse(BaseModel):
skipped_count: int
failed_count: int
total: int


class PublishTranscriptRequest(BaseModel):
"""Request to publish a version's captured transcript."""

version_id: int = Field(description="Version whose segments to publish")


class PublishTranscriptResponse(BaseModel):
"""Response from the publish-transcript endpoint."""

transcript_entity_id: int = Field(
description="Entity ID of the row in the tracking system"
)
outcome: str = Field(description="created | updated | skipped")
skipped_reason: Optional[str] = None
segments_count: int
12 changes: 12 additions & 0 deletions backend/src/dna/prodtrack_providers/mock_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -596,3 +596,15 @@ def attach_file_to_note(
self, note_id: int, file_path: str, display_name: str
) -> bool:
return True

def publish_transcript(self, **_: object) -> int:
raise NotImplementedError(
"Transcript publishing requires a live ShotGrid connection. "
"Set PRODTRACK_PROVIDER=shotgrid to use it."
)

def update_transcript(self, **_: object) -> bool:
raise NotImplementedError(
"Transcript publishing requires a live ShotGrid connection. "
"Set PRODTRACK_PROVIDER=shotgrid to use it."
)
38 changes: 38 additions & 0 deletions backend/src/dna/prodtrack_providers/prodtrack_provider_base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
from datetime import date
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
Expand Down Expand Up @@ -190,6 +191,43 @@ def attach_file_to_note(
"""
raise NotImplementedError("Subclasses must implement this method.")

def publish_transcript(
self,
*,
project_id: int,
playlist_id: int,
version_id: int,
meeting_id: str,
meeting_date: date,
platform: str,
body: str,
) -> int:
"""Create a transcript row in the production tracking system.

Returns the entity ID of the newly-created row.
"""
raise NotImplementedError("Subclasses must implement this method.")

def update_transcript(
self,
*,
entity_type: str,
entity_id: int,
body: str,
meeting_date: date,
) -> bool:
"""Update body + meeting_date on an existing transcript entity.

`entity_type` must come from the caller's bookkeeping (whichever
custom-entity slot the row was originally created in). Reading the
current env var here would misfire if studios migrate between slots.

Only body and meeting_date are touched on purpose; summary and other
fields are left alone so manual edits on the tracking-system side
survive a re-publish.
"""
raise NotImplementedError("Subclasses must implement this method.")


def get_prodtrack_provider() -> ProdtrackProviderBase:
"""Get the production tracking provider."""
Expand Down
Loading
Loading