Skip to content

feat: transcript publishing workflow (#120)#138

Open
thc1006 wants to merge 19 commits intoAcademySoftwareFoundation:mainfrom
thc1006:issue-120-transcript-publishing
Open

feat: transcript publishing workflow (#120)#138
thc1006 wants to merge 19 commits intoAcademySoftwareFoundation:mainfrom
thc1006:issue-120-transcript-publishing

Conversation

@thc1006
Copy link
Copy Markdown

@thc1006 thc1006 commented Apr 18, 2026

Summary

Adds a feature-flagged workflow that publishes a version's captured transcript into the production tracking system as a single custom-entity row per (playlist, version, meeting). Ships dark: endpoint returns 404 and the UI button is hidden unless DNA_ENABLE_TRANSCRIPT_PUBLISH=true (backend) and VITE_ENABLE_TRANSCRIPT_PUBLISH=true (frontend build) are set.

Closes #120.

What changed

Backend:

  • New pure helper build_transcript_payload in dna/transcription_publish.py (dedupe, sort, same-speaker collapse, stable body_hash).
  • New Mongo collection published_transcripts for bookkeeping, keyed by (playlist_id, version_id, meeting_id); stores the SG entity id, the entity type at create time, and the body_hash used for idempotence.
  • Two methods on ProdtrackProviderBase: publish_transcript (create) and update_transcript (patch body + meeting_date only). ShotGrid implementation targets the slot configured via SHOTGRID_TRANSCRIPT_ENTITY (default CustomEntity01); the mock provider raises NotImplementedError with a user-facing message.
  • New REST endpoint POST /playlists/{playlist_id}/publish-transcript that wires the above together. Returns created / updated / skipped (when body_hash is unchanged). Feature-flagged off by default.

Frontend:

  • New types PublishTranscriptRequest / PublishTranscriptResponse and ApiHandler.publishTranscript in @dna/core.
  • New usePublishTranscript hook and PublishTranscriptDialog Radix-Themes dialog in @dna/app.
  • Trigger button wired into the header of TranscriptPanel, visible only when VITE_ENABLE_TRANSCRIPT_PUBLISH=true and segments exist.

Docs:

  • QUICKSTART.md — new env-var rows.
  • DEPLOYMENT.md — ShotGrid site-setup checklist (enable custom entity, fields, script-user perms).
  • backend/docs/TRANSCRIPTION_PIPELINE.md — new "Publishing to the Production Tracking System" section and ADR-005/006/007 (custom entity, body_hash idempotence pinned to bookkeeping, build-at-publish-time).
  • backend/example.docker-compose.local.yml — the two new vars default off.

Design decisions worth flagging

  • update_transcript uses existing.sg_entity_type from the bookkeeping row, not the current env. This lets studios migrate to a new CustomEntityNN slot without breaking updates to rows created on the previous one.
  • If the SG create succeeds but the local bookkeeping upsert then fails, the endpoint surfaces a 500 whose message includes the SG entity id and logs at exception level, so operators can reconcile manually instead of blindly retrying (which would duplicate the SG row).
  • Coupling with Refactor transcript pipeline: direct Vexa WS passthrough + frontend TranscriptManager #135: segment reading is isolated in build_transcript_payload, so when StoredSegment changes shape the publisher is a one-file diff.

Testing

  • I have tested these changes locally
  • I have run all relevant automated tests
  • I have verified this does not break existing workflows
  • For changes that can be tested in UI, I have included screenshots or gif animations of the changes.
    (Deferred: the Publish button / dialog only render against a real ShotGrid site that has the custom entity provisioned. Screenshots to be added when a test site is available; see DEPLOYMENT.md for the site-setup checklist.)

How I Tested

Backend:

  • make test in the backend docker stack: 552 passed, coverage 91% (floor 90%).
  • New file coverage: transcription_publish.py 100%; published_transcript.py 100%.
  • Brought the stack up with make start-local and PRODTRACK_PROVIDER=mock:
    • curl /health returned 200.
    • POST /playlists/{id}/publish-transcript with flag off returned 404.
    • With flag on and no metadata, returned 422 with the documented reason.
    • With flag on, no segments, returned 422.
    • With flag on and the mock provider, surfaced 501 with the "live ShotGrid connection required" message.
  • Verified naive-timestamp handling under TZ=America/New_York (the failing-first test only passes after the UTC-fallback fix).

Frontend:

  • @dna/core: tsc --noEmit clean, 58 vitest tests pass.
  • @dna/app: 8 new vitest cases for the hook and dialog pass; Radix no longer warns about a missing Dialog.Description.

Formatting:

  • make format-python clean.
  • prettier --check clean on touched frontend files.

Rollout

Flag off by default. Studios opt in by:

  1. Enabling a CustomEntityNN slot on their ShotGrid site (see DEPLOYMENT.md for fields and permissions).
  2. Setting DNA_ENABLE_TRANSCRIPT_PUBLISH=true and SHOTGRID_TRANSCRIPT_ENTITY=CustomEntityNN on the backend.
  3. Setting VITE_ENABLE_TRANSCRIPT_PUBLISH=true on the frontend build.

Rollback is dropping the flag; no schema migration required.

Known V1 limitations (documented in ADRs)

  • No unique index on published_transcripts in this PR; two concurrent publishes against the same (playlist, version, meeting) can create two SG rows before either writes the bookkeeping row. Low-probability; acceptable for V1.
  • No automated summary generation; sg_summary is left blank for users to fill on the SG side.
  • created_by on the SG row is the script user, not the DNA user who clicked publish (no sudo in V1).

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 18, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@jspada200
Copy link
Copy Markdown
Collaborator

Thanks for your contrib @thc1006 ! I will give this a go with our SG instance as soon as I can. If you would like access, please hit me up on Slack and I can get you setup in there.

thc1006 added a commit to thc1006/dna that referenced this pull request Apr 19, 2026
isort 8.0.1 (the version CI runs) groups `from main import ...` with
third-party imports because `main` is not in `known_first_party`. This
matches tests/test_publish_endpoint.py, which already does it this way.

No behaviour change; unblocks format-check on PR AcademySoftwareFoundation#138.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
@thc1006
Copy link
Copy Markdown
Author

thc1006 commented Apr 19, 2026

Hi @jspada200, thanks again for taking a look at this PR and for offering
to try it against your SG instance. I really appreciate it.

I just pushed d241ddd to fix the format-check failure that appeared on
the previous commit. The issue was a single import-grouping mistake in
tests/test_publish_transcript_endpoint.py: from main import ... needed
to sit in the third-party block (matching the existing layout in
tests/test_publish_endpoint.py) because main is not listed in
known_first_party. Before pushing I re-ran the CI-pinned versions
locally (black==26.3.1, isort==8.0.1) together with the full pytest
suite (552 passed, coverage 91%), so the format-check should come back
green this time.

The workflow runs on the new commit are currently in action_required.
Whenever it is convenient for you, could I trouble you to re-approve
them? No rush at all. I know the SG validation is the bigger piece and
that is on your own schedule. Happy to adjust anything you find useful.

Thanks again for your time and for the warm welcome.

thc1006 added 19 commits April 24, 2026 00:46
Covers empty input, single line, duplicate collapse, sort stability,
consecutive same-speaker merge, body_hash stability, missing speaker,
and whitespace-only text skipping.

Implementation lands in the next commit.

Contributes to AcademySoftwareFoundation#120 (slice 1/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Pure function that turns a list of StoredSegment rows into a payload
ready to be pushed to the production tracking system. Handles the
duplicate segments described in AcademySoftwareFoundation#100 defensively so the bug does not
leak into published rows.

Contributes to AcademySoftwareFoundation#120 (slice 1/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds PublishedTranscript / PublishedTranscriptUpdate models and six red
tests: two on the abstract base plus four on the Mongo impl covering the
collection property, find-by-composite-key, missing-row path, and upsert
query shape.

Contributes to AcademySoftwareFoundation#120 (slice 2/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds two methods on the storage contract and the MongoDB implementation:
  - get_published_transcript(playlist_id, version_id, meeting_id)
  - upsert_published_transcript(data)

Keyed by (playlist, version, meeting). The row stores the SG entity ID
and a body_hash so re-publish can skip when nothing changed.

Contributes to AcademySoftwareFoundation#120 (slice 2/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
…ript

Nine red tests across the base class, the ShotGrid provider (default
entity type, env override, create payload shape, disconnect guard,
update-body-only, error swallowing) and the mock provider (must refuse
with a user-facing message).

Contributes to AcademySoftwareFoundation#120 (slice 3/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds two methods on ProdtrackProviderBase and the ShotGrid implementation
so the REST endpoint in the next slice has a provider surface to call.
The SG custom entity name is read from SHOTGRID_TRANSCRIPT_ENTITY at
call time (default CustomEntity01) so studios can point DNA at whichever
slot they enabled.

MockProdtrackProvider raises NotImplementedError with a user-facing
message so the mock stack still boots but callers get a clear error
instead of silent success.

Contributes to AcademySoftwareFoundation#120 (slice 3/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Eight tests covering the flag gate, happy create, skip-when-body-hash-
unchanged, update path, missing-metadata, no-segments, mock 501, and the
version-without-project guard. Request/response models land with the
tests so the endpoint in the next commit has something to return.

Contributes to AcademySoftwareFoundation#120 (slice 4/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Wires the builder, the storage bookkeeping and the prodtrack provider.
Behaviour:
  - DNA_ENABLE_TRANSCRIPT_PUBLISH must be "true" or endpoint returns 404
  - no metadata / no segments -> 422 with clear detail
  - body_hash unchanged       -> skipped, no provider call
  - existing row + changes    -> update, sg_entity_id reused
  - no existing row           -> create
  - mock provider raises      -> 501 surfaces the user-facing message
  - version missing project   -> 404

Contributes to AcademySoftwareFoundation#120 (slice 4/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
One red case asserting the method posts to
/playlists/{id}/publish-transcript with the typed body and returns the
response envelope.

Contributes to AcademySoftwareFoundation#120 (slice 5/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Additive only. Existing exports untouched.

Also tightens one adjacent mock cast in apiHandler.test.ts so the core
typecheck passes (Boy Scout; the broken cast was in the same fixture
block we just extended).

Contributes to AcademySoftwareFoundation#120 (slice 5/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Two red tests: success path (posts + resolves) and error path (mutateAsync
rejects with the axios error). Hook lands next.

Contributes to AcademySoftwareFoundation#120 (slice 6/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Five red cases: dialog hidden when closed, summary shown when open,
publish disabled when segments=0, happy created path, skipped callout
path, and server-error callout. Implementation and the hook for the
dialog also land here (the hook is small enough that separating the
commit adds noise).

Contributes to AcademySoftwareFoundation#120 (slice 6/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds the hook, the Radix-Themes dialog, and a soft button at the top of
the transcript panel. Visibility is gated by
VITE_ENABLE_TRANSCRIPT_PUBLISH so the button only appears on builds that
opted in.

Contributes to AcademySoftwareFoundation#120 (slice 6/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
- QUICKSTART: two new env-var rows for the feature flag and the SG
  custom-entity slot.
- DEPLOYMENT: new "Transcript Publishing Setup" section with the SG
  site-side checklist (enable custom entity, fields, perms).
- TRANSCRIPTION_PIPELINE: new section describing the publish data flow,
  the three Mongo collections touched, the SG field mapping, and three
  ADRs (custom entity choice, Mongo bookkeeping, publish-time builder).
- example.docker-compose.local.yml: the two new vars default off.

Contributes to AcademySoftwareFoundation#120 (slice 7/7)

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Self-review after integration surfaced five defects that all failed
against tightened tests:

1. main.py:1033 used version.project.id, but Version.project is dict[str,
   Any] in the DNA model. Now reads project.get("id"). The prior test
   mocked version.project as an object which hid this bug; the mock is
   now a dict matching reality.
2. main.py called prodtrack.get_entity() without catching ValueError.
   Both ShotgridProvider and MockProdtrackProvider raise ValueError on
   missing entities, so a stale version_id produced 500 instead of the
   intended 404. Now wrapped in try/except.
3. main.py ignored the boolean returned by update_transcript. A silent
   SG-side failure advanced body_hash in Mongo, which made the next
   publish return "skipped" while SG stayed stale. Now surfaces 502 and
   skips the bookkeeping upsert.
4. transcription_publish._first_segment_date mishandled naive ISO
   timestamps: .astimezone(UTC) on a naive datetime treats it as local
   time, so non-UTC hosts could shift meeting_date by a day. Now attaches
   UTC when tzinfo is missing. Verified red under TZ=America/New_York.
5. mongodb.upsert_published_transcript placed the composite key in $set.
   Functionally harmless but inconsistent with upsert_draft_note. Moved
   playlist_id/version_id/meeting_id to $setOnInsert.

Three new tests (missing-version, update-failure, naive-tz) guard the
fixed behaviour so these do not regress.

547 passed, coverage 91%.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Two more defects caught on a second self-review round:

1. platform=metadata.platform or "" sent an empty string to SG's
   sg_platform list field when metadata was created without a platform.
   SG schema rejects it with an opaque Fault. The endpoint now checks
   metadata.platform up-front and returns 422 with a clear reason, and
   no longer coerces None to "" on the way out.
2. SG create succeeded but upsert_published_transcript could raise
   (mongo hiccup, unique-index race, etc). Previously the 500 had no
   context, so a client retry would create a second SG row. The upsert
   is now wrapped: the error message includes the sg_entity_id that is
   already on the tracking system and is logged at exception level for
   operators. The error wording also tells clients not to blind-retry.

Two red-first tests guard the new behaviour:
- test_metadata_without_platform_is_422
- test_bookkeeping_failure_after_sg_create_is_surfaced

549 passed, coverage 91%.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Three more defects caught by running the distributed-invariants lens
over the flow:

1. update_transcript used the *current* SHOTGRID_TRANSCRIPT_ENTITY env
   to route the SG call. If a studio migrates the custom-entity slot
   (enabling CustomEntity05 after starting on CustomEntity01), the
   bookkeeping still points at the original entity_id, and SG rejects
   the update because 9001 is not a row on CustomEntity05. The
   provider now takes entity_type as a required kwarg and the endpoint
   passes existing.sg_entity_type from the bookkeeping row.
2. The 422 "no segments" check runs against the raw list. When every
   segment contains only whitespace, build_transcript_payload filters
   them all out and we would happily push an empty body to SG. Added a
   second 422 after build with a clear "nothing to publish" detail.
3. test_empty_list_returns_empty_body asserted equality against
   datetime.now().date() twice (once in prod code, once in the test).
   Theoretically flaky on UTC midnight rollover. Loosened to an
   isinstance(date) check.

552 passed, coverage 91%.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
1. PublishTranscriptDialog was missing a Dialog.Description, so Radix
   emitted a dev warning on every render and screen-reader users got no
   context for the content. The copy that was already inside the dialog
   body moved to Dialog.Description.
2. TRANSCRIPTION_PIPELINE.md still described the pre-round-3 signature
   of update_transcript. The data-flow block now shows entity_type
   sourced from env on create and from the bookkeeping row on update;
   ADR-006 was expanded to make the "pin to bookkeeping, not env" rule
   explicit so a later refactor does not silently reintroduce the env-
   drift bug.

552 passed, coverage 91%.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
isort 8.0.1 (the version CI runs) groups `from main import ...` with
third-party imports because `main` is not in `known_first_party`. This
matches tests/test_publish_endpoint.py, which already does it this way.

No behaviour change; unblocks format-check on PR AcademySoftwareFoundation#138.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
@thc1006 thc1006 force-pushed the issue-120-transcript-publishing branch from d241ddd to 6faac71 Compare April 23, 2026 17:13
@jspada200
Copy link
Copy Markdown
Collaborator

Hey @thc1006 sorry I have not had a chance to test and review this. Been really busy at work. Will get to it hopfully next week. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add transcript publishing workflow

2 participants