feat: transcript publishing workflow (#120)#138
feat: transcript publishing workflow (#120)#138thc1006 wants to merge 19 commits intoAcademySoftwareFoundation:mainfrom
Conversation
|
Thanks for your contrib @thc1006 ! I will give this a go with our SG instance as soon as I can. If you would like access, please hit me up on Slack and I can get you setup in there. |
isort 8.0.1 (the version CI runs) groups `from main import ...` with third-party imports because `main` is not in `known_first_party`. This matches tests/test_publish_endpoint.py, which already does it this way. No behaviour change; unblocks format-check on PR AcademySoftwareFoundation#138. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
|
Hi @jspada200, thanks again for taking a look at this PR and for offering I just pushed d241ddd to fix the format-check failure that appeared on The workflow runs on the new commit are currently in Thanks again for your time and for the warm welcome. |
Covers empty input, single line, duplicate collapse, sort stability, consecutive same-speaker merge, body_hash stability, missing speaker, and whitespace-only text skipping. Implementation lands in the next commit. Contributes to AcademySoftwareFoundation#120 (slice 1/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Pure function that turns a list of StoredSegment rows into a payload ready to be pushed to the production tracking system. Handles the duplicate segments described in AcademySoftwareFoundation#100 defensively so the bug does not leak into published rows. Contributes to AcademySoftwareFoundation#120 (slice 1/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds PublishedTranscript / PublishedTranscriptUpdate models and six red tests: two on the abstract base plus four on the Mongo impl covering the collection property, find-by-composite-key, missing-row path, and upsert query shape. Contributes to AcademySoftwareFoundation#120 (slice 2/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds two methods on the storage contract and the MongoDB implementation: - get_published_transcript(playlist_id, version_id, meeting_id) - upsert_published_transcript(data) Keyed by (playlist, version, meeting). The row stores the SG entity ID and a body_hash so re-publish can skip when nothing changed. Contributes to AcademySoftwareFoundation#120 (slice 2/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
…ript Nine red tests across the base class, the ShotGrid provider (default entity type, env override, create payload shape, disconnect guard, update-body-only, error swallowing) and the mock provider (must refuse with a user-facing message). Contributes to AcademySoftwareFoundation#120 (slice 3/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds two methods on ProdtrackProviderBase and the ShotGrid implementation so the REST endpoint in the next slice has a provider surface to call. The SG custom entity name is read from SHOTGRID_TRANSCRIPT_ENTITY at call time (default CustomEntity01) so studios can point DNA at whichever slot they enabled. MockProdtrackProvider raises NotImplementedError with a user-facing message so the mock stack still boots but callers get a clear error instead of silent success. Contributes to AcademySoftwareFoundation#120 (slice 3/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Eight tests covering the flag gate, happy create, skip-when-body-hash- unchanged, update path, missing-metadata, no-segments, mock 501, and the version-without-project guard. Request/response models land with the tests so the endpoint in the next commit has something to return. Contributes to AcademySoftwareFoundation#120 (slice 4/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Wires the builder, the storage bookkeeping and the prodtrack provider. Behaviour: - DNA_ENABLE_TRANSCRIPT_PUBLISH must be "true" or endpoint returns 404 - no metadata / no segments -> 422 with clear detail - body_hash unchanged -> skipped, no provider call - existing row + changes -> update, sg_entity_id reused - no existing row -> create - mock provider raises -> 501 surfaces the user-facing message - version missing project -> 404 Contributes to AcademySoftwareFoundation#120 (slice 4/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
One red case asserting the method posts to
/playlists/{id}/publish-transcript with the typed body and returns the
response envelope.
Contributes to AcademySoftwareFoundation#120 (slice 5/7)
Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Additive only. Existing exports untouched. Also tightens one adjacent mock cast in apiHandler.test.ts so the core typecheck passes (Boy Scout; the broken cast was in the same fixture block we just extended). Contributes to AcademySoftwareFoundation#120 (slice 5/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Two red tests: success path (posts + resolves) and error path (mutateAsync rejects with the axios error). Hook lands next. Contributes to AcademySoftwareFoundation#120 (slice 6/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Five red cases: dialog hidden when closed, summary shown when open, publish disabled when segments=0, happy created path, skipped callout path, and server-error callout. Implementation and the hook for the dialog also land here (the hook is small enough that separating the commit adds noise). Contributes to AcademySoftwareFoundation#120 (slice 6/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Adds the hook, the Radix-Themes dialog, and a soft button at the top of the transcript panel. Visibility is gated by VITE_ENABLE_TRANSCRIPT_PUBLISH so the button only appears on builds that opted in. Contributes to AcademySoftwareFoundation#120 (slice 6/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
- QUICKSTART: two new env-var rows for the feature flag and the SG custom-entity slot. - DEPLOYMENT: new "Transcript Publishing Setup" section with the SG site-side checklist (enable custom entity, fields, perms). - TRANSCRIPTION_PIPELINE: new section describing the publish data flow, the three Mongo collections touched, the SG field mapping, and three ADRs (custom entity choice, Mongo bookkeeping, publish-time builder). - example.docker-compose.local.yml: the two new vars default off. Contributes to AcademySoftwareFoundation#120 (slice 7/7) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Self-review after integration surfaced five defects that all failed
against tightened tests:
1. main.py:1033 used version.project.id, but Version.project is dict[str,
Any] in the DNA model. Now reads project.get("id"). The prior test
mocked version.project as an object which hid this bug; the mock is
now a dict matching reality.
2. main.py called prodtrack.get_entity() without catching ValueError.
Both ShotgridProvider and MockProdtrackProvider raise ValueError on
missing entities, so a stale version_id produced 500 instead of the
intended 404. Now wrapped in try/except.
3. main.py ignored the boolean returned by update_transcript. A silent
SG-side failure advanced body_hash in Mongo, which made the next
publish return "skipped" while SG stayed stale. Now surfaces 502 and
skips the bookkeeping upsert.
4. transcription_publish._first_segment_date mishandled naive ISO
timestamps: .astimezone(UTC) on a naive datetime treats it as local
time, so non-UTC hosts could shift meeting_date by a day. Now attaches
UTC when tzinfo is missing. Verified red under TZ=America/New_York.
5. mongodb.upsert_published_transcript placed the composite key in $set.
Functionally harmless but inconsistent with upsert_draft_note. Moved
playlist_id/version_id/meeting_id to $setOnInsert.
Three new tests (missing-version, update-failure, naive-tz) guard the
fixed behaviour so these do not regress.
547 passed, coverage 91%.
Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Two more defects caught on a second self-review round: 1. platform=metadata.platform or "" sent an empty string to SG's sg_platform list field when metadata was created without a platform. SG schema rejects it with an opaque Fault. The endpoint now checks metadata.platform up-front and returns 422 with a clear reason, and no longer coerces None to "" on the way out. 2. SG create succeeded but upsert_published_transcript could raise (mongo hiccup, unique-index race, etc). Previously the 500 had no context, so a client retry would create a second SG row. The upsert is now wrapped: the error message includes the sg_entity_id that is already on the tracking system and is logged at exception level for operators. The error wording also tells clients not to blind-retry. Two red-first tests guard the new behaviour: - test_metadata_without_platform_is_422 - test_bookkeeping_failure_after_sg_create_is_surfaced 549 passed, coverage 91%. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Three more defects caught by running the distributed-invariants lens over the flow: 1. update_transcript used the *current* SHOTGRID_TRANSCRIPT_ENTITY env to route the SG call. If a studio migrates the custom-entity slot (enabling CustomEntity05 after starting on CustomEntity01), the bookkeeping still points at the original entity_id, and SG rejects the update because 9001 is not a row on CustomEntity05. The provider now takes entity_type as a required kwarg and the endpoint passes existing.sg_entity_type from the bookkeeping row. 2. The 422 "no segments" check runs against the raw list. When every segment contains only whitespace, build_transcript_payload filters them all out and we would happily push an empty body to SG. Added a second 422 after build with a clear "nothing to publish" detail. 3. test_empty_list_returns_empty_body asserted equality against datetime.now().date() twice (once in prod code, once in the test). Theoretically flaky on UTC midnight rollover. Loosened to an isinstance(date) check. 552 passed, coverage 91%. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
1. PublishTranscriptDialog was missing a Dialog.Description, so Radix emitted a dev warning on every render and screen-reader users got no context for the content. The copy that was already inside the dialog body moved to Dialog.Description. 2. TRANSCRIPTION_PIPELINE.md still described the pre-round-3 signature of update_transcript. The data-flow block now shows entity_type sourced from env on create and from the bookkeeping row on update; ADR-006 was expanded to make the "pin to bookkeeping, not env" rule explicit so a later refactor does not silently reintroduce the env- drift bug. 552 passed, coverage 91%. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
isort 8.0.1 (the version CI runs) groups `from main import ...` with third-party imports because `main` is not in `known_first_party`. This matches tests/test_publish_endpoint.py, which already does it this way. No behaviour change; unblocks format-check on PR AcademySoftwareFoundation#138. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
d241ddd to
6faac71
Compare
|
Hey @thc1006 sorry I have not had a chance to test and review this. Been really busy at work. Will get to it hopfully next week. Thanks! |
Summary
Adds a feature-flagged workflow that publishes a version's captured transcript into the production tracking system as a single custom-entity row per
(playlist, version, meeting). Ships dark: endpoint returns 404 and the UI button is hidden unlessDNA_ENABLE_TRANSCRIPT_PUBLISH=true(backend) andVITE_ENABLE_TRANSCRIPT_PUBLISH=true(frontend build) are set.Closes #120.
What changed
Backend:
build_transcript_payloadindna/transcription_publish.py(dedupe, sort, same-speaker collapse, stablebody_hash).published_transcriptsfor bookkeeping, keyed by(playlist_id, version_id, meeting_id); stores the SG entity id, the entity type at create time, and thebody_hashused for idempotence.ProdtrackProviderBase:publish_transcript(create) andupdate_transcript(patch body + meeting_date only). ShotGrid implementation targets the slot configured viaSHOTGRID_TRANSCRIPT_ENTITY(defaultCustomEntity01); the mock provider raisesNotImplementedErrorwith a user-facing message.POST /playlists/{playlist_id}/publish-transcriptthat wires the above together. Returnscreated/updated/skipped(whenbody_hashis unchanged). Feature-flagged off by default.Frontend:
PublishTranscriptRequest/PublishTranscriptResponseandApiHandler.publishTranscriptin@dna/core.usePublishTranscripthook andPublishTranscriptDialogRadix-Themes dialog in@dna/app.TranscriptPanel, visible only whenVITE_ENABLE_TRANSCRIPT_PUBLISH=trueand segments exist.Docs:
QUICKSTART.md— new env-var rows.DEPLOYMENT.md— ShotGrid site-setup checklist (enable custom entity, fields, script-user perms).backend/docs/TRANSCRIPTION_PIPELINE.md— new "Publishing to the Production Tracking System" section and ADR-005/006/007 (custom entity, body_hash idempotence pinned to bookkeeping, build-at-publish-time).backend/example.docker-compose.local.yml— the two new vars default off.Design decisions worth flagging
update_transcriptusesexisting.sg_entity_typefrom the bookkeeping row, not the current env. This lets studios migrate to a newCustomEntityNNslot without breaking updates to rows created on the previous one.build_transcript_payload, so whenStoredSegmentchanges shape the publisher is a one-file diff.Testing
(Deferred: the Publish button / dialog only render against a real ShotGrid site that has the custom entity provisioned. Screenshots to be added when a test site is available; see DEPLOYMENT.md for the site-setup checklist.)
How I Tested
Backend:
make testin the backend docker stack: 552 passed, coverage 91% (floor 90%).transcription_publish.py100%;published_transcript.py100%.make start-localandPRODTRACK_PROVIDER=mock:curl /healthreturned 200.POST /playlists/{id}/publish-transcriptwith flag off returned 404.TZ=America/New_York(the failing-first test only passes after the UTC-fallback fix).Frontend:
@dna/core:tsc --noEmitclean, 58 vitest tests pass.@dna/app: 8 new vitest cases for the hook and dialog pass; Radix no longer warns about a missingDialog.Description.Formatting:
make format-pythonclean.prettier --checkclean on touched frontend files.Rollout
Flag off by default. Studios opt in by:
CustomEntityNNslot on their ShotGrid site (seeDEPLOYMENT.mdfor fields and permissions).DNA_ENABLE_TRANSCRIPT_PUBLISH=trueandSHOTGRID_TRANSCRIPT_ENTITY=CustomEntityNNon the backend.VITE_ENABLE_TRANSCRIPT_PUBLISH=trueon the frontend build.Rollback is dropping the flag; no schema migration required.
Known V1 limitations (documented in ADRs)
published_transcriptsin this PR; two concurrent publishes against the same(playlist, version, meeting)can create two SG rows before either writes the bookkeeping row. Low-probability; acceptable for V1.sg_summaryis left blank for users to fill on the SG side.created_byon the SG row is the script user, not the DNA user who clicked publish (no sudo in V1).