You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two problems, one root cause. Hooks are embedded inline and immutably inside a convention (Convention.hooks: list[HookDefinition], each carrying OCI image + digest + feature-table column spec — osa/domain/shared/model/hook.py), and conventions are immutable.
No way to ship a hook bug-fix except creating a whole new convention. No history, no rollback, no machine-to-machine deploy path. This blocks automated deploy tooling that builds hook images out of band and must register the built image with a running instance.
No usable provenance. Records carry a convention reference, but nothing records which exact hook image computed which feature row. ingest_run stores no per-hook digest.
The provenance requirement (why the model is shaped this way)
A downstream consumer that builds artifacts from exported features must later demonstrate exactly what produced them:
What exactly was used? → the precise set of feature rows.
What computed them, and was it correct? → if a hook had a bug in v2 fixed in v3, which rows came from v2?
Recall on a bad input → if a source is later withdrawn, which rows (hence which downstream artifacts) touched it? A recall query, run on provenance.
Reproducibility → same records + same hook code + same config → same features.
These force the spine: per-feature-row provenance, stored on the row, anchored to an immutable record of what ran (so it can't drift and survives reconciliation), with build source and config captured.
Data model
hooks(name PK, feature_spec, live_release_id FK, created_at) — identity + contract + live pointer. Owns the feature table. The feature spec is fixed across releases (releases are image-only).
hook_releases(id PK, hook_name FK, version int, image, digest, config, source_ref, built_by, built_at) — immutable. version is monotonic per hook (integers, not SemVer — a hook has no compat contract beyond its fixed feature spec). source_ref = git SHA / build id from the build step (reproducibility anchor). UNIQUE(hook_name, version); optional UNIQUE(hook_name, digest) for idempotent re-push.
Conventions reference hooks by name. A live pointer on the hooks table selects the active release. Deploy = create release + advance pointer; rollback = repoint. The convention is not touched on a hook version change.
The live release is resolved once at ingest-run start and snapshotted for that run, so a mid-run deploy can't split one run across two versions.
Resolution cost is one indexed lookup per ingest run (single WHERE hook_name IN (...)), amortized over thousands of records — negligible. The live pointer is also the concept reconciliation (future) will reuse.
(Version-pinning in the convention was considered and rejected: the run/release ledger makes provenance independent of how the convention points, removing pinning's only advantage, while per-row provenance removes the live pointer's only cost. Live pointer wins on deploy ergonomics, rollback, stable convention versions, and reconciliation-fit.)
Identity
Convention gets an SDK-supplied human-readable slug as its id: ConventionId = "<slug>@<version>" (mirrors the existing SchemaId pattern). No server-generated opaque SRN as the primary handle.
Schema keeps its SDK-supplied slug; SchemaId = "<slug>@<version>".
Records reference conventions by internal ConventionId, not SRN. SRN is reserved for federation-edge surfaces. (record/model/aggregate.pyconvention_srn → convention_id.)
API surface
Principle: a deploy is a singlePOST /conventions whose body is a composition of the same sub-structures the standalone endpoints accept (schema, each hook's release). One call for instrumentation; the server fans it out into the schema registry, hook registry, and convention in one transaction. Standalone endpoints exist only for incremental updates, rollback, and reads.
POST /api/v1/conventions # bundled deploy (schema + hooks + convention) [conventions:write | ADMIN]
GET /api/v1/conventions/{id} # detail
POST /api/v1/schemas # create/version a schema (== deploy "schema" block) [ADMIN]
POST /api/v1/hooks/{name}/releases # create release vN+1; advances live pointer [hooks:write | ADMIN]
PUT /api/v1/hooks/{name}/live # repoint live (rollback / pin) [hooks:write | ADMIN]
GET /api/v1/hooks # catalog: hooks + live release
GET /api/v1/hooks/{name}/releases # release history
GET /api/v1/hooks/{name}/releases/{v} # inspect a release
Deploy body
The schema block and each hook's release block are byte-identical to the standalone POST /schemas / POST /hooks/{name}/releases bodies.
Server fan-out: upsert schema (id@version); for each hook upsert identity (name + feature, creating the feature table on first sight) + create the release (advances live pointer); create the convention referencing hooks by name.
Idempotent on (hook_name, digest): re-sending an unchanged release is a no-op; a changed digest creates a new version + advances live.
Rejected: a different feature for an existing hook (the column contract is fixed).
Re-deploy: same convention id + new version = a new version of the same convention; same id+version = conflict unless byte-identical.
No feature — columns are fixed at the hook's first release. Creates release vN+1, advances the live pointer; future ingest runs pick it up.
Auth: additional M2M token issuer (ships in this issue)
Deploys are driven by external automation, not interactive sessions. Today validation supports only a single symmetric secret (HS256, config.auth.jwt.secret).
New config: an optional additional trusted issuer (asymmetric — RS256/EdDSA — via static public key for v1; JWKS only if out-of-band rotation is actually needed).
validate_access_token routes on the iss claim: extra-issuer tokens verify against its key; existing user tokens are unaffected.
Extra-issuer tokens resolve to a scope-limited principal: Principal gains scopes (today only roles). The bundled deploy (POST /conventions) requires conventions:write; release/live endpoints require hooks:write; both also accept ADMIN. Keeps long-lived broad credentials out of automation.
Existing single-secret behavior unchanged when no additional issuer is configured.
Deploy flow this enables
External deploy tooling holds the convention metadata + hook source, builds images out of band, then makes a singlePOST /api/v1/conventions carrying the inline schema + each hook's identity and release block (real digests + source commit). The server fans it out. A later hook bugfix is one small POST /hooks/{name}/releases: convention untouched, live pointer moves, future runs pick it up, provenance records exactly which version ran. Instrumenting an instance is one call; maintaining a hook is one call.
Acceptance criteria
Each feature row carries a run_id; row → run → release yields version, digest, config, and build source for that row.
A single POST /conventions deploy creates the schema, hooks (+ first releases + feature tables), and the convention in one transaction; sub-structures match the standalone endpoints.
POST /hooks/{name}/releases creates an immutable, integer-versioned release and advances the live pointer; the convention is untouched.
Rollback repoints the live pointer to a prior release; release history is listable per hook.
Within an ingest run, all rows for a hook share one resolved release (resolve-at-run-start snapshot; no mid-run split).
Conventions reference hooks by name; ingest resolves the live release; concurrent deploys advance the pointer atomically (row lock).
Convention and schema use SDK-supplied human-readable slugs; records reference conventions by internal ConventionId, not SRN.
Deploys are idempotent on (hook_name, digest); re-sending an unchanged release is a no-op; a different feature for an existing hook is rejected.
Optional second issuer configurable; iss-routed; scope-limited principal enforced (conventions:write for deploy, hooks:write for release/live); existing single-secret auth unchanged when not configured.
Migration backfills existing inline hooks as release 1, reuses existing feature tables, and hard-fails with a report on any hook-name collision across conventions with divergent specs.
Scope notes
The issuer support and the registry ship together (this issue): the registry is unusable for automated deploys without a scoped M2M credential.
Build source commit must be passed in the release payload by the build step — it is the reproducibility anchor and only the build step has it.
Releases are image-only; a hook's feature spec (columns) is fixed at first release. Changing columns would need a feature-table migration — out of scope.
Deferred (designed-for, not built here)
Reconciliation — running a newer hook version against existing records to backfill/update features. The run/release/live-pointer model fits it directly (a reconciliation pass is just another run).
Convention hook-list mutability — adding a new hook to an existing convention post-creation; depends on reconciliation.
Reproducible export/snapshot identity — a stable handle for "the exact set of rows that were exported" (changefeed cursor / snapshot); completes the downstream audit story.
Tamper-evidence / signing — signing provenance manifests via the Node Document keys so the chain is verifiable, not just present.
Migration
Backfill each convention's inline hooks → hooks row (with feature spec) + hook_releases v1 (embedded image/digest) + set live; rewrite convention to name refs; switch records convention_srn → convention_id.
Feature tables already exist (old ConventionRegistered path) — point new rows at them; do not re-fire feature-table DDL for backfilled v1s.
Audit for hook-name collisions across conventions before migrating; hard-fail with a report if found.
Problem
Two problems, one root cause. Hooks are embedded inline and immutably inside a convention (
Convention.hooks: list[HookDefinition], each carrying OCI image + digest + feature-table column spec —osa/domain/shared/model/hook.py), and conventions are immutable.ingest_runstores no per-hook digest.The provenance requirement (why the model is shaped this way)
A downstream consumer that builds artifacts from exported features must later demonstrate exactly what produced them:
These force the spine: per-feature-row provenance, stored on the row, anchored to an immutable record of what ran (so it can't drift and survives reconciliation), with build source and config captured.
Data model
hooks(name PK, feature_spec, live_release_id FK, created_at)— identity + contract + live pointer. Owns the feature table. The feature spec is fixed across releases (releases are image-only).hook_releases(id PK, hook_name FK, version int, image, digest, config, source_ref, built_by, built_at)— immutable.versionis monotonic per hook (integers, not SemVer — a hook has no compat contract beyond its fixed feature spec).source_ref= git SHA / build id from the build step (reproducibility anchor).UNIQUE(hook_name, version); optionalUNIQUE(hook_name, digest)for idempotent re-push.hook_runs(id PK, release_id FK, ingest_run_id|deposition_id, batch_index, status, started_at, finished_at, duration_s, oom_retries, log_ref)— append-only. Execution record; logs/timing/status attach here. (Eng review: extend/replace the transientHookResult+ existingvalidation_runs, don't duplicate.)features.*): addrun_id FK→hook_runs. Per-row provenance:feature row → run → release → (version, digest, config, source_ref).Versioning & resolution
hookstable selects the active release. Deploy = create release + advance pointer; rollback = repoint. The convention is not touched on a hook version change.WHERE hook_name IN (...)), amortized over thousands of records — negligible. The live pointer is also the concept reconciliation (future) will reuse.(Version-pinning in the convention was considered and rejected: the run/release ledger makes provenance independent of how the convention points, removing pinning's only advantage, while per-row provenance removes the live pointer's only cost. Live pointer wins on deploy ergonomics, rollback, stable convention versions, and reconciliation-fit.)
Identity
ConventionId = "<slug>@<version>"(mirrors the existingSchemaIdpattern). No server-generated opaque SRN as the primary handle.SchemaId = "<slug>@<version>".ConventionId, not SRN. SRN is reserved for federation-edge surfaces. (record/model/aggregate.pyconvention_srn→convention_id.)API surface
Principle: a deploy is a single
POST /conventionswhose body is a composition of the same sub-structures the standalone endpoints accept (schema, each hook'srelease). One call for instrumentation; the server fans it out into the schema registry, hook registry, and convention in one transaction. Standalone endpoints exist only for incremental updates, rollback, and reads.Deploy body
The
schemablock and each hook'sreleaseblock are byte-identical to the standalonePOST /schemas/POST /hooks/{name}/releasesbodies.{ "id": "proteins", // convention slug (caller-supplied) → ConventionId = "proteins@1.0.0" "version": "1.0.0", "title": "Protein structures", "file_requirements": { /* ... */ }, "schema": { // fully inline, nested; == POST /schemas body "id": "protein_fields", // schema slug (caller-supplied) "version": "1.0.0", "fields": [ /* field definitions */ ] }, "hooks": [ { "name": "pocket_detect", // hook identity (<=40 chars, [a-z][a-z0-9_]*) "feature": { /* TableFeatureSpec: columns, cardinality — set once, fixed forever */ }, "release": { // == POST /hooks/{name}/releases body "image": "registry/.../pocket_detect:abc", "digest": "sha256:...", "config": {}, "limits": {}, "source_ref": "git-sha-or-build-id" // REQUIRED — reproducibility anchor } } ], "ingester": null }Server fan-out: upsert schema (
id@version); for each hook upsert identity (name+feature, creating the feature table on first sight) + create therelease(advances live pointer); create the convention referencing hooks by name.(hook_name, digest): re-sending an unchanged release is a no-op; a changed digest creates a new version + advances live.featurefor an existing hook (the column contract is fixed).id+ newversion= a new version of the same convention; sameid+version= conflict unless byte-identical.Incremental hook release (convention untouched)
No
feature— columns are fixed at the hook's first release. Creates releasevN+1, advances the live pointer; future ingest runs pick it up.Auth: additional M2M token issuer (ships in this issue)
Deploys are driven by external automation, not interactive sessions. Today validation supports only a single symmetric secret (HS256,
config.auth.jwt.secret).validate_access_tokenroutes on theissclaim: extra-issuer tokens verify against its key; existing user tokens are unaffected.Principalgainsscopes(today onlyroles). The bundled deploy (POST /conventions) requiresconventions:write; release/live endpoints requirehooks:write; both also acceptADMIN. Keeps long-lived broad credentials out of automation.Deploy flow this enables
External deploy tooling holds the convention metadata + hook source, builds images out of band, then makes a single
POST /api/v1/conventionscarrying the inline schema + each hook's identity andreleaseblock (real digests + source commit). The server fans it out. A later hook bugfix is one smallPOST /hooks/{name}/releases: convention untouched, live pointer moves, future runs pick it up, provenance records exactly which version ran. Instrumenting an instance is one call; maintaining a hook is one call.Acceptance criteria
run_id;row → run → releaseyields version, digest, config, and build source for that row.POST /conventionsdeploy creates the schema, hooks (+ first releases + feature tables), and the convention in one transaction; sub-structures match the standalone endpoints.POST /hooks/{name}/releasescreates an immutable, integer-versioned release and advances the live pointer; the convention is untouched.ConventionId, not SRN.(hook_name, digest); re-sending an unchanged release is a no-op; a differentfeaturefor an existing hook is rejected.iss-routed; scope-limited principal enforced (conventions:writefor deploy,hooks:writefor release/live); existing single-secret auth unchanged when not configured.Scope notes
Deferred (designed-for, not built here)
Migration
hooksrow (with feature spec) +hook_releasesv1 (embedded image/digest) + set live; rewrite convention to name refs; switch recordsconvention_srn→convention_id.ConventionRegisteredpath) — point new rows at them; do not re-fire feature-table DDL for backfilled v1s.