Skip to content

feat: hook versioning, per-row provenance, and M2M deploy (#145)#146

Open
rorybyrne wants to merge 11 commits into
mainfrom
145-feat-hook-versioning
Open

feat: hook versioning, per-row provenance, and M2M deploy (#145)#146
rorybyrne wants to merge 11 commits into
mainfrom
145-feat-hook-versioning

Conversation

@rorybyrne

Copy link
Copy Markdown
Contributor

Implements GitHub issue #145 end-to-end: a versioned hook registry, exact per-row provenance, and scoped machine (M2M) deploy credentials. Built on the existing DDD/CQRS layering (API → Command/Query → Service → Repo), following the feature's architecture.md and design-revisions.md.

What's in it

Registry spine (foundational)

  • Hook identity (name + fixed output contract) · immutable, integer-versioned HookRelease · live pointer · append-only HookRun.
  • Postgres HookRegistry adapter: row-locked, gap-free monotonic versions; live-pointer advance/repoint; digest idempotency; live-release snapshot.
  • Single born-correct initial_schema migration (zero-drift alembic check); records.convention_id typed ConventionSlug; conventions are unversioned, slug-keyed, mutable.

US1 — single-call deploy (POST /conventions): one atomic call creates/versions the schema, upserts hooks + first releases (+ feature tables), and upserts the convention by slug (declarative upsert, no 409).

US2 — provenance: every feature row carries run_id → hook_runs → hook_releases, yielding the exact version/digest/config/source_ref that produced it. run_id is carried via a run.json in each hook's output dir (no DB-lookup reconstruction). The live release is resolved once at run start and snapshotted, so in-flight runs are unaffected by a mid-run pointer move.

US3 — incremental release (POST /hooks/{name}/releases): mints vN+1 (201) or returns the existing release on a duplicate digest (200, idempotent), advancing live without touching the convention. Catalog/history/detail reads: GET /hooks, GET /hooks/{name}/releases[/{version}].

US4 — rollback (PUT /hooks/{name}/live): repoints live to a prior release; history and already-produced rows' provenance are preserved.

US5 — scoped M2M credentials: optional second JWT issuer (auth.extra_issuer) with iss-routed verification (RS256/EdDSA). Principal gains scopes + has_scope; new RequiresScope gate (scope OR ADMIN) wired into the command/query auth metaclasses, startup validation, and identity resolution. Deploy → conventions:write; release/live → hooks:write (ADMIN still accepted). The primary HS256 path is byte-identical when no second issuer is configured.

Testing

TDD throughout. 1155 unit + 7 contract + 120 integration tests pass; ruff and ty check osa clean. New this PR: handler/gate/token unit tests, DB-free hook contract tests, and centralized error-mapping unit tests. The integration suite exercises the registry adapter, migration, and provenance chain against real Postgres.

Notes

  • Pre-launch: forward-only schema, no data backfill.
  • Out of scope (deferred, per spec): reconciliation, reproducible export/snapshot identity, tamper-evidence/signing.

rorybyrne and others added 8 commits June 13, 2026 18:24
WIP checkpoint. Foundational + US1 (bundled deploy) + US2 (per-row provenance)
production layer for hook versioning. Production type-checks clean; new unit
tests pass. Existing test fixtures not yet migrated (suite red); Alembic
migration and US3/US4/US5 endpoints pending.

- Split HookDefinition -> HookIdentity (name+feature); runtime+source_ref move
  to the immutable, integer-versioned HookRelease entity.
- New validation-domain registry: Hook/HookRelease/HookRun; HookRegistry port +
  Postgres adapter (row-locked gap-free versions, idempotent-on-digest releases,
  live-pointer advance/rollback, resolve_live snapshot, append-only hook_runs).
- Tables hooks/hook_releases/hook_runs; features.* gain run_id; conventions PK
  srn->id; records convention_srn->convention_id.
- ConventionSRN -> ConventionId across deposition/record/ingest/curation/
  validation; conventions reference hooks by name; transactional bundled
  POST /conventions deploy fan-out.
- Provenance: resolve-at-run-start snapshot; one hook_run per hook per batch;
  run_id threaded through ingest and deposition feature-insert paths; runners
  take (identity, release).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Drop hook_run_ids pass-through from ValidationCompleted, DepositionApproved,
RecordDraft, RecordPublished, HookBatchCompleted, IngestBatchPublished — it was
tramp data riding through curation/record handlers that don't use it.

Instead reconstruct {hook_name: run_id} at feature-insert time from the keys the
consumer already holds, via a feature-domain HookRunReader port (Postgres
adapter: one indexed hook_runs ⋈ hook_releases join per batch / per deposition,
not per row). hook_runs are still written at execution time; provenance is
unchanged. Keeps provenance ownership in the validation tables and avoids a new
feature→validation domain-code edge.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The HookRunReader port I'd put in the feature domain was wrong: it inverted
ownership (feature dictating a validation-aggregate query) and laundered a real
feature->validation dependency through an infra adapter that read validation's
tables.

hook_runs/hook_releases are validation-domain data, so the read capability
belongs there: run_ids_for_batch / run_ids_for_deposition now live on the
HookRegistry port + Postgres adapter (SQL) + HookRegistryService. The feature
insert handlers call the validation service directly — an honest, declared
service-to-service cross-domain read. Deleted the feature-domain reader port,
its adapter, and the DI binding.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… types (#145)

Reconcile the committed #145 spine to the post-spec design-revisions doc, and
tighten loose string types into nominal value objects.

Identity & conventions
- ConventionId (slug@version) -> ConventionSlug: a frozen RootModel[str], bare
  unversioned slug. Conventions are now mutable; deploy is a declarative upsert
  by slug (ON CONFLICT DO UPDATE) with no caller version and no 409 path.
- DeployConvention command DTOs renamed to DeployConvention{Schema,Hook,Release};
  the shared HookDeploySpec is removed (release input is the command DTO; the
  internal deploy unit is deposition-domain HookDeploy).

Typed names (no bare str across boundaries)
- HookName promoted to a frozen RootModel[str]; .root only at PG/k8s string
  boundaries. HookResult.hook_name is now HookName.
- New FeatureName value object for the read/feature surface: expected_features
  is list[FeatureName]; /data/{schema}/{feature} and the feature store/service
  speak features, not hooks. HookName -> FeatureName is converted once at the
  hook->feature boundary (validate_deposition, publish_batch).

Provenance (design-revisions §6)
- HookRun slimmed to a pure execution record (drop ingest/deposition/batch
  context + XOR check; finished_at/duration_s/oom_retries required). Add
  HookRunStatus.from_hook_status; delete the duplicated _run_status helpers.
- run_id is carried via a run.json written into each hook output dir and read by
  the feature-insert handlers (read_run_ref/write_run_ref on the storage ports;
  RunRef value object). Delete run_ids_for_batch/run_ids_for_deposition and drop
  the HookRegistryService dependency from the insert handlers.

hook_runs table slimmed accordingly; records keep a slug convention ref. All
1126 unit tests pass; ruff + ty(osa) clean. Alembic migration and
integration-test reconciliation follow.
…ion slug (#145)

Create hooks / hook_releases / hook_runs (slim, no execution-context columns);
swap conventions PK from the opaque srn to the slug `id`; rename
convention_srn -> convention_id on depositions / ingest_runs / records with
their indexes. Pre-launch: forward-only, no data backfill. Dynamic feature
tables gain `run_id` via the build_feature_table helper, not a static migration.

Excludes a pre-existing, unrelated depositions.owner_id NOT NULL drift.
…ation tests (#145)

Two production bugs surfaced by the integration suite once feature tables gained
a NOT NULL run_id FK and conventions stored hook names as bare strings:

- schema_feature_reader._hook_names: conventions.hooks is now a JSON list of
  plain hook-name strings, not {"name": ...} dicts — read the element directly.
- DataCatalogService.resolve_table: compare the manifest's str resource name
  against feature_name.root (a FeatureName), not the FeatureName object, so
  /data/{schema}/{feature} resolves instead of 404ing.

Integration tests reconciled to the new API: ConventionSlug, mutable upsert-by-
slug convention repo, bundled deploy(), and a seed_hook_run conftest helper that
builds the hooks->releases->runs provenance chain so feature rows satisfy the
run_id FK. 1126 unit + 120 integration tests pass; ruff + ty(osa) clean.
…#145)

Pre-launch with no deployed DB, so collapse all 16 incremental migrations into
one `initial_schema` migration generated from tables.py. The schema is now
born in its final shape — no convention_srn->convention_id rename churn, no PK
swaps, and the stale depositions.owner_id NOT NULL drift is gone.

Also fixes a latent index/runtime mismatch the squash surfaced: the
uq_records_source unique index used .as_string() (a redundant CAST) which the
bulk-publish ON CONFLICT ((source->>'type'),(source->>'id')) could not match.
Switched to .astext so the index expression matches the runtime exactly.

Verified: `alembic upgrade head` applies clean, `alembic check` reports zero
drift (migration == tables.py), and all 120 integration + 1126 unit tests pass.
Completes US3–US5 of the hook-versioning feature on top of the existing
registry spine (deploy + provenance).

US3 — incremental release: CreateRelease mints vN+1 (201) or returns the
existing release on a duplicate digest (200, idempotent), advancing the live
pointer without touching the referencing convention; ListHooks/ListReleases/
GetRelease serve the catalog, history, and detail reads. New hooks router
(POST releases, GET catalog/history/detail) wired in ValidationProvider.

US4 — rollback: SetLive + PUT /hooks/{name}/live repoints the live pointer to
a prior release; history and already-produced rows' provenance are unchanged.

US5 — scoped machine credentials: optional second JWT issuer (extra_issuer
config) with iss-routed verification (RS256/EdDSA); Principal gains scopes +
has_scope; new RequiresScope gate (scope OR ADMIN) wired into the command/query
auth metaclasses, startup validation, and identity resolution. Deploy now
requires conventions:write; release/live require hooks:write (ADMIN still
accepted). The primary HS256 path is byte-identical when no second issuer is
configured.

Tests: handler/gate/token unit tests + DB-free hook contract tests + error-map
unit tests. 1155 unit + 7 contract + 120 integration pass; ruff + ty(osa) clean.
@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown

Code Coverage

Package Line Rate Complexity Health
. 82% 0
application 100% 0
application.api 100% 0
application.api.rest 87% 0
application.api.v1 88% 0
application.api.v1.routes 64% 0
application.api.v1.routes.data 94% 0
application.api.v1.routes.data.serializers 99% 0
application.event 100% 0
domain 100% 0
domain.auth 100% 0
domain.auth.command 90% 0
domain.auth.event 100% 0
domain.auth.model 93% 0
domain.auth.port 99% 0
domain.auth.query 93% 0
domain.auth.service 91% 0
domain.auth.util 100% 0
domain.auth.util.di 79% 0
domain.curation 100% 0
domain.curation.adapter 100% 0
domain.curation.command 100% 0
domain.curation.event 100% 0
domain.curation.handler 92% 0
domain.curation.model 100% 0
domain.curation.port 100% 0
domain.curation.query 100% 0
domain.curation.service 100% 0
domain.data 100% 0
domain.data.model 92% 0
domain.data.port 100% 0
domain.data.query 95% 0
domain.data.service 78% 0
domain.data.util 100% 0
domain.data.util.di 91% 0
domain.deposition 100% 0
domain.deposition.adapter 100% 0
domain.deposition.command 87% 0
domain.deposition.event 100% 0
domain.deposition.handler 100% 0
domain.deposition.model 98% 0
domain.deposition.port 100% 0
domain.deposition.query 86% 0
domain.deposition.service 97% 0
domain.deposition.util.di 94% 0
domain.feature 100% 0
domain.feature.event 0% 0
domain.feature.handler 63% 0
domain.feature.model 0% 0
domain.feature.port 100% 0
domain.feature.service 97% 0
domain.feature.util 100% 0
domain.feature.util.di 100% 0
domain.ingest 100% 0
domain.ingest.command 79% 0
domain.ingest.event 100% 0
domain.ingest.handler 57% 0
domain.ingest.model 100% 0
domain.ingest.port 100% 0
domain.ingest.service 76% 0
domain.metadata 100% 0
domain.metadata.event 100% 0
domain.metadata.handler 100% 0
domain.metadata.model 0% 0
domain.metadata.port 100% 0
domain.metadata.service 93% 0
domain.metadata.util 100% 0
domain.metadata.util.di 100% 0
domain.record 100% 0
domain.record.adapter 100% 0
domain.record.command 100% 0
domain.record.event 100% 0
domain.record.handler 100% 0
domain.record.model 100% 0
domain.record.port 100% 0
domain.record.query 100% 0
domain.record.service 65% 0
domain.semantics 100% 0
domain.semantics.command 94% 0
domain.semantics.event 100% 0
domain.semantics.handler 100% 0
domain.semantics.model 100% 0
domain.semantics.port 100% 0
domain.semantics.query 90% 0
domain.semantics.service 100% 0
domain.semantics.util 100% 0
domain.semantics.util.di 93% 0
domain.shared 92% 0
domain.shared.authorization 89% 0
domain.shared.model 92% 0
domain.shared.port 100% 0
domain.validation 100% 0
domain.validation.adapter 100% 0
domain.validation.command 98% 0
domain.validation.event 100% 0
domain.validation.handler 91% 0
domain.validation.model 96% 0
domain.validation.port 100% 0
domain.validation.query 100% 0
domain.validation.service 88% 0
domain.validation.util.di 96% 0
infrastructure 80% 0
infrastructure.auth 56% 0
infrastructure.data 37% 0
infrastructure.event 75% 0
infrastructure.http 92% 0
infrastructure.ingest 83% 0
infrastructure.k8s 76% 0
infrastructure.messaging 100% 0
infrastructure.oci 56% 0
infrastructure.persistence 69% 0
infrastructure.persistence.adapter 77% 0
infrastructure.persistence.mappers 62% 0
infrastructure.persistence.repository 31% 0
infrastructure.s3 25% 0
infrastructure.storage 100% 0
sdk 100% 0
util 100% 0
util.di 71% 0
Summary 76% (7720 / 10112) 0

@rorybyrne

Copy link
Copy Markdown
Contributor Author

@greptile

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR delivers three tightly coupled features on top of the existing DDD/CQRS skeleton: a versioned hook registry (identity + immutable releases + live pointer), per-row provenance via run.json/hook_runs, and scoped M2M credentials routed through a second JWT issuer.

  • Hook registry: hooks, hook_releases, and hook_runs tables added in one consolidated initial_schema migration; PostgresHookRegistry uses a SELECT … FOR UPDATE row lock to ensure gap-free monotonic versions and race-free created/idempotency detection, fixing both previously flagged concurrency issues.
  • Provenance chain: run_id is pre-allocated before hook execution, written into each hook's output dir (run.json), and separately persisted as an append-only hook_run row; oom_retries is now correctly propagated from HookResult (incrementing in the OOM retry loop in HookService).
  • M2M credentials: ExtraIssuerConfig adds an optional second RS256/EdDSA issuer; validate_access_token routes on the unverified iss claim (safe — full signature verification follows immediately in the chosen branch); RequiresScope gate allows scope OR ADMIN and is wired into both command and query metaclasses with startup validation.

Confidence Score: 5/5

All three previously flagged blocking issues are resolved; the remaining findings are cosmetic response-field staleness and an edge case in M2M provenance when a token omits its sub claim.

The two prior concurrency problems (TOCTOU created flag, missing hooks.live_release_id FK) are correctly fixed: ReleaseOutcome.created is now determined under the row lock, and the deferred FK is emitted in op.create_foreign_key after both tables exist. oom_retries is properly incremented in HookService and flows through HookResult into hook_runs. The M2M issuer routing is safe: the unverified decode is used only to select the verification path, and the real signature+audience check immediately follows. The two new findings are non-blocking: is_live in the 201 response is a cosmetic field that can read stale after a concurrent set_live, and the empty-sub edge case only affects provenance attribution, not authorization.

server/osa/util/di/fastapi.py (M2M identity resolution — missing sub guard) and server/osa/domain/validation/command/create_release.py (is_live staleness)

Important Files Changed

Filename Overview
server/migrations/versions/c6d9f4c0c3ab_initial_schema.py Consolidated initial schema migration: creates all tables including hooks/hook_releases/hook_runs with the deferred FK fk_hooks_live_release_id, resolving the previously flagged missing-FK issue.
server/osa/infrastructure/persistence/repository/hook_registry.py Postgres adapter for the hook registry: row-locked version assignment, digest idempotency, live-pointer advance, and resolve_live join — concurrency semantics match the design spec.
server/osa/domain/auth/service/token.py Adds iss-routed JWT validation: reads iss without verification for routing only, then verifies against the matched issuer's key; primary HS256 path is byte-identical when no extra issuer is configured.
server/osa/util/di/fastapi.py Adds M2M identity resolution: parses scopes from extra-issuer tokens and derives a stable synthetic UserId via uuid5(issuer+sub); a missing sub claim silently yields an issuer-level UUID shared across all clients from that issuer.
server/osa/domain/validation/command/create_release.py Implements CreateReleaseHandler; created flag is now correctly determined inside the registry row lock (ReleaseOutcome), fixing the prior TOCTOU. The is_live response field is derived from a separate get_hook call, which is cosmetically stale.
server/osa/domain/ingest/handler/run_hooks.py Adds per-batch provenance: resolves live releases once (snapshot), pre-allocates run IDs, writes run.json to each hook's output dir, records append-only hook_run rows; oom_retries now propagated from HookResult.
server/osa/domain/shared/command.py Adds RequiresScope gate enforcement: checks principal.has_scope OR principal.has_role(ADMIN), consistent with query.py.
server/osa/domain/deposition/service/convention.py Replaces create_convention with deploy: fans out to schema, hook registry, and convention upsert in one call; schema reuse is idempotent on id@version; hook identity upsert rejects contract changes.
server/osa/domain/validation/service/validation.py Validation service now resolves live releases, records hook_run rows with oom_retries from HookResult, and writes run.json; oom_retries on the exception-catch path correctly stays 0.
server/osa/application/api/v1/routes/hooks.py New hooks REST router: POST /{name}/releases, PUT /{name}/live, GET catalog/list/detail; sets 201/200 from outcome.created; thin DTO coercion only.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API
    participant TokenService
    participant ResolveIdentity
    participant Handler
    participant ConventionService
    participant HookRegistry
    participant DB

    Client->>API: POST /conventions (DeployConvention)
    API->>TokenService: validate_access_token(token)
    Note over TokenService: Read iss (unverified) for routing
    alt "iss == extra_issuer"
        TokenService-->>TokenService: verify RS256 signature
    else primary path
        TokenService-->>TokenService: verify HS256
    end
    TokenService-->>API: payload
    API->>ResolveIdentity: build Principal (scopes or roles)
    ResolveIdentity-->>Handler: Principal
    Handler->>Handler: RequiresScope(conventions:write) OR ADMIN
    Handler->>ConventionService: deploy(slug, schema, hooks)
    ConventionService->>DB: create/reuse schema
    loop each hook
        ConventionService->>HookRegistry: upsert_identity(name, feature)
        HookRegistry->>DB: SELECT FOR UPDATE hooks row
        ConventionService->>HookRegistry: create_release(name, runtime)
        HookRegistry->>DB: INSERT hook_releases + UPDATE live_release_id
    end
    ConventionService->>DB: pg_insert(conventions).on_conflict_do_update
    ConventionService->>DB: outbox ConventionRegistered
    API-->>Client: 201 ConventionCreated

    Client->>API: POST /hooks/name/releases
    API->>Handler: CreateReleaseHandler
    Handler->>HookRegistry: "create_release -> ReleaseOutcome(created)"
    Note over HookRegistry: Digest idempotency inside row lock
    API-->>Client: 201 new or 200 duplicate digest
Loading

Reviews (2): Last reviewed commit: "feat: surface OOM retry count into hook-..." | Re-trigger Greptile

Comment on lines +79 to +86
"status", sa.String(length=32), server_default=sa.text("'pending'"), nullable=False
),
sa.Column(
"ingestion_finished", sa.Boolean(), server_default=sa.text("false"), nullable=False
),
sa.Column("batches_ingested", sa.Integer(), server_default=sa.text("0"), nullable=False),
sa.Column("batches_completed", sa.Integer(), server_default=sa.text("0"), nullable=False),
sa.Column("published_count", sa.Integer(), server_default=sa.text("0"), nullable=False),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing FK: hooks.live_release_id → hook_releases.id

tables.py has an explicit comment: # FK added in migration (deferrable), but this migration never adds it. The hooks table is created with live_release_id UUID nullable and the hook_releases table is created right afterwards with a back-FK to hooks.name, but no op.create_foreign_key(...) call from hooks.live_release_id to hook_releases.id appears anywhere in the migration. The database therefore has no referential-integrity guard on the live pointer — a buggy or direct-SQL write could set live_release_id to a non-existent UUID and the adapter's _to_hook / resolve_live would silently return a Hook with an orphaned pointer. The fix is an op.create_foreign_key after both tables exist, using use_alter=True or a deferred constraint to handle the circular dependency.

Comment on lines +65 to +80

async def run(self, cmd: CreateRelease) -> ReleaseCreated:
built_by = str(self.principal.user_id) if self.principal.user_id else None

# Whether this is a brand-new version (201) or an idempotent no-op (200)
# for a digest already present. Read before minting; the adapter's row
# lock still guarantees gap-free versions regardless of this hint.
seen_digests = {r.runtime.digest for r in await self.service.list_releases(cmd.name)}
release = await self.service.create_release(
cmd.name, cmd.to_runtime(), cmd.source_ref, built_by
)
hook = await self.service.get_hook(cmd.name)
is_live = hook is not None and hook.live_release_id == release.id

return ReleaseCreated(
hook_name=release.hook_name,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 created flag is TOCTOU-racy under concurrent identical submissions

seen_digests is collected via list_releases before the adapter's SELECT … FOR UPDATE row lock is acquired inside create_release. Two concurrent requests carrying the same digest can both read an empty seen_digests, call create_release in series (the lock serializes them correctly for data integrity), and both compute created=True — returning 201 for what should be the second caller's idempotent 200. The simplest fix is to derive created from a flag the adapter already knows: return a (HookRelease, bool) tuple from create_release (or compare the release's built_at to now()), so the determination happens inside the lock.

started_at=started_at,
finished_at=finished_at,
duration_s=duration_by_hook.get(hook.name, 0.0),
oom_retries=0,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 oom_retries always recorded as 0 in provenance

HookRun.oom_retries is hardcoded to 0 for every run, even when the hook actually retried after an OOM eviction. The OOM retry loop (which calls with_doubled_memory) lives inside the hook runner infrastructure and the retry count isn't surfaced to HookResult, so this information is currently lost at the provenance layer. At minimum the field should reflect actual retries; if the runner doesn't expose the count yet, a follow-up ticket should be opened to propagate it so the provenance record is accurate.

Suggested change
oom_retries=0,
oom_retries=0, # TODO: surface actual retry count from hook runner (FR-provenance)

Addresses two PR review findings on the hook registry.

1. Missing referential integrity on the live pointer: the initial_schema
   migration created hooks.live_release_id without the FK to hook_releases.id
   (the tables.py comment promised one). Add it as a DEFERRABLE, use_alter FK in
   both the SQLAlchemy metadata and the migration (created after both tables
   exist, dropped first on downgrade) so the circular dependency is handled and
   `alembic check` stays zero-drift. Verified against Postgres: FK present,
   DEFERRABLE INITIALLY DEFERRED, no drift.

2. TOCTOU on the release 201-vs-200 hint: the handler read existing digests
   before the adapter's row lock, so concurrent identical submissions could both
   report "created". Decide `created` inside the lock instead — create_release
   now returns a ReleaseOutcome{release, created} (named result, not a tuple),
   and the handler maps it straight to the status code with no pre-read.

Tests + integration conftest updated for the new return type. 1155 unit + 120
integration pass; ruff + ty(osa) clean.
Addresses the third PR review finding: HookRun.oom_retries was hardcoded to 0
at every record site, so provenance always claimed zero retries even when a hook
OOM-retried with doubled memory.

- HookResult gains oom_retries; HookService.run_hook counts each doubled-memory
  retry inside its loop and sets it on the returned result (success + rejected
  paths).
- Layer-2 cleanup: the ingest handler's parallel status_by_hook/duration_by_hook
  dicts collapse into one result_by_hook, from which status, duration, and
  oom_retries are read.
- Both record sites (ingest run_hooks, deposition ValidationService) stamp the
  real count. The two hard-failure paths (OOM-exhausted / other exceptions) keep
  0 with a comment — the count isn't surfaced through the exception and those
  runs produce no feature rows (provenance is best-effort by design).

Tests: run_hook counting (0 / 1 / 2 retries with memory doubling) and end-to-end
that the recorded HookRun carries the count. 1156 unit + 120 integration pass;
ruff + ty(osa) clean.
@rorybyrne

Copy link
Copy Markdown
Contributor Author

@greptile

The optional second (machine) JWT issuer now accepts EdDSA/Ed25519 exclusively
— the algorithm is no longer configurable. ExtraIssuerConfig drops its
`algorithm` field and the verify path pins `algorithms=["EdDSA"]`, so the token
header's `alg` is never trusted and can't downgrade to `none` or to HS256
treating the configured PEM as a shared secret. Operators only supply the
issuer, Ed25519 public key, audience, and scope claim.

Tests cover the EdDSA happy path, wrong-key/wrong-aud rejection, and that an
RS256-signed token for the trusted issuer is rejected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant