Skip to content

feat: spec-coverage-analyzer spike (#277)#300

Draft
esraagamal6 wants to merge 2 commits into
mainfrom
spike/spec-coverage-analyzer-277
Draft

feat: spec-coverage-analyzer spike (#277)#300
esraagamal6 wants to merge 2 commits into
mainfrom
spike/spec-coverage-analyzer-277

Conversation

@esraagamal6
Copy link
Copy Markdown
Contributor

@esraagamal6 esraagamal6 commented May 19, 2026

Status

Draft / spike for #277. Not for merge as-is — the goal of this PR is to give @jwulf and the team something concrete to react to before deciding on the next step (ABox slice priorities, rule-table refinement, integration design).

Summary

Adds spec-coverage-analyzer/ — a Python analyzer that reads an OpenAPI spec and emits a per-endpoint test plan, tagging each plan item as either computable from the spec alone or needs an ABox fact (with the missing fact named in the output).

Runs by default against spec/camunda-oca/bundled/rest-api.bundle.json; designed to be re-run on Camunda Hub when that spec lands.

How is this different from the generator?

Both tools read the spec and the ABox — the generator already integrates the ontology under configs/<config>/ontology/ (that's how it knows e.g. process-instance needs a deployed process first). The analyzer-spike-as-shipped doesn't read the ABox yet — that's deferred to next-step #2 below.

The real difference is what each tool produces:

Generator Analyzer (this PR)
Output Real .spec.ts test files (executable code) A checklist describing what tests should exist (no code)
Categories covered Only the categories the emitter plugins know how to build (happy-path, request-validation negatives, edge + entity lifecycles, …) Every category that should exist, including ones no emitter knows how to build yet
When a category isn't covered Silent — those tests just don't exist Listed in the plan, tagged with what's missing (an ABox slice, a new emitter plugin, or both)
ABox integration Already wired Not in this spike — deferred to step 2 of "Next steps"
Audience The test runner (CI / dev machine) Humans deciding what to build next

Concrete example for POST /tenants

What the generator currently produces (real .spec.ts files):

  • ✅ A happy-path test
  • ✅ ~15 schema-validation negative tests via the request-validation emitter
  • ✅ A composite lifecycle test (create → present → update → delete → absent) from the EntityLifecycle template

What the analyzer says should exist for that endpoint:

  • ✅ All of the above, plus:
  • ⚠️ A 401 test — currently needs-abox: spec-gap (spec under-declares auth)
  • ⚠️ A 403 test — needs-abox: RBAC (no slice yet)
  • ⚠️ A 404 test against a fake tenant ID — computable ✓ (no emitter for it yet either, but the spec gives us enough)
  • ⚠️ A 409 test for duplicate-create — needs-abox: duplicatePolicy (Josh's 8.8 design, not landed)
  • ⚠️ Pagination + filter behaviour tests on searchneeds-abox: filter/sort semantics

The ✅ items: generator already produces them.
The ⚠️ items: generator produces zero of them.

Why this matters: the analyzer is the checklist the generator gets graded against. Without it, "is the generator producing enough tests?" gets hand-counted against another suite. With it, every missing test is named, scoped, and tagged with exactly what's blocking it — usually a specific ABox slice or a specific emitter plugin.

Snapshot against the OCA spec

count
Operations 190
Plan items total 1817
Computable from spec alone 1027 (56%)
Needs ABox / domain knowledge 790 (44%)

Top missing ABox facts (by plan-item load):

missing fact plan items
RBAC: permissions required per endpoint 190
spec-gap: which endpoints actually require auth 189
creation chain per identifier semantic-type 120
filter-field-semantics + sort-field-allowlist per entity 106
duplicatePolicy per endpoint (idempotent / conflict / replace) 59
consistency window per entity 43
scale thresholds + expected response time per entity 43
lifecycle state machine for this entity 40
cross-field validation rules (low — heuristic only catches paired *Before/*After)

Spec-gap finding

The OCA spec declares securitySchemes (BearerAuth, basicAuth) but only applies them on getAuthentication. The analyzer flags 189 ops with a 401-unauthorized needs-ABox item (spec-gap: which endpoints actually require auth, encoded only in deployment, not the spec). That's a real spec/reality drift worth surfacing — relates to camunda/camunda#52511.

Rule table

See spec-coverage-analyzer/README.md for the full rule table. Brief summary:

  • Computable rules (14 kinds): happy-path, bad-request:{missing-required, type-mismatch, format-invalid, enum-violation, range-violation, additional-property, oneof-violation}, 404-not-found (per path param), 401-unauthorized (when security is declared on the op), pagination-sort:request-shape, filter:request-shape, documented-XXX (per documented non-2xx response).
  • Needs-ABox rules (9 kinds): 401-unauthorized:spec-gap, 403-forbidden, 409-conflict, business-entity-lifecycle, prerequisite-resource, eventual-consistency, scale-large-n, cross-field-range, pagination-sort/filter:behaviour-assertion.

Outputs

  • plan.csv — machine-readable, one row per (operationId, plan-item) tuple.
  • plan.md — per-endpoint readable summary, formatted as upstream's coverage_breakdown.md.
  • needs-abox.md — aggregated needs-ABox gaps grouped by missing fact.

Test plan

  • Run python3 spec-coverage-analyzer/build_plan.py and confirm it emits the 3 artifacts.
  • Spot-check a few operations in plan.md — do the computable / needs-ABox tags match what you'd expect?
  • Sanity-check needs-abox.md — are the 9 ABox-fact buckets the right axes? Anything missing?

Open questions for review

  1. Rule-table coverage: is the current 14-computable + 9-needs-ABox set the right shape, or are there obvious categories missing?
  2. ABox slice priority: which of the 8 top ABox facts should land first? My read: duplicatePolicy (smallest, already designed in 8.8) is the cheapest unlock — would move 59 plan items from "needs ABox" to "computable" in a single shot.
  3. Integration design: where does this tool live long-term? Standalone Python (current spike) or rewritten into the TypeScript path-analyser/ stack (consistent with the rest of the generator codebase)?

Next steps (once this spike is signed off)

In priority order, with dependencies:

  1. Land duplicatePolicy as the first ABox slice. Cheapest unblock — Josh has already designed it (8.8); needs to be expressed as a new file under configs/camunda-oca/ontology/duplicatePolicy.json with { operationId → policy } entries for the ~59 create-style endpoints flagged. Independent of this PR's review outcome — could start in parallel.
  2. Wire the analyzer to read the ABox. After (1), update build_plan.py to consume duplicatePolicy and reclassify those 59 plan items from needs-abox to computable. Validates the analyzer ↔ ABox contract on the smallest slice before scaling to the bigger ones (RBAC, filter-semantics).
  3. Ship the 404 fake-ID emitter (Close coverage gap vs upstream e2e suite (negative-path + search-refinement emitters) #279). First proof of the full loop: analyzer flags a plan item as computable → emitter generates the actual test → tests run. No dependency on any ABox slice; ontology/semantics.json already encodes the path-param identifier types. ~1 day's work, closes ~127 upstream-equivalent tests.
  4. Run the analyzer against the Camunda Hub spec. Validates the rule table generalises beyond OCA. Will surface any OCA-specific assumptions baked into the heuristics (e.g. the business-entity-signal patterns).
  5. Repurpose coverage-analysis/ (PR chore: add coverage analysis for generated tests (#275) #278) as a verification check. Currently it analyses what the generator emits. Once the analyzer exists, the two can be diffed: "does the generator emit what the analyzer says it should?". Becomes a CI check rather than a static snapshot.

Out of scope for follow-up (defer)

  • RBAC ABox slice (190 items) — biggest unlock but also biggest scoping effort. Defer until duplicatePolicy validates the analyzer↔ABox pattern.
  • Filter-semantics ABox slice (106 items) — same reasoning; lots of per-field decisions.
  • Camunda Hub generalisation — wait until OCA-side loop (analyzer + duplicatePolicy + 404 emitter) is stable.

🤖 Generated with Claude Code

Reads an OpenAPI spec and emits a per-endpoint test plan, tagging each
plan item as either:
  - computable    -- derivable from the spec alone
  - needs-abox:X  -- requires domain knowledge; X names the missing fact

Snapshot against the OCA spec:
  190 operations, 1817 plan items
  1027 computable from spec (56%)
   790 needs-abox / domain knowledge (44%)

The needs-ABox load is concentrated in a handful of facts (top 5):
  - RBAC permissions per endpoint           190 items
  - spec-gap: which endpoints require auth  189 items
  - creation chain per identifier semantic  120 items
  - filter-field-semantics + sort-allowlist 106 items
  - duplicatePolicy per endpoint             59 items

Also surfaces a real spec/reality drift: the OCA spec declares
securitySchemes but only applies them on getAuthentication. The analyzer
flags this as a spec-gap so 401 coverage stays visible in the plan.

Outputs (next to the script, committed for diffability):
  - plan.csv        machine-readable, one row per (op, plan-item)
  - plan.md         per-endpoint readable summary
  - needs-abox.md   aggregated needs-ABox gaps grouped by fact

Independent of coverage-analysis/ (which runs in the opposite direction,
analysing what the generator already emits).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es + deps

Spells out the 5-step path forward after the spike is signed off:
duplicatePolicy slice → analyzer/ABox wiring → 404 fake-ID emitter →
Camunda Hub generalisation → coverage-analysis/ as verification check.

Also lists what's deferred (RBAC, filter-semantics, Hub generalisation)
so reviewers know what we're explicitly NOT picking up first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant