feat: spec-coverage-analyzer spike (#277) by esraagamal6 · Pull Request #300 · camunda/api-test-generator

esraagamal6 · 2026-05-19T13:20:54Z

Status

Draft / spike for #277. Not for merge as-is — the goal of this PR is to give @jwulf and the team something concrete to react to before deciding on the next step (ABox slice priorities, rule-table refinement, integration design).

Summary

Adds spec-coverage-analyzer/ — a Python analyzer that reads an OpenAPI spec and emits a per-endpoint test plan, tagging each plan item as either computable from the spec alone or needs an ABox fact (with the missing fact named in the output).

Runs by default against spec/camunda-oca/bundled/rest-api.bundle.json; designed to be re-run on Camunda Hub when that spec lands.

How is this different from the generator?

Both tools read the spec and the ABox — the generator already integrates the ontology under configs/<config>/ontology/ (that's how it knows e.g. process-instance needs a deployed process first). The analyzer-spike-as-shipped doesn't read the ABox yet — that's deferred to next-step #2 below.

The real difference is what each tool produces:

	Generator	Analyzer (this PR)
Output	Real `.spec.ts` test files (executable code)	A checklist describing what tests should exist (no code)
Categories covered	Only the categories the emitter plugins know how to build (happy-path, request-validation negatives, edge + entity lifecycles, …)	Every category that should exist, including ones no emitter knows how to build yet
When a category isn't covered	Silent — those tests just don't exist	Listed in the plan, tagged with what's missing (an ABox slice, a new emitter plugin, or both)
ABox integration	Already wired	Not in this spike — deferred to step 2 of "Next steps"
Audience	The test runner (CI / dev machine)	Humans deciding what to build next

Concrete example for POST /tenants —

What the generator currently produces (real .spec.ts files):

✅ A happy-path test
✅ ~15 schema-validation negative tests via the request-validation emitter
✅ A composite lifecycle test (create → present → update → delete → absent) from the EntityLifecycle template

What the analyzer says should exist for that endpoint:

✅ All of the above, plus:
⚠️ A 401 test — currently needs-abox: spec-gap (spec under-declares auth)
⚠️ A 403 test — needs-abox: RBAC (no slice yet)
⚠️ A 404 test against a fake tenant ID — computable ✓ (no emitter for it yet either, but the spec gives us enough)
⚠️ A 409 test for duplicate-create — needs-abox: duplicatePolicy (Josh's 8.8 design, not landed)
⚠️ Pagination + filter behaviour tests on search — needs-abox: filter/sort semantics

The ✅ items: generator already produces them.
The ⚠️ items: generator produces zero of them.

Why this matters: the analyzer is the checklist the generator gets graded against. Without it, "is the generator producing enough tests?" gets hand-counted against another suite. With it, every missing test is named, scoped, and tagged with exactly what's blocking it — usually a specific ABox slice or a specific emitter plugin.

Snapshot against the OCA spec

	count
Operations	190
Plan items total	1817
Computable from spec alone	1027 (56%)
Needs ABox / domain knowledge	790 (44%)

Top missing ABox facts (by plan-item load):

missing fact	plan items
RBAC: permissions required per endpoint	190
spec-gap: which endpoints actually require auth	189
creation chain per identifier semantic-type	120
filter-field-semantics + sort-field-allowlist per entity	106
`duplicatePolicy` per endpoint (idempotent / conflict / replace)	59
consistency window per entity	43
scale thresholds + expected response time per entity	43
lifecycle state machine for this entity	40
cross-field validation rules	(low — heuristic only catches paired `Before`/`After`)

Spec-gap finding

The OCA spec declares securitySchemes (BearerAuth, basicAuth) but only applies them on getAuthentication. The analyzer flags 189 ops with a 401-unauthorized needs-ABox item (spec-gap: which endpoints actually require auth, encoded only in deployment, not the spec). That's a real spec/reality drift worth surfacing — relates to camunda/camunda#52511.

Rule table

See spec-coverage-analyzer/README.md for the full rule table. Brief summary:

Computable rules (14 kinds): happy-path, bad-request:{missing-required, type-mismatch, format-invalid, enum-violation, range-violation, additional-property, oneof-violation}, 404-not-found (per path param), 401-unauthorized (when security is declared on the op), pagination-sort:request-shape, filter:request-shape, documented-XXX (per documented non-2xx response).
Needs-ABox rules (9 kinds): 401-unauthorized:spec-gap, 403-forbidden, 409-conflict, business-entity-lifecycle, prerequisite-resource, eventual-consistency, scale-large-n, cross-field-range, pagination-sort/filter:behaviour-assertion.

Outputs

plan.csv — machine-readable, one row per (operationId, plan-item) tuple.
plan.md — per-endpoint readable summary, formatted as upstream's coverage_breakdown.md.
needs-abox.md — aggregated needs-ABox gaps grouped by missing fact.

Test plan

Run python3 spec-coverage-analyzer/build_plan.py and confirm it emits the 3 artifacts.
Spot-check a few operations in plan.md — do the computable / needs-ABox tags match what you'd expect?
Sanity-check needs-abox.md — are the 9 ABox-fact buckets the right axes? Anything missing?

Open questions for review

Rule-table coverage: is the current 14-computable + 9-needs-ABox set the right shape, or are there obvious categories missing?
ABox slice priority: which of the 8 top ABox facts should land first? My read: duplicatePolicy (smallest, already designed in 8.8) is the cheapest unlock — would move 59 plan items from "needs ABox" to "computable" in a single shot.
Integration design: where does this tool live long-term? Standalone Python (current spike) or rewritten into the TypeScript path-analyser/ stack (consistent with the rest of the generator codebase)?

Next steps (once this spike is signed off)

In priority order, with dependencies:

Land duplicatePolicy as the first ABox slice. Cheapest unblock — Josh has already designed it (8.8); needs to be expressed as a new file under configs/camunda-oca/ontology/duplicatePolicy.json with { operationId → policy } entries for the ~59 create-style endpoints flagged. Independent of this PR's review outcome — could start in parallel.
Wire the analyzer to read the ABox. After (1), update build_plan.py to consume duplicatePolicy and reclassify those 59 plan items from needs-abox to computable. Validates the analyzer ↔ ABox contract on the smallest slice before scaling to the bigger ones (RBAC, filter-semantics).
Ship the 404 fake-ID emitter (Close coverage gap vs upstream e2e suite (negative-path + search-refinement emitters) #279). First proof of the full loop: analyzer flags a plan item as computable → emitter generates the actual test → tests run. No dependency on any ABox slice; ontology/semantics.json already encodes the path-param identifier types. ~1 day's work, closes ~127 upstream-equivalent tests.
Run the analyzer against the Camunda Hub spec. Validates the rule table generalises beyond OCA. Will surface any OCA-specific assumptions baked into the heuristics (e.g. the business-entity-signal patterns).
Repurpose coverage-analysis/ (PR chore: add coverage analysis for generated tests (#275) #278) as a verification check. Currently it analyses what the generator emits. Once the analyzer exists, the two can be diffed: "does the generator emit what the analyzer says it should?". Becomes a CI check rather than a static snapshot.

Out of scope for follow-up (defer)

RBAC ABox slice (190 items) — biggest unlock but also biggest scoping effort. Defer until duplicatePolicy validates the analyzer↔ABox pattern.
Filter-semantics ABox slice (106 items) — same reasoning; lots of per-field decisions.
Camunda Hub generalisation — wait until OCA-side loop (analyzer + duplicatePolicy + 404 emitter) is stable.

🤖 Generated with Claude Code

Reads an OpenAPI spec and emits a per-endpoint test plan, tagging each plan item as either: - computable -- derivable from the spec alone - needs-abox:X -- requires domain knowledge; X names the missing fact Snapshot against the OCA spec: 190 operations, 1817 plan items 1027 computable from spec (56%) 790 needs-abox / domain knowledge (44%) The needs-ABox load is concentrated in a handful of facts (top 5): - RBAC permissions per endpoint 190 items - spec-gap: which endpoints require auth 189 items - creation chain per identifier semantic 120 items - filter-field-semantics + sort-allowlist 106 items - duplicatePolicy per endpoint 59 items Also surfaces a real spec/reality drift: the OCA spec declares securitySchemes but only applies them on getAuthentication. The analyzer flags this as a spec-gap so 401 coverage stays visible in the plan. Outputs (next to the script, committed for diffability): - plan.csv machine-readable, one row per (op, plan-item) - plan.md per-endpoint readable summary - needs-abox.md aggregated needs-ABox gaps grouped by fact Independent of coverage-analysis/ (which runs in the opposite direction, analysing what the generator already emits). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…es + deps Spells out the 5-step path forward after the spike is signed off: duplicatePolicy slice → analyzer/ABox wiring → 404 fake-ID emitter → Camunda Hub generalisation → coverage-analysis/ as verification check. Also lists what's deferred (RBAC, filter-semantics, Hub generalisation) so reviewers know what we're explicitly NOT picking up first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 19, 2026

How do we calculate the test coverage of an API? #277

Open

Close coverage gap vs upstream e2e suite (negative-path + search-refinement emitters) #279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: spec-coverage-analyzer spike (#277)#300

feat: spec-coverage-analyzer spike (#277)#300
esraagamal6 wants to merge 2 commits into
mainfrom
spike/spec-coverage-analyzer-277

esraagamal6 commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esraagamal6 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status

Summary

How is this different from the generator?

Snapshot against the OCA spec

Spec-gap finding

Rule table

Outputs

Test plan

Open questions for review

Next steps (once this spike is signed off)

Out of scope for follow-up (defer)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

esraagamal6 commented May 19, 2026 •

edited

Loading