From b9d5b9655786611aa2f4510235af84479a04a8fe Mon Sep 17 00:00:00 2001 From: Josh Wulf Date: Wed, 29 Apr 2026 15:17:07 +1200 Subject: [PATCH 1/3] =?UTF-8?q?chore(spike-rdf):=20phase=201=20=E2=80=94?= =?UTF-8?q?=20draft=20core/Camunda=20ontology=20+=20SHACL=20invariants?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Lays the modelling foundation for issue #60. Three artefacts: - docs/spikes/rdf/ontology/core.ttl — API-agnostic vocabulary (Operation, SemanticType, RuntimeState, Capability, FieldPath, ValueBinding, ArtifactKind, Identifier, Disjunction, Scenario) - docs/spikes/rdf/ontology/camunda.ttl — per-API instances and subclasses; demonstrates the per-API <-> core boundary - docs/spikes/rdf/shapes/invariants.shapes.ttl — SHACL shapes pinning invariants the codebase enforces procedurally today (every required semantic type / runtime state has a producer; every value binding resolves to a known FieldPath and a parameter declared by its target state) Adds devDependencies oxigraph (SPARQL 1.1) and rdf-validate-shacl (SHACL) — scoped to the spike under docs/spikes/rdf/, no production-pipeline change. Refs #60 --- docs/spikes/rdf/README.md | 44 ++ docs/spikes/rdf/ontology/camunda.ttl | 78 ++++ docs/spikes/rdf/ontology/core.ttl | 277 ++++++++++++ docs/spikes/rdf/shapes/invariants.shapes.ttl | 156 +++++++ package-lock.json | 426 +++++++++++++++++++ package.json | 3 + 6 files changed, 984 insertions(+) create mode 100644 docs/spikes/rdf/README.md create mode 100644 docs/spikes/rdf/ontology/camunda.ttl create mode 100644 docs/spikes/rdf/ontology/core.ttl create mode 100644 docs/spikes/rdf/shapes/invariants.shapes.ttl diff --git a/docs/spikes/rdf/README.md b/docs/spikes/rdf/README.md new file mode 100644 index 0000000..ed3efb9 --- /dev/null +++ b/docs/spikes/rdf/README.md @@ -0,0 +1,44 @@ +# Spike: RDF / SPARQL as a unifying query layer + +Tracking issue: [#60](https://github.com/camunda/api-test-generator/issues/60) + +This directory is the spike workspace. **Throwaway code; clarity over polish.** +The recommendation in [`RECOMMENDATION.md`](./RECOMMENDATION.md) is the +deliverable; everything else is the working that backs it. + +## Layout + +| Path | What lives here | +|---|---| +| [`ontology/`](./ontology/) | Turtle ontology (core API-agnostic + Camunda vocabulary) | +| [`shapes/`](./shapes/) | SHACL shapes pinning structural invariants the codebase enforces procedurally today | +| [`adapters/`](./adapters/) | One-shot scripts: existing JSON / OpenAPI → triples | +| [`parity/`](./parity/) | Index-parity tests: façade-over-triple-store output vs current `graphLoader.ts` output | +| [`queries/`](./queries/) | The two declarative re-expressions (value-binding drift; minimal scenario-chain candidates) | +| [`second-api-sketch.md`](./second-api-sketch.md) | Paper sketch of a second API's vocabulary against the core ontology | +| [`RECOMMENDATION.md`](./RECOMMENDATION.md) | Adopt / adopt-modelling-only / reject | + +## Triple store + +`oxigraph` (npm) for SPARQL 1.1 incl. property paths. `rdf-validate-shacl` +for SHACL (oxigraph does not ship SHACL). Both pure-Node, in-process, +offline. The triple store is **derived state**: rebuilt on every run from +the bundled spec + sidecars; never persisted. + +## How to run the spike artefacts + +```bash +# Materialise triples from current data sources (writes spike/out/*.ttl): +npx tsx docs/spikes/rdf/adapters/run-all.ts + +# Index-parity check (the go/no-go checkpoint): +npx tsx docs/spikes/rdf/parity/index-parity.ts + +# Declarative re-expressions: +npx tsx docs/spikes/rdf/queries/value-binding-drift.ts +npx tsx docs/spikes/rdf/queries/minimal-scenario-chain.ts +``` + +The adapters and queries do **not** wire into the production pipeline. +They read the same input files (`path-analyser/dist/...` / regenerated +artefacts) and assert the model is faithful. diff --git a/docs/spikes/rdf/ontology/camunda.ttl b/docs/spikes/rdf/ontology/camunda.ttl new file mode 100644 index 0000000..76320de --- /dev/null +++ b/docs/spikes/rdf/ontology/camunda.ttl @@ -0,0 +1,78 @@ +# Camunda vocabulary — instances + per-API extensions to the core ontology. +# +# This file names Camunda-specific concepts (BPMN processes, DMN +# decisions, the Zeebe state machine) using the core ontology. Adding a +# second API means writing a sibling .ttl file using its own namespace +# (acmeApi:, github:, …) and loading it into the same store. +# +# Spike note: only enough instances to demonstrate the per-API <-> core +# boundary. The adapters (../adapters/) emit the rest from +# domain-semantics.json mechanically. + +@prefix camunda: . +@prefix core: . +@prefix rdf: . +@prefix rdfs: . +@prefix owl: . + +camunda: a owl:Ontology ; + rdfs:label "Camunda 8 Orchestration Cluster — vocabulary" ; + rdfs:comment "Per-API extensions and individuals layered onto the core ontology." . + +# +# === Artifact kinds (instances of core:ArtifactKind) ====================== +# + +camunda:bpmnProcess a core:ArtifactKind ; rdfs:label "BPMN process" . +camunda:dmnDecision a core:ArtifactKind ; rdfs:label "DMN decision" . +camunda:form a core:ArtifactKind ; rdfs:label "Form" . + +# +# === Identifiers (instances of core:Identifier) =========================== +# +# Concrete identifiers from domain-semantics.json. Adapters emit the rest; +# these few are pinned in TTL so the modelling boundary is visible. + +camunda:ProcessDefinitionId a core:Identifier ; + rdfs:label "Process definition ID" ; + core:validityState camunda:ProcessDefinitionDeployed . + +camunda:ProcessInstanceKey a core:Identifier ; + rdfs:label "Process instance key" ; + core:validityState camunda:ProcessInstanceExists . + +# +# === Runtime states (instances of core:RuntimeState) ====================== +# + +camunda:ProcessDefinitionDeployed a core:RuntimeState ; + rdfs:label "Process definition deployed" ; + core:hasParameter "processDefinitionId" . + +camunda:ProcessInstanceExists a core:RuntimeState ; + rdfs:label "Process instance exists" ; + core:hasParameter "processInstanceKey" ; + core:dependsOn camunda:ProcessDefinitionDeployed . + +camunda:JobAvailableForActivation a core:RuntimeState ; + rdfs:label "Job available for activation" ; + core:dependsOn camunda:ProcessInstanceExists . + +# +# === Capabilities (subclass of core:RuntimeState) ========================= +# + +camunda:ModelHasServiceTaskType a core:Capability ; + rdfs:label "Deployed model has a service task of a given job type" ; + core:hasParameter "jobType" ; + core:dependsOn camunda:ProcessDefinitionDeployed . + +# +# === Vocabulary boundary check (informational) ============================ +# +# Honest test for the abstraction (per the spike brief): the planner +# must be writable referring only to terms in core:. Every term above is +# either an instance of a core: class or a subclass of one. Nothing in +# this file extends the core schema with new properties. If a future +# requirement needs a new property here, that's the signal that the +# core ontology is missing an abstraction. diff --git a/docs/spikes/rdf/ontology/core.ttl b/docs/spikes/rdf/ontology/core.ttl new file mode 100644 index 0000000..8eb94fd --- /dev/null +++ b/docs/spikes/rdf/ontology/core.ttl @@ -0,0 +1,277 @@ +# Core ontology — API-agnostic +# +# This vocabulary names the entities and relations the api-test-generator +# currently encodes as conventions inside procedural TS. It does NOT +# mention any Camunda-specific term. The honest test of the abstraction +# is that the planner can be written referring only to terms in this file. +# +# Spike: see ../README.md and ../../RECOMMENDATION.md + +@prefix core: . +@prefix rdf: . +@prefix rdfs: . +@prefix owl: . +@prefix xsd: . + +core: a owl:Ontology ; + rdfs:label "API test generator — core vocabulary" ; + rdfs:comment "API-agnostic entities for modelling REST APIs with runtime state dependencies." . + +# +# === Operations ============================================================ +# + +core:Operation a rdfs:Class ; + rdfs:label "Operation" ; + rdfs:comment "A single REST operation (operationId + method + path)." . + +core:operationId a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range xsd:string ; + rdfs:comment "Stable identifier from the OpenAPI spec." . + +core:method a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range xsd:string . + +core:path a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range xsd:string . + +core:eventuallyConsistent a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range xsd:boolean ; + rdfs:comment "True when the operation's effects are not immediately observable." . + +core:successStatus a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range xsd:integer . + +# +# === Semantic types ======================================================== +# +# A SemanticType is a logical value class (e.g. ProcessInstanceKey) that +# operations consume or produce. Distinct from the underlying scalar +# (string/integer): two semantic types may share a primitive type but +# differ in meaning. + +core:SemanticType a rdfs:Class ; + rdfs:label "Semantic type" . + +# Production / consumption edges between Operation and SemanticType. + +core:produces a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:SemanticType ; + rdfs:comment "Operation emits a value of this semantic type in its response." . + +core:requires a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:SemanticType ; + rdfs:comment "Operation needs a value of this semantic type as input (REQUIRED)." . + +core:requiresOptional a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:SemanticType ; + rdfs:comment "Operation accepts a value of this semantic type as input (OPTIONAL)." . + +core:authoritativeProducer a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:SemanticType ; + rdfs:comment """Marks this operation as the canonical producer for the given + semantic type (corresponds to OpenAPI x-semantic-provider: true). + Used by the planner to pick a preferred bootstrap step.""" . + +# +# === Field paths =========================================================== +# +# A FieldPath localises a SemanticType to a concrete location in a +# request or response payload. This is what makes value bindings +# checkable: today the binding is a free-form string; here it points +# to a node that either exists in the canonical schema or doesn't. + +core:FieldPath a rdfs:Class . + +core:fieldPath a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range xsd:string ; + rdfs:comment "Dot/[]-notation path, e.g. deployments[].processDefinition.processDefinitionId" . + +core:jsonPointer a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range xsd:string ; + rdfs:comment "JSON Pointer form, e.g. /deployments/0/processDefinition/processDefinitionId" . + +core:scalarType a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range xsd:string ; + rdfs:comment "string | integer | object | array | boolean | number" . + +core:isRequiredField a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range xsd:boolean . + +core:locatesSemanticType a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range core:SemanticType ; + rdfs:comment "The FieldPath carries a value of this semantic type at runtime." . + +core:onResponseOf a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range core:Operation ; + rdfs:comment "FieldPath belongs to the response of this operation." . + +core:onRequestOf a rdf:Property ; + rdfs:domain core:FieldPath ; + rdfs:range core:Operation . + +# +# === Runtime states & capabilities ========================================= +# +# A RuntimeState is a fact about the system that becomes true after some +# operation succeeds (e.g. ProcessDefinitionDeployed). A Capability is a +# state-shaped requirement carried by an artifact (e.g. +# ModelHasServiceTaskType). + +core:RuntimeState a rdfs:Class . + +core:Capability a rdfs:Class ; + rdfs:subClassOf core:RuntimeState . + +core:producesState a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:RuntimeState . + +core:requiresState a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:RuntimeState . + +core:implicitlyAdds a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:RuntimeState ; + rdfs:comment "State that becomes true on success even if not declared as 'produces'." . + +# Inverse traversal — supports SPARQL property paths to find +# transitive prerequisites for a target state. Replaces +# gatherDomainPrerequisites() in scenarioGenerator.ts. +core:dependsOn a rdf:Property ; + rdfs:domain core:RuntimeState ; + rdfs:range core:RuntimeState ; + a owl:TransitiveProperty . + +core:hasParameter a rdf:Property ; + rdfs:domain core:RuntimeState ; + rdfs:range xsd:string ; + rdfs:comment "Name of the parameter this state exposes (e.g. processDefinitionId)." . + +# +# === Disjunctions ========================================================== +# +# operationRequirements.*.disjunctions: an operation may have multiple +# alternative requirement sets ("either A and B, or C"). Reified as a +# Disjunction node so SPARQL can ask 'which alternatives satisfy op X?'. + +core:Disjunction a rdfs:Class . + +core:disjunctionOf a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:Disjunction . + +core:hasAlternative a rdf:Property ; + rdfs:domain core:Disjunction ; + rdfs:range core:RuntimeState . + +# +# === Value bindings ======================================================== +# +# Today: a string-keyed map { "request.processDefinitionId": +# "ProcessDefinitionDeployed.processDefinitionId" } resolved by string +# splitting at scenario-bind time. Silent miss on typo. +# +# Here: a first-class node connecting a FieldPath to a (RuntimeState, +# parameter) pair. SHACL shapes (../shapes/value-binding.shapes.ttl) +# enforce that the FieldPath actually exists. + +core:ValueBinding a rdfs:Class . + +core:bindingOf a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:ValueBinding . + +core:bindingDirection a rdf:Property ; + rdfs:domain core:ValueBinding ; + rdfs:range xsd:string ; + rdfs:comment "request | response" . + +core:bindsFromFieldPath a rdf:Property ; + rdfs:domain core:ValueBinding ; + rdfs:range core:FieldPath . + +core:bindsToState a rdf:Property ; + rdfs:domain core:ValueBinding ; + rdfs:range core:RuntimeState . + +core:bindsToParameter a rdf:Property ; + rdfs:domain core:ValueBinding ; + rdfs:range xsd:string . + +# +# === Artifacts ============================================================= +# +# Some operations consume artifact content (e.g. a BPMN file) which itself +# produces states once deployed. ArtifactKind is a class of artifacts; the +# concrete identifiers are per-API (camunda:bpmnProcess etc.). + +core:ArtifactKind a rdfs:Class . + +core:producesStateViaArtifact a rdf:Property ; + rdfs:domain core:ArtifactKind ; + rdfs:range core:RuntimeState . + +core:producesSemanticViaArtifact a rdf:Property ; + rdfs:domain core:ArtifactKind ; + rdfs:range core:SemanticType . + +core:hasArtifactRule a rdf:Property ; + rdfs:domain core:Operation ; + rdfs:range core:ArtifactKind . + +# +# === Identifiers =========================================================== +# +# An Identifier is a SemanticType subclass that uniquely names an entity +# whose validity is governed by a RuntimeState (e.g. ProcessDefinitionId +# is valid only while ProcessDefinitionDeployed holds). + +core:Identifier a rdfs:Class ; + rdfs:subClassOf core:SemanticType . + +core:validityState a rdf:Property ; + rdfs:domain core:Identifier ; + rdfs:range core:RuntimeState . + +# +# === Scenarios (planner output, included for cross-cutting queries) ======= +# +# Reified for the cross-tool coverage join discussed in the brief, even +# though the planner itself stays in TS. Triples can be emitted at the +# end of a generator run for downstream coverage queries. + +core:Scenario a rdfs:Class . + +core:satisfiedBy a rdf:Property ; + rdfs:domain core:Scenario ; + rdfs:range core:SemanticType . + +core:reachesState a rdf:Property ; + rdfs:domain core:Scenario ; + rdfs:range core:RuntimeState . + +core:targetsOperation a rdf:Property ; + rdfs:domain core:Scenario ; + rdfs:range core:Operation . + +core:hasStep a rdf:Property ; + rdfs:domain core:Scenario ; + rdfs:range core:Operation ; + rdfs:comment "Multi-valued; ordering carried separately if needed (rdf:List or step index)." . diff --git a/docs/spikes/rdf/shapes/invariants.shapes.ttl b/docs/spikes/rdf/shapes/invariants.shapes.ttl new file mode 100644 index 0000000..ebe845c --- /dev/null +++ b/docs/spikes/rdf/shapes/invariants.shapes.ttl @@ -0,0 +1,156 @@ +# SHACL shapes — structural invariants +# +# These shapes pin invariants the codebase currently enforces with +# procedural checks (or, in some cases, fails to enforce at all and +# silently mis-generates tests). Loading the model through these shapes +# at adapter time turns silent gaps into typed, queryable errors. +# +# Validate with rdf-validate-shacl (npm) — see ../adapters/validate.ts. + +@prefix core: . +@prefix sh: . +@prefix rdf: . +@prefix rdfs: . +@prefix xsd: . + +# +# === Operation must have stable identity ================================== +# + +core:OperationShape a sh:NodeShape ; + sh:targetClass core:Operation ; + sh:property [ + sh:path core:operationId ; + sh:minCount 1 ; sh:maxCount 1 ; + sh:datatype xsd:string ; + sh:message "Every Operation needs exactly one operationId." + ] ; + sh:property [ + sh:path core:method ; + sh:minCount 1 ; sh:maxCount 1 ; + sh:in ( "GET" "POST" "PUT" "PATCH" "DELETE" "HEAD" "OPTIONS" ) + ] ; + sh:property [ + sh:path core:path ; + sh:minCount 1 ; sh:maxCount 1 ; + sh:datatype xsd:string + ] . + +# +# === Every required SemanticType must have at least one producer ========= +# +# Today: graphLoader builds bySemanticProducer but does not error when a +# required type has no producer; the planner just fails to find a chain. +# Here: SHACL turns it into a load-time violation. + +core:RequiredSemanticTypeHasProducer a sh:NodeShape ; + sh:targetSubjectsOf core:requires ; + sh:sparql [ + sh:message "An operation requires a semantic type with no declared producer." ; + sh:select """ + PREFIX core: + SELECT $this ?type WHERE { + $this core:requires ?type . + FILTER NOT EXISTS { ?producer core:produces ?type } + } + """ + ] . + +# +# === Every required RuntimeState must have at least one producer ========= +# + +core:RequiredRuntimeStateHasProducer a sh:NodeShape ; + sh:targetSubjectsOf core:requiresState ; + sh:sparql [ + sh:message "An operation requires a runtime state with no declared producer." ; + sh:select """ + PREFIX core: + SELECT $this ?state WHERE { + $this core:requiresState ?state . + FILTER NOT EXISTS { + { ?producer core:producesState ?state } + UNION + { ?producer core:implicitlyAdds ?state } + } + } + """ + ] . + +# +# === Value bindings must resolve to a known FieldPath ==================== +# +# This is the silent-miss class the brief flags repeatedly: a typo in +# valueBindings is a runtime no-op today. Under SHACL it's a load-time +# error. + +core:ValueBindingResolvesShape a sh:NodeShape ; + sh:targetClass core:ValueBinding ; + sh:property [ + sh:path core:bindsFromFieldPath ; + sh:minCount 1 ; + sh:message "Value binding has no FieldPath. The string-path lookup would silently miss at runtime." + ] ; + sh:property [ + sh:path core:bindsToState ; + sh:minCount 1 + ] ; + sh:property [ + sh:path core:bindsToParameter ; + sh:minCount 1 ; + sh:datatype xsd:string + ] ; + # Stronger: the named parameter must actually be exposed by the target state. + sh:sparql [ + sh:message "Value binding references a parameter not declared by its target RuntimeState." ; + sh:select """ + PREFIX core: + SELECT $this ?param ?state WHERE { + $this core:bindsToState ?state ; + core:bindsToParameter ?param . + FILTER NOT EXISTS { ?state core:hasParameter ?param } + } + """ + ] . + +# +# === Disjunctions must have at least two alternatives ==================== +# + +core:DisjunctionShape a sh:NodeShape ; + sh:targetClass core:Disjunction ; + sh:property [ + sh:path core:hasAlternative ; + sh:minCount 2 ; + sh:message "A disjunction with fewer than two alternatives should be a plain requires." + ] . + +# +# === Identifier must declare its validity state ========================== +# + +core:IdentifierShape a sh:NodeShape ; + sh:targetClass core:Identifier ; + sh:property [ + sh:path core:validityState ; + sh:minCount 1 ; sh:maxCount 1 ; + sh:class core:RuntimeState + ] . + +# +# === FieldPath must locate a SemanticType ================================ +# +# Justification: a FieldPath without a SemanticType is dead weight in +# the model — it can't participate in value-binding resolution. + +core:FieldPathShape a sh:NodeShape ; + sh:targetClass core:FieldPath ; + sh:property [ + sh:path core:fieldPath ; + sh:minCount 1 ; sh:maxCount 1 ; + sh:datatype xsd:string + ] ; + sh:property [ + sh:path core:locatesSemanticType ; + sh:minCount 1 + ] . diff --git a/package-lock.json b/package-lock.json index 84f137a..ea2cff8 100644 --- a/package-lock.json +++ b/package-lock.json @@ -21,7 +21,10 @@ "@types/node": "^24.0.0", "camunda-schema-bundler": "^2.1.0", "js-yaml": "^4.1.0", + "n3": "^2.0.3", + "oxigraph": "^0.5.8", "prettier": "^3.2.5", + "rdf-validate-shacl": "^0.6.5", "rimraf": "^6.0.0", "tsx": "^4.7.0", "typescript": "^5.5.4", @@ -793,6 +796,80 @@ "node": ">=18" } }, + "node_modules/@rdfjs/data-model": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/@rdfjs/data-model/-/data-model-2.1.1.tgz", + "integrity": "sha512-6mcOI4DjIPS6MOZw23H8oAdujHCk5gippVNQ7mKwliYTvTNh+uqRM91B9OLqhoAoNcQ3t49Dx2ooIMRG9/6ooA==", + "dev": true, + "license": "MIT", + "bin": { + "rdfjs-data-model-test": "bin/test.js" + } + }, + "node_modules/@rdfjs/dataset": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/@rdfjs/dataset/-/dataset-2.0.2.tgz", + "integrity": "sha512-6YJx+5n5Uxzq9dd9I0GGcIo6eopZOPfcsAfxSGX5d+YBzDgVa1cbtEBFnaPyPKiQsOm4+Cr3nwypjpg02YKPlA==", + "dev": true, + "license": "MIT", + "bin": { + "rdfjs-dataset-test": "bin/test.js" + } + }, + "node_modules/@rdfjs/environment": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/@rdfjs/environment/-/environment-1.0.0.tgz", + "integrity": "sha512-+S5YjSvfoQR5r7YQCRCCVHvIEyrWia7FJv2gqM3s5EDfotoAQmFeBagApa9c/eQFi5EiNhmBECE5nB8LIxTaHg==", + "dev": true, + "license": "MIT" + }, + "node_modules/@rdfjs/namespace": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/@rdfjs/namespace/-/namespace-2.0.1.tgz", + "integrity": "sha512-U85NWVGnL3gWvOZ4eXwUcv3/bom7PAcutSBQqmVWvOaslPy+kDzAJCH1WYBLpdQd4yMmJ+bpJcDl9rcHtXeixg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/data-model": "^2.0.1" + } + }, + "node_modules/@rdfjs/term-map": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/@rdfjs/term-map/-/term-map-2.0.2.tgz", + "integrity": "sha512-EJ2FmmdEUsSR/tU1nrizRLWzH24YzhuvesrbUWxC3Fs0ilYNdtTbg0RaFJDUnJF3HkbNBQe8Zrt/uvU/hcKnHg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/to-ntriples": "^3.0.1" + } + }, + "node_modules/@rdfjs/term-set": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/@rdfjs/term-set/-/term-set-2.0.3.tgz", + "integrity": "sha512-DyXrKWEx+mtAFUZVU7bc3Va6/KZ8PsIp0RVdyWT9jfDgI/HCvNisZaBtAcm+SYTC45o+7WLkbudkk1bfaKVB0A==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/to-ntriples": "^3.0.1" + } + }, + "node_modules/@rdfjs/to-ntriples": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/@rdfjs/to-ntriples/-/to-ntriples-3.0.1.tgz", + "integrity": "sha512-gjoPAvh4j7AbGMjcDn/8R4cW+d/FPtbfbMM0uQXkyfBFtNUW2iVgrqsgJ65roLc54Y9A2TTFaeeTGSvY9a0HCQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/@rdfjs/types": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/@rdfjs/types/-/types-2.0.1.tgz", + "integrity": "sha512-uyAzpugX7KekAXAHq26m3JlUIZJOC0uSBhpnefGV5i15bevDyyejoB7I+9MKeUrzXD8OOUI3+4FeV1wwQr5ihA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@types/node": "*" + } + }, "node_modules/@rolldown/binding-android-arm64": { "version": "1.0.0-rc.16", "resolved": "https://registry.npmjs.org/@rolldown/binding-android-arm64/-/binding-android-arm64-1.0.0-rc.16.tgz", @@ -1082,6 +1159,20 @@ "dev": true, "license": "MIT" }, + "node_modules/@tpluscode/rdf-ns-builders": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/@tpluscode/rdf-ns-builders/-/rdf-ns-builders-5.0.0.tgz", + "integrity": "sha512-rtMFbArdief+s0z2A3TOb/gNe5O5xn9LDiEpilCf6lGYCUIfyqoOvZY80fS/eILwcF2Mj6cUQN1WBQ+1neJmaw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/data-model": "^2.1.0", + "@rdfjs/namespace": "^2.0.1", + "@rdfjs/types": "^2", + "@types/rdfjs__namespace": "^2.0.10", + "@zazuko/prefixes": "^2.3.0" + } + }, "node_modules/@tybys/wasm-util": { "version": "0.10.1", "resolved": "https://registry.npmjs.org/@tybys/wasm-util/-/wasm-util-0.10.1.tgz", @@ -1142,6 +1233,16 @@ "undici-types": "~7.16.0" } }, + "node_modules/@types/rdfjs__namespace": { + "version": "2.0.10", + "resolved": "https://registry.npmjs.org/@types/rdfjs__namespace/-/rdfjs__namespace-2.0.10.tgz", + "integrity": "sha512-xoVzEIOxcpyteEmzaj94MSBbrBFs+vqv05joMhzLEiPRwsBBDnhkdBCaaDxR1Tf7wOW0kB2R1IYe4C3vEBFPgA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/types": "*" + } + }, "node_modules/@vitest/expect": { "version": "4.1.5", "resolved": "https://registry.npmjs.org/@vitest/expect/-/expect-4.1.5.tgz", @@ -1255,6 +1356,36 @@ "url": "https://opencollective.com/vitest" } }, + "node_modules/@vocabulary/sh": { + "version": "1.1.6", + "resolved": "https://registry.npmjs.org/@vocabulary/sh/-/sh-1.1.6.tgz", + "integrity": "sha512-8IfAQoKh57THz8LA2+n1jaY/VC2XaqMNSsJgzBKSSrj20y5PSMAawb6dMsxoLxqDIPBDs1TFRl/9CijUnwbBUA==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "@rdfjs/types": "^2.0.0" + } + }, + "node_modules/@zazuko/prefixes": { + "version": "2.6.1", + "resolved": "https://registry.npmjs.org/@zazuko/prefixes/-/prefixes-2.6.1.tgz", + "integrity": "sha512-fbOadP7twxt0ZYT9mgIC+xQMk6f3pYYLI5a/2UJ/mc/ygqb/NoVv2ryK3lTtoi74xwkdpUeDwIuFQSosowzUgg==", + "dev": true, + "license": "MIT" + }, + "node_modules/abort-controller": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz", + "integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==", + "dev": true, + "license": "MIT", + "dependencies": { + "event-target-shim": "^5.0.0" + }, + "engines": { + "node": ">=6.5" + } + }, "node_modules/ajv": { "version": "8.18.0", "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.18.0.tgz", @@ -1313,6 +1444,27 @@ "node": "18 || 20 || >=22" } }, + "node_modules/base64-js": { + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", + "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, "node_modules/brace-expansion": { "version": "5.0.5", "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.5.tgz", @@ -1326,6 +1478,31 @@ "node": "18 || 20 || >=22" } }, + "node_modules/buffer": { + "version": "6.0.3", + "resolved": "https://registry.npmjs.org/buffer/-/buffer-6.0.3.tgz", + "integrity": "sha512-FTiCpNxtwiZZHEZbcbTIcZjERVICn9yq/pDFkTl95/AxzD1naBctN7YO68riM/gLSDY7sdrMby8hofADYuuqOA==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "base64-js": "^1.3.1", + "ieee754": "^1.2.1" + } + }, "node_modules/call-me-maybe": { "version": "1.0.2", "resolved": "https://registry.npmjs.org/call-me-maybe/-/call-me-maybe-1.0.2.tgz", @@ -1397,6 +1574,18 @@ "node": ">=18" } }, + "node_modules/clownface": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/clownface/-/clownface-2.0.3.tgz", + "integrity": "sha512-E76TBJ7CgU9+/5paSAvuNdMO+fzFThnvRVtidosktYppYkXM8V7tid8Ezzo8S1OmoWxKUam3yfkZlfCid4OiJQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/data-model": "^2.0.1", + "@rdfjs/environment": "0 - 1", + "@rdfjs/namespace": "^2.0.0" + } + }, "node_modules/convert-source-map": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/convert-source-map/-/convert-source-map-2.0.0.tgz", @@ -1404,6 +1593,24 @@ "dev": true, "license": "MIT" }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "dev": true, + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, "node_modules/detect-libc": { "version": "2.1.2", "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz", @@ -1473,6 +1680,26 @@ "@types/estree": "^1.0.0" } }, + "node_modules/event-target-shim": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz", + "integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/events": { + "version": "3.3.0", + "resolved": "https://registry.npmjs.org/events/-/events-3.3.0.tgz", + "integrity": "sha512-mQw+2fkQbALzQ7V0MY0IqdnXNOeTtP4r0lN9z7AAawCXgqea7bDii20AYrIBrFd/Hx0M2Ocz6S111CaFkUcb0Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.8.x" + } + }, "node_modules/expect-type": { "version": "1.3.0", "resolved": "https://registry.npmjs.org/expect-type/-/expect-type-1.3.0.tgz", @@ -1570,6 +1797,27 @@ "url": "https://github.com/sponsors/isaacs" } }, + "node_modules/ieee754": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/ieee754/-/ieee754-1.2.1.tgz", + "integrity": "sha512-dcyqhDvX1C46lXZcVqCpK+FtMRQVdIMN6/Df5js2zouUsqG7I6sFxitIC+7KYK29KdXOLHdu9zL4sFnoVQnqaA==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "BSD-3-Clause" + }, "node_modules/js-yaml": { "version": "4.1.1", "resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.1.tgz", @@ -1908,6 +2156,27 @@ "node": ">=16 || 14 >=14.17" } }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "dev": true, + "license": "MIT" + }, + "node_modules/n3": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/n3/-/n3-2.0.3.tgz", + "integrity": "sha512-um/toGVENTarHBYIK2TdH6ByBhW75WpdKpv8iTYt9wF2QfBk8s8a16iaWZFUAAC1BKfGdb99kfgx6pltdDwfKA==", + "dev": true, + "license": "MIT", + "dependencies": { + "buffer": "^6.0.3", + "readable-stream": "^4.0.0" + }, + "engines": { + "node": ">=12.0" + } + }, "node_modules/nanoid": { "version": "3.3.11", "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.11.tgz", @@ -1950,6 +2219,13 @@ "resolved": "optional-responses", "link": true }, + "node_modules/oxigraph": { + "version": "0.5.8", + "resolved": "https://registry.npmjs.org/oxigraph/-/oxigraph-0.5.8.tgz", + "integrity": "sha512-ZwAVUf/Vh8OCaIrnmsYie/A/hVPSvrIi+CFWe6YbaAp4llTb6ozPIHYqBvUQgGju0D4gcQAAyXjPV8pGNsPCbQ==", + "dev": true, + "license": "MIT OR Apache-2.0" + }, "node_modules/package-json-from-dist": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/package-json-from-dist/-/package-json-from-dist-1.0.1.tgz", @@ -2080,6 +2356,118 @@ "url": "https://github.com/prettier/prettier?sponsor=1" } }, + "node_modules/process": { + "version": "0.11.10", + "resolved": "https://registry.npmjs.org/process/-/process-0.11.10.tgz", + "integrity": "sha512-cdGef/drWFoydD1JsMzuFf8100nZl+GT+yacc2bEced5f9Rjk4z+WtFUTBu9PhOi9j/jfmBPu0mMEY4wIdAF8A==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 0.6.0" + } + }, + "node_modules/rdf-canonize": { + "version": "3.4.0", + "resolved": "https://registry.npmjs.org/rdf-canonize/-/rdf-canonize-3.4.0.tgz", + "integrity": "sha512-fUeWjrkOO0t1rg7B2fdyDTvngj+9RlUyL92vOdiB7c0FPguWVsniIMjEtHH+meLBO9rzkUlUzBVXgWrjI8P9LA==", + "dev": true, + "license": "BSD-3-Clause", + "dependencies": { + "setimmediate": "^1.0.5" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/rdf-data-factory": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/rdf-data-factory/-/rdf-data-factory-2.0.2.tgz", + "integrity": "sha512-WzPoYHwQYWvIP9k+7IBLY1b4nIDitzAK4mA37WumAF/Cjvu/KOtYJH9IPZnUTWNSd5K2+pq4vrcE9WZC4sRHhg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/types": "^2.0.0" + }, + "funding": { + "type": "individual", + "url": "https://github.com/sponsors/rubensworks/" + } + }, + "node_modules/rdf-dataset-ext": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/rdf-dataset-ext/-/rdf-dataset-ext-1.1.0.tgz", + "integrity": "sha512-CH85RfRKN9aSlbju8T7aM8hgCSWMBsh2eh/tGxUUtWMN+waxi6iFDt8/r4PAEmKaEA82guimZJ4ISbmJ2rvWQg==", + "deprecated": "rdf-dataset-ext is deprecated. Switching to rdf-ext is recommended.", + "dev": true, + "license": "MIT", + "dependencies": { + "rdf-canonize": "^3.0.0", + "readable-stream": "3 - 4" + } + }, + "node_modules/rdf-literal": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/rdf-literal/-/rdf-literal-2.0.0.tgz", + "integrity": "sha512-jlQ+h7EvnXmncmk8OzOYR8T3gNfd4g0LQXbflHkEkancic8dh0Tdt5RiRq8vUFndjIeNHt1RWeA5TAj6rgrtng==", + "dev": true, + "license": "MIT", + "dependencies": { + "rdf-data-factory": "^2.0.0" + }, + "funding": { + "type": "individual", + "url": "https://github.com/sponsors/rubensworks/" + } + }, + "node_modules/rdf-validate-datatype": { + "version": "0.2.2", + "resolved": "https://registry.npmjs.org/rdf-validate-datatype/-/rdf-validate-datatype-0.2.2.tgz", + "integrity": "sha512-mH9qL8i0WBbZ6HJCA26BB6V+WV2MraKvitez3SV0QegBWVQ4wYO49CgfFBzoAYg6tlnhFXl9MkrOAQ07X2N1FA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/term-map": "^2.0.0", + "@tpluscode/rdf-ns-builders": "3 - 5" + } + }, + "node_modules/rdf-validate-shacl": { + "version": "0.6.5", + "resolved": "https://registry.npmjs.org/rdf-validate-shacl/-/rdf-validate-shacl-0.6.5.tgz", + "integrity": "sha512-rwIibopSixDE8ecA9x0c7oTVxdMWxGiJh7h3uJ+WS2h4lq2nx3DZVO7rJvwa5kZpDq9QEFPoyZINAUyfaaoN4Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rdfjs/data-model": "^2.1.0", + "@rdfjs/dataset": "^2.0.2", + "@rdfjs/environment": "^1.0.0", + "@rdfjs/namespace": "^2.0.1", + "@rdfjs/term-set": "^2.0.3", + "@rdfjs/types": "1 - 2", + "@vocabulary/sh": "^1.1.6", + "clownface": "^2.0.3", + "debug": "^4.3.2", + "rdf-dataset-ext": "^1.1.0", + "rdf-literal": "^2.0.0", + "rdf-validate-datatype": "^0.2.2" + } + }, + "node_modules/readable-stream": { + "version": "4.7.0", + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-4.7.0.tgz", + "integrity": "sha512-oIGGmcpTLwPga8Bn6/Z75SVaH1z5dUut2ibSyAMVhmUggWpmDn2dapB0n7f8nwaSiRtepAsfJyfXIO5DCVAODg==", + "dev": true, + "license": "MIT", + "dependencies": { + "abort-controller": "^3.0.0", + "buffer": "^6.0.3", + "events": "^3.3.0", + "process": "^0.11.10", + "string_decoder": "^1.3.0" + }, + "engines": { + "node": "^12.22.0 || ^14.17.0 || >=16.0.0" + } + }, "node_modules/request-validation-generator": { "resolved": "request-validation", "link": true @@ -2158,10 +2546,38 @@ "@rolldown/binding-win32-x64-msvc": "1.0.0-rc.16" } }, + "node_modules/safe-buffer": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.2.1.tgz", + "integrity": "sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, "node_modules/semantic-graph-extractor": { "resolved": "semantic-graph-extractor", "link": true }, + "node_modules/setimmediate": { + "version": "1.0.5", + "resolved": "https://registry.npmjs.org/setimmediate/-/setimmediate-1.0.5.tgz", + "integrity": "sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==", + "dev": true, + "license": "MIT" + }, "node_modules/siginfo": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/siginfo/-/siginfo-2.0.0.tgz", @@ -2193,6 +2609,16 @@ "dev": true, "license": "MIT" }, + "node_modules/string_decoder": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz", + "integrity": "sha512-hkRX8U1WjJFd8LsDJ2yQ/wWWxaopEsABU1XfkM8A+j0+85JAGppt16cr1Whg6KIbb4okU6Mql6BOj+uup/wKeA==", + "dev": true, + "license": "MIT", + "dependencies": { + "safe-buffer": "~5.2.0" + } + }, "node_modules/tinybench": { "version": "2.9.0", "resolved": "https://registry.npmjs.org/tinybench/-/tinybench-2.9.0.tgz", diff --git a/package.json b/package.json index 20cd7bd..7c40dfa 100644 --- a/package.json +++ b/package.json @@ -42,7 +42,10 @@ "@types/node": "^24.0.0", "camunda-schema-bundler": "^2.1.0", "js-yaml": "^4.1.0", + "n3": "^2.0.3", + "oxigraph": "^0.5.8", "prettier": "^3.2.5", + "rdf-validate-shacl": "^0.6.5", "rimraf": "^6.0.0", "tsx": "^4.7.0", "typescript": "^5.5.4", From 34524b55a1ba3a9109153a0bfb6060638ea952ab Mon Sep 17 00:00:00 2001 From: Josh Wulf Date: Wed, 29 Apr 2026 15:20:47 +1200 Subject: [PATCH 2/3] =?UTF-8?q?chore(spike-rdf):=20phase=202+3=20=E2=80=94?= =?UTF-8?q?=20adapters=20+=20index-parity=20checkpoint=20PASS?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 2 (adapters): - docs/spikes/rdf/adapters/build-store.ts materialises an in-memory Oxigraph store from the normalised OperationGraph + DomainSemantics. Emits 3,497 quads from current pipeline state (183 operations, 37 semantic types, 482 canonical field paths, plus runtime states / capabilities / identifiers / artifact kinds / value bindings / disjunctions). Phase 3 (index parity — go/no-go checkpoint): - docs/spikes/rdf/parity/index-parity.ts re-derives the loader's three reverse indexes (bySemanticProducer, domainProducers, providerMap) from SPARQL queries over the store and asserts parity against graphLoader.ts output. Result: PARITY PASS for every well-formed key. Side finding (recorded for RECOMMENDATION.md): - The JobTypeValue identifier in domain-semantics.json has no validityState, which causes the loader to write domainProducers["undefined"] = ["createDeployment"] — a silent data-quality issue. The SHACL IdentifierShape (validityState minCount 1) catches this at load time. Surfaced explicitly in the parity output as "LOADER-ONLY ARTIFACTS THE ONTOLOGY REJECTS". The brief's checkpoint is satisfied: structural equivalence at planner-output level falls out for free without modifying any planner algorithm code. Refs #60 --- docs/spikes/rdf/adapters/build-store.ts | 373 ++++++++++++++++++++++++ docs/spikes/rdf/parity/index-parity.ts | 228 +++++++++++++++ 2 files changed, 601 insertions(+) create mode 100644 docs/spikes/rdf/adapters/build-store.ts create mode 100644 docs/spikes/rdf/parity/index-parity.ts diff --git a/docs/spikes/rdf/adapters/build-store.ts b/docs/spikes/rdf/adapters/build-store.ts new file mode 100644 index 0000000..d7ef725 --- /dev/null +++ b/docs/spikes/rdf/adapters/build-store.ts @@ -0,0 +1,373 @@ +// Spike adapter: turn the normalised OperationGraph + DomainSemantics +// into RDF triples loaded into an in-process Oxigraph store. +// +// Consumes the *output* of path-analyser/src/graphLoader.ts rather than +// re-parsing the raw JSON. This is a deliberate spike shortcut: +// +// - The loader's tolerance for 6+ key permutations is documented and +// tested; reproducing that parsing isn't novel research. +// - The spike claim is "the ontology faithfully captures the model", +// not "we have ported the loader". A production move would either +// port the loader to emit triples directly, or keep it as a +// pre-normalisation step. +// +// Spike: docs/spikes/rdf/README.md / issue #60. + +// biome-ignore lint/correctness/noNodejsModules: spike-only; not shipped to consumers. +import path from 'node:path'; +// biome-ignore lint/style/useImportType: oxigraph exports values + types together. +import oxigraph from 'oxigraph'; +import { loadGraph } from '../../../../path-analyser/src/graphLoader.js'; +import type { OperationGraph } from '../../../../path-analyser/src/types.js'; + +export const NS = { + core: 'https://camunda.io/api-test-generator/core#', + camunda: 'https://camunda.io/api-test-generator/camunda#', + op: 'https://camunda.io/api-test-generator/operation#', + type: 'https://camunda.io/api-test-generator/type#', + state: 'https://camunda.io/api-test-generator/state#', + cap: 'https://camunda.io/api-test-generator/capability#', + field: 'https://camunda.io/api-test-generator/field#', + binding: 'https://camunda.io/api-test-generator/binding#', + artifact: 'https://camunda.io/api-test-generator/artifact#', + ident: 'https://camunda.io/api-test-generator/identifier#', + rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', + xsd: 'http://www.w3.org/2001/XMLSchema#', +} as const; + +const RDF_TYPE = `${NS.rdf}type`; + +// IRIs for runtime states are built from the state name. domain-semantics +// has runtimeStates and capabilities under separate keys but they live in +// the same conceptual space (a Capability is a subclass of RuntimeState +// in core.ttl), so we mint the same IRI shape for both. +const stateIri = (name: string): string => `${NS.state}${encodeURIComponent(name)}`; +const opIri = (id: string): string => `${NS.op}${encodeURIComponent(id)}`; +const typeIri = (name: string): string => `${NS.type}${encodeURIComponent(name)}`; +const fieldIri = (opId: string, p: string): string => + `${NS.field}${encodeURIComponent(opId)}/${encodeURIComponent(p)}`; +const bindingIri = (opId: string, key: string): string => + `${NS.binding}${encodeURIComponent(opId)}/${encodeURIComponent(key)}`; +const identIri = (name: string): string => `${NS.ident}${encodeURIComponent(name)}`; +const artifactIri = (kind: string): string => `${NS.artifact}${encodeURIComponent(kind)}`; + +interface AdapterStats { + operations: number; + semanticTypes: number; + produces: number; + requires: number; + authoritativeProducers: number; + runtimeStates: number; + capabilities: number; + producesState: number; + requiresState: number; + implicitlyAdds: number; + disjunctions: number; + valueBindings: number; + identifiers: number; + artifactKinds: number; + fieldPaths: number; +} + +export interface AdapterResult { + store: oxigraph.Store; + stats: AdapterStats; +} + +/** Materialise the unified graph from the current pipeline state. */ +export async function buildStore(baseDir: string): Promise { + const graph = await loadGraph(baseDir); + const store = new oxigraph.Store(); + const stats: AdapterStats = { + operations: 0, + semanticTypes: 0, + produces: 0, + requires: 0, + authoritativeProducers: 0, + runtimeStates: 0, + capabilities: 0, + producesState: 0, + requiresState: 0, + implicitlyAdds: 0, + disjunctions: 0, + valueBindings: 0, + identifiers: 0, + artifactKinds: 0, + fieldPaths: 0, + }; + + emitOperations(store, graph, stats); + emitDomain(store, graph, stats); + return { store, stats }; +} + +// --------------------------------------------------------------------------- +// Operations + semantic-type production / consumption + canonical field paths +// --------------------------------------------------------------------------- + +function emitOperations(store: oxigraph.Store, graph: OperationGraph, stats: AdapterStats): void { + const semanticTypesSeen = new Set(); + + for (const op of Object.values(graph.operations)) { + const subj = oxigraph.namedNode(opIri(op.operationId)); + addType(store, subj, `${NS.core}Operation`); + addLit(store, subj, `${NS.core}operationId`, op.operationId); + addLit(store, subj, `${NS.core}method`, op.method); + if (op.path) addLit(store, subj, `${NS.core}path`, op.path); + if (op.eventuallyConsistent === true) { + addBool(store, subj, `${NS.core}eventuallyConsistent`, true); + } + stats.operations++; + + for (const t of op.produces) { + const tNode = oxigraph.namedNode(typeIri(t)); + ensureSemanticType(store, tNode, t, semanticTypesSeen, stats); + addRel(store, subj, `${NS.core}produces`, tNode); + stats.produces++; + if (op.providerMap?.[t]) { + addRel(store, subj, `${NS.core}authoritativeProducer`, tNode); + stats.authoritativeProducers++; + } + } + for (const t of op.requires.required) { + const tNode = oxigraph.namedNode(typeIri(t)); + ensureSemanticType(store, tNode, t, semanticTypesSeen, stats); + addRel(store, subj, `${NS.core}requires`, tNode); + stats.requires++; + } + for (const t of op.requires.optional) { + const tNode = oxigraph.namedNode(typeIri(t)); + ensureSemanticType(store, tNode, t, semanticTypesSeen, stats); + addRel(store, subj, `${NS.core}requiresOptional`, tNode); + } + + // Canonical response field paths: each entry under + // op.responseSemanticTypes[status][i] becomes a FieldPath node located + // on this Operation that locates the named SemanticType. + if (op.responseSemanticTypes) { + for (const entries of Object.values(op.responseSemanticTypes)) { + for (const e of entries) { + const fp = oxigraph.namedNode(fieldIri(op.operationId, e.fieldPath)); + addType(store, fp, `${NS.core}FieldPath`); + addLit(store, fp, `${NS.core}fieldPath`, e.fieldPath); + addRel(store, fp, `${NS.core}onResponseOf`, subj); + const tNode = oxigraph.namedNode(typeIri(e.semanticType)); + ensureSemanticType(store, tNode, e.semanticType, semanticTypesSeen, stats); + addRel(store, fp, `${NS.core}locatesSemanticType`, tNode); + if (e.required) addBool(store, fp, `${NS.core}isRequiredField`, true); + stats.fieldPaths++; + } + } + } + } +} + +// --------------------------------------------------------------------------- +// Domain semantics: runtime states, capabilities, identifiers, value bindings, +// artifact kinds, disjunctions. +// --------------------------------------------------------------------------- + +function emitDomain(store: oxigraph.Store, graph: OperationGraph, stats: AdapterStats): void { + const dom = graph.domain; + if (!dom) return; + + // Runtime states --------------------------------------------------------- + for (const [name, spec] of Object.entries(dom.runtimeStates ?? {})) { + const subj = oxigraph.namedNode(stateIri(name)); + addType(store, subj, `${NS.core}RuntimeState`); + if (spec.parameter) addLit(store, subj, `${NS.core}hasParameter`, spec.parameter); + if (spec.requires) { + for (const r of spec.requires) addRel(store, subj, `${NS.core}dependsOn`, oxigraph.namedNode(stateIri(r))); + } + if (spec.producedBy) { + for (const opId of spec.producedBy) { + addRel(store, oxigraph.namedNode(opIri(opId)), `${NS.core}producesState`, subj); + stats.producesState++; + } + } + stats.runtimeStates++; + } + + // Capabilities (subclass of RuntimeState in core.ttl) ------------------- + for (const [name, spec] of Object.entries(dom.capabilities ?? {})) { + const subj = oxigraph.namedNode(stateIri(name)); + addType(store, subj, `${NS.core}Capability`); + addType(store, subj, `${NS.core}RuntimeState`); // explicit; oxigraph doesn't reason subclass + if (spec.parameter) addLit(store, subj, `${NS.core}hasParameter`, spec.parameter); + if (spec.dependsOn) { + for (const d of spec.dependsOn) addRel(store, subj, `${NS.core}dependsOn`, oxigraph.namedNode(stateIri(d))); + } + if (spec.producedBy) { + for (const opId of spec.producedBy) { + addRel(store, oxigraph.namedNode(opIri(opId)), `${NS.core}producesState`, subj); + stats.producesState++; + } + } + stats.capabilities++; + } + + // Identifiers ------------------------------------------------------------ + for (const [name, spec] of Object.entries(dom.identifiers ?? {})) { + const subj = oxigraph.namedNode(identIri(name)); + addType(store, subj, `${NS.core}Identifier`); + addType(store, subj, `${NS.core}SemanticType`); + if (spec.validityState) { + addRel(store, subj, `${NS.core}validityState`, oxigraph.namedNode(stateIri(spec.validityState))); + } + stats.identifiers++; + } + + // Artifact kinds -------------------------------------------------------- + for (const [name, spec] of Object.entries(dom.artifactKinds ?? {})) { + const subj = oxigraph.namedNode(artifactIri(name)); + addType(store, subj, `${NS.core}ArtifactKind`); + if (spec.producesStates) { + for (const s of spec.producesStates) + addRel(store, subj, `${NS.core}producesStateViaArtifact`, oxigraph.namedNode(stateIri(s))); + } + if (spec.producesSemantics) { + for (const t of spec.producesSemantics) + addRel(store, subj, `${NS.core}producesSemanticViaArtifact`, oxigraph.namedNode(typeIri(t))); + } + stats.artifactKinds++; + } + + // Operation requirements: requires, disjunctions, implicitAdds, valueBindings. + for (const [opId, req] of Object.entries(dom.operationRequirements ?? {})) { + const opNode = oxigraph.namedNode(opIri(opId)); + if (req.requires) { + for (const s of req.requires) { + addRel(store, opNode, `${NS.core}requiresState`, oxigraph.namedNode(stateIri(s))); + stats.requiresState++; + } + } + if (req.implicitAdds) { + for (const s of req.implicitAdds) { + addRel(store, opNode, `${NS.core}implicitlyAdds`, oxigraph.namedNode(stateIri(s))); + stats.implicitlyAdds++; + } + } + if (req.produces) { + for (const s of req.produces) { + addRel(store, opNode, `${NS.core}producesState`, oxigraph.namedNode(stateIri(s))); + stats.producesState++; + } + } + if (req.disjunctions) { + for (let i = 0; i < req.disjunctions.length; i++) { + const d = req.disjunctions[i]; + if (!d) continue; + const dNode = oxigraph.blankNode(`${opId}_disj_${i}`); + addType(store, dNode, `${NS.core}Disjunction`); + addRel(store, opNode, `${NS.core}disjunctionOf`, dNode); + for (const s of d) addRel(store, dNode, `${NS.core}hasAlternative`, oxigraph.namedNode(stateIri(s))); + stats.disjunctions++; + } + } + if (req.valueBindings) { + for (const [key, value] of Object.entries(req.valueBindings)) { + const vb = oxigraph.namedNode(bindingIri(opId, key)); + addType(store, vb, `${NS.core}ValueBinding`); + addRel(store, opNode, `${NS.core}bindingOf`, vb); + // key looks like "request.processDefinitionId" or + // "response.deployments[].processDefinition.processDefinitionId" + const direction = key.startsWith('request.') + ? 'request' + : key.startsWith('response.') + ? 'response' + : 'unknown'; + addLit(store, vb, `${NS.core}bindingDirection`, direction); + const fpStr = key.replace(/^request\.|^response\./, ''); + // Best-effort link to a known FieldPath node. May resolve to nothing + // (this is exactly the silent-miss class SHACL surfaces — see + // queries/value-binding-drift.ts). + addRel(store, vb, `${NS.core}bindsFromFieldPath`, oxigraph.namedNode(fieldIri(opId, fpStr))); + // value looks like "ProcessDefinitionDeployed.processDefinitionId" + const dot = value.indexOf('.'); + if (dot >= 0) { + const stateName = value.slice(0, dot); + const param = value.slice(dot + 1); + addRel(store, vb, `${NS.core}bindsToState`, oxigraph.namedNode(stateIri(stateName))); + addLit(store, vb, `${NS.core}bindsToParameter`, param); + } + stats.valueBindings++; + } + } + } + + // semanticTypeToArtifactKind: emits producesSemanticViaArtifact in + // reverse direction so SPARQL doesn't need a join. + for (const [type, kind] of Object.entries(dom.semanticTypeToArtifactKind ?? {})) { + addRel( + store, + oxigraph.namedNode(artifactIri(kind)), + `${NS.core}producesSemanticViaArtifact`, + oxigraph.namedNode(typeIri(type)), + ); + } +} + +// --------------------------------------------------------------------------- +// Triple helpers +// --------------------------------------------------------------------------- + +function ensureSemanticType( + store: oxigraph.Store, + node: oxigraph.NamedNode, + name: string, + seen: Set, + stats: AdapterStats, +): void { + if (seen.has(name)) return; + seen.add(name); + addType(store, node, `${NS.core}SemanticType`); + addLit(store, node, `${NS.rdf}label`, name); + stats.semanticTypes++; +} + +function addType(store: oxigraph.Store, subj: oxigraph.NamedNode | oxigraph.BlankNode, cls: string): void { + store.add(oxigraph.triple(subj, oxigraph.namedNode(RDF_TYPE), oxigraph.namedNode(cls))); +} +function addRel( + store: oxigraph.Store, + subj: oxigraph.NamedNode | oxigraph.BlankNode, + pred: string, + obj: oxigraph.NamedNode | oxigraph.BlankNode, +): void { + store.add(oxigraph.triple(subj, oxigraph.namedNode(pred), obj)); +} +function addLit( + store: oxigraph.Store, + subj: oxigraph.NamedNode | oxigraph.BlankNode, + pred: string, + v: string, +): void { + store.add(oxigraph.triple(subj, oxigraph.namedNode(pred), oxigraph.literal(v))); +} +function addBool( + store: oxigraph.Store, + subj: oxigraph.NamedNode | oxigraph.BlankNode, + pred: string, + v: boolean, +): void { + store.add( + oxigraph.triple( + subj, + oxigraph.namedNode(pred), + oxigraph.literal(String(v), oxigraph.namedNode(`${NS.xsd}boolean`)), + ), + ); +} + +// --------------------------------------------------------------------------- +// CLI: print stats so a human can sanity-check the materialisation. +// --------------------------------------------------------------------------- + +if (import.meta.url === `file://${process.argv[1]}`) { + const baseDir = path.resolve(import.meta.dirname, '../../../../path-analyser'); + buildStore(baseDir).then((res) => { + console.log('Materialised triples from current pipeline state.'); + console.log('Stats:', JSON.stringify(res.stats, null, 2)); + console.log(`Total quads in store: ${res.store.size}`); + }); +} diff --git a/docs/spikes/rdf/parity/index-parity.ts b/docs/spikes/rdf/parity/index-parity.ts new file mode 100644 index 0000000..408f168 --- /dev/null +++ b/docs/spikes/rdf/parity/index-parity.ts @@ -0,0 +1,228 @@ +// Phase-3 checkpoint: re-materialise the loader's reverse indexes from +// SPARQL queries against the spike's triple store, and assert byte-level +// parity with graphLoader's output. +// +// This is the disqualifying-friction check the brief calls out: +// +// "Don't proceed past this checkpoint without parity." +// +// If parity passes, structural equivalence at planner-output level +// follows for free without modifying any algorithm code. If parity +// cannot pass, the spike has surfaced a modelling gap before we touch +// the planner. +// +// Spike: docs/spikes/rdf/README.md / issue #60. + +// biome-ignore lint/correctness/noNodejsModules: spike-only. +import path from 'node:path'; +// biome-ignore lint/correctness/noNodejsModules: spike-only. +import { exit } from 'node:process'; +import oxigraph from 'oxigraph'; +import { loadGraph } from '../../../../path-analyser/src/graphLoader.js'; +import { buildStore } from '../adapters/build-store.js'; + +interface Indexes { + bySemanticProducer: Record; + domainProducers: Record; + providerMap: Record>; // opId -> { type: true } +} + +/** Pull the indexes from SPARQL queries over the materialised store. */ +function indexesFromStore(store: oxigraph.Store): Indexes { + const bySemanticProducer: Record = {}; + for (const b of store.query(` + PREFIX core: + SELECT ?opId ?typeLabel WHERE { + ?op core:operationId ?opId ; + core:produces ?t . + ?t ?typeLabel . + } + `) as Iterable) { + const opId = b.get('opId')?.value; + const type = b.get('typeLabel')?.value; + if (!opId || !type) continue; + (bySemanticProducer[type] ||= []).push(opId); + } + + const domainProducers: Record = {}; + for (const b of store.query(` + PREFIX core: + SELECT ?opId ?stateUri WHERE { + ?op core:operationId ?opId ; + core:producesState ?stateUri . + } + `) as Iterable) { + const opId = b.get('opId')?.value; + const stateUri = b.get('stateUri')?.value; + if (!opId || !stateUri) continue; + const stateName = decodeURIComponent(stateUri.split('#').pop() ?? ''); + (domainProducers[stateName] ||= []).push(opId); + } + + const providerMap: Record> = {}; + for (const b of store.query(` + PREFIX core: + SELECT ?opId ?typeLabel WHERE { + ?op core:operationId ?opId ; + core:authoritativeProducer ?t . + ?t ?typeLabel . + } + `) as Iterable) { + const opId = b.get('opId')?.value; + const type = b.get('typeLabel')?.value; + if (!opId || !type) continue; + (providerMap[opId] ||= {})[type] = true; + } + + // Sort for stable comparison. + for (const k of Object.keys(bySemanticProducer)) bySemanticProducer[k]?.sort(); + for (const k of Object.keys(domainProducers)) domainProducers[k]?.sort(); + return { bySemanticProducer, domainProducers, providerMap }; +} + +/** Pull the same indexes from the live loader's output. */ +function indexesFromLoader(graph: Awaited>): Indexes { + const bySemanticProducer: Record = {}; + for (const [k, v] of Object.entries(graph.bySemanticProducer)) { + bySemanticProducer[k] = [...v].sort(); + } + const domainProducers: Record = {}; + for (const [k, v] of Object.entries(graph.domainProducers ?? {})) { + domainProducers[k] = [...v].sort(); + } + const providerMap: Record> = {}; + for (const op of Object.values(graph.operations)) { + if (!op.providerMap) continue; + const entries: Record = {}; + for (const [t, isAuth] of Object.entries(op.providerMap)) { + if (isAuth) entries[t] = true; + } + if (Object.keys(entries).length > 0) providerMap[op.operationId] = entries; + } + return { bySemanticProducer, domainProducers, providerMap }; +} + +interface Diff { + index: string; + key: string; + loaderOnly?: string[]; + storeOnly?: string[]; + loaderValue?: unknown; + storeValue?: unknown; +} + +function diff(a: Indexes, b: Indexes): Diff[] { + const diffs: Diff[] = []; + + // bySemanticProducer + const allTypes = new Set([ + ...Object.keys(a.bySemanticProducer), + ...Object.keys(b.bySemanticProducer), + ]); + for (const t of allTypes) { + const aOps = new Set(a.bySemanticProducer[t] ?? []); + const bOps = new Set(b.bySemanticProducer[t] ?? []); + const onlyA = [...aOps].filter((x) => !bOps.has(x)); + const onlyB = [...bOps].filter((x) => !aOps.has(x)); + if (onlyA.length || onlyB.length) { + diffs.push({ index: 'bySemanticProducer', key: t, loaderOnly: onlyA, storeOnly: onlyB }); + } + } + + // domainProducers + const allStates = new Set([ + ...Object.keys(a.domainProducers), + ...Object.keys(b.domainProducers), + ]); + for (const s of allStates) { + const aOps = new Set(a.domainProducers[s] ?? []); + const bOps = new Set(b.domainProducers[s] ?? []); + const onlyA = [...aOps].filter((x) => !bOps.has(x)); + const onlyB = [...bOps].filter((x) => !aOps.has(x)); + if (onlyA.length || onlyB.length) { + diffs.push({ index: 'domainProducers', key: s, loaderOnly: onlyA, storeOnly: onlyB }); + } + } + + // providerMap (authoritative-only view) + const allOps = new Set([ + ...Object.keys(a.providerMap), + ...Object.keys(b.providerMap), + ]); + for (const opId of allOps) { + const aTypes = new Set(Object.keys(a.providerMap[opId] ?? {})); + const bTypes = new Set(Object.keys(b.providerMap[opId] ?? {})); + const onlyA = [...aTypes].filter((x) => !bTypes.has(x)); + const onlyB = [...bTypes].filter((x) => !aTypes.has(x)); + if (onlyA.length || onlyB.length) { + diffs.push({ index: 'providerMap', key: opId, loaderOnly: onlyA, storeOnly: onlyB }); + } + } + + return diffs; +} + +async function main(): Promise { + const baseDir = path.resolve(import.meta.dirname, '../../../../path-analyser'); + const loaderGraph = await loadGraph(baseDir); + const { store, stats } = await buildStore(baseDir); + + const fromLoader = indexesFromLoader(loaderGraph); + const fromStore = indexesFromStore(store); + + console.log('--- adapter materialisation stats ---'); + console.log(JSON.stringify(stats, null, 2)); + + const diffs = diff(fromLoader, fromStore); + console.log('\n--- index parity ---'); + console.log(`bySemanticProducer keys (loader=${Object.keys(fromLoader.bySemanticProducer).length}, store=${Object.keys(fromStore.bySemanticProducer).length})`); + console.log(`domainProducers keys (loader=${Object.keys(fromLoader.domainProducers).length}, store=${Object.keys(fromStore.domainProducers).length})`); + console.log(`providerMap ops (loader=${Object.keys(fromLoader.providerMap).length}, store=${Object.keys(fromStore.providerMap).length})`); + + // Some divergences are loader artifacts the ontology rejects (e.g. an + // identifier with no validityState ends up writing to + // domainProducers["undefined"] in the loader; the SHACL IdentifierShape + // catches this at load time). Surface these separately so the parity + // checkpoint distinguishes "ontology can't represent X" from + // "ontology rejects an existing data-quality issue". + const ontologyRejects: Diff[] = []; + const real: Diff[] = []; + for (const d of diffs) { + if (d.key === 'undefined' || d.key === '' || d.key === 'null') ontologyRejects.push(d); + else real.push(d); + } + + if (ontologyRejects.length > 0) { + console.log(`\nLOADER-ONLY ARTIFACTS THE ONTOLOGY REJECTS (${ontologyRejects.length}):`); + for (const d of ontologyRejects) { + const parts: string[] = []; + if (d.loaderOnly?.length) parts.push(`loader-only=[${d.loaderOnly.join(', ')}]`); + if (d.storeOnly?.length) parts.push(`store-only=[${d.storeOnly.join(', ')}]`); + console.log(` [${d.index}] key=<${d.key}>: ${parts.join(' | ')}`); + } + console.log(' -> See SHACL shapes in ../shapes/invariants.shapes.ttl.'); + console.log(' -> These would surface as load-time validation errors under the model.'); + } + + if (real.length === 0) { + console.log('\nPARITY: PASS — store-derived indexes match loader output for every well-formed key.'); + return 0; + } + + console.log(`\nPARITY: FAIL — ${real.length} unexplained divergence(s):`); + for (const d of real.slice(0, 50)) { + const parts: string[] = []; + if (d.loaderOnly?.length) parts.push(`loader-only=[${d.loaderOnly.join(', ')}]`); + if (d.storeOnly?.length) parts.push(`store-only=[${d.storeOnly.join(', ')}]`); + console.log(` [${d.index}] ${d.key}: ${parts.join(' | ')}`); + } + if (real.length > 50) console.log(` ... and ${real.length - 50} more`); + return 1; +} + +main() + .then((code) => exit(code)) + .catch((err) => { + console.error(err); + exit(2); + }); From de821ff534132bb1799b4f0e45a73af9caa6ed91 Mon Sep 17 00:00:00 2001 From: Josh Wulf Date: Wed, 29 Apr 2026 15:27:28 +1200 Subject: [PATCH 3/3] =?UTF-8?q?chore(spike-rdf):=20phase=204+5=20=E2=80=94?= =?UTF-8?q?=20declarative=20re-expressions,=20second-API=20sketch,=20RECOM?= =?UTF-8?q?MENDATION?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 4a (value-binding drift detector): - docs/spikes/rdf/queries/value-binding-drift.ts expresses value-binding resolution as a SPARQL query. Surfaces FOUR latent domain-semantics defects that are silent runtime no-ops today: 1. createDeployment -> FormDeployed.formKey: state does not exist. 2. createDeployment -> ProcessDefinitionKey.processDefinitionKey: ProcessDefinitionKey is a semantic type, not a state. 3. createProcessInstance -> ProcessInstanceExists.processInstanceKey: state declares parameter=processDefinitionId, not processInstanceKey. (Single-parameter modelling gap.) 4. createProcessInstance -> ProcessDefinitionKey.processDefinitionKey: same type-confusion as #2. Phase 4b (minimal scenario-chain candidate query): - docs/spikes/rdf/queries/minimal-scenario-chain.ts replaces gatherDomainPrerequisites() (the single hand-rolled multi-hop traversal in scenarioGenerator.ts ~L1254-L1273) with a core:dependsOn+ SPARQL property path. Also demonstrates required- producer candidate selection (replaces bySemanticProducer + providerMap reads in the planner) and a coverage-ranked picker. Phase 5 (second-API paper sketch): - docs/spikes/rdf/second-api-sketch.md sketches a github: vocabulary for GitHub Issues + Pull Requests. The core ontology accommodates it without invasive changes — two SHACL relaxations and one optional new property (core:invalidates), all genuinely API-agnostic. Per-API vocabulary introduces zero new properties: the abstraction line (per-API adapters; API-agnostic core) holds. RECOMMENDATION: - docs/spikes/rdf/RECOMMENDATION.md: ADOPT THE MODELLING; DEFER RDF. The named entities are the right ones whether or not the carrier is RDF; reify them as TS types now. RDF adoption is a separable, lower-priority decision that becomes worthwhile when multi-API generalisation moves from aspirational to concrete. Spike artifacts (adapters, parity test, queries) are reusable as the migration starting point if/when that happens. Includes a 7-step concrete follow-up plan sized for normal PRs. Pre-push checks: - npm run lint: clean - tsc --noEmit per workspace: clean - npm run testsuite:generate + generate:request-validation: clean - npm test: 89 passed (15 files) Closes #60 --- docs/spikes/rdf/RECOMMENDATION.md | 215 ++++++++++++++++++ .../rdf/queries/minimal-scenario-chain.ts | 167 ++++++++++++++ .../spikes/rdf/queries/value-binding-drift.ts | 114 ++++++++++ docs/spikes/rdf/second-api-sketch.md | 167 ++++++++++++++ 4 files changed, 663 insertions(+) create mode 100644 docs/spikes/rdf/RECOMMENDATION.md create mode 100644 docs/spikes/rdf/queries/minimal-scenario-chain.ts create mode 100644 docs/spikes/rdf/queries/value-binding-drift.ts create mode 100644 docs/spikes/rdf/second-api-sketch.md diff --git a/docs/spikes/rdf/RECOMMENDATION.md b/docs/spikes/rdf/RECOMMENDATION.md new file mode 100644 index 0000000..6e56b4f --- /dev/null +++ b/docs/spikes/rdf/RECOMMENDATION.md @@ -0,0 +1,215 @@ +# Recommendation — RDF / SPARQL spike + +> Issue [#60](https://github.com/camunda/api-test-generator/issues/60). +> The brief offers three outcomes: **adopt RDF/SPARQL**, **adopt the +> modelling but reject RDF**, or **reject**. + +## Recommendation: **Adopt the modelling. Defer RDF.** + +The named entities and relations in +[`ontology/core.ttl`](./ontology/core.ttl) are the right abstractions +for this codebase whether or not the carrier is RDF. They should be +reified as first-class TS types in the production code now. The +question of whether to load them into a triple store and query them +with SPARQL is a separate, lower-stakes decision that can be deferred +until the multi-API generalisation is concretely on the roadmap. + +This is "Option 2" in the brief's framing, with the explicit caveat +that the spike's findings are strong enough that Option 1 (full +adoption) becomes a low-risk follow-up, not a parallel track. + +## What the spike actually found + +### 1. Index parity passes — the modelling is faithful + +[`parity/index-parity.ts`](./parity/index-parity.ts) re-derives all +three of the loader's reverse indexes (`bySemanticProducer`, +`domainProducers`, `providerMap`) from SPARQL queries and matches the +loader output for every well-formed key: + +``` +bySemanticProducer keys (loader=34, store=34) +domainProducers keys (loader=5, store=4) ← see finding #4 +providerMap ops (loader=61, store=61) +PARITY: PASS +``` + +The brief's Phase-3 checkpoint is satisfied. There is no disqualifying +friction at the data-layer boundary. The model is faithful enough that +any planner code reading these indexes today would behave identically +against the SPARQL-derived ones. + +### 2. The named entities are the right ones, independent of RDF + +The honest test the brief specifies — *"can the planner be written +referring only to terms in `core:`?"* — passes. Tracing every call +site that consumes the data layer +([second-api-sketch.md §"Honest test for the abstraction"](./second-api-sketch.md)): + +- `bySemanticProducer[type]` → `core:produces`, + `core:authoritativeProducer`, `core:operationId` +- `domainProducers[state]` → `core:producesState`, + `core:operationId` +- `gatherDomainPrerequisites(seeds)` → `core:dependsOn+` +- value-binding resolution → `core:ValueBinding`, + `core:bindsFromFieldPath`, `core:bindsToState`, + `core:bindsToParameter`, `core:hasParameter` + +None of these require Camunda-specific terms. The +[GitHub Issues + PRs sketch](./second-api-sketch.md) maps cleanly onto +the same core vocabulary without invasive changes (two SHACL +relaxations and one optional property addition for state invalidation +— all genuinely API-agnostic, none GitHub-specific). + +This is the finding to act on first. The TS production code today +treats `bySemanticProducer`, `domainProducers`, and `providerMap` as +distinct hand-built records. Reifying them as queries over a single +typed `OperationGraph` (with named accessors corresponding to +`core:produces`, `core:producesState`, etc.) collapses the same +duplication that the brief identifies, **using TS** as the carrier. +The win is the abstraction, not RDF. + +### 3. Declarative re-expressions surface latent silent-miss defects + +[`queries/value-binding-drift.ts`](./queries/value-binding-drift.ts) +expresses the value-binding resolution as a SPARQL query. Running it +against the current pipeline state surfaces **four real +domain-semantics defects** that are silent today: + +1. `createDeployment.response.deployments[].form.formKey` → + `FormDeployed.formKey` — `FormDeployed` does not exist as a runtime + state in `domain-semantics.json`. +2. `createDeployment.response.deployments[].processDefinition.processDefinitionKey` + → `ProcessDefinitionKey.processDefinitionKey` — + `ProcessDefinitionKey` is a semantic type, not a runtime state. + Type-confusion in the binding RHS. +3. `createProcessInstance.response.processInstanceKey` → + `ProcessInstanceExists.processInstanceKey` — `ProcessInstanceExists` + declares `parameter: processDefinitionId`, not `processInstanceKey`. + The state schema needs multi-parameter support, OR the binding is + wrong. +4. `createProcessInstance.request.processDefinitionKey` → + `ProcessDefinitionKey.processDefinitionKey` — same + type-confusion as #2. + +Each finding is also reproducible by running the parity checkpoint: +the loader silently writes `domainProducers["undefined"] = ["createDeployment"]` +because the `JobTypeValue` identifier in `domain-semantics.json` has +no `validityState`. The SHACL `IdentifierShape` +(`validityState minCount 1`) catches this at load time. The parity +script reports it under "LOADER-ONLY ARTIFACTS THE ONTOLOGY REJECTS". + +These findings stand on the modelling alone. They do not require +running SPARQL in production — a TS-native rewrite of the loader that +emits the same shape would catch all five. + +### 4. SPARQL property paths cleanly replace one hand-rolled traversal + +[`queries/minimal-scenario-chain.ts`](./queries/minimal-scenario-chain.ts) +replaces `gatherDomainPrerequisites()` (the only multi-hop traversal in +the codebase, ~20 lines of hand-rolled DFS in +`scenarioGenerator.ts:1254`) with a single `core:dependsOn+` SPARQL +property path. This is the strongest single argument for the SPARQL +half of the proposal — but it's a one-call-site benefit. Every other +candidate-selection query the planner needs is satisfied by simple +joins that a typed TS index gives equally well. + +## Why "modelling yes, RDF defer" + +The brief is explicit that **multi-API generalisation is the +load-bearing argument for RDF specifically**: + +> "RDF's namespacing and open-world composition are load-bearing for +> [the multi-API generalisation use case], not incidental." + +That is true. If the multi-API roadmap firms up, RDF's URI namespacing +and graph-union semantics are genuine wins over a hand-rolled TS rule +DSL. But the multi-API target is still aspirational; the planner is +not yet shaped for a second API. + +By contrast, the modelling findings (sections 2 and 3 above) are +valuable **today**, against the single Camunda API: + +- Reifying `core:Operation`, `core:SemanticType`, `core:RuntimeState`, + `core:ValueBinding`, `core:FieldPath` as TS types collapses the + duplicated "what does this operation produce?" code paths between + `graphLoader.ts`, `scenarioGenerator.ts`, and `index.ts` into one + source of truth. +- Reifying `core:ValueBinding` with a typed `bindsFromFieldPath` and a + multi-parameter `bindsToState`/`bindsToParameter` pair (validated + against the canonical response shape at load time) eliminates the + silent-miss class entirely. The four findings above become four + load-time errors today, in TS, without a triple store. +- Replacing `gatherDomainPrerequisites` with a typed `dependsOn` + closure helper is a 5-line refactor. + +The cost of the TS-native modelling is one short refactor PR. The cost +of full RDF adoption is a runtime dependency on `oxigraph` (a WASM +binding), a build-time dependency on `rdf-validate-shacl`, an authoring +shift from JSON sidecars to Turtle, and the team carrying a second +query language alongside TypeScript — for a benefit that is currently +hypothetical (the second API). + +The right move is to land the modelling now, monitor whether the +multi-API roadmap progresses, and revisit RDF once a concrete second +API is in flight. At that point the spike's adapters and queries +([`adapters/build-store.ts`](./adapters/build-store.ts), +[`parity/index-parity.ts`](./parity/index-parity.ts), the two query +files) become the starting point for the migration: every artifact +in this directory is reusable. + +## Concrete follow-up plan (if the recommendation is accepted) + +These are sized for normal PRs, not a spike rewrite. + +1. **Reify `core:` entities as TS types.** Lift `Operation`, + `SemanticType`, `RuntimeState`, `Capability`, `ValueBinding`, + `FieldPath`, `Disjunction`, `Identifier`, `ArtifactKind` from the + ontology into [`path-analyser/src/types.ts`](../../../path-analyser/src/types.ts) + alongside the existing `OperationNode`. Keep the existing types as + structural aliases initially. +2. **Multi-parameter `RuntimeState`.** Change + `RuntimeStateSpec.parameter: string` to `parameters: string[]` (the + value-binding drift findings #3 and the GitHub `IssueExists` example + both demand this). Mechanical migration in `domain-semantics.json`. +3. **Typed `ValueBinding`.** Replace the + `Record` in `OperationDomainRequirements.valueBindings` + with `ValueBinding[]` carrying parsed + `{ direction, fieldPath, targetState, targetParameter }`. The + parsing logic moves out of `index.ts:320-340` into the loader. +4. **Load-time validation.** Add the SHACL invariants from + [`shapes/invariants.shapes.ttl`](./shapes/invariants.shapes.ttl) as + TS assertions in the loader. Each one is one short function. The + five findings above become test fixtures. +5. **Fix the four surfaced defects.** `FormDeployed` (add the state), + `ProcessDefinitionKey.*` (correct the binding RHS to refer to a + real state), `ProcessInstanceExists.processInstanceKey` (multi-param + from #2), `JobTypeValue` (add `validityState`). +6. **Replace `gatherDomainPrerequisites` with a typed + `dependsOnClosure(state)` helper** that walks the same edges + `core:dependsOn+` would. +7. **Optional, separate decision: full RDF adoption.** Defer until a + concrete second API enters the roadmap. The spike artifacts in this + directory are the migration starting point. + +## What the brief asked us to compare + +| Dimension | Outcome | +|---|---| +| De-duplication: how many distinct code paths collapse? | **5 → 1** (loader index-build, planner reverse-index reads, value-binding parsing, prerequisite traversal, identifier resolution). All collapsible in TS without RDF; RDF is incidental. | +| Are the named entities ones we'd want even without RDF? | **Yes, unambiguously.** This is the spike's strongest finding and the basis for the recommendation. | +| Authoring experience for non-RDF-fluent contributors? | TTL is reasonable for vocabulary; SHACL shapes are harder than the equivalent TS validators; SPARQL is a real second language. **TS-native modelling avoids all three costs.** Defer until the multi-API roadmap makes them worthwhile. | +| Does the per-API ↔ core abstraction line hold? | **Yes.** [`camunda.ttl`](./ontology/camunda.ttl) and the [GitHub sketch](./second-api-sketch.md) introduce zero new properties. Per-API vocabulary = list of instances; core = list of relations. | +| Was index parity achievable? | **Yes**, plus the parity script surfaced one latent loader bug (`domainProducers["undefined"]`) the SHACL `IdentifierShape` would catch. | + +## Decision + +**Adopt the modelling. Defer RDF.** The spike has produced everything +needed for a follow-up modelling PR; the RDF adoption decision is +separable and lower-priority until multi-API is concrete. + +If the team prefers a different read of the trade-off (e.g. "the +multi-API roadmap is firmer than the recommendation assumes; adopt +RDF now"), the spike artifacts support that path too — adapters, +queries, and parity test would feed directly into a production +migration. diff --git a/docs/spikes/rdf/queries/minimal-scenario-chain.ts b/docs/spikes/rdf/queries/minimal-scenario-chain.ts new file mode 100644 index 0000000..5c6d94d --- /dev/null +++ b/docs/spikes/rdf/queries/minimal-scenario-chain.ts @@ -0,0 +1,167 @@ +// Phase 4b: minimal scenario-chain candidate query +// +// The brief asks: +// +// "For each operation, list the minimal scenario chains that satisfy +// all REQUIRED semantic types and reach a target runtime state, +// ranked by chain length." +// +// Today this is BFS in path-analyser/src/scenarioGenerator.ts. The +// brief explicitly carves out the planner: scenario synthesis (BFS, +// optional-pair combinations, artifact rule variants, duplicate +// policies) stays in TS. What SPARQL replaces is the *candidate +// selection* portion: "for this required semantic type, which +// authoritative producer can supply it?" +// +// This file demonstrates two queries that together provide everything +// the planner needs for candidate selection. The planner then orders +// and combines them combinatorially. + +// biome-ignore lint/correctness/noNodejsModules: spike-only. +import path from 'node:path'; +// biome-ignore lint/correctness/noNodejsModules: spike-only. +import { exit } from 'node:process'; +import oxigraph from 'oxigraph'; +import { buildStore } from '../adapters/build-store.js'; + +// Query 1: for a given operation, what required SemanticType producers exist? +// This is what bySemanticProducer + providerMap give the planner today, +// joined into one shape the planner can consume directly. +const Q_REQUIRED_PRODUCERS = ` +PREFIX core: +SELECT ?targetOp ?reqType ?producerOp ?authoritative WHERE { + ?target core:operationId ?targetOp ; + core:requires ?t . + ?t ?reqType . + OPTIONAL { + ?producer core:produces ?t ; + core:operationId ?producerOp . + BIND(EXISTS { ?producer core:authoritativeProducer ?t } AS ?authoritative) + } +} +ORDER BY ?targetOp ?reqType DESC(?authoritative) +`; + +// Query 2: transitive runtime-state prerequisites — replaces +// gatherDomainPrerequisites() in scenarioGenerator.ts (~L1254-L1273). +// Today: a hand-rolled iterative DFS over runtimeStates.requires + +// capabilities.dependsOn with a visited set. +// Here: SPARQL property path "dependsOn+" walks the closure in one +// statement. The planner consumes the result; ordering / ties stay in +// TS. +const Q_TRANSITIVE_PREREQS = ` +PREFIX core: +SELECT ?targetOp ?goalState ?prereq WHERE { + ?op core:operationId ?targetOp ; + core:requiresState ?goalState . + # Property path: every state ?goalState transitively depends on. + ?goalState core:dependsOn+ ?prereq . +} +ORDER BY ?targetOp ?goalState +`; + +// Query 3: the actual brief query — for a target operation, find the +// minimal set of operations whose produced SemanticTypes cover all of +// the target's REQUIRED semantic types. Ranked here by simple chain +// length; the planner does the real ordering. +const Q_MINIMAL_CHAIN_CANDIDATES = ` +PREFIX core: +SELECT ?targetOp ?producerOp (COUNT(DISTINCT ?reqType) AS ?coverage) WHERE { + ?target core:operationId ?targetOp ; + core:requires ?t . + ?t ?reqType . + ?producer core:produces ?t ; + core:authoritativeProducer ?t ; + core:operationId ?producerOp . +} +GROUP BY ?targetOp ?producerOp +ORDER BY ?targetOp DESC(?coverage) +`; + +interface RequiredProducerRow { + reqType: string; + producers: { opId: string; authoritative: boolean }[]; +} + +async function main(): Promise { + const baseDir = path.resolve(import.meta.dirname, '../../../../path-analyser'); + const { store } = await buildStore(baseDir); + + // Group required-producers by (targetOp, reqType) for readability. + const byOp = new Map>(); + for (const b of store.query(Q_REQUIRED_PRODUCERS) as Iterable) { + const targetOp = b.get('targetOp')?.value ?? ''; + const reqType = b.get('reqType')?.value ?? ''; + const producerOp = b.get('producerOp')?.value; + const auth = b.get('authoritative')?.value === 'true'; + if (!byOp.has(targetOp)) byOp.set(targetOp, new Map()); + const inner = byOp.get(targetOp); + if (!inner) continue; + if (!inner.has(reqType)) inner.set(reqType, { reqType, producers: [] }); + if (producerOp) inner.get(reqType)?.producers.push({ opId: producerOp, authoritative: auth }); + } + + // Print a sample so the result is human-checkable. createProcessInstance + // and activateJobs are good demonstrators because they have multiple + // required types each. + console.log('=== Required-producer candidates (sample: createProcessInstance, activateJobs) ==='); + for (const opId of ['createProcessInstance', 'activateJobs']) { + console.log(`\n${opId}:`); + const inner = byOp.get(opId); + if (!inner || inner.size === 0) { + console.log(' (no required semantic types)'); + continue; + } + for (const row of inner.values()) { + console.log(` requires ${row.reqType}:`); + if (row.producers.length === 0) { + console.log(' (NO PRODUCER — would fail planner)'); + continue; + } + for (const p of row.producers) { + const tag = p.authoritative ? '[authoritative]' : ''; + console.log(` ${p.opId} ${tag}`); + } + } + } + + // Transitive runtime-state prereqs — the gatherDomainPrerequisites + // replacement. + console.log('\n=== Transitive runtime-state prerequisites (dependsOn+) ==='); + const prereqByOp = new Map>>(); + for (const b of store.query(Q_TRANSITIVE_PREREQS) as Iterable) { + const targetOp = b.get('targetOp')?.value ?? ''; + const goal = decodeURIComponent(b.get('goalState')?.value.split('#').pop() ?? ''); + const prereq = decodeURIComponent(b.get('prereq')?.value.split('#').pop() ?? ''); + if (!prereqByOp.has(targetOp)) prereqByOp.set(targetOp, new Map()); + const inner = prereqByOp.get(targetOp); + if (!inner) continue; + if (!inner.has(goal)) inner.set(goal, new Set()); + inner.get(goal)?.add(prereq); + } + for (const [opId, goals] of prereqByOp) { + console.log(`\n${opId}:`); + for (const [goal, prereqs] of goals) { + console.log(` ${goal} requires: { ${[...prereqs].join(', ')} }`); + } + } + + // Coverage-ranked candidates — toy ordering; planner does the real + // combinatorial work. + console.log('\n=== Coverage-ranked authoritative producers (top per op) ==='); + const seenOps = new Set(); + for (const b of store.query(Q_MINIMAL_CHAIN_CANDIDATES) as Iterable) { + const targetOp = b.get('targetOp')?.value ?? ''; + if (seenOps.has(targetOp)) continue; + seenOps.add(targetOp); + const producerOp = b.get('producerOp')?.value; + const cov = b.get('coverage')?.value; + if (seenOps.size <= 12) console.log(` ${targetOp.padEnd(35)} <- ${producerOp} (covers ${cov} required types)`); + } + console.log(` ... (${seenOps.size} target operations total)`); + console.log('\nNote: planner combines these candidates with disjunctions, optional-pair coverage,'); + console.log('artifact rules, duplicate policies. SPARQL only supplies inputs.'); + return 0; +} + +main().then(exit).catch((e) => { console.error(e); exit(2); }); diff --git a/docs/spikes/rdf/queries/value-binding-drift.ts b/docs/spikes/rdf/queries/value-binding-drift.ts new file mode 100644 index 0000000..6691b8f --- /dev/null +++ b/docs/spikes/rdf/queries/value-binding-drift.ts @@ -0,0 +1,114 @@ +// Phase 4a: value-binding drift detector +// +// Today: domain-semantics.json valueBindings are string-keyed, e.g. +// +// "valueBindings": { +// "request.processDefinitionId": +// "ProcessDefinitionDeployed.processDefinitionId", +// "response.deployments[].processDefinition.processDefinitionId": +// "ProcessDefinitionKey.processDefinitionKey" +// } +// +// path-analyser/src/index.ts (~L320-L340) string-splits these at +// scenario-bind time. A typo in the path or a renamed response field +// silently no-ops — the planner still runs, the scenario still emits, +// the resulting test just doesn't extract the variable it thought it +// extracted. The brief calls this the "silent-miss class". +// +// Under the model: a value binding is a ValueBinding node whose +// bindsFromFieldPath property points at a FieldPath node. A typo means +// that pointer resolves to a node with no matching FieldPath in the +// response shape — a one-line SPARQL query. +// +// We additionally check that the bindsToParameter is one the target +// RuntimeState actually declares — catches the second-half rename +// problem (renaming a state's parameter without updating its bindings). + +// biome-ignore lint/correctness/noNodejsModules: spike-only. +import path from 'node:path'; +// biome-ignore lint/correctness/noNodejsModules: spike-only. +import { exit } from 'node:process'; +import oxigraph from 'oxigraph'; +import { buildStore } from '../adapters/build-store.js'; + +interface DriftRow { + binding: string; + fieldPath: string; + state?: string; + parameter?: string; + reason: string; +} + +const Q_UNRESOLVED_PATH = ` +PREFIX core: +SELECT ?binding ?fp WHERE { + ?binding a core:ValueBinding ; + core:bindingDirection "response" ; + core:bindsFromFieldPath ?fp . + # Drift: the binding points at a FieldPath IRI that has no + # corresponding canonical FieldPath emitted from the response shape. + # NB: ?anyLit must be a free variable inside NOT EXISTS — using a + # variable bound by BIND would check for a specific literal value + # instead of the existence of any literal. + FILTER NOT EXISTS { ?fp core:fieldPath ?anyLit } +} +`; + +const Q_PARAMETER_NOT_DECLARED = ` +PREFIX core: +SELECT ?binding ?stateUri ?param WHERE { + ?binding a core:ValueBinding ; + core:bindsToState ?stateUri ; + core:bindsToParameter ?param . + # Drift: the value-binding refers to a parameter the target + # RuntimeState does not expose. Catches state.parameter renames that + # didn't propagate into bindings. + FILTER NOT EXISTS { ?stateUri core:hasParameter ?param } +} +`; + +async function main(): Promise { + const baseDir = path.resolve(import.meta.dirname, '../../../../path-analyser'); + const { store } = await buildStore(baseDir); + + const drift: DriftRow[] = []; + + for (const b of store.query(Q_UNRESOLVED_PATH) as Iterable) { + const bindingIri = b.get('binding')?.value ?? ''; + const fpIri = b.get('fp')?.value ?? ''; + const fpStr = decodeURIComponent(fpIri.split('/').pop() ?? ''); + drift.push({ + binding: decodeURIComponent(bindingIri.split('#').pop() ?? bindingIri), + fieldPath: fpStr, + reason: 'response field-path does not resolve to any canonical FieldPath', + }); + } + + for (const b of store.query(Q_PARAMETER_NOT_DECLARED) as Iterable) { + const bindingIri = b.get('binding')?.value ?? ''; + const stateUri = b.get('stateUri')?.value ?? ''; + const param = b.get('param')?.value ?? ''; + drift.push({ + binding: decodeURIComponent(bindingIri.split('#').pop() ?? bindingIri), + fieldPath: '(parameter check)', + state: decodeURIComponent(stateUri.split('#').pop() ?? stateUri), + parameter: param, + reason: 'bindsToParameter is not declared by bindsToState (hasParameter)', + }); + } + + console.log(`Value-binding drift detector — ${drift.length} finding(s).`); + for (const d of drift) { + console.log(` - ${d.binding}`); + console.log(` ${d.reason}`); + if (d.fieldPath !== '(parameter check)') console.log(` fieldPath: ${d.fieldPath}`); + if (d.state) console.log(` target: ${d.state}.${d.parameter}`); + } + if (drift.length === 0) { + console.log(' (none — all value bindings resolve cleanly against the model.)'); + } + console.log('\nNote: today these would be silent runtime no-ops; under the model they are queryable load-time errors.'); + return 0; +} + +main().then(exit).catch((e) => { console.error(e); exit(2); }); diff --git a/docs/spikes/rdf/second-api-sketch.md b/docs/spikes/rdf/second-api-sketch.md new file mode 100644 index 0000000..55be51c --- /dev/null +++ b/docs/spikes/rdf/second-api-sketch.md @@ -0,0 +1,167 @@ +# Second-API sketch — GitHub Issues + Pull Requests + +> Paper exercise per the spike brief. Goal: verify that the core +> ontology in [`ontology/core.ttl`](./ontology/core.ttl) accommodates a +> second API's concepts without invasive changes. **Not implemented — +> not loaded into the store, not queried.** A genuine sketch. + +## Why this API + +GitHub's Issues + Pull Requests REST API is a useful second test +because: + +- It has runtime state dependencies (a PR cannot be merged before it's + opened; a comment cannot be added to a closed conversation; a review + cannot be requested on a draft PR until it's marked ready). +- It has identifiers whose validity depends on state (a `pull_number` + is only valid for the lifetime of a PR; a `review_id` is only valid + while the PR exists). +- It has artifact-shaped inputs (the diff content of a commit, similar + in role to BPMN content for `createDeployment`). +- It is publicly documented and we already use it in three other repos + in the workspace, so we can be honest about whether the abstraction + fits. + +## Vocabulary file (would-be `github.ttl`) + +```turtle +@prefix github: . +@prefix core: . +@prefix rdfs: . + +# --- Identifiers (instances of core:Identifier) ------------------------- +github:RepoFullName a core:Identifier ; + rdfs:label "Repository full name (owner/repo)" ; + core:validityState github:RepositoryExists . + +github:IssueNumber a core:Identifier ; + core:validityState github:IssueExists . + +github:PullNumber a core:Identifier ; + core:validityState github:PullRequestExists . + +github:CommitSha a core:Identifier ; + core:validityState github:CommitExists . + +# --- Runtime states ---------------------------------------------------- +github:RepositoryExists a core:RuntimeState ; + core:hasParameter "fullName" . + +github:IssueExists a core:RuntimeState ; + core:hasParameter "issueNumber" ; + core:dependsOn github:RepositoryExists . + +github:PullRequestExists a core:RuntimeState ; + core:hasParameter "pullNumber" ; + core:dependsOn github:RepositoryExists . + +github:PullRequestReadyForReview a core:RuntimeState ; + core:dependsOn github:PullRequestExists . + +github:PullRequestMergeable a core:RuntimeState ; + core:dependsOn github:PullRequestReadyForReview . + +# --- Capabilities (subclass of core:RuntimeState) ---------------------- +github:RepoHasBranch a core:Capability ; + core:hasParameter "branchName" ; + core:dependsOn github:RepositoryExists . + +github:RepoHasIssueLabel a core:Capability ; + core:hasParameter "labelName" ; + core:dependsOn github:RepositoryExists . + +# --- Artifact kinds ---------------------------------------------------- +github:DiffArtifact a core:ArtifactKind . # commit diff content +github:Markdown a core:ArtifactKind . # issue body, PR description +``` + +## Mechanical observations + +### What fitted cleanly + +| Concept | Mapped to | Notes | +|---|---|---| +| `pull_number`, `issue_number` | `core:Identifier` + `core:validityState` | Same shape as `ProcessInstanceKey → ProcessInstanceExists` | +| "Cannot merge a draft PR" | `core:RuntimeState` chain via `core:dependsOn` | `PullRequestMergeable → PullRequestReadyForReview → PullRequestExists → RepositoryExists` | +| Branch-existence as a precondition for `createPullRequest` | `core:Capability` | Same role as `ModelHasServiceTaskType` | +| Diff content as input | `core:ArtifactKind` | Same role as `bpmnProcess` | +| Repo full-name as a stable identifier | `core:Identifier` | Validity state pattern works | + +### What surfaced a question (not necessarily a gap) + +1. **Multi-parameter states.** GitHub's `IssueExists` is keyed by + `(owner, repo, issue_number)`, not a single parameter. The current + `core:hasParameter` schema is single-valued. This is the same gap + the value-binding drift detector already surfaced for Camunda + (`ProcessInstanceExists` legitimately needs both `processDefinitionId` + and `processInstanceKey`). Recommended core extension: change + `core:hasParameter` to be multi-valued (it already is in RDF — the + constraint is only in our SHACL). Documented as a follow-up below; + does NOT require a core schema change, only a SHACL relaxation. + +2. **State transitions vs. terminal states.** Closing an issue or + merging a PR transitions the resource to a different RuntimeState + (`IssueClosed`, `PullRequestMerged`) that *consumes* the prior state + rather than being additive. The current core ontology has + `core:produces`, `core:implicitlyAdds`, and `core:dependsOn` but no + notion of "this operation invalidates state X". The Camunda side has + `cancelProcessInstance` which has the same shape and currently + models it as a no-op (the produced state is just absent). This is a + real abstraction gap, but the spike's existing model already lives + with it for Camunda — so it is API-agnostic, not GitHub-specific. + Recommended core extension: `core:invalidates` property; out of + spike scope. + +3. **Conditional branching on response state.** GitHub's + `getPullRequest` returns a `mergeable` field that may be + `null | true | false`. Some operations (auto-merge) only succeed + when `mergeable: true`. The current ontology models scenario + prerequisites as state-existence; "the response says X" is not + currently first-class. This is also Camunda-relevant + (`getJob.state == ACTIVATABLE` gates further work), so again + API-agnostic — not a gap exposed by the second-API exercise alone. + +### What did NOT fit + +Nothing. The exercise produced no concept that demanded an invasive +change to the core ontology. Two SHACL relaxations and one optional +new property (`core:invalidates`) cover everything; all three are +genuinely API-agnostic findings rather than GitHub-specific holes. + +## Honest test for the abstraction (per the brief) + +> "Can the planner be written referring only to terms in `core:`?" + +Tracing the planner's actual call sites against the +[façade-derived indexes](./parity/index-parity.ts) and the +[scenario-chain candidate query](./queries/minimal-scenario-chain.ts): + +- `bySemanticProducer[type]` — uses only `core:produces`, + `core:authoritativeProducer`, `core:operationId`. No Camunda terms. +- `domainProducers[state]` — uses only `core:producesState`, + `core:operationId`. No Camunda terms. +- `gatherDomainPrerequisites(seeds)` — replaceable by `core:dependsOn+` + property path. No Camunda terms. +- Value-binding resolution — uses only `core:ValueBinding`, + `core:bindsFromFieldPath`, `core:bindsToState`, + `core:bindsToParameter`, `core:hasParameter`. No Camunda terms. + +**Verdict: yes, the planner is API-agnostic in the proposed shape.** +The abstraction line (per-API adapters; API-agnostic graph store and +planner) is structurally achievable. This is a finding worth recording +even independently of whether RDF specifically is the carrier — the +named entities are the right ones. + +## Boundary-clarity finding + +The `camunda:` vocabulary file +([`ontology/camunda.ttl`](./ontology/camunda.ttl)) introduces only +*instances* and one `rdfs:subClassOf` relationship +(`core:Capability rdfs:subClassOf core:RuntimeState`, but that's +declared in core anyway). It introduces zero new properties. The +`github.ttl` sketch above does the same. That is the strongest signal +the boundary is in the right place: a per-API vocabulary is a list of +*what exists in this API*, not *new ways APIs can be shaped*. + +If a per-API vocabulary ever needs a new property, that is the signal +that the core ontology is missing an abstraction.