spike: evaluate RDF/SPARQL as a unifying query layer (#60)#63
Draft
jwulf wants to merge 3 commits into
Draft
Conversation
…iants Lays the modelling foundation for issue #60. Three artefacts: - docs/spikes/rdf/ontology/core.ttl — API-agnostic vocabulary (Operation, SemanticType, RuntimeState, Capability, FieldPath, ValueBinding, ArtifactKind, Identifier, Disjunction, Scenario) - docs/spikes/rdf/ontology/camunda.ttl — per-API instances and subclasses; demonstrates the per-API <-> core boundary - docs/spikes/rdf/shapes/invariants.shapes.ttl — SHACL shapes pinning invariants the codebase enforces procedurally today (every required semantic type / runtime state has a producer; every value binding resolves to a known FieldPath and a parameter declared by its target state) Adds devDependencies oxigraph (SPARQL 1.1) and rdf-validate-shacl (SHACL) — scoped to the spike under docs/spikes/rdf/, no production-pipeline change. Refs #60
Phase 2 (adapters): - docs/spikes/rdf/adapters/build-store.ts materialises an in-memory Oxigraph store from the normalised OperationGraph + DomainSemantics. Emits 3,497 quads from current pipeline state (183 operations, 37 semantic types, 482 canonical field paths, plus runtime states / capabilities / identifiers / artifact kinds / value bindings / disjunctions). Phase 3 (index parity — go/no-go checkpoint): - docs/spikes/rdf/parity/index-parity.ts re-derives the loader's three reverse indexes (bySemanticProducer, domainProducers, providerMap) from SPARQL queries over the store and asserts parity against graphLoader.ts output. Result: PARITY PASS for every well-formed key. Side finding (recorded for RECOMMENDATION.md): - The JobTypeValue identifier in domain-semantics.json has no validityState, which causes the loader to write domainProducers["undefined"] = ["createDeployment"] — a silent data-quality issue. The SHACL IdentifierShape (validityState minCount 1) catches this at load time. Surfaced explicitly in the parity output as "LOADER-ONLY ARTIFACTS THE ONTOLOGY REJECTS". The brief's checkpoint is satisfied: structural equivalence at planner-output level falls out for free without modifying any planner algorithm code. Refs #60
…sketch, RECOMMENDATION
Phase 4a (value-binding drift detector):
- docs/spikes/rdf/queries/value-binding-drift.ts expresses
value-binding resolution as a SPARQL query. Surfaces FOUR latent
domain-semantics defects that are silent runtime no-ops today:
1. createDeployment -> FormDeployed.formKey: state does not exist.
2. createDeployment -> ProcessDefinitionKey.processDefinitionKey:
ProcessDefinitionKey is a semantic type, not a state.
3. createProcessInstance -> ProcessInstanceExists.processInstanceKey:
state declares parameter=processDefinitionId, not
processInstanceKey. (Single-parameter modelling gap.)
4. createProcessInstance -> ProcessDefinitionKey.processDefinitionKey:
same type-confusion as #2.
Phase 4b (minimal scenario-chain candidate query):
- docs/spikes/rdf/queries/minimal-scenario-chain.ts replaces
gatherDomainPrerequisites() (the single hand-rolled multi-hop
traversal in scenarioGenerator.ts ~L1254-L1273) with a
core:dependsOn+ SPARQL property path. Also demonstrates required-
producer candidate selection (replaces bySemanticProducer +
providerMap reads in the planner) and a coverage-ranked picker.
Phase 5 (second-API paper sketch):
- docs/spikes/rdf/second-api-sketch.md sketches a github: vocabulary
for GitHub Issues + Pull Requests. The core ontology accommodates
it without invasive changes — two SHACL relaxations and one
optional new property (core:invalidates), all genuinely
API-agnostic. Per-API vocabulary introduces zero new properties:
the abstraction line (per-API adapters; API-agnostic core) holds.
RECOMMENDATION:
- docs/spikes/rdf/RECOMMENDATION.md: ADOPT THE MODELLING; DEFER RDF.
The named entities are the right ones whether or not the carrier
is RDF; reify them as TS types now. RDF adoption is a separable,
lower-priority decision that becomes worthwhile when multi-API
generalisation moves from aspirational to concrete. Spike artifacts
(adapters, parity test, queries) are reusable as the migration
starting point if/when that happens. Includes a 7-step concrete
follow-up plan sized for normal PRs.
Pre-push checks:
- npm run lint: clean
- tsc --noEmit per workspace: clean
- npm run testsuite:generate + generate:request-validation: clean
- npm test: 89 passed (15 files)
Closes #60
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Spike for #60. Recommendation: adopt the modelling, defer RDF.
See
docs/spikes/rdf/RECOMMENDATION.mdfor the full reasoning. TL;DR:docs/spikes/rdf/ontology/core.ttlare the right abstractions whether or not the carrier is RDF.bySemanticProducer,domainProducers,providerMapall re-derive cleanly from SPARQL queries).core:(no Camunda-specific terms), so the per-API ↔ core abstraction line is structurally achievable.domain-semantics.jsonand the loader (4 value-binding drift cases + 1 loader bug writingdomainProducers["undefined"]).Adopt RDF later if/when multi-API generalisation moves from aspirational to concrete. The spike artifacts (adapters, parity test, queries) are reusable as the migration starting point.
Deliverables (all under
docs/spikes/rdf/)ontology/core.ttl,ontology/camunda.ttl,shapes/invariants.shapes.ttladapters/build-store.ts— emits 3,497 quads from current pipeline stateparity/index-parity.ts— PASS for every well-formed keyqueries/value-binding-drift.ts— surfaces 4 real defectsqueries/minimal-scenario-chain.ts—dependsOn+property path replacesgatherDomainPrerequisitessecond-api-sketch.md— paper sketchRECOMMENDATION.md— adopt modelling, defer RDF, includes 7-step follow-up planPre-push checks
npm run lint: cleantsc --noEmitper workspace: cleannpm run testsuite:generate+generate:request-validation: cleannpm test: 89 passed (15 files)Adds devDependencies
oxigraph+rdf-validate-shacl+n3, scoped to the spike underdocs/spikes/rdf/. No production-pipeline change.Closes #60