diff --git a/.github/workflows/build-docs.yml b/.github/workflows/build-docs.yml index 1b891b58..61631f44 100644 --- a/.github/workflows/build-docs.yml +++ b/.github/workflows/build-docs.yml @@ -35,7 +35,16 @@ jobs: version: "0.9.28" - name: Install dependencies run: | - uv sync --extra docs + uv sync --extra docs --extra dev + + - name: Generate ontology visualization + run: | + uv run python docs/scripts/build_ontology_viz.py + uv run pre-commit run --files docs/assets/graflo-ontology-viz/*.json || true + + - name: Verify committed viz assets are fresh + run: | + git diff --exit-code docs/assets/graflo-ontology-viz/ - name: Build site run: | diff --git a/CHANGELOG.md b/CHANGELOG.md index b1b4e2f9..2bacfab2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.8.0] + +### Added + +- **GraFlo meta-ontology** — OWL vocabulary at `https://ontology.growgraph.dev/graflo` (`owl:versionIRI` `…/1.0.0`, `owl:versionInfo` `1.0.0`) describing `GraphManifest`, `Schema`, `IngestionModel`, `ProtoTransform`, pipeline actor steps, bindings, and related enumerations. Shipped as `graflo/rdf/ontology/graflo.ttl` plus JSON-LD context `graflo-context.jsonld`. +- **`graflo.rdf`** — `ManifestRdfSerializer` / `ManifestRdfDeserializer` for bidirectional conversion between `GraphManifest` (YAML/Pydantic) and RDF (Turtle, JSON-LD, N-Triples, RDF/XML). +- **CLI** — `manifest-to-rdf` and `rdf-to-manifest` console scripts (`graflo.rdf.cli`). + +### Documentation + +- **[GraFlo ontology](docs/model/graflo_ontology.md)** — meta-model vs user-domain RDF (`RdfInferenceManager`), versioning, URI layout, CLI, and round-trip semantics. +- **Interactive ontology visualization** — custom hierarchical class graph (rectangular nodes, subClassOf and optional property edges, pan/zoom) embedded on the GraFlo ontology page; built via `docs/scripts/build_ontology_viz.py` with committed assets under `docs/assets/graflo-ontology-viz/`. +- **README** and **docs index** — feature overview and quick links for manifest ↔ RDF workflows. + ## [1.7.33] ### Added diff --git a/CITATION.cff b/CITATION.cff index afdf0aa9..22f3e7a0 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -1,5 +1,8 @@ -abstract:

A framework for transforming tabular data (CSV, SQLand hierarchical data (JSON, XMLinto - property graphs and ingesting them into graph databases (ArangoDB, Neo4j).

+abstract: >- + Manifest-driven graph schema and ingestion for labeled property graphs: + define schemas in GraphManifest (YAML/Python), ingest from CSV/JSON/Parquet/SQL/RDF/SPARQL/API, + infer from PostgreSQL 3NF or OWL/RDFS, apply schema migrations, and project to ArangoDB, Neo4j, + TigerGraph, FalkorDB, Memgraph, or NebulaGraph. authors: - affiliation: GrowGraph family-names: Belikov @@ -11,5 +14,5 @@ doi: 10.5281/zenodo.15446131 license: [] license-url: https://github.com/growgraph/graflo/blob/main/LICENSE message: If you use this software, please cite it using the metadata from this file. -title: graflo +title: GraFlo — Graph Schema & Transformation Language (GSTL) type: software diff --git a/README.md b/README.md index 897051fc..06e60099 100644 --- a/README.md +++ b/README.md @@ -8,13 +8,19 @@ [![pre-commit](https://github.com/growgraph/graflo/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/growgraph/graflo/actions/workflows/pre-commit.yml) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15446131.svg)]( https://doi.org/10.5281/zenodo.15446131) -**GraFlo** is a manifest-driven toolkit for **labeled property graphs (LPGs)**: describe vertices, edges, and ingestion (`GraphManifest` — YAML or Python), then project and load into a target graph database. +**GraFlo** is a manifest-driven schema and ingestion layer for **labeled property graphs (LPGs)**. +Write a `GraphManifest` (YAML or Python) once — it defines vertices, edges, typed properties, +identities, and DB profile — then infer, validate, migrate, and load into any supported graph engine. ### What you get - **One pipeline, several graph databases** — The same manifest targets ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, or NebulaGraph; `DatabaseProfile` and DB-aware types absorb naming, defaults, and indexing differences. - **Explicit identities** — Vertex identity fields and indexes back upserts so reloads merge on keys instead of blindly duplicating nodes. - **Reusable ingestion** — `Resource` actor pipelines (including **vertex_router** / **edge** steps) bind to files, SQL, SPARQL/RDF, APIs, or in-memory batches via `Bindings` and the `DataSourceRegistry`. +- **Schema as the contract** — `GraphManifest` is the single source of truth: vertex/edge definitions, + typed properties, identity fields, and DB profile are validated at `finish_init` time, not at + write time. Schema migrations are first-class (`graflo migrate_schema`). +- **Manifest as linked data** — The [GraFlo ontology](https://growgraph.github.io/graflo/model/graflo_ontology/) (`gf:` at `ontology.growgraph.dev`) lets you export manifests to RDF and round-trip them for tooling, provenance, and SPARQL-facing catalogs. ### What’s in the manifest @@ -56,7 +62,8 @@ The graph engines listed in **What you get** are the supported **output** `DBTyp ## More capabilities -- **SPARQL & RDF** — Endpoints and RDF files (`.ttl`, `.rdf`, `.n3`, …); optional OWL/RDFS schema inference (`rdflib`, `SPARQLWrapper` in the default install). +- **GraFlo ontology (manifest RDF)** — Serialize any `GraphManifest` to RDF (Turtle, JSON-LD) using the published vocabulary at [`https://ontology.growgraph.dev/graflo`](https://ontology.growgraph.dev/graflo) (`owl:versionInfo` **1.0.0**). Covers schema, ingestion (resources, transforms, pipeline actors), and bindings. Round-trip via `graflo.rdf` or the `manifest-to-rdf` / `rdf-to-manifest` CLI. This is the **meta-model** of GraFlo itself — distinct from importing a **domain** OWL ontology into an LPG schema (`RdfInferenceManager`). Details: [docs — GraFlo ontology](https://growgraph.github.io/graflo/model/graflo_ontology/). +- **SPARQL & RDF** — Endpoints and RDF files (`.ttl`, `.rdf`, `.n3`, …); optional OWL/RDFS **domain** schema inference (`rdflib`, `SPARQLWrapper` in the default install). - **Schema inference** — From PostgreSQL-style 3NF layouts (PK/FK heuristics) or from OWL/RDFS (`owl:Class` → vertices, `owl:ObjectProperty` → edges, `owl:DatatypeProperty` → vertex fields). - **Schema migrations** — Plan and apply guarded schema deltas (`migrate_schema` console script → `graflo.cli.migrate_schema`; library in `graflo.migrate`; see docs). - **Typed `properties`** — Optional field types (`INT`, `FLOAT`, `STRING`, `DATETIME`, `BOOL`) on vertices and edges. @@ -212,7 +219,35 @@ caster = Caster(schema=schema, ingestion_model=ingestion_model) # ... continue with ingestion ``` -### RDF / SPARQL Ingestion +### Manifest ↔ RDF (GraFlo ontology) + +```bash +# Serialize manifest YAML to Turtle (embeds gf: vocabulary when --include-ontology is default) +uv run manifest-to-rdf manifest.yaml \ + --base-uri https://growgraph.dev/manifests/mygraph/v1 \ + --format turtle \ + --output mygraph.ttl + +# Restore YAML from RDF +uv run rdf-to-manifest mygraph.ttl \ + --manifest-uri https://growgraph.dev/manifests/mygraph/v1 \ + --output manifest.restored.yaml +``` + +```python +from graflo import GraphManifest +from graflo.rdf import ManifestRdfDeserializer, ManifestRdfSerializer + +manifest = GraphManifest.from_yaml("manifest.yaml") +base = "https://growgraph.dev/manifests/mygraph/v1" + +ttl = ManifestRdfSerializer().to_turtle(manifest, base) +restored = ManifestRdfDeserializer().from_turtle(ttl, base.rstrip("/")) +``` + +Ontology source: `graflo/rdf/ontology/graflo.ttl`. See [GraFlo ontology](https://growgraph.github.io/graflo/model/graflo_ontology/). + +### RDF / SPARQL Ingestion (domain ontology → LPG) ```python from pathlib import Path diff --git a/docs/assets/graflo-ontology-viz/embed.html b/docs/assets/graflo-ontology-viz/embed.html new file mode 100644 index 00000000..da83e1b4 --- /dev/null +++ b/docs/assets/graflo-ontology-viz/embed.html @@ -0,0 +1,884 @@ + + + + + GraFlo Ontology (v1.0.0) + + + +
+
+
+ + + + +
+
Drag · scroll zoom · click class
+ +
+
+ + + + diff --git a/docs/assets/graflo-ontology-viz/graph-data.json b/docs/assets/graflo-ontology-viz/graph-data.json new file mode 100644 index 00000000..e75addef --- /dev/null +++ b/docs/assets/graflo-ontology-viz/graph-data.json @@ -0,0 +1,857 @@ +{ + "edges": [ + { + "id": "sub:https://ontology.growgraph.dev/graflo/Actor->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Actor", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Bindings->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Bindings", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/BoundConnector->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/BoundConnector", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/ConnectorConnectionBinding->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/ConnectorConnectionBinding", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/CoreSchema->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/CoreSchema", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/DatabaseProfile->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/DatabaseProfile", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/DescendActor->https://ontology.growgraph.dev/graflo/Actor", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/DescendActor", + "target": "https://ontology.growgraph.dev/graflo/Actor" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/DressConfig->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/DressConfig", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Edge->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Edge", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/EdgeActor->https://ontology.growgraph.dev/graflo/Actor", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/EdgeActor", + "target": "https://ontology.growgraph.dev/graflo/Actor" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/EdgeConfig->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/EdgeConfig", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/EdgeInferSpec->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/EdgeInferSpec", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/EdgePhysicalSpec->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/EdgePhysicalSpec", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Field->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Field", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/FileConnector->https://ontology.growgraph.dev/graflo/BoundConnector", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/FileConnector", + "target": "https://ontology.growgraph.dev/graflo/BoundConnector" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/GraphManifest->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/GraphManifest", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/GraphManifest->http://www.w3.org/ns/prov#Entity", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/GraphManifest", + "target": "http://www.w3.org/ns/prov#Entity" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/GraphMetadata->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/GraphMetadata", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Identity->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Identity", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Index->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Index", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/IngestionModel->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/IngestionModel", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/KeySelectionConfig->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/KeySelectionConfig", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/ProtoTransform->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/ProtoTransform", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/ProtoTransform->http://www.w3.org/ns/prov#Activity", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/ProtoTransform", + "target": "http://www.w3.org/ns/prov#Activity" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Resource->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Resource", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/ResourceConnectorBinding->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/ResourceConnectorBinding", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Schema->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Schema", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/SparqlConnector->https://ontology.growgraph.dev/graflo/BoundConnector", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/SparqlConnector", + "target": "https://ontology.growgraph.dev/graflo/BoundConnector" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/StagingProxyBinding->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/StagingProxyBinding", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/TableConnector->https://ontology.growgraph.dev/graflo/BoundConnector", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/TableConnector", + "target": "https://ontology.growgraph.dev/graflo/BoundConnector" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Transform->https://ontology.growgraph.dev/graflo/ProtoTransform", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Transform", + "target": "https://ontology.growgraph.dev/graflo/ProtoTransform" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/TransformActor->https://ontology.growgraph.dev/graflo/Actor", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/TransformActor", + "target": "https://ontology.growgraph.dev/graflo/Actor" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/Vertex->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/Vertex", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/VertexActor->https://ontology.growgraph.dev/graflo/VertexProducingActor", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/VertexActor", + "target": "https://ontology.growgraph.dev/graflo/VertexProducingActor" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/VertexConfig->https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/VertexConfig", + "target": "https://ontology.growgraph.dev/graflo/GrafloArtifact" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/VertexProducingActor->https://ontology.growgraph.dev/graflo/Actor", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/VertexProducingActor", + "target": "https://ontology.growgraph.dev/graflo/Actor" + }, + { + "id": "sub:https://ontology.growgraph.dev/graflo/VertexRouterActor->https://ontology.growgraph.dev/graflo/VertexProducingActor", + "kind": "subClassOf", + "label": "subClassOf", + "source": "https://ontology.growgraph.dev/graflo/VertexRouterActor", + "target": "https://ontology.growgraph.dev/graflo/VertexProducingActor" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasSchema", + "kind": "objectProperty", + "label": "hasSchema", + "source": "https://ontology.growgraph.dev/graflo/GraphManifest", + "target": "https://ontology.growgraph.dev/graflo/Schema" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasIngestionModel", + "kind": "objectProperty", + "label": "hasIngestionModel", + "source": "https://ontology.growgraph.dev/graflo/GraphManifest", + "target": "https://ontology.growgraph.dev/graflo/IngestionModel" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasBindings", + "kind": "objectProperty", + "label": "hasBindings", + "source": "https://ontology.growgraph.dev/graflo/GraphManifest", + "target": "https://ontology.growgraph.dev/graflo/Bindings" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasCoreSchema", + "kind": "objectProperty", + "label": "hasCoreSchema", + "source": "https://ontology.growgraph.dev/graflo/Schema", + "target": "https://ontology.growgraph.dev/graflo/CoreSchema" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasVertexConfig", + "kind": "objectProperty", + "label": "hasVertexConfig", + "source": "https://ontology.growgraph.dev/graflo/CoreSchema", + "target": "https://ontology.growgraph.dev/graflo/VertexConfig" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasEdgeConfig", + "kind": "objectProperty", + "label": "hasEdgeConfig", + "source": "https://ontology.growgraph.dev/graflo/CoreSchema", + "target": "https://ontology.growgraph.dev/graflo/EdgeConfig" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasMetadata", + "kind": "objectProperty", + "label": "hasMetadata", + "source": "https://ontology.growgraph.dev/graflo/Schema", + "target": "https://ontology.growgraph.dev/graflo/GraphMetadata" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasDatabaseProfile", + "kind": "objectProperty", + "label": "hasDatabaseProfile", + "source": "https://ontology.growgraph.dev/graflo/Schema", + "target": "https://ontology.growgraph.dev/graflo/DatabaseProfile" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasVertex", + "kind": "objectProperty", + "label": "hasVertex", + "source": "https://ontology.growgraph.dev/graflo/VertexConfig", + "target": "https://ontology.growgraph.dev/graflo/Vertex" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasEdge", + "kind": "objectProperty", + "label": "hasEdge", + "source": "https://ontology.growgraph.dev/graflo/EdgeConfig", + "target": "https://ontology.growgraph.dev/graflo/Edge" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasField", + "kind": "objectProperty", + "label": "hasField", + "source": "https://ontology.growgraph.dev/graflo/Vertex", + "target": "https://ontology.growgraph.dev/graflo/Field" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasIdentity", + "kind": "objectProperty", + "label": "hasIdentity", + "source": "https://ontology.growgraph.dev/graflo/Vertex", + "target": "https://ontology.growgraph.dev/graflo/Identity" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/edgeSource", + "kind": "objectProperty", + "label": "edgeSource", + "source": "https://ontology.growgraph.dev/graflo/Edge", + "target": "https://ontology.growgraph.dev/graflo/Vertex" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/edgeTarget", + "kind": "objectProperty", + "label": "edgeTarget", + "source": "https://ontology.growgraph.dev/graflo/Edge", + "target": "https://ontology.growgraph.dev/graflo/Vertex" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasResource", + "kind": "objectProperty", + "label": "hasResource", + "source": "https://ontology.growgraph.dev/graflo/IngestionModel", + "target": "https://ontology.growgraph.dev/graflo/Resource" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasTransform", + "kind": "objectProperty", + "label": "hasTransform", + "source": "https://ontology.growgraph.dev/graflo/IngestionModel", + "target": "https://ontology.growgraph.dev/graflo/ProtoTransform" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasActor", + "kind": "objectProperty", + "label": "hasActor", + "source": "https://ontology.growgraph.dev/graflo/Resource", + "target": "https://ontology.growgraph.dev/graflo/Actor" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/targetsVertex", + "kind": "objectProperty", + "label": "targetsVertex", + "source": "https://ontology.growgraph.dev/graflo/VertexProducingActor", + "target": "https://ontology.growgraph.dev/graflo/Vertex" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/targetsEdge", + "kind": "objectProperty", + "label": "targetsEdge", + "source": "https://ontology.growgraph.dev/graflo/EdgeActor", + "target": "https://ontology.growgraph.dev/graflo/Edge" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/executesTransform", + "kind": "objectProperty", + "label": "executesTransform", + "source": "https://ontology.growgraph.dev/graflo/TransformActor", + "target": "https://ontology.growgraph.dev/graflo/ProtoTransform" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasDress", + "kind": "objectProperty", + "label": "hasDress", + "source": "https://ontology.growgraph.dev/graflo/ProtoTransform", + "target": "https://ontology.growgraph.dev/graflo/DressConfig" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasKeySelection", + "kind": "objectProperty", + "label": "hasKeySelection", + "source": "https://ontology.growgraph.dev/graflo/ProtoTransform", + "target": "https://ontology.growgraph.dev/graflo/KeySelectionConfig" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasConnector", + "kind": "objectProperty", + "label": "hasConnector", + "source": "https://ontology.growgraph.dev/graflo/Bindings", + "target": "https://ontology.growgraph.dev/graflo/BoundConnector" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/bindsResourceToConnector", + "kind": "objectProperty", + "label": "bindsResourceToConnector", + "source": "https://ontology.growgraph.dev/graflo/Bindings", + "target": "https://ontology.growgraph.dev/graflo/ResourceConnectorBinding" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/bindsConnectorToConnProxy", + "kind": "objectProperty", + "label": "bindsConnectorToConnProxy", + "source": "https://ontology.growgraph.dev/graflo/Bindings", + "target": "https://ontology.growgraph.dev/graflo/ConnectorConnectionBinding" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasStagingProxy", + "kind": "objectProperty", + "label": "hasStagingProxy", + "source": "https://ontology.growgraph.dev/graflo/Bindings", + "target": "https://ontology.growgraph.dev/graflo/StagingProxyBinding" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasVertexIndex", + "kind": "objectProperty", + "label": "hasVertexIndex", + "source": "https://ontology.growgraph.dev/graflo/DatabaseProfile", + "target": "https://ontology.growgraph.dev/graflo/Index" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasEdgeSpec", + "kind": "objectProperty", + "label": "hasEdgeSpec", + "source": "https://ontology.growgraph.dev/graflo/DatabaseProfile", + "target": "https://ontology.growgraph.dev/graflo/EdgePhysicalSpec" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/refinesEdge", + "kind": "objectProperty", + "label": "refinesEdge", + "source": "https://ontology.growgraph.dev/graflo/EdgePhysicalSpec", + "target": "https://ontology.growgraph.dev/graflo/Edge" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasIndex", + "kind": "objectProperty", + "label": "hasIndex", + "source": "https://ontology.growgraph.dev/graflo/EdgePhysicalSpec", + "target": "https://ontology.growgraph.dev/graflo/Index" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasEdgeInferOnly", + "kind": "objectProperty", + "label": "hasEdgeInferOnly", + "source": "https://ontology.growgraph.dev/graflo/Resource", + "target": "https://ontology.growgraph.dev/graflo/EdgeInferSpec" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/hasEdgeInferExcept", + "kind": "objectProperty", + "label": "hasEdgeInferExcept", + "source": "https://ontology.growgraph.dev/graflo/Resource", + "target": "https://ontology.growgraph.dev/graflo/EdgeInferSpec" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/fieldType", + "kind": "objectProperty", + "label": "fieldType", + "source": "https://ontology.growgraph.dev/graflo/Field", + "target": "https://ontology.growgraph.dev/graflo/FieldType" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/dbFlavor", + "kind": "objectProperty", + "label": "dbFlavor", + "source": "https://ontology.growgraph.dev/graflo/DatabaseProfile", + "target": "https://ontology.growgraph.dev/graflo/DBType" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/edgesOnDuplicate", + "kind": "objectProperty", + "label": "edgesOnDuplicate", + "source": "https://ontology.growgraph.dev/graflo/IngestionModel", + "target": "https://ontology.growgraph.dev/graflo/EdgeDuplicatePolicy" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/transformTarget", + "kind": "objectProperty", + "label": "transformTarget", + "source": "https://ontology.growgraph.dev/graflo/ProtoTransform", + "target": "https://ontology.growgraph.dev/graflo/TransformTarget" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/transformStrategy", + "kind": "objectProperty", + "label": "transformStrategy", + "source": "https://ontology.growgraph.dev/graflo/Transform", + "target": "https://ontology.growgraph.dev/graflo/TransformStrategy" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/keySelectionMode", + "kind": "objectProperty", + "label": "keySelectionMode", + "source": "https://ontology.growgraph.dev/graflo/KeySelectionConfig", + "target": "https://ontology.growgraph.dev/graflo/KeySelectionMode" + }, + { + "id": "prop:https://ontology.growgraph.dev/graflo/boundSourceKind", + "kind": "objectProperty", + "label": "boundSourceKind", + "source": "https://ontology.growgraph.dev/graflo/BoundConnector", + "target": "https://ontology.growgraph.dev/graflo/BoundSourceKind" + } + ], + "nodeHeight": 40, + "nodeWidth": 168, + "nodes": [ + { + "comment": null, + "id": "http://www.w3.org/ns/prov#Activity", + "kind": "external", + "label": "Activity", + "local": "Activity" + }, + { + "comment": null, + "id": "http://www.w3.org/ns/prov#Entity", + "kind": "external", + "label": "Entity", + "local": "Entity" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/Actor", + "kind": "gf", + "label": "Actor", + "local": "Actor" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/Bindings", + "kind": "gf", + "label": "Bindings", + "local": "Bindings" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/BoundConnector", + "kind": "gf", + "label": "BoundConnector", + "local": "BoundConnector" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/BoundSourceKind", + "kind": "gf", + "label": "BoundSourceKind", + "local": "BoundSourceKind" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/ConnectorConnectionBinding", + "kind": "gf", + "label": "ConnectorConnectionBinding", + "local": "ConnectorConnectionBinding" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/CoreSchema", + "kind": "gf", + "label": "CoreSchema", + "local": "CoreSchema" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/DBType", + "kind": "enum", + "label": "DBType", + "local": "DBType" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/DatabaseProfile", + "kind": "gf", + "label": "DatabaseProfile", + "local": "DatabaseProfile" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/DescendActor", + "kind": "gf", + "label": "DescendActor", + "local": "DescendActor" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/DressConfig", + "kind": "gf", + "label": "DressConfig", + "local": "DressConfig" + }, + { + "comment": "Manifest-facing edge item in edge_config.edges.", + "id": "https://ontology.growgraph.dev/graflo/Edge", + "kind": "gf", + "label": "Edge", + "local": "Edge" + }, + { + "comment": "Manifest-facing actor name (edge).", + "id": "https://ontology.growgraph.dev/graflo/EdgeActor", + "kind": "gf", + "label": "EdgeActor", + "local": "EdgeActor" + }, + { + "comment": "Manifest-facing edge_config block containing edge declarations.", + "id": "https://ontology.growgraph.dev/graflo/EdgeConfig", + "kind": "gf", + "label": "EdgeConfig", + "local": "EdgeConfig" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/EdgeDuplicatePolicy", + "kind": "enum", + "label": "EdgeDuplicatePolicy", + "local": "EdgeDuplicatePolicy" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/EdgeInferSpec", + "kind": "gf", + "label": "EdgeInferSpec", + "local": "EdgeInferSpec" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/EdgePhysicalSpec", + "kind": "gf", + "label": "EdgePhysicalSpec", + "local": "EdgePhysicalSpec" + }, + { + "comment": "Manifest-facing field item used in vertex.properties and edge.properties.", + "id": "https://ontology.growgraph.dev/graflo/Field", + "kind": "gf", + "label": "Field", + "local": "Field" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/FieldType", + "kind": "enum", + "label": "FieldType", + "local": "FieldType" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/FileConnector", + "kind": "gf", + "label": "FileConnector", + "local": "FileConnector" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/GrafloArtifact", + "kind": "gf", + "label": "Artifact", + "local": "GrafloArtifact" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/GraphManifest", + "kind": "gf", + "label": "GraphManifest", + "local": "GraphManifest" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/GraphMetadata", + "kind": "gf", + "label": "Metadata", + "local": "GraphMetadata" + }, + { + "comment": "Manifest-facing identity element (string) promoted to a first-class ontology node for traceability.", + "id": "https://ontology.growgraph.dev/graflo/Identity", + "kind": "gf", + "label": "Identity", + "local": "Identity" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/Index", + "kind": "gf", + "label": "Index", + "local": "Index" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/IngestionModel", + "kind": "gf", + "label": "IngestionModel", + "local": "IngestionModel" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/KeySelectionConfig", + "kind": "gf", + "label": "KeySelectionConfig", + "local": "KeySelectionConfig" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/KeySelectionMode", + "kind": "enum", + "label": "KeySelectionMode", + "local": "KeySelectionMode" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/ProtoTransform", + "kind": "gf", + "label": "ProtoTransform", + "local": "ProtoTransform" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/Resource", + "kind": "gf", + "label": "Resource", + "local": "Resource" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/ResourceConnectorBinding", + "kind": "gf", + "label": "ResourceConnectorBinding", + "local": "ResourceConnectorBinding" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/Schema", + "kind": "gf", + "label": "Schema", + "local": "Schema" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/SparqlConnector", + "kind": "gf", + "label": "SparqlConnector", + "local": "SparqlConnector" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/StagingProxyBinding", + "kind": "gf", + "label": "StagingProxyBinding", + "local": "StagingProxyBinding" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/TableConnector", + "kind": "gf", + "label": "TableConnector", + "local": "TableConnector" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/Transform", + "kind": "gf", + "label": "Transform", + "local": "Transform" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/TransformActor", + "kind": "gf", + "label": "TransformActor", + "local": "TransformActor" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/TransformStrategy", + "kind": "gf", + "label": "TransformStrategy", + "local": "TransformStrategy" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/TransformTarget", + "kind": "gf", + "label": "TransformTarget", + "local": "TransformTarget" + }, + { + "comment": "Manifest-facing vertex item in vertex_config.vertices.", + "id": "https://ontology.growgraph.dev/graflo/Vertex", + "kind": "gf", + "label": "Vertex", + "local": "Vertex" + }, + { + "comment": "Manifest-facing actor name (vertex).", + "id": "https://ontology.growgraph.dev/graflo/VertexActor", + "kind": "gf", + "label": "VertexActor", + "local": "VertexActor" + }, + { + "comment": "Manifest-facing vertex_config block containing vertex declarations and identity policy.", + "id": "https://ontology.growgraph.dev/graflo/VertexConfig", + "kind": "gf", + "label": "VertexConfig", + "local": "VertexConfig" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/VertexProducingActor", + "kind": "gf", + "label": "VertexProducingActor", + "local": "VertexProducingActor" + }, + { + "comment": null, + "id": "https://ontology.growgraph.dev/graflo/VertexRouterActor", + "kind": "gf", + "label": "VertexRouterActor", + "local": "VertexRouterActor" + } + ], + "ontology": "https://ontology.growgraph.dev/graflo", + "version": "1.0.0" +} diff --git a/docs/assets/graflo-ontology-viz/graph-view.css b/docs/assets/graflo-ontology-viz/graph-view.css new file mode 100644 index 00000000..95a32feb --- /dev/null +++ b/docs/assets/graflo-ontology-viz/graph-view.css @@ -0,0 +1,322 @@ +:root { + --gf-node: #2e7d32; + --gf-node-border: #1b5e20; + --enum-node: #ef6c00; + --enum-node-border: #e65100; + --external-node: #5e35b1; + --external-node-border: #4527a0; + --edge-subclass: #546e7a; + --edge-object: #1565c0; + --edge-datatype: #6d4c41; + --panel-bg: #fafafa; + --panel-border: #ddd; + --text: #212121; + --muted: #616161; +} + +* { + box-sizing: border-box; +} + +html, +body { + margin: 0; + height: 100%; + font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; + color: var(--text); + background: #fff; +} + +body.embed { + overflow: hidden; +} + +.layout { + display: grid; + grid-template-columns: 280px 1fr; + height: 100vh; +} + +body.embed .layout { + grid-template-columns: 1fr; +} + +.sidebar { + border-right: 1px solid var(--panel-border); + background: var(--panel-bg); + padding: 12px 14px; + overflow: auto; +} + +body.embed .sidebar { + display: none; +} + +.sidebar h1 { + font-size: 1rem; + margin: 0 0 8px; +} + +.sidebar p { + margin: 0 0 12px; + color: var(--muted); + font-size: 0.85rem; + line-height: 1.4; +} + +.control { + margin-bottom: 12px; +} + +.control label { + display: block; + font-size: 0.78rem; + font-weight: 600; + margin-bottom: 4px; + color: var(--muted); + text-transform: uppercase; + letter-spacing: 0.03em; +} + +.control input[type="search"], +.control select { + width: 100%; + padding: 8px 10px; + border: 1px solid var(--panel-border); + border-radius: 6px; + font-size: 0.9rem; +} + +.control-row { + display: flex; + gap: 8px; + flex-wrap: wrap; +} + +button { + border: 1px solid var(--panel-border); + background: #fff; + border-radius: 6px; + padding: 7px 10px; + font-size: 0.85rem; + cursor: pointer; +} + +button:hover { + background: #f0f0f0; +} + +.toggle { + display: flex; + align-items: center; + gap: 8px; + font-size: 0.88rem; +} + +.legend { + margin-top: 16px; + font-size: 0.82rem; +} + +.legend-item { + display: flex; + align-items: center; + gap: 8px; + margin-bottom: 6px; +} + +.swatch { + width: 14px; + height: 14px; + border-radius: 4px; + border: 1px solid rgba(0, 0, 0, 0.2); +} + +.details { + margin-top: 16px; + padding-top: 12px; + border-top: 1px solid var(--panel-border); + font-size: 0.85rem; +} + +.details h2 { + font-size: 0.95rem; + margin: 0 0 6px; +} + +.details .uri { + word-break: break-all; + color: var(--muted); + font-size: 0.78rem; +} + +.details .comment { + margin-top: 8px; + line-height: 1.4; +} + +.graph-shell { + position: relative; + min-height: 0; + height: 100%; + overflow: hidden; + background: + linear-gradient(#eceff1 1px, transparent 1px) 0 0 / 24px 24px, + linear-gradient(90deg, #eceff1 1px, transparent 1px) 0 0 / 24px 24px, + #fafafa; +} + +body.embed .graph-shell { + height: 100vh; +} + +.toolbar { + position: absolute; + top: 10px; + right: 10px; + left: 10px; + z-index: 2; + display: flex; + gap: 8px; + align-items: center; + flex-wrap: wrap; +} + +.embed-toolbar input[type="search"] { + min-width: 180px; + flex: 1; + max-width: 280px; + padding: 7px 10px; + border: 1px solid var(--panel-border); + border-radius: 6px; + font-size: 0.85rem; +} + +.embed-toolbar select { + min-width: 140px; + padding: 7px 10px; + border: 1px solid var(--panel-border); + border-radius: 6px; + font-size: 0.85rem; +} + +.embed-toolbar .toggle { + font-size: 0.82rem; + white-space: nowrap; +} + +#graph { + display: block; + width: 100%; + height: 100%; + cursor: grab; +} + +#graph.dragging { + cursor: grabbing; +} + +#graph.dragging-node { + cursor: move; +} + +.node rect { + stroke-width: 1.5px; +} + +.node text { + font-size: 12px; + pointer-events: none; + fill: #fff; + font-weight: 600; +} + +.node.kind-gf rect { + fill: var(--gf-node); + stroke: var(--gf-node-border); +} + +.node.kind-enum rect { + fill: var(--enum-node); + stroke: var(--enum-node-border); +} + +.node.kind-external rect { + fill: var(--external-node); + stroke: var(--external-node-border); +} + +.node.dimmed rect { + opacity: 0.1; +} + +.node.dimmed text { + opacity: 0.2; +} + +.node.selected rect { + stroke-width: 3px; + filter: drop-shadow(0 0 4px rgba(0, 0, 0, 0.25)); +} + +.edge { + fill: none; +} + +.edge.kind-subClassOf { + stroke: var(--edge-subclass); + stroke-width: 3.4px; +} + +.edge.kind-subClassOfReverse { + stroke: #7b5ea7; + stroke-width: 1.6px; + stroke-dasharray: 5 4; + opacity: 0.82; +} + +.edge.kind-equivalentClass { + stroke: #009688; + stroke-width: 2.1px; + stroke-dasharray: 7 3; +} + +.edge.kind-objectProperty { + stroke: var(--edge-object); + stroke-width: 1.25px; + stroke-dasharray: 6 4; + opacity: 0.9; +} + +.edge.kind-datatypeProperty { + stroke: var(--edge-datatype); + stroke-width: 1.25px; + stroke-dasharray: 2 4; + opacity: 0.9; +} + +.edge.dimmed { + opacity: 0.08; +} + +.edge-label.dimmed { + opacity: 0.1; +} + +.edge-label { + font-size: 10px; + fill: var(--muted); + pointer-events: none; +} + +.hint { + position: absolute; + left: 12px; + bottom: 10px; + z-index: 2; + font-size: 0.78rem; + color: var(--muted); + background: rgba(255, 255, 255, 0.85); + padding: 4px 8px; + border-radius: 4px; +} diff --git a/docs/assets/graflo-ontology-viz/graph-view.js b/docs/assets/graflo-ontology-viz/graph-view.js new file mode 100644 index 00000000..245f025b --- /dev/null +++ b/docs/assets/graflo-ontology-viz/graph-view.js @@ -0,0 +1,764 @@ +(function () { + "use strict"; + + const data = window.GRAFLO_ONTOLOGY_GRAPH; + if (!data) { + return; + } + + const LAYOUT = { + isoPad: 20, + isoGap: 14, + hierarchyGapX: 100, + vGap: 118, + linkDistance: 100, + maxTicks: 420, + alphaMin: 0.001, + minGapX: 24, + minGapY: 20, + resolvePasses: 70, + }; + + // Ignore sub-pixel jitter so tap/click can select nodes instead of being treated as drag. + const DRAG_THRESHOLD_PX = 5; + + const svg = document.getElementById("graph"); + const viewport = document.createElementNS("http://www.w3.org/2000/svg", "g"); + svg.appendChild(viewport); + + const defs = document.createElementNS("http://www.w3.org/2000/svg", "defs"); + [ + ["subClassOf", "#546e7a", 8], + ["subClassOfReverse", "#7b5ea7", 7], + ["equivalentClass", "#009688", 7], + ["objectProperty", "#1565c0", 6], + ["datatypeProperty", "#6d4c41", 6], + ].forEach(function (entry) { + const kind = entry[0]; + const color = entry[1]; + const size = entry[2]; + const marker = document.createElementNS("http://www.w3.org/2000/svg", "marker"); + marker.setAttribute("id", "arrow-" + kind); + marker.setAttribute("viewBox", "0 -4 8 8"); + marker.setAttribute("refX", 7); + marker.setAttribute("refY", 0); + marker.setAttribute("markerWidth", size); + marker.setAttribute("markerHeight", size); + marker.setAttribute("orient", "auto"); + const path = document.createElementNS("http://www.w3.org/2000/svg", "path"); + path.setAttribute("d", "M0,-4L8,0L0,4"); + path.setAttribute("fill", color); + marker.appendChild(path); + defs.appendChild(marker); + }); + svg.insertBefore(defs, viewport); + + const nodeById = new Map( + data.nodes.map(function (node) { + return [node.id, node]; + }), + ); + const subclassEdges = data.edges.filter(function (edge) { + return edge.kind === "subClassOf"; + }); + + const state = { + scale: 1, + tx: 40, + ty: 40, + selectedId: null, + search: "", + relationMode: "all", + draggingViewport: false, + draggingNodeId: null, + dragMoved: false, + pointerDownX: 0, + pointerDownY: 0, + lastX: 0, + lastY: 0, + }; + + function pointerTravelPx(event) { + const dx = event.clientX - state.pointerDownX; + const dy = event.clientY - state.pointerDownY; + return Math.hypot(dx, dy); + } + + function markDragIfNeeded(event) { + if (!state.dragMoved && pointerTravelPx(event) >= DRAG_THRESHOLD_PX) { + state.dragMoved = true; + } + } + + function truncate(text, max) { + if (text.length <= max) { + return text; + } + return text.slice(0, max - 1) + "…"; + } + + function isOverlapping(a, b) { + const minDx = data.nodeWidth + LAYOUT.minGapX; + const minDy = data.nodeHeight + LAYOUT.minGapY; + return Math.abs(a.x - b.x) < minDx && Math.abs(a.y - b.y) < minDy; + } + + function resolveOverlaps(nodes) { + for (let pass = 0; pass < LAYOUT.resolvePasses; pass += 1) { + let moved = false; + for (let i = 0; i < nodes.length; i += 1) { + for (let j = i + 1; j < nodes.length; j += 1) { + const a = nodes[i]; + const b = nodes[j]; + if (!isOverlapping(a, b)) { + continue; + } + + const minDx = data.nodeWidth + LAYOUT.minGapX; + const minDy = data.nodeHeight + LAYOUT.minGapY; + const dx = (b.x - a.x) || 0.1; + const dy = (b.y - a.y) || 0.1; + const overlapX = minDx - Math.abs(dx); + const overlapY = minDy - Math.abs(dy); + if (overlapX <= 0 || overlapY <= 0) { + continue; + } + + if (overlapX < overlapY) { + const push = overlapX / 2; + const dir = dx >= 0 ? 1 : -1; + a.x -= push * dir; + b.x += push * dir; + } else { + const push = overlapY / 2; + const dir = dy >= 0 ? 1 : -1; + a.y -= push * dir; + b.y += push * dir; + } + moved = true; + } + } + if (!moved) { + break; + } + } + } + + function computeLevels(nodeIds, edges) { + const children = new Map(); + nodeIds.forEach(function (id) { + children.set(id, []); + }); + edges.forEach(function (edge) { + if (!children.has(edge.target)) { + children.set(edge.target, []); + } + children.get(edge.target).push(edge.source); + }); + + const isChild = new Set(edges.map(function (edge) { + return edge.source; + })); + const roots = nodeIds.filter(function (id) { + return !isChild.has(id); + }); + + const level = new Map(); + const queue = roots.map(function (id) { + return [id, 0]; + }); + while (queue.length) { + const item = queue.shift(); + const id = item[0]; + const lv = item[1]; + if (level.has(id)) { + continue; + } + level.set(id, lv); + (children.get(id) || []).forEach(function (child) { + queue.push([child, lv + 1]); + }); + } + nodeIds.forEach(function (id) { + if (!level.has(id)) { + level.set(id, 0); + } + }); + return level; + } + + function placeIsolatedNodes(isolated, offsetX, offsetY) { + isolated.forEach(function (node, index) { + node.layoutGroup = "isolated"; + node.x = offsetX; + node.y = offsetY + index * (data.nodeHeight + LAYOUT.isoGap); + }); + if (!isolated.length) { + return { width: 0, height: 0 }; + } + return { + width: data.nodeWidth, + height: isolated.length * (data.nodeHeight + LAYOUT.isoGap) - LAYOUT.isoGap, + }; + } + + function runSubclassForceLayout(hierarchyNodes, edges, originX, originY, levels) { + const links = edges + .map(function (edge) { + return { + source: nodeById.get(edge.source), + target: nodeById.get(edge.target), + }; + }) + .filter(function (link) { + return link.source && link.target; + }); + + hierarchyNodes.forEach(function (node, index) { + node.layoutGroup = "hierarchy"; + node.level = levels.get(node.id) || 0; + node.x = originX + (index % 2) * 36 + (Math.random() - 0.5) * 4; + node.y = originY + node.level * LAYOUT.vGap; + node.vx = 0; + node.vy = 0; + }); + + let alpha = 1; + const centerX = originX + data.nodeWidth * 0.4; + for (let tick = 0; tick < LAYOUT.maxTicks && alpha > LAYOUT.alphaMin; tick += 1) { + hierarchyNodes.forEach(function (node) { + node.vx = 0; + node.vy = 0; + }); + + for (let i = 0; i < hierarchyNodes.length; i += 1) { + for (let j = i + 1; j < hierarchyNodes.length; j += 1) { + const a = hierarchyNodes[i]; + const b = hierarchyNodes[j]; + let dx = b.x - a.x; + let dy = b.y - a.y; + const dist = Math.sqrt(dx * dx + dy * dy) || 1; + const repulse = (620 * alpha) / dist; + dx = (dx / dist) * repulse; + dy = (dy / dist) * repulse; + a.vx -= dx; + a.vy -= dy; + b.vx += dx; + b.vy += dy; + } + } + + links.forEach(function (link) { + let dx = link.target.x - link.source.x; + let dy = link.target.y - link.source.y; + const dist = Math.sqrt(dx * dx + dy * dy) || 1; + const strength = 0.55 * alpha; + const delta = ((dist - LAYOUT.linkDistance) / dist) * strength; + dx *= delta; + dy *= delta; + link.source.vx += dx; + link.source.vy += dy; + link.target.vx -= dx; + link.target.vy -= dy; + }); + + hierarchyNodes.forEach(function (node) { + const targetY = originY + node.level * LAYOUT.vGap; + node.vy += (targetY - node.y) * 0.42 * alpha; + node.vx += (centerX - node.x) * 0.08 * alpha; + }); + + hierarchyNodes.forEach(function (node) { + node.x += node.vx * 0.18; + node.y += node.vy * 0.18; + }); + + alpha *= 0.965 + } + } + + function computeLayout() { + const hierarchyIds = new Set(); + subclassEdges.forEach(function (edge) { + hierarchyIds.add(edge.source); + hierarchyIds.add(edge.target); + }); + + const isolated = data.nodes.filter(function (node) { + return !hierarchyIds.has(node.id); + }); + const hierarchy = data.nodes.filter(function (node) { + return hierarchyIds.has(node.id); + }); + + const isoBox = placeIsolatedNodes(isolated, LAYOUT.isoPad, LAYOUT.isoPad); + const hierarchyOriginX = LAYOUT.isoPad + isoBox.width + LAYOUT.hierarchyGapX; + const hierarchyOriginY = LAYOUT.isoPad; + const levels = computeLevels( + hierarchy.map(function (node) { + return node.id; + }), + subclassEdges, + ); + runSubclassForceLayout(hierarchy, subclassEdges, hierarchyOriginX, hierarchyOriginY, levels); + resolveOverlaps(hierarchy); + resolveOverlaps(isolated); + resolveOverlaps(data.nodes); + data.bounds = computeBounds(data.nodes); + } + + function computeBounds(nodes) { + const xs = nodes.map(function (node) { + return node.x; + }); + const ys = nodes.map(function (node) { + return node.y; + }); + if (!xs.length) { + return { minX: 0, minY: 0, maxX: data.nodeWidth, maxY: data.nodeHeight }; + } + return { + minX: Math.min.apply(null, xs), + minY: Math.min.apply(null, ys), + maxX: Math.max.apply(null, xs) + data.nodeWidth, + maxY: Math.max.apply(null, ys) + data.nodeHeight, + }; + } + + function nodeAnchor(node, toward) { + const cx = node.x + data.nodeWidth / 2; + const cy = node.y + data.nodeHeight / 2; + const tx = toward.x + data.nodeWidth / 2; + const ty = toward.y + data.nodeHeight / 2; + const dx = tx - cx; + const dy = ty - cy; + if (!dx && !dy) { + return { x: cx, y: cy }; + } + const hw = data.nodeWidth / 2; + const hh = data.nodeHeight / 2; + const scale = Math.min(hw / (Math.abs(dx) || 1e-6), hh / (Math.abs(dy) || 1e-6)); + return { x: cx + dx * scale, y: cy + dy * scale }; + } + + function edgePath(source, target) { + const start = nodeAnchor(source, target); + const end = nodeAnchor(target, source); + const mx = (start.x + end.x) / 2; + const my = (start.y + end.y) / 2; + return "M" + start.x + "," + start.y + " Q" + mx + "," + my + " " + end.x + "," + end.y; + } + + function edgeVisibleByMode(edge) { + if (state.relationMode === "all") { + return true; + } + if (state.relationMode === "taxonomy") { + return edge.kind === "subClassOf" || edge.kind === "equivalentClass"; + } + if (state.relationMode === "has") { + return edge.label.toLowerCase().startsWith("has"); + } + return true; + } + + function getVisibleEdges() { + return data.edges.filter(edgeVisibleByMode); + } + + function renderEdges() { + viewport.querySelectorAll(".edge-layer").forEach(function (el) { + el.remove(); + }); + const layer = document.createElementNS("http://www.w3.org/2000/svg", "g"); + layer.setAttribute("class", "edge-layer"); + + getVisibleEdges().forEach(function (edge) { + const source = nodeById.get(edge.source); + const target = nodeById.get(edge.target); + if (!source || !target) { + return; + } + + const path = document.createElementNS("http://www.w3.org/2000/svg", "path"); + path.setAttribute("d", edgePath(source, target)); + path.setAttribute("class", "edge kind-" + edge.kind); + path.setAttribute("marker-end", "url(#arrow-" + edge.kind + ")"); + path.dataset.edgeId = edge.id; + layer.appendChild(path); + + if (edge.kind === "subClassOf") { + const reverse = document.createElementNS("http://www.w3.org/2000/svg", "path"); + reverse.setAttribute("d", edgePath(target, source)); + reverse.setAttribute("class", "edge edge-reverse kind-subClassOfReverse"); + reverse.setAttribute("marker-end", "url(#arrow-subClassOfReverse)"); + reverse.dataset.edgeId = edge.id + ":reverse"; + layer.appendChild(reverse); + } else if (edge.kind === "equivalentClass") { + const reverseEq = document.createElementNS("http://www.w3.org/2000/svg", "path"); + reverseEq.setAttribute("d", edgePath(target, source)); + reverseEq.setAttribute("class", "edge kind-equivalentClass"); + reverseEq.setAttribute("marker-end", "url(#arrow-equivalentClass)"); + reverseEq.dataset.edgeId = edge.id + ":reverse"; + layer.appendChild(reverseEq); + } else { + const label = document.createElementNS("http://www.w3.org/2000/svg", "text"); + label.setAttribute("class", "edge-label"); + label.setAttribute("x", (source.x + target.x + data.nodeWidth) / 2); + label.setAttribute("y", (source.y + target.y + data.nodeHeight) / 2); + label.setAttribute("text-anchor", "middle"); + label.dataset.edgeId = edge.id; + label.textContent = edge.label; + layer.appendChild(label); + } + }); + viewport.insertBefore(layer, viewport.firstChild); + } + + function renderNodes() { + viewport.querySelectorAll(".node").forEach(function (el) { + el.remove(); + }); + data.nodes.forEach(function (node) { + const group = document.createElementNS("http://www.w3.org/2000/svg", "g"); + group.setAttribute("class", "node kind-" + node.kind + " group-" + node.layoutGroup); + group.setAttribute("transform", "translate(" + node.x + "," + node.y + ")"); + group.dataset.nodeId = node.id; + + const rect = document.createElementNS("http://www.w3.org/2000/svg", "rect"); + rect.setAttribute("width", data.nodeWidth); + rect.setAttribute("height", data.nodeHeight); + rect.setAttribute("rx", 8); + rect.setAttribute("ry", 8); + group.appendChild(rect); + + const text = document.createElementNS("http://www.w3.org/2000/svg", "text"); + text.setAttribute("x", data.nodeWidth / 2); + text.setAttribute("y", data.nodeHeight / 2 + 4); + text.setAttribute("text-anchor", "middle"); + text.textContent = truncate(node.label, 18); + group.appendChild(text); + + const title = document.createElementNS("http://www.w3.org/2000/svg", "title"); + title.textContent = node.label; + group.appendChild(title); + + viewport.appendChild(group); + }); + } + + function reRenderGraph() { + data.bounds = computeBounds(data.nodes); + renderNodes(); + renderEdges(); + applyHighlight(); + } + + function applyTransform() { + viewport.setAttribute( + "transform", + "translate(" + state.tx + "," + state.ty + ") scale(" + state.scale + ")", + ); + } + + function fitToScreen() { + const shell = document.querySelector(".graph-shell"); + const pad = 48; + const bounds = data.bounds; + const graphW = bounds.maxX - bounds.minX; + const graphH = bounds.maxY - bounds.minY; + const viewW = shell.clientWidth - pad * 2; + const viewH = shell.clientHeight - pad * 2; + if (viewW <= 0 || viewH <= 0 || graphW <= 0 || graphH <= 0) { + state.scale = 1; + state.tx = pad - bounds.minX; + state.ty = pad - bounds.minY; + applyTransform(); + return; + } + state.scale = Math.min(viewW / graphW, viewH / graphH, 1.3); + state.tx = pad - bounds.minX * state.scale + (viewW - graphW * state.scale) / 2; + state.ty = pad - bounds.minY * state.scale + (viewH - graphH * state.scale) / 2; + applyTransform(); + } + + function matchesSearch(node) { + if (!state.search) { + return true; + } + const q = state.search.toLowerCase(); + return node.local.toLowerCase().includes(q) || node.label.toLowerCase().includes(q); + } + + function neighborhood(nodeId) { + const related = new Set([nodeId]); + getVisibleEdges().forEach(function (edge) { + if (edge.source === nodeId) { + related.add(edge.target); + } + if (edge.target === nodeId) { + related.add(edge.source); + } + }); + return related; + } + + function edgeIsIncidentToSelected(edge, selectedId) { + return edge.source === selectedId || edge.target === selectedId; + } + + function findVisibleEdge(edgeId) { + return getVisibleEdges().find(function (item) { + return item.id === edgeId; + }); + } + + function applyHighlight() { + const hasSelection = Boolean(state.selectedId); + const hasSearch = Boolean(state.search); + const focus = hasSelection ? neighborhood(state.selectedId) : null; + + viewport.querySelectorAll(".node").forEach(function (group) { + const nodeId = group.dataset.nodeId; + const node = nodeById.get(nodeId); + let dim = false; + if (hasSearch && node && !matchesSearch(node)) { + dim = true; + } + if (hasSelection && !focus.has(nodeId)) { + dim = true; + } + group.classList.toggle("dimmed", dim); + group.classList.toggle("selected", nodeId === state.selectedId); + }); + + viewport.querySelectorAll(".edge").forEach(function (edgeEl) { + const edgeId = edgeEl.dataset.edgeId.replace(/:reverse$/, ""); + const edge = findVisibleEdge(edgeId); + let dim = false; + if (hasSelection && edge && !edgeIsIncidentToSelected(edge, state.selectedId)) { + dim = true; + } + edgeEl.classList.toggle("dimmed", dim); + }); + + viewport.querySelectorAll(".edge-label").forEach(function (labelEl) { + const edge = findVisibleEdge(labelEl.dataset.edgeId); + let dim = false; + if (hasSelection && edge && !edgeIsIncidentToSelected(edge, state.selectedId)) { + dim = true; + } + labelEl.classList.toggle("dimmed", dim); + }); + } + + function selectNode(nodeId) { + state.selectedId = state.selectedId === nodeId ? null : nodeId; + updateDetails(); + applyHighlight(); + } + + function updateDetails() { + const panel = document.getElementById("details"); + if (!panel) { + return; + } + if (!state.selectedId) { + panel.innerHTML = "

Select a class to inspect its IRI and description.

"; + return; + } + const node = nodeById.get(state.selectedId); + if (!node) { + return; + } + const props = data.edges.filter(function (edge) { + return edge.kind !== "subClassOf" && (edge.source === node.id || edge.target === node.id); + }); + const subclasses = data.edges + .filter(function (edge) { + return edge.kind === "subClassOf" && edge.target === node.id; + }) + .map(function (edge) { + return nodeById.get(edge.source); + }) + .filter(Boolean); + const parents = data.edges + .filter(function (edge) { + return edge.kind === "subClassOf" && edge.source === node.id; + }) + .map(function (edge) { + return nodeById.get(edge.target); + }) + .filter(Boolean); + + let html = "

" + node.label + "

"; + html += "
" + node.id + "
"; + if (node.comment) { + html += "
" + node.comment + "
"; + } + if (node.layoutGroup === "isolated") { + html += "

Layout: standalone class (no subClassOf links)

"; + } + if (parents.length) { + html += "

Parents: " + parents.map(function (p) { + return p.local; + }).join(", ") + "

"; + } + if (subclasses.length) { + html += "

Subclasses: " + subclasses.map(function (p) { + return p.local; + }).join(", ") + "

"; + } + if (props.length) { + html += "

Properties:

"; + } + panel.innerHTML = html; + } + + function nodeFromEventTarget(target) { + const group = target.closest ? target.closest(".node") : null; + if (!group) { + return null; + } + const nodeId = group.dataset.nodeId; + return nodeById.get(nodeId) || null; + } + + function bindControls() { + const search = document.getElementById("search"); + const relationFilter = document.getElementById("relation-filter"); + if (search) { + search.addEventListener("input", function (event) { + state.search = event.target.value.trim(); + applyHighlight(); + }); + } + if (relationFilter) { + relationFilter.addEventListener("change", function (event) { + state.relationMode = event.target.value; + renderEdges(); + applyHighlight(); + }); + } + + document.getElementById("fit-button").addEventListener("click", fitToScreen); + document.getElementById("reset-button").addEventListener("click", function () { + state.selectedId = null; + state.search = ""; + if (search) { + search.value = ""; + } + if (relationFilter) { + relationFilter.value = "all"; + } + state.relationMode = "all"; + updateDetails(); + renderEdges(); + applyHighlight(); + fitToScreen(); + }); + + svg.addEventListener("wheel", function (event) { + event.preventDefault(); + const delta = event.deltaY > 0 ? 0.92 : 1.08; + const rect = svg.getBoundingClientRect(); + const px = event.clientX - rect.left; + const py = event.clientY - rect.top; + state.tx = px - (px - state.tx) * delta; + state.ty = py - (py - state.ty) * delta; + state.scale *= delta; + applyTransform(); + }, { passive: false }); + + svg.addEventListener("mousedown", function (event) { + state.dragMoved = false; + state.pointerDownX = event.clientX; + state.pointerDownY = event.clientY; + const node = nodeFromEventTarget(event.target); + state.lastX = event.clientX; + state.lastY = event.clientY; + if (node) { + state.draggingNodeId = node.id; + svg.classList.add("dragging-node"); + return; + } + state.draggingViewport = true; + svg.classList.add("dragging"); + }); + + window.addEventListener("mouseup", function (event) { + const pendingNodeId = state.draggingNodeId; + const draggedNode = pendingNodeId ? nodeById.get(pendingNodeId) : null; + if (draggedNode && state.dragMoved) { + resolveOverlaps(data.nodes); + reRenderGraph(); + } else if (pendingNodeId && !state.dragMoved) { + selectNode(pendingNodeId); + } + state.draggingNodeId = null; + state.draggingViewport = false; + svg.classList.remove("dragging"); + svg.classList.remove("dragging-node"); + }); + + window.addEventListener("mousemove", function (event) { + markDragIfNeeded(event); + const dx = event.clientX - state.lastX; + const dy = event.clientY - state.lastY; + state.lastX = event.clientX; + state.lastY = event.clientY; + + if (state.draggingNodeId) { + if (!state.dragMoved) { + return; + } + const node = nodeById.get(state.draggingNodeId); + if (!node) { + return; + } + node.x += dx / state.scale; + node.y += dy / state.scale; + reRenderGraph(); + return; + } + + if (!state.draggingViewport) { + return; + } + markDragIfNeeded(event); + state.tx += dx; + state.ty += dy; + applyTransform(); + }); + + svg.addEventListener("click", function (event) { + const wasDrag = state.dragMoved; + state.dragMoved = false; + if (wasDrag) { + return; + } + if (nodeFromEventTarget(event.target)) { + return; + } + if (state.selectedId) { + state.selectedId = null; + updateDetails(); + applyHighlight(); + } + }); + } + + computeLayout(); + reRenderGraph(); + bindControls(); + updateDetails(); + fitToScreen(); + window.addEventListener("resize", fitToScreen); +})(); diff --git a/docs/assets/graflo-ontology-viz/index.html b/docs/assets/graflo-ontology-viz/index.html new file mode 100644 index 00000000..3b1cc55d --- /dev/null +++ b/docs/assets/graflo-ontology-viz/index.html @@ -0,0 +1,907 @@ + + + + + GraFlo Ontology (v1.0.0) + + + +
+ +
+
Drag · scroll zoom · click class
+ +
+
+ + + + diff --git a/docs/concepts/index.md b/docs/concepts/index.md index b478068a..d8ab476d 100644 --- a/docs/concepts/index.md +++ b/docs/concepts/index.md @@ -121,4 +121,4 @@ The overview above is continued in dedicated pages (formerly a single long docum - [Core components](core_components.md) — schema, ingestion, edges, DataSources, resources, actors, location scoping, transforms - [Features, migration, and practices](features_and_practices.md) — product features, `migrate_schema` CLI, performance notes, best practices -Focused topics: [Transforms](transforms.md), [Table connector views](table_connector_views.md) (SQL **`filters`** / **`view.where`** with logical operators `AND` / `OR` / `NOT` / `IF_THEN`), [Runtime connector updates](runtime_connector_updates.md) (patches, **`time_filter`** / **`ColumnTimeFilter`**, pushdown **`filters`**), [Backend indexes](backend_indexes.md), [Ingestion doc errors](ingestion_doc_errors.md), [Object storage (S3 staging)](object_storage.md), [Manifest evolution](manifest_evolution.md). +Focused topics: [Transforms](transforms.md), [Table connector views](table_connector_views.md) (SQL **`filters`** / **`view.where`** with logical operators `AND` / `OR` / `NOT` / `IF_THEN`), [Runtime connector updates](runtime_connector_updates.md) (patches, **`time_filter`** / **`ColumnTimeFilter`**, pushdown **`filters`**), [Backend indexes](backend_indexes.md), [Ingestion doc errors](ingestion_doc_errors.md), [Object storage (S3 staging)](object_storage.md), [Manifest evolution](manifest_evolution.md), [GraFlo ontology (manifest ↔ RDF)](../model/graflo_ontology.md). diff --git a/docs/contributing.md b/docs/contributing.md index 4a86e86b..17bc613d 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -55,6 +55,23 @@ We welcome contributions to GraFlo! This document provides guidelines and instru - Include examples in docstrings where appropriate - Update the changelog for significant changes +To build and preview the docs site locally: + +```bash +uv sync --extra docs +uv run mkdocs serve +``` + +If you edit the GraFlo meta-ontology (`graflo/rdf/ontology/graflo.ttl`), regenerate the interactive visualization and commit the updated assets: + +```bash +uv run python docs/scripts/build_ontology_viz.py +``` + +Visual tweaks and the graph viewer live in repo-owned files under `docs/scripts/ontology_viz/` and are copied into `docs/assets/graflo-ontology-viz/` at build time. **Do not edit packages inside `.venv`.** + +CI runs the same script and fails if `docs/assets/graflo-ontology-viz/` is out of date with the committed ontology. + ## Testing - Write tests for all new features diff --git a/docs/index.md b/docs/index.md index 860c7639..3645f7fd 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,15 +7,21 @@ [![pre-commit](https://github.com/growgraph/graflo/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/growgraph/graflo/actions/workflows/pre-commit.yml) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15446131.svg)]( https://doi.org/10.5281/zenodo.15446131) -**GraFlo** is a manifest-driven toolkit for **labeled property graphs (LPGs)**: describe vertices, edges, and ingestion (`GraphManifest` — YAML or Python), then project and load into a target graph database. +**GraFlo** is a manifest-driven schema and ingestion layer for **labeled property graphs (LPGs)**. +Write a `GraphManifest` (YAML or Python) once — it defines vertices, edges, typed properties, +identities, and DB profile — then infer, validate, migrate, and load into any supported graph engine. -It is a **Python package** and **Graph Schema & Transformation Language (GSTL)**. **`GraphEngine`** covers inference, DDL, and ingest; **`Caster`** focuses on batching records into a **`GraphContainer`** and **`DBWriter`**. +It is a **Python package** and **Graph Schema & Transformation Language (GSTL)**. **`GraphEngine`** covers schema inference, migrations, DDL, and ingest; **`Caster`** focuses on batching records into a **`GraphContainer`** and **`DBWriter`**. ### What you get - **One pipeline, several graph databases** — The same manifest targets ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, or NebulaGraph; `DatabaseProfile` and DB-aware types absorb naming, defaults, and indexing differences. - **Explicit identities** — Vertex identity fields and indexes back upserts so reloads merge on keys instead of blindly duplicating nodes. - **Reusable ingestion** — `ResourceConfig` actor pipelines (including **vertex** / **vertex_router** / **edge** steps) bind to files, SQL, SPARQL/RDF, APIs, or in-memory batches via `Bindings` and the `DataSourceRegistry`. A single flat row can populate multiple same-type vertices in distinct named slots (`role`) and emit multiple edges in one `edge: links` step. Per-resource **`tolerate_transform_errors`** (default on) keeps ingestion moving when an individual transform step fails. +- **Schema as the contract** — `GraphManifest` is the single source of truth: vertex/edge definitions, + typed properties, identity fields, and DB profile are validated at `finish_init` time, not at + write time. Schema migrations are first-class (`graflo migrate_schema`). +- **Manifest as linked data** — Export and restore manifests as RDF via the [GraFlo ontology](model/graflo_ontology.md) (`manifest-to-rdf` / `rdf-to-manifest` CLI, `graflo.rdf` API). - **Manifest-first sanitization** — `Sanitizer` (backed by `graflo.architecture.evolution` **`SanitizeOp`**) normalizes schema identifiers (reserved words, TigerGraph relation/index constraints) and synchronizes related ingestion mappings via `sanitize_manifest(GraphManifest)`. `GraphEngine.infer_manifest(...)` applies it automatically; lower-level `SQLInferenceManager` does not—sanitize the manifest yourself when assembling contracts outside the engine. ### What’s in the manifest @@ -119,7 +125,8 @@ For PostgreSQL workflows, `infer_manifest(...)` returns a full manifest contract ## More capabilities -- **SPARQL & RDF** — Endpoints and RDF files; optional OWL/RDFS schema inference (`rdflib`, `SPARQLWrapper` in the default install). +- **GraFlo ontology (manifest RDF)** — Publish and query manifests as linked data: OWL vocabulary at `https://ontology.growgraph.dev/graflo` (v1.0.0), plus `manifest-to-rdf` / `rdf-to-manifest` CLI and `graflo.rdf` serializers. See [GraFlo ontology](model/graflo_ontology.md). +- **SPARQL & RDF** — Endpoints and RDF files; optional OWL/RDFS **domain** schema inference (`rdflib`, `SPARQLWrapper` in the default install). - **Schema inference** — From PostgreSQL-style 3NF layouts (PK/FK heuristics) or from OWL/RDFS (`owl:Class` → vertices, `owl:ObjectProperty` → edges, `owl:DatatypeProperty` → vertex fields). See [Example 5](examples/example-5.md). - **Schema migrations** — Plan and apply guarded schema deltas (`migrate_schema` console script → `graflo.cli.migrate_schema`; library in `graflo.migrate`). Compare `from` / `to` schemas before execution to preview deltas and blocked high-risk operations. See [Concepts — Schema Migration](concepts/features_and_practices.md#schema-migration-v1). - **Typed `properties`** — Optional field types (`INT`, `FLOAT`, `STRING`, `DATETIME`, `BOOL`) on vertices and edges. @@ -132,6 +139,7 @@ For PostgreSQL workflows, `infer_manifest(...)` returns a full manifest contract - [Installation](getting_started/installation.md) - [Quick Start Guide](getting_started/quickstart.md) - [Concepts (architecture diagrams)](concepts/index.md) +- [GraFlo ontology — manifest ↔ RDF](model/graflo_ontology.md) - [Concepts — Schema Migration](concepts/features_and_practices.md#schema-migration-v1) - [Concepts — Comparing Two Schemas](concepts/features_and_practices.md#comparing-two-schemas) - [API Reference](reference/index.md) diff --git a/docs/model/graflo_ontology.md b/docs/model/graflo_ontology.md new file mode 100644 index 00000000..57455ac6 --- /dev/null +++ b/docs/model/graflo_ontology.md @@ -0,0 +1,159 @@ +# GraFlo ontology (meta-model RDF) + +GraFlo ships an **OWL ontology** that describes GraFlo’s own configuration language — not your domain knowledge graph, but the **manifest contract**: `GraphManifest`, `Schema`, `IngestionModel`, `Resource` pipelines (YAML `ResourceConfig`), `ProtoTransform` definitions, and `Bindings`. + +This is separate from **user-domain RDF** ingestion, where `RdfInferenceManager` reads an external OWL/RDFS TBox (`ex:Person`, `ex:publication`, …) and produces a GraFlo `Schema` + ingestion wiring. + +```mermaid +flowchart TB + subgraph domain ["User domain (existing)"] + UOWL["OWL/RDFS ontology
ex:Person, ex:Publication"] + UOWL --> RIM["RdfInferenceManager"] + RIM --> LPG["GraFlo Schema + IngestionModel"] + end + + subgraph meta ["GraFlo meta-model (new)"] + YAML["GraphManifest YAML"] + YAML --> SER["ManifestRdfSerializer"] + SER --> GFOWL["gf: GraphManifest RDF"] + GFOWL --> DES["ManifestRdfDeserializer"] + DES --> YAML + end + + LPG -. "same Pydantic types" .- YAML +``` + +## Ontology identifiers + +| Role | IRI | +|------|-----| +| Ontology document | `https://ontology.growgraph.dev/graflo` | +| Version IRI | `https://ontology.growgraph.dev/graflo/1.0.0` | +| Version info | `1.0.0` | +| Vocabulary prefix `gf:` | `https://ontology.growgraph.dev/graflo/` | + +The Turtle source lives in the package at `graflo/rdf/ontology/graflo.ttl`. Constants are also exposed in Python as `graflo.rdf.namespace` (`GF_ONTOLOGY_IRI`, `GF_VERSION`, `GF_VERSION_IRI`, `GF_BASE`). + +## Interactive visualization + +The explorer below is a **class graph** from `graflo.ttl`: `subClassOf` drives a stable force layout (vertical hierarchy, narrow levels); classes without subclass links sit in the **top-left**. **All edges** are drawn — thick arrows = `subClassOf`, dashed = properties. Drag, scroll, click to focus. Regenerate with `uv run python docs/scripts/build_ontology_viz.py` after ontology edits. + +If the embedded viewer is blank in an IDE browser preview, use **Open full screen** in a normal browser tab. + + + +

Open full screen

+ +## What the vocabulary covers + +**Schema block** + +- `gf:GraphManifest`, `gf:Schema`, `gf:CoreSchema`, `gf:GraphMetadata`, `gf:DatabaseProfile` +- `gf:VertexConfig`, `gf:EdgeConfig`, `gf:Vertex`, `gf:Edge`, `gf:Field`, `gf:Identity` +- `gf:FieldType` individuals (`gf:INT`, `gf:STRING`, …) + +**Ingestion block** + +- `gf:IngestionModel`, `gf:Resource`, `gf:ProtoTransform`, `gf:Transform` +- `gf:DressConfig`, `gf:KeySelectionConfig`, `gf:EdgeInferSpec` +- Pipeline actor steps (blank nodes): `gf:VertexActor`, `gf:EdgeActor`, `gf:TransformActor`, `gf:DescendActor`, `gf:VertexRouterActor` (Python aliases `*ActorStep` in `graflo.rdf.namespace`) + +**Bindings block** + +- `gf:Bindings`, `gf:FileConnector`, `gf:TableConnector`, `gf:SparqlConnector` +- `gf:ResourceConnectorBinding`, `gf:ConnectorConnectionBinding`, `gf:StagingProxyBinding` + +**Enumerations** (named individuals): `gf:DBType` (ArangoDB, Neo4j, …), transform target/strategy, key-selection mode, edge duplicate policy, bound source kind. + +**PROV-O hooks**: `gf:GraphManifest` ⊑ `prov:Entity`, `gf:ProtoTransform` ⊑ `prov:Activity` (subclasses such as `gf:Transform` inherit this; for lineage tooling). + +## Manifest instance URIs + +When you serialize a manifest, you pass a **`base_uri`** that identifies *that* manifest document (not the ontology). The serializer mints stable paths under it, for example: + +| Path under `base_uri` | RDF type | +|-----------------------|----------| +| `(base_uri)` | `gf:GraphManifest` | +| `schema/` | `gf:Schema` | +| `schema/core/vertex-config` | `gf:VertexConfig` | +| `schema/core/edge-config` | `gf:EdgeConfig` | +| `schema/core/vertex/Person` | `gf:Vertex` | +| `schema/core/edge/Person_knows_Person` | `gf:Edge` | +| `ingestion/` | `gf:IngestionModel` | +| `ingestion/resource/my_resource` | `gf:Resource` | +| `ingestion/transform/my_transform` | `gf:ProtoTransform` | +| `bindings/` | `gf:Bindings` | +| `bindings/connector/` | `gf:FileConnector` / `TableConnector` / `SparqlConnector` | + +Pipeline steps are **blank nodes** typed with the appropriate `gf:*Actor` class (and `gf:Actor`); the full step dict is stored in `gf:stepPayload` as JSON so round-trip preserves shorthand YAML shapes (`vertex: person`, nested `descend`, `transform.call`, …). + +List order (resources, transforms, connectors, vertices, fields, pipeline steps) is preserved via `gf:artifactIndex`. + +## Python API + +```python +from graflo import GraphManifest +from graflo.rdf import ManifestRdfDeserializer, ManifestRdfSerializer + +manifest = GraphManifest.from_yaml("manifest.yaml") +base = "https://growgraph.dev/manifests/academic/v1" + +serializer = ManifestRdfSerializer(include_ontology=True) +ttl = serializer.to_turtle(manifest, base) + +restored = ManifestRdfDeserializer().from_turtle( + ttl, + manifest_uri=base.rstrip("/"), +) +``` + +- **`include_ontology=True`** (default) embeds `graflo.ttl` triples in the output graph — useful for self-contained Turtle files. +- **`to_json_ld`**, **`to_graph`** — same graph, other serializations. + +## CLI + +After `pip install graflo` (or `uv sync` in the repo): + +```bash +# Manifest → RDF +uv run manifest-to-rdf manifest.yaml \ + --base-uri https://growgraph.dev/manifests/academic/v1 \ + --format turtle \ + --output academic.ttl + +# RDF → manifest YAML +uv run rdf-to-manifest academic.ttl \ + --manifest-uri https://growgraph.dev/manifests/academic/v1 \ + --output manifest.restored.yaml +``` + +Formats: `turtle` (default), `json-ld`, `nt`, `xml`. + +## Round-trip fidelity + +| Area | Behavior | +|------|----------| +| Scalars, enums, transforms | Full via literals and `gf` individuals | +| `pipeline` actor steps | Full via `gf:stepPayload` JSON | +| `params`, connector extras | JSON literals on payload properties | +| YAML aliases (`schema` / `graph`, `pipeline` / `apply`) | Canonical names only in restored YAML | +| Runtime `PrivateAttr` state | Not serialized; call `finish_init()` after load | +| Vertex `filters` | Serialized in `gf:vertexPayload` JSON; not decomposed into filter AST | + +The guaranteed invariant matches the rest of GraFlo config: **semantic canonical round-trip** (`parse → RDF → parse` equals minimal canonical dict), not byte-identical YAML. + +## JSON-LD + +`graflo/rdf/ontology/graflo-context.jsonld` maps common JSON keys to `gf:` IRIs for tools that consume JSON-LD directly. The serializer’s `to_json_ld()` output can be combined with this context in downstream pipelines. + +## Related + +- [Example 6 — RDF / Turtle ingestion](../examples/example-6.md) — **domain** OWL → GraFlo manifest (`RdfInferenceManager`) +- [API — `graflo.rdf`](../reference/rdf/index.md) +- [API — `RdfInferenceManager`](../reference/hq/rdf_inferencer.md) diff --git a/docs/reference/index.md b/docs/reference/index.md index a99a405c..7b70b126 100644 --- a/docs/reference/index.md +++ b/docs/reference/index.md @@ -44,6 +44,13 @@ Database connection and management components: - [Resource Mapping](db/postgres/resource_mapping.md): Mapping PostgreSQL tables to graflo Resources - [Type Mapping](db/postgres/types.md): PostgreSQL to graflo type conversion +## RDF + +Manifest serialization and domain ontology inference: + +- **[Manifest ↔ RDF (`graflo.rdf`)](rdf/index.md)**: `ManifestRdfSerializer`, `ManifestRdfDeserializer`, GraFlo meta-ontology +- **[RDF inference (`RdfInferenceManager`)](hq/rdf_inferencer.md)**: User OWL/RDFS TBox → GraFlo `Schema` + ingestion + ## Core Components Main graflo functionality: @@ -79,6 +86,7 @@ Graph visualization and plotting: CLI tools for graflo operations: - **[Ingest](cli/ingest.md)**: Data ingestion commands +- **Manifest ↔ RDF**: `manifest-to-rdf`, `rdf-to-manifest` (see [GraFlo ontology](../model/graflo_ontology.md#cli)) - **[Database Management](cli/manage_dbs.md)**: Database administration commands - **[Schema Visualization](cli/plot_manifest.md)**: Schema visualization commands - **[XML to JSON](cli/xml2json.md)**: XML data conversion utilities diff --git a/docs/reference/rdf/index.md b/docs/reference/rdf/index.md new file mode 100644 index 00000000..a75b67d3 --- /dev/null +++ b/docs/reference/rdf/index.md @@ -0,0 +1,34 @@ +# `graflo.rdf` — manifest ↔ RDF bridge + +Bidirectional conversion between **`GraphManifest`** and RDF using the [GraFlo meta-ontology](../../model/graflo_ontology.md). + +## Modules + +::: graflo.rdf + +## Serializer + +::: graflo.rdf.serializer.ManifestRdfSerializer + +## Deserializer + +::: graflo.rdf.deserializer.ManifestRdfDeserializer + +## Namespace constants + +::: graflo.rdf.namespace + +## Utilities + +::: graflo.rdf.utils + +## CLI + +Console entry points (see `pyproject.toml` → `[project.scripts]`): + +- **`manifest-to-rdf`** — `graflo.rdf.cli:manifest_to_rdf` +- **`rdf-to-manifest`** — `graflo.rdf.cli:rdf_to_manifest` + +## See also + +- [`graflo.hq.rdf_inferencer`](../hq/rdf_inferencer.md) — import **user** OWL/RDFS ontologies into a GraFlo schema (opposite direction of concern) diff --git a/docs/scripts/build_ontology_viz.py b/docs/scripts/build_ontology_viz.py new file mode 100644 index 00000000..ff7bef15 --- /dev/null +++ b/docs/scripts/build_ontology_viz.py @@ -0,0 +1,83 @@ +#!/usr/bin/env python +"""Generate interactive HTML visualization for the GraFlo OWL ontology.""" + +from __future__ import annotations + +import importlib.util +import shutil +import sys +from pathlib import Path + +from graflo.rdf.namespace import GF_ONTOLOGY_IRI, GF_VERSION + +REPO_ROOT = Path(__file__).resolve().parents[2] +OUTPUT_DIR = REPO_ROOT / "docs" / "assets" / "graflo-ontology-viz" +VIZ_DIR = Path(__file__).resolve().parent / "ontology_viz" + +ASSET_FILES = ( + "graph-view.css", + "graph-view.js", +) + + +def _load_extract_module(): + spec = importlib.util.spec_from_file_location( + "graflo_ontology_viz_extract", + VIZ_DIR / "extract.py", + ) + if spec is None or spec.loader is None: + msg = "Could not load ontology graph extractor" + raise RuntimeError(msg) + module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(module) + return module + + +def _render_template(template_name: str, *, title: str, graph_json: str) -> str: + extract = _load_extract_module() + template = (VIZ_DIR / template_name).read_text(encoding="utf-8") + return template.replace("{{TITLE}}", title).replace( + "{{GRAPH_JSON}}", extract.escape_json_for_html(graph_json) + ) + + +def build_ontology_viz(*, output_dir: Path = OUTPUT_DIR) -> str: + """Build ontology visualization HTML. Returns the viz kind identifier.""" + extract = _load_extract_module() + graph_data = extract.extract_ontology_graph() + graph_json = extract.graph_to_json(graph_data) + title = f"GraFlo Ontology (v{GF_VERSION})" + + if output_dir.exists(): + shutil.rmtree(output_dir) + output_dir.mkdir(parents=True) + + for name in ASSET_FILES: + shutil.copy2(VIZ_DIR / name, output_dir / name) + + (output_dir / "graph-data.json").write_text(graph_json + "\n", encoding="utf-8") + (output_dir / "index.html").write_text( + _render_template("page.template.html", title=title, graph_json=graph_json), + encoding="utf-8", + ) + (output_dir / "embed.html").write_text( + _render_template("embed.template.html", title=title, graph_json=graph_json), + encoding="utf-8", + ) + return "hierarchical-graph" + + +def main() -> int: + try: + viz_id = build_ontology_viz() + except Exception as exc: # noqa: BLE001 — CLI entrypoint + print(f"error: {exc}", file=sys.stderr) + return 1 + + print(f"Built {OUTPUT_DIR} using GraFlo viz '{viz_id}'") + print(f"Ontology: {GF_ONTOLOGY_IRI} (v{GF_VERSION})") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/docs/scripts/ontology_viz/embed.template.html b/docs/scripts/ontology_viz/embed.template.html new file mode 100644 index 00000000..028a929f --- /dev/null +++ b/docs/scripts/ontology_viz/embed.template.html @@ -0,0 +1,28 @@ + + + + + {{TITLE}} + + + +
+
+
+ + + + +
+
Drag · scroll zoom · click class
+ +
+
+ + + + diff --git a/docs/scripts/ontology_viz/extract.py b/docs/scripts/ontology_viz/extract.py new file mode 100644 index 00000000..778c9300 --- /dev/null +++ b/docs/scripts/ontology_viz/extract.py @@ -0,0 +1,187 @@ +"""Extract class and property graph data from the GraFlo OWL ontology.""" + +from __future__ import annotations + +import json +from typing import Any + +from rdflib import Graph, URIRef +from rdflib.namespace import OWL, RDF, RDFS, SKOS + +from graflo.rdf.namespace import GF_BASE, GF_ONTOLOGY_IRI, GF_VERSION +from graflo.rdf.utils import load_ontology_graph + +PROV = URIRef("http://www.w3.org/ns/prov#") +EXTERNAL_PREFIXES = ( + "http://www.w3.org/ns/prov#", + "http://www.w3.org/2002/07/owl#", + "http://www.w3.org/2000/01/rdf-schema#", + "http://www.w3.org/2001/XMLSchema#", +) + +NODE_W = 168 +NODE_H = 40 + + +def local_name(uri: str) -> str: + if uri.startswith(GF_BASE): + return uri[len(GF_BASE) :] + if "#" in uri: + return uri.rsplit("#", 1)[-1] + return uri.rsplit("/", 1)[-1] + + +def _label(graph: Graph, uri: URIRef) -> str: + for candidate in graph.objects(uri, SKOS.prefLabel): + return str(candidate) + for candidate in graph.objects(uri, RDFS.label): + return str(candidate) + return local_name(str(uri)) + + +def _comment(graph: Graph, uri: URIRef) -> str | None: + for candidate in graph.objects(uri, RDFS.comment): + return str(candidate) + return None + + +def _node_kind(uri: str) -> str: + if uri.startswith(GF_BASE): + name = local_name(uri) + if name.endswith("Type") or name.endswith("Mode") or name.endswith("Policy"): + return "enum" + return "gf" + if uri.startswith(str(PROV)): + return "external" + return "external" + + +def _include_class_uri(uri: str) -> bool: + if uri.startswith(GF_BASE): + return True + return uri.startswith(EXTERNAL_PREFIXES) + + +def extract_ontology_graph(graph: Graph | None = None) -> dict[str, Any]: + """Build nodes and edges for the ontology viewer (layout runs in the browser).""" + g = graph or load_ontology_graph() + class_uris: set[str] = { + str(subject) + for subject in g.subjects(RDF.type, OWL.Class) + if _include_class_uri(str(subject)) + } + + for child in list(class_uris): + for parent in g.objects(URIRef(child), RDFS.subClassOf): + if not isinstance(parent, URIRef): + continue + parent_uri = str(parent) + if parent_uri.startswith(GF_BASE) or parent_uri.startswith(str(PROV)): + class_uris.add(parent_uri) + + nodes: dict[str, dict[str, Any]] = {} + for uri in sorted(class_uris): + ref = URIRef(uri) + nodes[uri] = { + "id": uri, + "label": _label(g, ref), + "local": local_name(uri), + "kind": _node_kind(uri), + "comment": _comment(g, ref), + } + + edges: list[dict[str, str]] = [] + for child in sorted(class_uris): + child_ref = URIRef(child) + for parent in g.objects(child_ref, RDFS.subClassOf): + if not isinstance(parent, URIRef): + continue + parent_uri = str(parent) + if parent_uri not in class_uris: + continue + edges.append( + { + "id": f"sub:{child}->{parent_uri}", + "source": child, + "target": parent_uri, + "kind": "subClassOf", + "label": "subClassOf", + } + ) + + # Explicit equivalent class relations (both directions represented once here; + # renderer can visualize with directional styles as needed). + seen_equiv: set[tuple[str, str]] = set() + for left in sorted(class_uris): + left_ref = URIRef(left) + for right in g.objects(left_ref, OWL.equivalentClass): + if not isinstance(right, URIRef): + continue + right_uri = str(right) + if right_uri not in class_uris: + continue + key: tuple[str, str] + if left < right_uri: + key = (left, right_uri) + else: + key = (right_uri, left) + if left == right_uri or key in seen_equiv: + continue + seen_equiv.add(key) + edges.append( + { + "id": f"equiv:{left}<->{right_uri}", + "source": left, + "target": right_uri, + "kind": "equivalentClass", + "label": "equivalentClass", + } + ) + + for prop_type in (OWL.ObjectProperty, OWL.DatatypeProperty): + for prop in g.subjects(RDF.type, prop_type): + if not isinstance(prop, URIRef): + continue + prop_uri = str(prop) + if not prop_uri.startswith(GF_BASE): + continue + prop_label = _label(g, prop) + domain = g.value(prop, RDFS.domain) + range_ = g.value(prop, RDFS.range) + if not isinstance(domain, URIRef) or not isinstance(range_, URIRef): + continue + domain_uri = str(domain) + range_uri = str(range_) + if domain_uri not in nodes or range_uri not in nodes: + continue + kind = ( + "objectProperty" + if prop_type == OWL.ObjectProperty + else "datatypeProperty" + ) + edges.append( + { + "id": f"prop:{prop_uri}", + "source": domain_uri, + "target": range_uri, + "kind": kind, + "label": prop_label, + } + ) + + return { + "ontology": GF_ONTOLOGY_IRI, + "version": GF_VERSION, + "nodeWidth": NODE_W, + "nodeHeight": NODE_H, + "nodes": list(nodes.values()), + "edges": edges, + } + + +def graph_to_json(graph_data: dict[str, Any]) -> str: + return json.dumps(graph_data, indent=2, sort_keys=True) + + +def escape_json_for_html(json_text: str) -> str: + return json_text.replace("= DRAG_THRESHOLD_PX) { + state.dragMoved = true; + } + } + + function truncate(text, max) { + if (text.length <= max) { + return text; + } + return text.slice(0, max - 1) + "…"; + } + + function isOverlapping(a, b) { + const minDx = data.nodeWidth + LAYOUT.minGapX; + const minDy = data.nodeHeight + LAYOUT.minGapY; + return Math.abs(a.x - b.x) < minDx && Math.abs(a.y - b.y) < minDy; + } + + function resolveOverlaps(nodes) { + for (let pass = 0; pass < LAYOUT.resolvePasses; pass += 1) { + let moved = false; + for (let i = 0; i < nodes.length; i += 1) { + for (let j = i + 1; j < nodes.length; j += 1) { + const a = nodes[i]; + const b = nodes[j]; + if (!isOverlapping(a, b)) { + continue; + } + + const minDx = data.nodeWidth + LAYOUT.minGapX; + const minDy = data.nodeHeight + LAYOUT.minGapY; + const dx = (b.x - a.x) || 0.1; + const dy = (b.y - a.y) || 0.1; + const overlapX = minDx - Math.abs(dx); + const overlapY = minDy - Math.abs(dy); + if (overlapX <= 0 || overlapY <= 0) { + continue; + } + + if (overlapX < overlapY) { + const push = overlapX / 2; + const dir = dx >= 0 ? 1 : -1; + a.x -= push * dir; + b.x += push * dir; + } else { + const push = overlapY / 2; + const dir = dy >= 0 ? 1 : -1; + a.y -= push * dir; + b.y += push * dir; + } + moved = true; + } + } + if (!moved) { + break; + } + } + } + + function computeLevels(nodeIds, edges) { + const children = new Map(); + nodeIds.forEach(function (id) { + children.set(id, []); + }); + edges.forEach(function (edge) { + if (!children.has(edge.target)) { + children.set(edge.target, []); + } + children.get(edge.target).push(edge.source); + }); + + const isChild = new Set(edges.map(function (edge) { + return edge.source; + })); + const roots = nodeIds.filter(function (id) { + return !isChild.has(id); + }); + + const level = new Map(); + const queue = roots.map(function (id) { + return [id, 0]; + }); + while (queue.length) { + const item = queue.shift(); + const id = item[0]; + const lv = item[1]; + if (level.has(id)) { + continue; + } + level.set(id, lv); + (children.get(id) || []).forEach(function (child) { + queue.push([child, lv + 1]); + }); + } + nodeIds.forEach(function (id) { + if (!level.has(id)) { + level.set(id, 0); + } + }); + return level; + } + + function placeIsolatedNodes(isolated, offsetX, offsetY) { + isolated.forEach(function (node, index) { + node.layoutGroup = "isolated"; + node.x = offsetX; + node.y = offsetY + index * (data.nodeHeight + LAYOUT.isoGap); + }); + if (!isolated.length) { + return { width: 0, height: 0 }; + } + return { + width: data.nodeWidth, + height: isolated.length * (data.nodeHeight + LAYOUT.isoGap) - LAYOUT.isoGap, + }; + } + + function runSubclassForceLayout(hierarchyNodes, edges, originX, originY, levels) { + const links = edges + .map(function (edge) { + return { + source: nodeById.get(edge.source), + target: nodeById.get(edge.target), + }; + }) + .filter(function (link) { + return link.source && link.target; + }); + + hierarchyNodes.forEach(function (node, index) { + node.layoutGroup = "hierarchy"; + node.level = levels.get(node.id) || 0; + node.x = originX + (index % 2) * 36 + (Math.random() - 0.5) * 4; + node.y = originY + node.level * LAYOUT.vGap; + node.vx = 0; + node.vy = 0; + }); + + let alpha = 1; + const centerX = originX + data.nodeWidth * 0.4; + for (let tick = 0; tick < LAYOUT.maxTicks && alpha > LAYOUT.alphaMin; tick += 1) { + hierarchyNodes.forEach(function (node) { + node.vx = 0; + node.vy = 0; + }); + + for (let i = 0; i < hierarchyNodes.length; i += 1) { + for (let j = i + 1; j < hierarchyNodes.length; j += 1) { + const a = hierarchyNodes[i]; + const b = hierarchyNodes[j]; + let dx = b.x - a.x; + let dy = b.y - a.y; + const dist = Math.sqrt(dx * dx + dy * dy) || 1; + const repulse = (620 * alpha) / dist; + dx = (dx / dist) * repulse; + dy = (dy / dist) * repulse; + a.vx -= dx; + a.vy -= dy; + b.vx += dx; + b.vy += dy; + } + } + + links.forEach(function (link) { + let dx = link.target.x - link.source.x; + let dy = link.target.y - link.source.y; + const dist = Math.sqrt(dx * dx + dy * dy) || 1; + const strength = 0.55 * alpha; + const delta = ((dist - LAYOUT.linkDistance) / dist) * strength; + dx *= delta; + dy *= delta; + link.source.vx += dx; + link.source.vy += dy; + link.target.vx -= dx; + link.target.vy -= dy; + }); + + hierarchyNodes.forEach(function (node) { + const targetY = originY + node.level * LAYOUT.vGap; + node.vy += (targetY - node.y) * 0.42 * alpha; + node.vx += (centerX - node.x) * 0.08 * alpha; + }); + + hierarchyNodes.forEach(function (node) { + node.x += node.vx * 0.18; + node.y += node.vy * 0.18; + }); + + alpha *= 0.965 + } + } + + function computeLayout() { + const hierarchyIds = new Set(); + subclassEdges.forEach(function (edge) { + hierarchyIds.add(edge.source); + hierarchyIds.add(edge.target); + }); + + const isolated = data.nodes.filter(function (node) { + return !hierarchyIds.has(node.id); + }); + const hierarchy = data.nodes.filter(function (node) { + return hierarchyIds.has(node.id); + }); + + const isoBox = placeIsolatedNodes(isolated, LAYOUT.isoPad, LAYOUT.isoPad); + const hierarchyOriginX = LAYOUT.isoPad + isoBox.width + LAYOUT.hierarchyGapX; + const hierarchyOriginY = LAYOUT.isoPad; + const levels = computeLevels( + hierarchy.map(function (node) { + return node.id; + }), + subclassEdges, + ); + runSubclassForceLayout(hierarchy, subclassEdges, hierarchyOriginX, hierarchyOriginY, levels); + resolveOverlaps(hierarchy); + resolveOverlaps(isolated); + resolveOverlaps(data.nodes); + data.bounds = computeBounds(data.nodes); + } + + function computeBounds(nodes) { + const xs = nodes.map(function (node) { + return node.x; + }); + const ys = nodes.map(function (node) { + return node.y; + }); + if (!xs.length) { + return { minX: 0, minY: 0, maxX: data.nodeWidth, maxY: data.nodeHeight }; + } + return { + minX: Math.min.apply(null, xs), + minY: Math.min.apply(null, ys), + maxX: Math.max.apply(null, xs) + data.nodeWidth, + maxY: Math.max.apply(null, ys) + data.nodeHeight, + }; + } + + function nodeAnchor(node, toward) { + const cx = node.x + data.nodeWidth / 2; + const cy = node.y + data.nodeHeight / 2; + const tx = toward.x + data.nodeWidth / 2; + const ty = toward.y + data.nodeHeight / 2; + const dx = tx - cx; + const dy = ty - cy; + if (!dx && !dy) { + return { x: cx, y: cy }; + } + const hw = data.nodeWidth / 2; + const hh = data.nodeHeight / 2; + const scale = Math.min(hw / (Math.abs(dx) || 1e-6), hh / (Math.abs(dy) || 1e-6)); + return { x: cx + dx * scale, y: cy + dy * scale }; + } + + function edgePath(source, target) { + const start = nodeAnchor(source, target); + const end = nodeAnchor(target, source); + const mx = (start.x + end.x) / 2; + const my = (start.y + end.y) / 2; + return "M" + start.x + "," + start.y + " Q" + mx + "," + my + " " + end.x + "," + end.y; + } + + function edgeVisibleByMode(edge) { + if (state.relationMode === "all") { + return true; + } + if (state.relationMode === "taxonomy") { + return edge.kind === "subClassOf" || edge.kind === "equivalentClass"; + } + if (state.relationMode === "has") { + return edge.label.toLowerCase().startsWith("has"); + } + return true; + } + + function getVisibleEdges() { + return data.edges.filter(edgeVisibleByMode); + } + + function renderEdges() { + viewport.querySelectorAll(".edge-layer").forEach(function (el) { + el.remove(); + }); + const layer = document.createElementNS("http://www.w3.org/2000/svg", "g"); + layer.setAttribute("class", "edge-layer"); + + getVisibleEdges().forEach(function (edge) { + const source = nodeById.get(edge.source); + const target = nodeById.get(edge.target); + if (!source || !target) { + return; + } + + const path = document.createElementNS("http://www.w3.org/2000/svg", "path"); + path.setAttribute("d", edgePath(source, target)); + path.setAttribute("class", "edge kind-" + edge.kind); + path.setAttribute("marker-end", "url(#arrow-" + edge.kind + ")"); + path.dataset.edgeId = edge.id; + layer.appendChild(path); + + if (edge.kind === "subClassOf") { + const reverse = document.createElementNS("http://www.w3.org/2000/svg", "path"); + reverse.setAttribute("d", edgePath(target, source)); + reverse.setAttribute("class", "edge edge-reverse kind-subClassOfReverse"); + reverse.setAttribute("marker-end", "url(#arrow-subClassOfReverse)"); + reverse.dataset.edgeId = edge.id + ":reverse"; + layer.appendChild(reverse); + } else if (edge.kind === "equivalentClass") { + const reverseEq = document.createElementNS("http://www.w3.org/2000/svg", "path"); + reverseEq.setAttribute("d", edgePath(target, source)); + reverseEq.setAttribute("class", "edge kind-equivalentClass"); + reverseEq.setAttribute("marker-end", "url(#arrow-equivalentClass)"); + reverseEq.dataset.edgeId = edge.id + ":reverse"; + layer.appendChild(reverseEq); + } else { + const label = document.createElementNS("http://www.w3.org/2000/svg", "text"); + label.setAttribute("class", "edge-label"); + label.setAttribute("x", (source.x + target.x + data.nodeWidth) / 2); + label.setAttribute("y", (source.y + target.y + data.nodeHeight) / 2); + label.setAttribute("text-anchor", "middle"); + label.dataset.edgeId = edge.id; + label.textContent = edge.label; + layer.appendChild(label); + } + }); + viewport.insertBefore(layer, viewport.firstChild); + } + + function renderNodes() { + viewport.querySelectorAll(".node").forEach(function (el) { + el.remove(); + }); + data.nodes.forEach(function (node) { + const group = document.createElementNS("http://www.w3.org/2000/svg", "g"); + group.setAttribute("class", "node kind-" + node.kind + " group-" + node.layoutGroup); + group.setAttribute("transform", "translate(" + node.x + "," + node.y + ")"); + group.dataset.nodeId = node.id; + + const rect = document.createElementNS("http://www.w3.org/2000/svg", "rect"); + rect.setAttribute("width", data.nodeWidth); + rect.setAttribute("height", data.nodeHeight); + rect.setAttribute("rx", 8); + rect.setAttribute("ry", 8); + group.appendChild(rect); + + const text = document.createElementNS("http://www.w3.org/2000/svg", "text"); + text.setAttribute("x", data.nodeWidth / 2); + text.setAttribute("y", data.nodeHeight / 2 + 4); + text.setAttribute("text-anchor", "middle"); + text.textContent = truncate(node.label, 18); + group.appendChild(text); + + const title = document.createElementNS("http://www.w3.org/2000/svg", "title"); + title.textContent = node.label; + group.appendChild(title); + + viewport.appendChild(group); + }); + } + + function reRenderGraph() { + data.bounds = computeBounds(data.nodes); + renderNodes(); + renderEdges(); + applyHighlight(); + } + + function applyTransform() { + viewport.setAttribute( + "transform", + "translate(" + state.tx + "," + state.ty + ") scale(" + state.scale + ")", + ); + } + + function fitToScreen() { + const shell = document.querySelector(".graph-shell"); + const pad = 48; + const bounds = data.bounds; + const graphW = bounds.maxX - bounds.minX; + const graphH = bounds.maxY - bounds.minY; + const viewW = shell.clientWidth - pad * 2; + const viewH = shell.clientHeight - pad * 2; + if (viewW <= 0 || viewH <= 0 || graphW <= 0 || graphH <= 0) { + state.scale = 1; + state.tx = pad - bounds.minX; + state.ty = pad - bounds.minY; + applyTransform(); + return; + } + state.scale = Math.min(viewW / graphW, viewH / graphH, 1.3); + state.tx = pad - bounds.minX * state.scale + (viewW - graphW * state.scale) / 2; + state.ty = pad - bounds.minY * state.scale + (viewH - graphH * state.scale) / 2; + applyTransform(); + } + + function matchesSearch(node) { + if (!state.search) { + return true; + } + const q = state.search.toLowerCase(); + return node.local.toLowerCase().includes(q) || node.label.toLowerCase().includes(q); + } + + function neighborhood(nodeId) { + const related = new Set([nodeId]); + getVisibleEdges().forEach(function (edge) { + if (edge.source === nodeId) { + related.add(edge.target); + } + if (edge.target === nodeId) { + related.add(edge.source); + } + }); + return related; + } + + function edgeIsIncidentToSelected(edge, selectedId) { + return edge.source === selectedId || edge.target === selectedId; + } + + function findVisibleEdge(edgeId) { + return getVisibleEdges().find(function (item) { + return item.id === edgeId; + }); + } + + function applyHighlight() { + const hasSelection = Boolean(state.selectedId); + const hasSearch = Boolean(state.search); + const focus = hasSelection ? neighborhood(state.selectedId) : null; + + viewport.querySelectorAll(".node").forEach(function (group) { + const nodeId = group.dataset.nodeId; + const node = nodeById.get(nodeId); + let dim = false; + if (hasSearch && node && !matchesSearch(node)) { + dim = true; + } + if (hasSelection && !focus.has(nodeId)) { + dim = true; + } + group.classList.toggle("dimmed", dim); + group.classList.toggle("selected", nodeId === state.selectedId); + }); + + viewport.querySelectorAll(".edge").forEach(function (edgeEl) { + const edgeId = edgeEl.dataset.edgeId.replace(/:reverse$/, ""); + const edge = findVisibleEdge(edgeId); + let dim = false; + if (hasSelection && edge && !edgeIsIncidentToSelected(edge, state.selectedId)) { + dim = true; + } + edgeEl.classList.toggle("dimmed", dim); + }); + + viewport.querySelectorAll(".edge-label").forEach(function (labelEl) { + const edge = findVisibleEdge(labelEl.dataset.edgeId); + let dim = false; + if (hasSelection && edge && !edgeIsIncidentToSelected(edge, state.selectedId)) { + dim = true; + } + labelEl.classList.toggle("dimmed", dim); + }); + } + + function selectNode(nodeId) { + state.selectedId = state.selectedId === nodeId ? null : nodeId; + updateDetails(); + applyHighlight(); + } + + function updateDetails() { + const panel = document.getElementById("details"); + if (!panel) { + return; + } + if (!state.selectedId) { + panel.innerHTML = "

Select a class to inspect its IRI and description.

"; + return; + } + const node = nodeById.get(state.selectedId); + if (!node) { + return; + } + const props = data.edges.filter(function (edge) { + return edge.kind !== "subClassOf" && (edge.source === node.id || edge.target === node.id); + }); + const subclasses = data.edges + .filter(function (edge) { + return edge.kind === "subClassOf" && edge.target === node.id; + }) + .map(function (edge) { + return nodeById.get(edge.source); + }) + .filter(Boolean); + const parents = data.edges + .filter(function (edge) { + return edge.kind === "subClassOf" && edge.source === node.id; + }) + .map(function (edge) { + return nodeById.get(edge.target); + }) + .filter(Boolean); + + let html = "

" + node.label + "

"; + html += "
" + node.id + "
"; + if (node.comment) { + html += "
" + node.comment + "
"; + } + if (node.layoutGroup === "isolated") { + html += "

Layout: standalone class (no subClassOf links)

"; + } + if (parents.length) { + html += "

Parents: " + parents.map(function (p) { + return p.local; + }).join(", ") + "

"; + } + if (subclasses.length) { + html += "

Subclasses: " + subclasses.map(function (p) { + return p.local; + }).join(", ") + "

"; + } + if (props.length) { + html += "

Properties:

"; + } + panel.innerHTML = html; + } + + function nodeFromEventTarget(target) { + const group = target.closest ? target.closest(".node") : null; + if (!group) { + return null; + } + const nodeId = group.dataset.nodeId; + return nodeById.get(nodeId) || null; + } + + function bindControls() { + const search = document.getElementById("search"); + const relationFilter = document.getElementById("relation-filter"); + if (search) { + search.addEventListener("input", function (event) { + state.search = event.target.value.trim(); + applyHighlight(); + }); + } + if (relationFilter) { + relationFilter.addEventListener("change", function (event) { + state.relationMode = event.target.value; + renderEdges(); + applyHighlight(); + }); + } + + document.getElementById("fit-button").addEventListener("click", fitToScreen); + document.getElementById("reset-button").addEventListener("click", function () { + state.selectedId = null; + state.search = ""; + if (search) { + search.value = ""; + } + if (relationFilter) { + relationFilter.value = "all"; + } + state.relationMode = "all"; + updateDetails(); + renderEdges(); + applyHighlight(); + fitToScreen(); + }); + + svg.addEventListener("wheel", function (event) { + event.preventDefault(); + const delta = event.deltaY > 0 ? 0.92 : 1.08; + const rect = svg.getBoundingClientRect(); + const px = event.clientX - rect.left; + const py = event.clientY - rect.top; + state.tx = px - (px - state.tx) * delta; + state.ty = py - (py - state.ty) * delta; + state.scale *= delta; + applyTransform(); + }, { passive: false }); + + svg.addEventListener("mousedown", function (event) { + state.dragMoved = false; + state.pointerDownX = event.clientX; + state.pointerDownY = event.clientY; + const node = nodeFromEventTarget(event.target); + state.lastX = event.clientX; + state.lastY = event.clientY; + if (node) { + state.draggingNodeId = node.id; + svg.classList.add("dragging-node"); + return; + } + state.draggingViewport = true; + svg.classList.add("dragging"); + }); + + window.addEventListener("mouseup", function (event) { + const pendingNodeId = state.draggingNodeId; + const draggedNode = pendingNodeId ? nodeById.get(pendingNodeId) : null; + if (draggedNode && state.dragMoved) { + resolveOverlaps(data.nodes); + reRenderGraph(); + } else if (pendingNodeId && !state.dragMoved) { + selectNode(pendingNodeId); + } + state.draggingNodeId = null; + state.draggingViewport = false; + svg.classList.remove("dragging"); + svg.classList.remove("dragging-node"); + }); + + window.addEventListener("mousemove", function (event) { + markDragIfNeeded(event); + const dx = event.clientX - state.lastX; + const dy = event.clientY - state.lastY; + state.lastX = event.clientX; + state.lastY = event.clientY; + + if (state.draggingNodeId) { + if (!state.dragMoved) { + return; + } + const node = nodeById.get(state.draggingNodeId); + if (!node) { + return; + } + node.x += dx / state.scale; + node.y += dy / state.scale; + reRenderGraph(); + return; + } + + if (!state.draggingViewport) { + return; + } + markDragIfNeeded(event); + state.tx += dx; + state.ty += dy; + applyTransform(); + }); + + svg.addEventListener("click", function (event) { + const wasDrag = state.dragMoved; + state.dragMoved = false; + if (wasDrag) { + return; + } + if (nodeFromEventTarget(event.target)) { + return; + } + if (state.selectedId) { + state.selectedId = null; + updateDetails(); + applyHighlight(); + } + }); + } + + computeLayout(); + reRenderGraph(); + bindControls(); + updateDetails(); + fitToScreen(); + window.addEventListener("resize", fitToScreen); +})(); diff --git a/docs/scripts/ontology_viz/page.template.html b/docs/scripts/ontology_viz/page.template.html new file mode 100644 index 00000000..29769066 --- /dev/null +++ b/docs/scripts/ontology_viz/page.template.html @@ -0,0 +1,51 @@ + + + + + {{TITLE}} + + + +
+ +
+
Drag · scroll zoom · click class
+ +
+
+ + + + diff --git a/graflo/architecture/database_features.py b/graflo/architecture/database_features.py index 9a24d93a..f30cad97 100644 --- a/graflo/architecture/database_features.py +++ b/graflo/architecture/database_features.py @@ -5,24 +5,37 @@ from __future__ import annotations -from typing import Any, Literal +from typing import TYPE_CHECKING, Any, Literal from pydantic import AliasChoices, Field as PydanticField, model_validator from graflo.architecture.base import ConfigBaseModel -from graflo.architecture.graph_types import EdgeId, Index +from graflo.architecture.graph_types import EdgeId, EdgePhysicalKey, Index +from graflo.architecture.schema.vertex import VertexName from graflo.onto import DBType +if TYPE_CHECKING: + from graflo.architecture.schema.edge import EdgeConfig -class EdgeVariantSpec(ConfigBaseModel): - """Unified edge physical variant spec keyed by edge identity + purpose.""" - source: str = PydanticField(..., description="Edge source vertex name.") - target: str = PydanticField(..., description="Edge target vertex name.") +class EdgeRef(ConfigBaseModel): + """Reference to a logical edge identity.""" + + source: VertexName = PydanticField(..., description="Edge source vertex name.") + target: VertexName = PydanticField(..., description="Edge target vertex name.") relation: str | None = PydanticField( default=None, description="Logical relation for edge identity (source, target, relation).", ) + + @property + def edge_id(self) -> EdgeId: + return (self.source, self.target, self.relation) + + +class EdgePhysicalSpec(EdgeRef): + """Unified edge physical spec keyed by edge identity + purpose.""" + purpose: str | None = PydanticField( default=None, description="DB-only purpose identifier for physical edge variant.", @@ -44,19 +57,13 @@ class EdgeVariantSpec(ConfigBaseModel): ) @property - def edge_id(self) -> EdgeId: - return (self.source, self.target, self.relation) + def physical_key(self) -> EdgePhysicalKey: + return (self.source, self.target, self.relation, self.purpose) -class EdgePropertyDefaults(ConfigBaseModel): +class EdgePropertyDefaults(EdgeRef): """Per logical edge type: optional GSQL ``DEFAULT`` values for edge attributes.""" - source: str = PydanticField(..., description="Logical source vertex name.") - target: str = PydanticField(..., description="Logical target vertex name.") - relation: str | None = PydanticField( - default=None, - description="Logical relation; must match edge identity (use null for default relation).", - ) values: dict[str, Any] = PydanticField( default_factory=dict, description="Edge attribute name to default value (YAML/JSON literals).", @@ -98,15 +105,15 @@ class DatabaseProfile(ConfigBaseModel): "GraphEngine uses this before falling back to schema.metadata.name." ), ) - vertex_storage_names: dict[str, str] = PydanticField( + vertex_storage_names: dict[VertexName, str] = PydanticField( default_factory=dict, description="Physical vertex collection/label names keyed by logical vertex name.", ) - vertex_indexes: dict[str, list[Index]] = PydanticField( + vertex_indexes: dict[VertexName, list[Index]] = PydanticField( default_factory=dict, description="Secondary indexes per vertex name (identity excluded).", ) - edge_specs: list[EdgeVariantSpec] = PydanticField( + edge_specs: list[EdgePhysicalSpec] = PydanticField( default_factory=list, description="Unified edge physical specs keyed by edge identity + purpose.", ) @@ -122,26 +129,21 @@ class DatabaseProfile(ConfigBaseModel): @model_validator(mode="after") def _normalize_edge_specs(self) -> "DatabaseProfile": def _variant_key( - spec: EdgeVariantSpec, - ) -> tuple[str, str, str | None, str | None]: - return ( - spec.source, - spec.target, - spec.relation, - spec.purpose, - ) + spec: EdgePhysicalSpec, + ) -> EdgePhysicalKey: + return spec.physical_key def _ensure_variant( - merged: dict[tuple[str, str, str | None, str | None], EdgeVariantSpec], + merged: dict[EdgePhysicalKey, EdgePhysicalSpec], *, source: str, target: str, relation: str | None, purpose: str | None, - ) -> EdgeVariantSpec: + ) -> EdgePhysicalSpec: key = (source, target, relation, purpose) if key not in merged: - merged[key] = EdgeVariantSpec( + merged[key] = EdgePhysicalSpec( source=source, target=target, relation=relation, @@ -149,7 +151,7 @@ def _ensure_variant( ) return merged[key] - merged: dict[tuple[str, str, str | None, str | None], EdgeVariantSpec] = {} + merged: dict[EdgePhysicalKey, EdgePhysicalSpec] = {} for item in self.edge_specs: variant = _ensure_variant( @@ -171,6 +173,15 @@ def _ensure_variant( object.__setattr__(self, "edge_specs", list(merged.values())) return self + def validate_against_schema(self, edge_config: "EdgeConfig") -> None: + """Assert all edge specs reference declared logical edges.""" + for spec in self.edge_specs: + if spec.edge_id not in edge_config: + raise ValueError( + f"EdgePhysicalSpec {spec.physical_key!r} references undeclared " + f"edge {spec.edge_id!r}" + ) + def vertex_property_default( self, vertex_name: str, property_name: str ) -> Any | None: @@ -224,7 +235,7 @@ def _edge_variant_spec( self, edge_id: EdgeId, purpose: str | None = None, - ) -> EdgeVariantSpec | None: + ) -> EdgePhysicalSpec | None: for item in self.edge_specs: if item.edge_id != edge_id: continue @@ -344,7 +355,7 @@ def edge_index_spec( self, edge_id: EdgeId, purpose: str | None = None, - ) -> EdgeVariantSpec | None: + ) -> EdgePhysicalSpec | None: spec = self._edge_variant_spec(edge_id=edge_id, purpose=purpose) if spec is None and purpose is not None: spec = self._edge_variant_spec(edge_id=edge_id, purpose=None) @@ -360,7 +371,7 @@ def add_edge_index( spec = self._edge_variant_spec(edge_id=edge_id, purpose=purpose) if spec is None: source, target, relation = edge_id - spec = EdgeVariantSpec( + spec = EdgePhysicalSpec( source=source, target=target, relation=relation, @@ -377,7 +388,7 @@ def edge_name_spec( self, edge_id: EdgeId, purpose: str | None = None, - ) -> EdgeVariantSpec | None: + ) -> EdgePhysicalSpec | None: spec = self._edge_variant_spec(edge_id=edge_id, purpose=purpose) if spec is None and purpose is not None: spec = self._edge_variant_spec(edge_id=edge_id, purpose=None) @@ -393,7 +404,7 @@ def set_edge_name_spec( spec = self._edge_variant_spec(edge_id=edge_id, purpose=purpose) if spec is None: source, target, relation = edge_id - spec = EdgeVariantSpec( + spec = EdgePhysicalSpec( source=source, target=target, relation=relation, diff --git a/graflo/architecture/evolution/apply.py b/graflo/architecture/evolution/apply.py index 8302ea5d..e4ddb879 100644 --- a/graflo/architecture/evolution/apply.py +++ b/graflo/architecture/evolution/apply.py @@ -23,6 +23,7 @@ apply_storage_name_sanitization_to_db_profile, apply_vertex_merge_to_db_profile, apply_vertex_removal_to_db_profile, + apply_vertex_rename_to_db_profile, merge_relation_entries_in_db_profile, ) from .merge_core import ( @@ -636,6 +637,17 @@ def _apply_rename_entities( mapping["resource"], ) + schema = manifest.graph_schema + if schema is not None and (vertex_map or edge_map): + if vertex_map: + apply_vertex_rename_to_db_profile(schema.db_profile, vertex_map) + if edge_map: + apply_relation_rename_to_db_profile(schema.db_profile, edge_map) + if isinstance(schema_payload, dict): + schema_payload["db_profile"] = schema.db_profile.to_dict( + skip_defaults=False + ) + updated = GraphManifest.from_dict(payload) manifest.graph_schema = updated.graph_schema manifest.ingestion_model = updated.ingestion_model @@ -672,6 +684,8 @@ def apply_remove_edges(manifest: GraphManifest, op: RemoveEdgesOp) -> None: schema = manifest.graph_schema if schema is None: raise ValueError("remove_edges requires graph_schema") + apply_relation_removal_to_db_profile(schema.db_profile, removed) + schema.db_profile = _revalidate_db_profile(schema.db_profile) schema.core_schema = CoreSchema( vertex_config=schema.core_schema.vertex_config, edge_config=EdgeConfig( @@ -682,8 +696,6 @@ def apply_remove_edges(manifest: GraphManifest, op: RemoveEdgesOp) -> None: ] ), ) - apply_relation_removal_to_db_profile(schema.db_profile, removed) - schema.db_profile = _revalidate_db_profile(schema.db_profile) schema.finish_init() if manifest.ingestion_model is not None: diff --git a/graflo/architecture/evolution/db_profile.py b/graflo/architecture/evolution/db_profile.py index 96e7660a..0db8c3e7 100644 --- a/graflo/architecture/evolution/db_profile.py +++ b/graflo/architecture/evolution/db_profile.py @@ -7,10 +7,10 @@ from graflo.architecture.database_features import ( DatabaseProfile, + EdgePhysicalSpec, EdgePropertyDefaults, - EdgeVariantSpec, ) -from graflo.architecture.graph_types import EdgeId, Index +from graflo.architecture.graph_types import EdgeId, EdgePhysicalKey, Index from graflo.architecture.schema import Schema logger = logging.getLogger(__name__) @@ -57,6 +57,72 @@ def apply_vertex_removal_to_db_profile( ] +def apply_vertex_rename_to_db_profile( + profile: DatabaseProfile, vertex_renames: dict[str, str] +) -> None: + """Rename logical vertex keys in *profile*.""" + if not vertex_renames: + return + + new_vs: dict[str, str] = {} + for k, v in profile.vertex_storage_names.items(): + nk = vertex_renames.get(k, k) + if nk in new_vs and new_vs[nk] != v: + raise ValueError( + f"Conflicting vertex_storage_names for logical {nk!r}: " + f"{new_vs[nk]!r} vs {v!r}" + ) + new_vs[nk] = v + profile.vertex_storage_names = new_vs + + new_vi: dict[str, list[Any]] = {} + for k, vlist in profile.vertex_indexes.items(): + nk = vertex_renames.get(k, k) + new_vi.setdefault(nk, []).extend(list(vlist)) + profile.vertex_indexes = new_vi + + new_specs: list[EdgePhysicalSpec] = [] + for s in profile.edge_specs: + new_specs.append( + EdgePhysicalSpec( + source=vertex_renames.get(s.source, s.source), + target=vertex_renames.get(s.target, s.target), + relation=s.relation, + purpose=s.purpose, + relation_name=s.relation_name, + indexes=list(s.indexes), + indexes_mode=s.indexes_mode, + ) + ) + profile.edge_specs = new_specs + + dpv = profile.default_property_values + if dpv is None: + return + + new_vertices: dict[str, dict[str, Any]] = {} + for k, props in dpv.vertices.items(): + nk = vertex_renames.get(k, k) + if nk in new_vertices and new_vertices[nk] != props: + raise ValueError( + f"Conflicting default_property_values.vertices for logical {nk!r}" + ) + new_vertices[nk] = dict(props) + object.__setattr__(dpv, "vertices", new_vertices) + + new_edges: list[EdgePropertyDefaults] = [] + for e in dpv.edges: + new_edges.append( + EdgePropertyDefaults( + source=vertex_renames.get(e.source, e.source), + target=vertex_renames.get(e.target, e.target), + relation=e.relation, + values=dict(e.values), + ) + ) + object.__setattr__(dpv, "edges", new_edges) + + def apply_vertex_merge_to_db_profile( profile: DatabaseProfile, from_vertices: set[str], @@ -86,12 +152,12 @@ def apply_vertex_merge_to_db_profile( new_vi.setdefault(nk, []).extend(list(vlist)) profile.vertex_indexes = new_vi - new_specs: list[EdgeVariantSpec] = [] + new_specs: list[EdgePhysicalSpec] = [] for s in profile.edge_specs: src = m.get(s.source, s.source) tgt = m.get(s.target, s.target) new_specs.append( - EdgeVariantSpec( + EdgePhysicalSpec( source=src, target=tgt, relation=s.relation, @@ -243,7 +309,7 @@ def apply_field_rename_to_db_profile( ) profile.vertex_indexes = new_vertex_indexes - new_specs: list[EdgeVariantSpec] = [] + new_specs: list[EdgePhysicalSpec] = [] for spec in profile.edge_specs: if edge_vertex_lookup is not None: source_name, target_name = edge_vertex_lookup.get( @@ -255,7 +321,7 @@ def apply_field_rename_to_db_profile( merged.update(renames.get(source_name) or {}) merged.update(renames.get(target_name) or {}) new_specs.append( - EdgeVariantSpec( + EdgePhysicalSpec( source=spec.source, target=spec.target, relation=spec.relation, @@ -287,7 +353,7 @@ def apply_relation_rename_to_db_profile( """Rename logical edge relation keys in edge specs/default edge values.""" if not relation_renames: return - new_specs: list[EdgeVariantSpec] = [] + new_specs: list[EdgePhysicalSpec] = [] for spec in profile.edge_specs: new_relation = ( relation_renames.get(spec.relation, spec.relation) @@ -345,9 +411,9 @@ def apply_relation_removal_to_db_profile( def merge_relation_entries_in_db_profile(profile: DatabaseProfile) -> None: """Merge duplicate edge-spec/default entries created by relation remaps.""" - merged_specs: dict[tuple[str, str, str | None, str | None], EdgeVariantSpec] = {} + merged_specs: dict[EdgePhysicalKey, EdgePhysicalSpec] = {} for spec in profile.edge_specs: - key = (spec.source, spec.target, spec.relation, spec.purpose) + key = spec.physical_key current = merged_specs.get(key) if current is None: merged_specs[key] = spec @@ -373,17 +439,16 @@ def merge_relation_entries_in_db_profile(profile: DatabaseProfile) -> None: return merged_defaults: dict[EdgeId, EdgePropertyDefaults] = {} for edge in dpv.edges: - edge_id: EdgeId = (edge.source, edge.target, edge.relation) - current = merged_defaults.get(edge_id) + current = merged_defaults.get(edge.edge_id) if current is None: - merged_defaults[edge_id] = edge + merged_defaults[edge.edge_id] = edge continue merged_values = _merge_vertex_default_maps( dict(current.values), dict(edge.values), label="default_property_values.edges", ) - merged_defaults[edge_id] = current.model_copy( + merged_defaults[edge.edge_id] = current.model_copy( update={"values": merged_values}, deep=True ) object.__setattr__(dpv, "edges", list(merged_defaults.values())) @@ -430,7 +495,7 @@ def apply_edge_property_removal_to_db_profile( """Remove edge property references from edge indexes/default values.""" if not removals_by_relation: return - updated_specs: list[EdgeVariantSpec] = [] + updated_specs: list[EdgePhysicalSpec] = [] for spec in profile.edge_specs: removals = ( removals_by_relation.get(spec.relation, set()) diff --git a/graflo/architecture/graph_types.py b/graflo/architecture/graph_types.py index fec2f389..36e7bfef 100644 --- a/graflo/architecture/graph_types.py +++ b/graflo/architecture/graph_types.py @@ -66,9 +66,14 @@ def make_hashable(obj: object) -> object: return list(seen.values()) +# Identifier aliases (kept here to avoid importing schema, which creates cycles). +VertexName: TypeAlias = str + # Edge identifier layers: # - EdgeId: schema-level edge definition key (source, target, relation) EdgeId: TypeAlias = tuple[str, str, str | None] +# - EdgePhysicalKey: physical edge specification key (source, target, relation, purpose) +EdgePhysicalKey: TypeAlias = tuple[str, str, str | None, str | None] GraphEntity: TypeAlias = str | EdgeId logger = logging.getLogger(__name__) @@ -257,7 +262,7 @@ class GraphContainer(ConfigBaseModel): linear: List of default dictionaries containing linear data """ - vertices: dict[str, list] = Field(default_factory=dict) + vertices: dict[VertexName, list] = Field(default_factory=dict) edges: dict[tuple[str, str, str | None], list] = Field(default_factory=dict) linear: list[defaultdict[str | tuple[str, str, str | None], list[Any]]] = Field( default_factory=list diff --git a/graflo/architecture/pipeline/runtime/actor/base.py b/graflo/architecture/pipeline/runtime/actor/base.py index 7eeb4b8c..af1b9086 100644 --- a/graflo/architecture/pipeline/runtime/actor/base.py +++ b/graflo/architecture/pipeline/runtime/actor/base.py @@ -11,7 +11,7 @@ from graflo.architecture.schema.edge import EdgeConfig from graflo.architecture.graph_types import EdgeId, ExtractionContext, LocationIndex from graflo.architecture.contract.ingestion.transform import ProtoTransform -from graflo.architecture.schema.vertex import VertexConfig +from graflo.architecture.schema.vertex import VertexConfig, VertexName from graflo.onto import DBType @@ -32,7 +32,7 @@ class ActorInitContext: edge_derivation: EdgeDerivationRegistry = field( default_factory=EdgeDerivationRegistry ) - allowed_vertex_names: set[str] | None = None + allowed_vertex_names: set[VertexName] | None = None infer_edges: bool = True infer_edge_only: set[EdgeId] = field(default_factory=set) infer_edge_except: set[EdgeId] = field(default_factory=set) @@ -68,7 +68,7 @@ def count(self) -> int: """Get the count of items processed by this actor.""" return 1 - def references_vertices(self) -> set[str]: + def references_vertices(self) -> set[VertexName]: """Return vertex names this actor references.""" return set() @@ -101,3 +101,14 @@ def __str__(self) -> str: def fetch_actors(self, level: int, edges: list) -> tuple[int, type, str, list]: """Fetch actor information for tree representation.""" return level, type(self), str(self), edges + + +class VertexProducingActor(Actor, ABC): + """Abstract base for actors that produce vertex observations.""" + + vertex_config: VertexConfig + + @abstractmethod + def references_vertices(self) -> set[VertexName]: + """Return vertex names this actor can emit.""" + raise NotImplementedError diff --git a/graflo/architecture/pipeline/runtime/actor/config/models.py b/graflo/architecture/pipeline/runtime/actor/config/models.py index a60c3eb7..36ef2c03 100644 --- a/graflo/architecture/pipeline/runtime/actor/config/models.py +++ b/graflo/architecture/pipeline/runtime/actor/config/models.py @@ -9,6 +9,7 @@ from graflo.architecture.base import ConfigBaseModel from graflo.architecture.contract.ingestion.transform import DressConfig from graflo.architecture.edge_derivation import EdgeDerivation +from graflo.architecture.schema.vertex import VertexName from .normalize import normalize_actor_step @@ -49,7 +50,9 @@ class VertexActorConfig(VertexExtractionOptionsConfig): type: Literal["vertex"] = PydanticField( default="vertex", description="Actor type discriminator" ) - vertex: str = PydanticField(..., description="Name of the vertex type to create") + vertex: VertexName = PydanticField( + ..., description="Name of the vertex type to create" + ) @model_validator(mode="before") @classmethod diff --git a/graflo/architecture/pipeline/runtime/actor/edge.py b/graflo/architecture/pipeline/runtime/actor/edge.py index 7b61fa8d..eb521d2c 100644 --- a/graflo/architecture/pipeline/runtime/actor/edge.py +++ b/graflo/architecture/pipeline/runtime/actor/edge.py @@ -15,7 +15,7 @@ Weight, merge_observation_with_transform_buffer, ) -from graflo.architecture.schema.vertex import VertexConfig +from graflo.architecture.schema.vertex import VertexConfig, VertexName logger = logging.getLogger(__name__) @@ -87,7 +87,7 @@ def __init__(self, config: EdgeActorConfig): self.edge: Edge | None = None self.vertex_config: VertexConfig | None = None self.edge_config: EdgeConfig | None = None - self.allowed_vertex_names: set[str] | None = None + self.allowed_vertex_names: set[VertexName] | None = None return self._link_actors = [] @@ -134,7 +134,7 @@ def __init__(self, config: EdgeActorConfig): self.vertex_config: VertexConfig | None = None self.edge_config: EdgeConfig | None = None - self.allowed_vertex_names: set[str] | None = None + self.allowed_vertex_names: set[VertexName] | None = None @property def relation_field(self) -> str | None: @@ -380,7 +380,7 @@ def _call_dynamic( ctx.record_edge_intent(edge=edge, location=lindex, derivation=derivation) return ctx - def references_vertices(self) -> set[str]: + def references_vertices(self) -> set[VertexName]: if self._link_actors: result: set[str] = set() for la in self._link_actors: diff --git a/graflo/architecture/pipeline/runtime/actor/vertex.py b/graflo/architecture/pipeline/runtime/actor/vertex.py index ce9e8844..90210bf3 100644 --- a/graflo/architecture/pipeline/runtime/actor/vertex.py +++ b/graflo/architecture/pipeline/runtime/actor/vertex.py @@ -4,7 +4,7 @@ from typing import Any, Literal -from .base import Actor, ActorConstants, ActorInitContext +from .base import ActorConstants, ActorInitContext, VertexProducingActor from .config import VertexActorConfig from graflo.architecture.graph_types import ( ExtractionContext, @@ -13,12 +13,12 @@ VertexRep, merge_observation_with_transform_buffer, ) -from graflo.architecture.schema.vertex import VertexConfig +from graflo.architecture.schema.vertex import VertexConfig, VertexName from graflo.onto import ExpressionFlavor from graflo.util.merge import merge_doc_basis -class VertexActor(Actor): +class VertexActor(VertexProducingActor): """Actor for processing vertex data.""" def __init__(self, config: VertexActorConfig): @@ -30,7 +30,7 @@ def __init__(self, config: VertexActorConfig): self.extraction_scope: Literal["full", "mapped_only"] = config.extraction_scope self.role: str | None = config.role self.vertex_config: VertexConfig - self.allowed_vertex_names: set[str] | None = None + self.allowed_vertex_names: set[VertexName] | None = None @classmethod def from_config(cls, config: VertexActorConfig) -> VertexActor: @@ -224,5 +224,5 @@ def __call__( ) return ctx - def references_vertices(self) -> set[str]: + def references_vertices(self) -> set[VertexName]: return {self.name} diff --git a/graflo/architecture/pipeline/runtime/actor/vertex_router.py b/graflo/architecture/pipeline/runtime/actor/vertex_router.py index 28702231..4494b32a 100644 --- a/graflo/architecture/pipeline/runtime/actor/vertex_router.py +++ b/graflo/architecture/pipeline/runtime/actor/vertex_router.py @@ -5,7 +5,7 @@ import logging from typing import TYPE_CHECKING, Any, Literal -from .base import Actor, ActorInitContext +from .base import ActorInitContext, VertexProducingActor from .config import ( VertexActorConfig, VertexRouterActorConfig, @@ -15,7 +15,7 @@ LocationIndex, merge_observation_with_transform_buffer, ) -from graflo.architecture.schema.vertex import VertexConfig +from graflo.architecture.schema.vertex import VertexConfig, VertexName if TYPE_CHECKING: from .wrapper import ActorWrapper @@ -23,7 +23,7 @@ logger = logging.getLogger(__name__) -class VertexRouterActor(Actor): +class VertexRouterActor(VertexProducingActor): """Routes documents to the correct VertexActor based on a type field. The merged observation (document + same-location transform buffer) is passed @@ -112,6 +112,9 @@ def _get_or_create_wrapper(self, vertex_type: str) -> "ActorWrapper | None": def count(self) -> int: return 1 + sum(w.count() for w in self._vertex_actors.values()) + def references_vertices(self) -> set[VertexName]: + return set(self._vertex_actors.keys()) + def __call__( self, ctx: ExtractionContext, lindex: LocationIndex, *nargs: Any, **kwargs: Any ) -> ExtractionContext: diff --git a/graflo/architecture/schema/document.py b/graflo/architecture/schema/document.py index 097d9ed9..b5a57dc8 100644 --- a/graflo/architecture/schema/document.py +++ b/graflo/architecture/schema/document.py @@ -46,6 +46,7 @@ def _init_schema(self) -> Schema: def finish_init(self) -> None: self.core_schema.finish_init() + self.db_profile.validate_against_schema(self.core_schema.edge_config) def remove_disconnected_vertices(self) -> set[str]: return self.core_schema.remove_disconnected_vertices() diff --git a/graflo/architecture/schema/edge.py b/graflo/architecture/schema/edge.py index aff939c5..575afc55 100644 --- a/graflo/architecture/schema/edge.py +++ b/graflo/architecture/schema/edge.py @@ -32,7 +32,7 @@ EdgeId, EdgeType, ) -from graflo.architecture.schema.vertex import Field, VertexConfig +from graflo.architecture.schema.vertex import Field, VertexConfig, VertexName # Default relation name for TigerGraph edges when relation is not specified @@ -68,11 +68,11 @@ class Edge(ConfigBaseModel): in pipeline edge steps, not on this model. """ - source: str = PydanticField( + source: VertexName = PydanticField( ..., description="Source vertex type name (e.g. user, company).", ) - target: str = PydanticField( + target: VertexName = PydanticField( ..., description="Target vertex type name (e.g. post, company).", ) diff --git a/graflo/architecture/schema/vertex.py b/graflo/architecture/schema/vertex.py index 5d6a72f4..2ef2c3f7 100644 --- a/graflo/architecture/schema/vertex.py +++ b/graflo/architecture/schema/vertex.py @@ -20,7 +20,7 @@ import ast import json import logging -from typing import Any +from typing import Any, TypeAlias from pydantic import ( ConfigDict, @@ -38,6 +38,7 @@ # Type accepted for vertex properties before normalization (for use by Edge/WeightConfig) PropertiesInputType = list[str] | list["Field"] | list[dict[str, Any]] +VertexName: TypeAlias = str class FieldType(BaseEnum): @@ -82,7 +83,7 @@ class Field(ConfigBaseModel): model_config = ConfigDict(extra="forbid") - name: str = PydanticField( + name: VertexName = PydanticField( ..., description="Name of the field (e.g. column or attribute name).", ) @@ -403,8 +404,10 @@ class VertexConfig(ConfigBaseModel): "When false, explicit identity is required except for blank vertices." ), ) - _vertices_map: dict[str, Vertex] | None = PrivateAttr(default=None) - _vertex_numeric_fields_map: dict[str, object] | None = PrivateAttr(default=None) + _vertices_map: dict[VertexName, Vertex] | None = PrivateAttr(default=None) + _vertex_numeric_fields_map: dict[VertexName, object] | None = PrivateAttr( + default=None + ) @model_validator(mode="after") def build_vertices_map(self) -> "VertexConfig": @@ -441,7 +444,7 @@ def _normalize_vertex_identities( for field_name in missing: vertex.properties.append(Field(name=field_name, type=None)) - def _get_vertices_map(self) -> dict[str, Vertex]: + def _get_vertices_map(self) -> dict[VertexName, Vertex]: """Return the vertices map (set by model validator).""" if self._vertices_map is None: raise RuntimeError("VertexConfig not fully initialized") @@ -465,7 +468,7 @@ def vertex_list(self): """ return list(self._get_vertices_map().values()) - def _get_vertex_by_name(self, identifier: str) -> Vertex: + def _get_vertex_by_name(self, identifier: VertexName) -> Vertex: """Get vertex by logical vertex name.""" m = self._get_vertices_map() if identifier in m: @@ -476,11 +479,11 @@ def _get_vertex_by_name(self, identifier: str) -> Vertex: f"Available names: {available_names}" ) - def identity_fields(self, vertex_name: str) -> list[str]: + def identity_fields(self, vertex_name: VertexName) -> list[str]: """Get identity fields for a vertex.""" return list(self._get_vertices_map()[vertex_name].identity) - def properties(self, vertex_name: str) -> list[Field]: + def properties(self, vertex_name: VertexName) -> list[Field]: """Vertex properties as Field objects.""" vertex = self._get_vertex_by_name(vertex_name) @@ -489,7 +492,7 @@ def properties(self, vertex_name: str) -> list[Field]: def property_names( self, - vertex_name: str, + vertex_name: VertexName, ) -> list[str]: """Vertex property names as strings.""" diff --git a/graflo/rdf/__init__.py b/graflo/rdf/__init__.py new file mode 100644 index 00000000..b1f3112e --- /dev/null +++ b/graflo/rdf/__init__.py @@ -0,0 +1,11 @@ +"""GraFlo RDF bridge: manifest serialization and deserialization.""" + +from __future__ import annotations + +from graflo.rdf.deserializer import ManifestRdfDeserializer +from graflo.rdf.serializer import ManifestRdfSerializer + +__all__ = [ + "ManifestRdfDeserializer", + "ManifestRdfSerializer", +] diff --git a/graflo/rdf/cli.py b/graflo/rdf/cli.py new file mode 100644 index 00000000..8cf358d3 --- /dev/null +++ b/graflo/rdf/cli.py @@ -0,0 +1,125 @@ +"""CLI commands for GraphManifest RDF conversion.""" + +from __future__ import annotations + +import pathlib + +import click +import yaml + +from graflo import GraphManifest +from graflo.rdf.deserializer import ManifestRdfDeserializer +from graflo.rdf.serializer import ManifestRdfSerializer + + +def _load_manifest(path: pathlib.Path) -> GraphManifest: + with path.open(encoding="utf-8") as handle: + data = yaml.safe_load(handle) + return GraphManifest.from_dict(data) + + +def _write_manifest(path: pathlib.Path, manifest: GraphManifest) -> None: + payload = manifest.to_minimal_canonical_dict() + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text( + yaml.safe_dump(payload, default_flow_style=False, sort_keys=False), + encoding="utf-8", + ) + + +@click.command("manifest-to-rdf") +@click.argument( + "manifest_path", + type=click.Path(exists=True, dir_okay=False, path_type=pathlib.Path), +) +@click.option( + "--format", + "output_format", + type=click.Choice(["turtle", "json-ld", "nt", "xml"], case_sensitive=False), + default="turtle", + show_default=True, + help="RDF output format.", +) +@click.option( + "--base-uri", + required=True, + help="Base URI for manifest resource IRIs (e.g. https://mygraph.dev/manifests/v1).", +) +@click.option( + "--output", + "output_path", + type=click.Path(dir_okay=False, path_type=pathlib.Path), + default=None, + help="Output file path. Prints to stdout when omitted.", +) +@click.option( + "--include-ontology/--no-include-ontology", + default=True, + show_default=True, + help="Embed GraFlo meta-ontology triples in output.", +) +def manifest_to_rdf( + manifest_path: pathlib.Path, + output_format: str, + base_uri: str, + output_path: pathlib.Path | None, + include_ontology: bool, +) -> None: + """Convert a GraphManifest YAML file to RDF.""" + manifest = _load_manifest(manifest_path) + serializer = ManifestRdfSerializer(include_ontology=include_ontology) + graph = serializer.to_graph(manifest, base_uri) + serialized = graph.serialize(format=output_format.lower()) + + if output_path is None: + click.echo(serialized) + return + + output_path.parent.mkdir(parents=True, exist_ok=True) + output_path.write_text(serialized, encoding="utf-8") + + +@click.command("rdf-to-manifest") +@click.argument( + "rdf_path", + type=click.Path(exists=True, dir_okay=False, path_type=pathlib.Path), +) +@click.option( + "--manifest-uri", + required=True, + help="GraphManifest subject URI inside the RDF document.", +) +@click.option( + "--input-format", + type=click.Choice(["turtle", "json-ld", "nt", "xml", "n3"], case_sensitive=False), + default="turtle", + show_default=True, + help="RDF input format.", +) +@click.option( + "--output", + "output_path", + type=click.Path(dir_okay=False, path_type=pathlib.Path), + default=None, + help="Output manifest YAML path. Prints to stdout when omitted.", +) +def rdf_to_manifest( + rdf_path: pathlib.Path, + manifest_uri: str, + input_format: str, + output_path: pathlib.Path | None, +) -> None: + """Convert RDF (GraFlo meta-ontology) back to GraphManifest YAML.""" + from rdflib import Graph + + graph = Graph() + graph.parse(str(rdf_path), format=input_format.lower()) + manifest = ManifestRdfDeserializer().from_graph(graph, manifest_uri) + + if output_path is None: + click.echo( + yaml.safe_dump(manifest.to_minimal_canonical_dict(), sort_keys=False) + ) + return + + _write_manifest(output_path, manifest) diff --git a/graflo/rdf/deserializer.py b/graflo/rdf/deserializer.py new file mode 100644 index 00000000..59a5fd4d --- /dev/null +++ b/graflo/rdf/deserializer.py @@ -0,0 +1,644 @@ +"""Deserialize RDF graphs into GraphManifest instances.""" + +from __future__ import annotations + +from typing import Any + +from rdflib import BNode, Graph, Literal, URIRef +from rdflib.namespace import RDF + +from graflo.architecture.contract.manifest import GraphManifest +from graflo.rdf import namespace as ns +from graflo.rdf.utils import parse_json_literal, reverse_enum + + +class ManifestRdfDeserializer: + """Reconstruct a :class:`GraphManifest` from RDF using the GraFlo meta-ontology.""" + + def from_turtle(self, ttl: str, manifest_uri: str) -> GraphManifest: + """Load Turtle and deserialize.""" + graph = Graph() + graph.parse(data=ttl, format="turtle") + return self.from_graph(graph, manifest_uri) + + def from_graph(self, graph: Graph, manifest_uri: str) -> GraphManifest: + """Deserialize manifest from an rdflib graph.""" + manifest_ref = URIRef(manifest_uri.rstrip("/")) + payload: dict[str, Any] = {} + + schema_uri = self._object(graph, manifest_ref, ns.hasSchema) + if schema_uri is not None: + payload["schema"] = self._parse_schema(graph, schema_uri) + + ingestion_uri = self._object(graph, manifest_ref, ns.hasIngestionModel) + if ingestion_uri is not None: + payload["ingestion_model"] = self._parse_ingestion_model( + graph, ingestion_uri + ) + + bindings_uri = self._object(graph, manifest_ref, ns.hasBindings) + if bindings_uri is not None: + payload["bindings"] = self._parse_bindings(graph, bindings_uri) + + return GraphManifest.from_dict(payload) + + def _parse_schema(self, graph: Graph, schema_uri: URIRef | BNode) -> dict[str, Any]: + metadata_uri = self._object(graph, schema_uri, ns.hasMetadata) + core_uri = self._object(graph, schema_uri, ns.hasCoreSchema) + profile_uri = self._object(graph, schema_uri, ns.hasDatabaseProfile) + + schema: dict[str, Any] = {} + if metadata_uri is not None: + schema["metadata"] = { + "name": self._literal(graph, metadata_uri, ns.name), + "version": self._literal(graph, metadata_uri, ns.version), + "description": self._literal(graph, metadata_uri, ns.description), + } + schema["metadata"] = { + key: value + for key, value in schema["metadata"].items() + if value is not None + } + + if core_uri is not None: + schema["core_schema"] = self._parse_core_schema(graph, core_uri) + + if profile_uri is not None: + schema["db_profile"] = self._parse_database_profile(graph, profile_uri) + + return schema + + def _parse_core_schema( + self, graph: Graph, core_uri: URIRef | BNode + ) -> dict[str, Any]: + vertex_config_uri = self._object(graph, core_uri, ns.hasVertexConfig) + edge_config_uri = self._object(graph, core_uri, ns.hasEdgeConfig) + vertices = self._ordered_nodes( + graph, + vertex_config_uri, + ns.hasVertex, + self._parse_vertex, + ) + edges = self._ordered_nodes( + graph, + edge_config_uri, + ns.hasEdge, + self._parse_edge, + ) + return { + "vertex_config": self._parse_vertex_config( + graph, vertex_config_uri, vertices + ), + "edge_config": {"edges": edges}, + } + + def _parse_vertex_config( + self, + graph: Graph, + vertex_config_uri: URIRef | BNode | None, + vertices: list[dict[str, Any]], + ) -> dict[str, Any]: + vertex_config: dict[str, Any] = {"vertices": vertices} + force_types = parse_json_literal( + self._literal(graph, vertex_config_uri, ns.forceTypes) + ) + if isinstance(force_types, dict): + vertex_config["force_types"] = force_types + + identity_from_all_properties = self._literal( + graph, vertex_config_uri, ns.identityFromAllProperties + ) + if identity_from_all_properties is not None: + vertex_config["identity_from_all_properties"] = ( + identity_from_all_properties.lower() == "true" + ) + return vertex_config + + def _parse_vertex(self, graph: Graph, vertex_uri: URIRef | BNode) -> dict[str, Any]: + identities = [] + for identity_node in self._related_nodes(graph, vertex_uri, ns.hasIdentity): + identity = self._literal(graph, identity_node, ns.identityName) + if identity is not None: + identities.append(identity) + vertex: dict[str, Any] = { + "name": self._literal(graph, vertex_uri, ns.name), + "identity": identities, + "properties": self._ordered_nodes( + graph, + vertex_uri, + ns.hasField, + self._parse_field, + ), + } + description = self._literal(graph, vertex_uri, ns.description) + if description is not None: + vertex["description"] = description + blank_value = self._literal(graph, vertex_uri, ns.blank) + if blank_value is not None: + vertex["blank"] = blank_value.lower() == "true" + + payload = parse_json_literal(self._literal(graph, vertex_uri, ns.vertexPayload)) + if isinstance(payload, dict): + vertex.update(payload) + + return vertex + + def _parse_field( + self, graph: Graph, field_uri: URIRef | BNode + ) -> dict[str, Any] | str: + name = self._literal(graph, field_uri, ns.name) + if name is None: + return {} + field_type_uri = self._object(graph, field_uri, ns.fieldType) + description = self._literal(graph, field_uri, ns.description) + if field_type_uri is None and description is None: + return name + field: dict[str, Any] = {"name": name} + if field_type_uri is not None: + field_type = reverse_enum(ns.ENUM_REGISTRIES["field_type"], field_type_uri) + if field_type is not None: + field["type"] = field_type + if description is not None: + field["description"] = description + return field + + def _parse_edge(self, graph: Graph, edge_uri: URIRef | BNode) -> dict[str, Any]: + source_uri = self._object(graph, edge_uri, ns.edgeSource) + target_uri = self._object(graph, edge_uri, ns.edgeTarget) + edge: dict[str, Any] = { + "source": self._literal(graph, source_uri, ns.name) if source_uri else None, + "target": self._literal(graph, target_uri, ns.name) if target_uri else None, + } + relation = self._literal(graph, edge_uri, ns.relation) + if relation is not None: + edge["relation"] = relation + description = self._literal(graph, edge_uri, ns.description) + if description is not None: + edge["description"] = description + + payload = parse_json_literal(self._literal(graph, edge_uri, ns.edgePayload)) + if isinstance(payload, dict): + edge.update(payload) + identities = parse_json_literal( + self._literal(graph, edge_uri, ns.edgeIdentities) + ) + if isinstance(identities, list): + edge["identities"] = identities + edge_type = self._literal(graph, edge_uri, ns.edgeType) + if edge_type is not None: + edge["type"] = edge_type + edge_by = self._literal(graph, edge_uri, ns.edgeBy) + if edge_by is not None: + edge["by"] = edge_by + + properties = self._ordered_nodes( + graph, + edge_uri, + ns.hasField, + self._parse_field, + ) + if properties: + edge["properties"] = properties + + return edge + + def _parse_database_profile( + self, graph: Graph, profile_uri: URIRef | BNode + ) -> dict[str, Any]: + profile: dict[str, Any] = {} + db_flavor_uri = self._object(graph, profile_uri, ns.dbFlavor) + if db_flavor_uri is not None: + db_flavor = reverse_enum(ns.ENUM_REGISTRIES["db_type"], db_flavor_uri) + if db_flavor is not None: + profile["db_flavor"] = db_flavor + target_namespace = self._literal(graph, profile_uri, ns.targetNamespace) + if target_namespace is not None: + profile["target_namespace"] = target_namespace + self._parse_profile_indexes(graph, profile_uri, profile) + + payload = parse_json_literal( + self._literal(graph, profile_uri, ns.profilePayload) + ) + if isinstance(payload, dict): + profile.update(payload) + return profile + + def _parse_profile_indexes( + self, + graph: Graph, + profile_uri: URIRef | BNode, + profile: dict[str, Any], + ) -> None: + vertex_indexes: dict[str, list[dict[str, Any]]] = {} + for index_node in self._related_nodes(graph, profile_uri, ns.hasVertexIndex): + index_payload = self._parse_index(graph, index_node) + vertex_name = self._literal(graph, index_node, ns.profileVertexName) + if vertex_name is None: + continue + vertex_indexes.setdefault(vertex_name, []).append(index_payload) + if vertex_indexes: + profile["vertex_indexes"] = vertex_indexes + + edge_specs: list[dict[str, Any]] = [] + for spec_node in self._related_nodes(graph, profile_uri, ns.hasEdgeSpec): + spec_payload: dict[str, Any] = { + "source": self._literal(graph, spec_node, ns.specSource), + "target": self._literal(graph, spec_node, ns.specTarget), + } + relation = self._literal(graph, spec_node, ns.specRelation) + if relation is not None: + spec_payload["relation"] = relation + purpose = self._literal(graph, spec_node, ns.specPurpose) + if purpose is not None: + spec_payload["purpose"] = purpose + relation_name = self._literal(graph, spec_node, ns.specRelationName) + if relation_name is not None: + spec_payload["relation_name"] = relation_name + indexes_mode = self._literal(graph, spec_node, ns.specIndexesMode) + if indexes_mode is not None: + spec_payload["indexes_mode"] = indexes_mode + indexes = [ + self._parse_index(graph, index_node) + for index_node in self._related_nodes(graph, spec_node, ns.hasIndex) + ] + if indexes: + spec_payload["indexes"] = indexes + edge_specs.append(spec_payload) + if edge_specs: + profile["edge_specs"] = edge_specs + + # Backward compatibility with legacy JSON payload predicates. + if "vertex_indexes" not in profile: + legacy_vertex_indexes = parse_json_literal( + self._literal(graph, profile_uri, ns.vertexIndexes) + ) + if isinstance(legacy_vertex_indexes, dict): + profile["vertex_indexes"] = legacy_vertex_indexes + if "edge_specs" not in profile: + legacy_edge_specs = parse_json_literal( + self._literal(graph, profile_uri, ns.edgeSpecs) + ) + if isinstance(legacy_edge_specs, list): + profile["edge_specs"] = legacy_edge_specs + + def _parse_index(self, graph: Graph, index_node: URIRef | BNode) -> dict[str, Any]: + index: dict[str, Any] = { + "fields": self._literals(graph, index_node, ns.indexField) + } + name = self._literal(graph, index_node, ns.indexName) + if name is not None: + index["name"] = name + unique = self._literal(graph, index_node, ns.indexUnique) + if unique is not None: + index["unique"] = unique.lower() == "true" + index_type = self._literal(graph, index_node, ns.indexType) + if index_type is not None: + index["type"] = index_type + deduplicate = self._literal(graph, index_node, ns.indexDeduplicate) + if deduplicate is not None: + index["deduplicate"] = deduplicate.lower() == "true" + sparse = self._literal(graph, index_node, ns.indexSparse) + if sparse is not None: + index["sparse"] = sparse.lower() == "true" + exclude_edge_endpoints = self._literal( + graph, index_node, ns.indexExcludeEdgeEndpoints + ) + if exclude_edge_endpoints is not None: + index["exclude_edge_endpoints"] = exclude_edge_endpoints.lower() == "true" + return index + + def _parse_ingestion_model( + self, graph: Graph, ingestion_uri: URIRef | BNode + ) -> dict[str, Any]: + model: dict[str, Any] = {} + duplicate_uri = self._object(graph, ingestion_uri, ns.edgesOnDuplicate) + if duplicate_uri is not None: + duplicate = reverse_enum( + ns.ENUM_REGISTRIES["edge_duplicate_policy"], duplicate_uri + ) + if duplicate is not None: + model["edges_on_duplicate"] = duplicate + + transforms = self._ordered_nodes( + graph, + ingestion_uri, + ns.hasTransform, + self._parse_proto_transform, + ) + if transforms: + model["transforms"] = transforms + + resources = self._ordered_nodes( + graph, + ingestion_uri, + ns.hasResource, + self._parse_resource, + ) + if resources: + model["resources"] = resources + return model + + def _parse_proto_transform( + self, graph: Graph, transform_uri: URIRef | BNode + ) -> dict[str, Any]: + transform: dict[str, Any] = { + "name": self._literal(graph, transform_uri, ns.name), + "module": self._literal(graph, transform_uri, ns.transformModule), + "foo": self._literal(graph, transform_uri, ns.transformFunction), + "input": self._literals(graph, transform_uri, ns.transformInput), + "output": self._literals(graph, transform_uri, ns.transformOutput), + } + + params = parse_json_literal( + self._literal(graph, transform_uri, ns.transformParams) + ) + if isinstance(params, dict): + if "params" in params: + transform["params"] = params["params"] + if params.get("input_groups") is not None: + transform["input_groups"] = params["input_groups"] + if params.get("output_groups") is not None: + transform["output_groups"] = params["output_groups"] + if ( + "params" not in params + and "input_groups" not in params + and "output_groups" not in params + ): + transform["params"] = params + + target_uri = self._object(graph, transform_uri, ns.transformTarget) + if target_uri is not None: + target = reverse_enum(ns.ENUM_REGISTRIES["transform_target"], target_uri) + if target is not None: + transform["target"] = target + + dress_uri = self._object(graph, transform_uri, ns.hasDress) + if dress_uri is not None: + transform["dress"] = { + "key": self._literal(graph, dress_uri, ns.dressKey), + "value": self._literal(graph, dress_uri, ns.dressValue), + } + + keys_uri = self._object(graph, transform_uri, ns.hasKeySelection) + if keys_uri is not None: + mode_uri = self._object(graph, keys_uri, ns.keySelectionMode) + mode = ( + reverse_enum(ns.ENUM_REGISTRIES["key_selection_mode"], mode_uri) + if mode_uri + else "all" + ) + transform["keys"] = { + "mode": mode or "all", + "names": self._literals(graph, keys_uri, ns.keySelectionName), + } + + return { + key: value + for key, value in transform.items() + if value not in (None, [], {}) + } + + def _parse_resource( + self, graph: Graph, resource_uri: URIRef | BNode + ) -> dict[str, Any]: + resource: dict[str, Any] = {"name": self._literal(graph, resource_uri, ns.name)} + + payload = parse_json_literal( + self._literal(graph, resource_uri, ns.resourcePayload) + ) + if isinstance(payload, dict): + resource.update(payload) + + steps = self._parse_pipeline_steps(graph, resource_uri) + if steps: + resource["pipeline"] = steps + + infer_only = [ + self._parse_edge_infer_spec(graph, spec_node) + for spec_node in self._related_nodes( + graph, resource_uri, ns.hasEdgeInferOnly + ) + ] + if infer_only: + resource["infer_edge_only"] = infer_only + + infer_except = [ + self._parse_edge_infer_spec(graph, spec_node) + for spec_node in self._related_nodes( + graph, resource_uri, ns.hasEdgeInferExcept + ) + ] + if infer_except: + resource["infer_edge_except"] = infer_except + + return resource + + def _parse_pipeline_steps( + self, + graph: Graph, + resource_uri: URIRef | BNode, + ) -> list[dict[str, Any]]: + step_nodes = self._related_nodes(graph, resource_uri, ns.hasActor) + indexed_steps: list[tuple[int, dict[str, Any]]] = [] + for step_node in step_nodes: + index_literal = self._literal(graph, step_node, ns.stepIndex) + index = int(index_literal) if index_literal is not None else 0 + payload = self._parse_actor_step(graph, step_node) + if payload: + indexed_steps.append((index, payload)) + indexed_steps.sort(key=lambda item: item[0]) + return [step for _, step in indexed_steps] + + def _parse_actor_step( + self, graph: Graph, step_node: URIRef | BNode + ) -> dict[str, Any]: + payload = parse_json_literal(self._literal(graph, step_node, ns.stepPayload)) + if not isinstance(payload, dict): + return {} + + actor_type = self._literal(graph, step_node, ns.actorType) + if actor_type == "descend": + nested_nodes = self._related_nodes(graph, step_node, ns.hasActor) + nested_indexed: list[tuple[int, dict[str, Any]]] = [] + for nested_node in nested_nodes: + nested_index_literal = self._literal(graph, nested_node, ns.stepIndex) + nested_index = ( + int(nested_index_literal) if nested_index_literal is not None else 0 + ) + nested_payload = self._parse_actor_step(graph, nested_node) + if nested_payload: + nested_indexed.append((nested_index, nested_payload)) + nested_indexed.sort(key=lambda item: item[0]) + payload["pipeline"] = [item for _, item in nested_indexed] + return payload + + def _parse_edge_infer_spec( + self, + graph: Graph, + spec_node: URIRef | BNode, + ) -> dict[str, Any]: + payload = parse_json_literal(self._literal(graph, spec_node, ns.stepPayload)) + if isinstance(payload, dict): + return payload + return {} + + def _parse_bindings( + self, graph: Graph, bindings_uri: URIRef | BNode + ) -> dict[str, Any]: + bindings: dict[str, Any] = {} + connectors = self._ordered_nodes( + graph, + bindings_uri, + ns.hasConnector, + self._parse_connector, + ) + if connectors: + bindings["connectors"] = connectors + + resource_connector = [] + for binding_node in self._related_nodes( + graph, bindings_uri, ns.bindsResourceToConnector + ): + resource_connector.append( + { + "resource": self._literal(graph, binding_node, ns.resourceName), + "connector": self._literal(graph, binding_node, ns.connectorName), + } + ) + if resource_connector: + bindings["resource_connector"] = resource_connector + + connector_connection = [] + for binding_node in self._related_nodes( + graph, bindings_uri, ns.bindsConnectorToConnProxy + ): + connector_connection.append( + { + "connector": self._literal(graph, binding_node, ns.connectorName), + "conn_proxy": self._literal(graph, binding_node, ns.connProxy), + } + ) + if connector_connection: + bindings["connector_connection"] = connector_connection + + staging_proxy = [] + for binding_node in self._related_nodes( + graph, bindings_uri, ns.hasStagingProxy + ): + staging_proxy.append( + { + "name": self._literal(graph, binding_node, ns.name), + "conn_proxy": self._literal(graph, binding_node, ns.connProxy), + } + ) + if staging_proxy: + bindings["staging_proxy"] = staging_proxy + + return bindings + + def _parse_connector( + self, graph: Graph, connector_uri: URIRef | BNode + ) -> dict[str, Any]: + rdf_types = {str(value) for value in graph.objects(connector_uri, RDF.type)} + connector_model = "FileConnector" + for rdf_type, model_name in ns.CONNECTOR_CLASS_BY_RDF_TYPE.items(): + if str(rdf_type) in rdf_types: + connector_model = model_name + break + + connector: dict[str, Any] = {} + name = self._literal(graph, connector_uri, ns.name) + if name is not None: + connector["name"] = name + resource_name = self._literal(graph, connector_uri, ns.resourceName) + if resource_name is not None: + connector["resource_name"] = resource_name + + payload = parse_json_literal( + self._literal(graph, connector_uri, ns.connectorPayload) + ) + if isinstance(payload, dict): + connector.update(payload) + + connector_cls = ns.CONNECTOR_MODELS[connector_model] + validated = connector_cls.model_validate(connector) + return validated.model_dump( + mode="json", by_alias=True, exclude={"hash"}, exclude_none=True + ) + + def _ordered_nodes( + self, + graph: Graph, + subject: URIRef | BNode | None, + predicate: URIRef, + parser: Any, + ) -> list[Any]: + if subject is None: + return [] + indexed: list[tuple[int, Any]] = [] + for node in self._related_nodes(graph, subject, predicate): + index_literal = self._literal(graph, node, ns.artifactIndex) + index = int(index_literal) if index_literal is not None else 0 + indexed.append((index, parser(graph, node))) + indexed.sort(key=lambda item: item[0]) + return [value for _, value in indexed] + + @staticmethod + def _related_nodes( + graph: Graph, + subject: URIRef | BNode, + predicate: URIRef, + ) -> list[URIRef | BNode]: + return [ + obj + for obj in graph.objects(subject, predicate) + if isinstance(obj, (URIRef, BNode)) + ] + + @staticmethod + def _object( + graph: Graph, + subject: URIRef | BNode | None, + predicate: URIRef, + ) -> URIRef | BNode | None: + if subject is None: + return None + for obj in graph.objects(subject, predicate): + if isinstance(obj, (URIRef, BNode)): + return obj + return None + + @staticmethod + def _objects( + graph: Graph, subject: URIRef | BNode, predicate: URIRef + ) -> list[URIRef]: + return [ + obj for obj in graph.objects(subject, predicate) if isinstance(obj, URIRef) + ] + + @staticmethod + def _literal( + graph: Graph, + subject: URIRef | BNode | None, + predicate: URIRef, + ) -> str | None: + if subject is None: + return None + for obj in graph.objects(subject, predicate): + if isinstance(obj, Literal): + return str(obj) + return None + + @staticmethod + def _literals( + graph: Graph, + subject: URIRef | BNode | None, + predicate: URIRef, + ) -> list[str]: + if subject is None: + return [] + return [ + str(obj) + for obj in graph.objects(subject, predicate) + if isinstance(obj, Literal) + ] diff --git a/graflo/rdf/namespace.py b/graflo/rdf/namespace.py new file mode 100644 index 00000000..cdc110f2 --- /dev/null +++ b/graflo/rdf/namespace.py @@ -0,0 +1,249 @@ +"""GraFlo RDF namespace and vocabulary term constants.""" + +from __future__ import annotations + +from rdflib import Namespace +from rdflib.term import URIRef + +from graflo.architecture.contract.bindings.connectors import ( + FileConnector as FileConnectorModel, + SparqlConnector as SparqlConnectorModel, + TableConnector as TableConnectorModel, +) + +GF_ONTOLOGY_IRI = "https://ontology.growgraph.dev/graflo" +GF_VERSION = "1.0.0" +GF_VERSION_IRI = f"{GF_ONTOLOGY_IRI}/{GF_VERSION}" +GF_BASE = "https://ontology.growgraph.dev/graflo/" +GF = Namespace(GF_BASE) + +# Classes +GraphManifest = GF.GraphManifest +Schema = GF.Schema +CoreSchema = GF.CoreSchema +VertexConfig = GF.VertexConfig +EdgeConfig = GF.EdgeConfig +GraphMetadata = GF.GraphMetadata +DatabaseProfile = GF.DatabaseProfile +Vertex = GF.Vertex +Edge = GF.Edge +Field = GF.Field +Identity = GF.Identity +IngestionModel = GF.IngestionModel +Resource = GF.Resource +EdgeInferSpec = GF.EdgeInferSpec +ProtoTransform = GF.ProtoTransform +Transform = GF.Transform +DressConfig = GF.DressConfig +KeySelectionConfig = GF.KeySelectionConfig +Actor = GF.Actor +VertexProducingActor = GF.VertexProducingActor +VertexActorStep = GF.VertexActor +EdgeActorStep = GF.EdgeActor +TransformActorStep = GF.TransformActor +DescendActorStep = GF.DescendActor +VertexRouterActorStep = GF.VertexRouterActor +Bindings = GF.Bindings +BoundConnector = GF.BoundConnector +FileConnector = GF.FileConnector +TableConnector = GF.TableConnector +SparqlConnector = GF.SparqlConnector +ResourceConnectorBinding = GF.ResourceConnectorBinding +ConnectorConnectionBinding = GF.ConnectorConnectionBinding +StagingProxyBinding = GF.StagingProxyBinding +EdgePhysicalSpec = GF.EdgePhysicalSpec +Index = GF.Index + +# Object properties +hasSchema = GF.hasSchema +hasIngestionModel = GF.hasIngestionModel +hasBindings = GF.hasBindings +hasCoreSchema = GF.hasCoreSchema +hasVertexConfig = GF.hasVertexConfig +hasEdgeConfig = GF.hasEdgeConfig +hasMetadata = GF.hasMetadata +hasDatabaseProfile = GF.hasDatabaseProfile +hasVertex = GF.hasVertex +hasEdge = GF.hasEdge +hasField = GF.hasField +hasIdentity = GF.hasIdentity +edgeSource = GF.edgeSource +edgeTarget = GF.edgeTarget +hasResource = GF.hasResource +hasTransform = GF.hasTransform +hasActor = GF.hasActor +targetsVertex = GF.targetsVertex +targetsEdge = GF.targetsEdge +executesTransform = GF.executesTransform +hasDress = GF.hasDress +hasKeySelection = GF.hasKeySelection +hasConnector = GF.hasConnector +bindsResourceToConnector = GF.bindsResourceToConnector +bindsConnectorToConnProxy = GF.bindsConnectorToConnProxy +hasStagingProxy = GF.hasStagingProxy +hasVertexIndex = GF.hasVertexIndex +hasEdgeSpec = GF.hasEdgeSpec +refinesEdge = GF.refinesEdge +hasIndex = GF.hasIndex +hasEdgeInferOnly = GF.hasEdgeInferOnly +hasEdgeInferExcept = GF.hasEdgeInferExcept + +# Datatype properties +name = GF.name +version = GF.version +description = GF.description +enumValue = GF.enumValue +fieldType = GF.fieldType +dbFlavor = GF.dbFlavor +targetNamespace = GF.targetNamespace +identityName = GF.identityName +relation = GF.relation +blank = GF.blank +edgesOnDuplicate = GF.edgesOnDuplicate +resourceName = GF.resourceName +connectorName = GF.connectorName +connProxy = GF.connProxy +actorType = GF.actorType +stepIndex = GF.stepIndex +stepPayload = GF.stepPayload +transformModule = GF.transformModule +transformFunction = GF.transformFunction +transformInput = GF.transformInput +transformOutput = GF.transformOutput +transformParams = GF.transformParams +transformTarget = GF.transformTarget +transformStrategy = GF.transformStrategy +renameMap = GF.renameMap +dressKey = GF.dressKey +dressValue = GF.dressValue +keySelectionMode = GF.keySelectionMode +keySelectionName = GF.keySelectionName +boundSourceKind = GF.boundSourceKind +connectorPayload = GF.connectorPayload +resourcePayload = GF.resourcePayload +profilePayload = GF.profilePayload +edgePayload = GF.edgePayload +vertexPayload = GF.vertexPayload +artifactIndex = GF.artifactIndex +forceTypes = GF.forceTypes +identityFromAllProperties = GF.identityFromAllProperties +edgeIdentities = GF.edgeIdentities +edgeType = GF.edgeType +edgeBy = GF.edgeBy +vertexIndexes = GF.vertexIndexes +edgeSpecs = GF.edgeSpecs +profileVertexName = GF.profileVertexName +indexName = GF.indexName +indexField = GF.indexField +indexUnique = GF.indexUnique +indexType = GF.indexType +indexDeduplicate = GF.indexDeduplicate +indexSparse = GF.indexSparse +indexExcludeEdgeEndpoints = GF.indexExcludeEdgeEndpoints +specSource = GF.specSource +specTarget = GF.specTarget +specRelation = GF.specRelation +specPurpose = GF.specPurpose +specRelationName = GF.specRelationName +specIndexesMode = GF.specIndexesMode + +# Actor step type mapping +ACTOR_STEP_CLASSES: dict[str, object] = { + "vertex": VertexActorStep, + "edge": EdgeActorStep, + "transform": TransformActorStep, + "descend": DescendActorStep, + "vertex_router": VertexRouterActorStep, +} + +# DBType enum value -> named individual +DB_TYPE_INDIVIDUALS: dict[str, object] = { + "arango": GF.ArangoDB, + "neo4j": GF.Neo4j, + "tigergraph": GF.TigerGraph, + "falkordb": GF.FalkorDB, + "memgraph": GF.Memgraph, + "nebula": GF.Nebula, + "postgres": GF.Postgres, + "mysql": GF.MySQL, + "mongodb": GF.MongoDB, + "sqlite": GF.SQLite, + "sparql": GF.SparqlEndpoint, +} + +FIELD_TYPE_INDIVIDUALS: dict[str, object] = { + "INT": GF.INT, + "UINT": GF.UINT, + "FLOAT": GF.FLOAT, + "DOUBLE": GF.DOUBLE, + "BOOL": GF.BOOL, + "STRING": GF.STRING, + "DATETIME": GF.DATETIME, +} + +BOUND_SOURCE_KIND_INDIVIDUALS: dict[str, object] = { + "file": GF.FileSource, + "sql_table": GF.SqlTableSource, + "sparql": GF.SparqlSource, +} + +TRANSFORM_TARGET_INDIVIDUALS: dict[str, object] = { + "values": GF.ValuesTarget, + "keys": GF.KeysTarget, +} + +TRANSFORM_STRATEGY_INDIVIDUALS: dict[str, object] = { + "single": GF.SingleStrategy, + "each": GF.EachStrategy, + "all": GF.AllStrategy, +} + +KEY_SELECTION_MODE_INDIVIDUALS: dict[str, object] = { + "all": GF.AllKeysMode, + "include": GF.IncludeKeysMode, + "exclude": GF.ExcludeKeysMode, +} + +EDGE_DUPLICATE_POLICY_INDIVIDUALS: dict[str, object] = { + "ignore": GF.IgnoreDuplicate, + "upsert": GF.UpsertDuplicate, +} + +CONNECTOR_CLASSES: dict[str, object] = { + "FileConnector": FileConnector, + "TableConnector": TableConnector, + "SparqlConnector": SparqlConnector, +} + +CONNECTOR_CLASS_BY_RDF_TYPE: dict[URIRef, str] = { + FileConnector: "FileConnector", + TableConnector: "TableConnector", + SparqlConnector: "SparqlConnector", +} + +CONNECTOR_MODELS = { + "FileConnector": FileConnectorModel, + "TableConnector": TableConnectorModel, + "SparqlConnector": SparqlConnectorModel, +} + +ENUM_REGISTRIES: dict[str, dict[str, object]] = { + "db_type": DB_TYPE_INDIVIDUALS, + "field_type": FIELD_TYPE_INDIVIDUALS, + "bound_source_kind": BOUND_SOURCE_KIND_INDIVIDUALS, + "transform_target": TRANSFORM_TARGET_INDIVIDUALS, + "transform_strategy": TRANSFORM_STRATEGY_INDIVIDUALS, + "key_selection_mode": KEY_SELECTION_MODE_INDIVIDUALS, + "edge_duplicate_policy": EDGE_DUPLICATE_POLICY_INDIVIDUALS, +} + +MODEL_PAYLOAD_EXCLUDES: dict[str, set[str]] = { + "database_profile": { + "db_flavor", + "target_namespace", + "vertex_indexes", + "edge_specs", + }, + "resource": {"name", "pipeline"}, + "connector": {"hash", "name", "resource_name"}, +} diff --git a/graflo/rdf/ontology/graflo-context.jsonld b/graflo/rdf/ontology/graflo-context.jsonld new file mode 100644 index 00000000..cbf107ee --- /dev/null +++ b/graflo/rdf/ontology/graflo-context.jsonld @@ -0,0 +1,252 @@ +{ + "@context": { + "gf": "https://ontology.growgraph.dev/graflo/", + "skos": "http://www.w3.org/2004/02/skos/core#", + "xsd": "http://www.w3.org/2001/XMLSchema#", + "GraphManifest": "gf:GraphManifest", + "Schema": "gf:Schema", + "CoreSchema": "gf:CoreSchema", + "VertexConfig": "gf:VertexConfig", + "EdgeConfig": "gf:EdgeConfig", + "GraphMetadata": "gf:GraphMetadata", + "DatabaseProfile": "gf:DatabaseProfile", + "Vertex": "gf:Vertex", + "Edge": "gf:Edge", + "Field": "gf:Field", + "Identity": "gf:Identity", + "IngestionModel": "gf:IngestionModel", + "Resource": "gf:Resource", + "ProtoTransform": "gf:ProtoTransform", + "Transform": "gf:Transform", + "DressConfig": "gf:DressConfig", + "KeySelectionConfig": "gf:KeySelectionConfig", + "Bindings": "gf:Bindings", + "Actor": "gf:Actor", + "VertexActor": "gf:VertexActor", + "EdgeActor": "gf:EdgeActor", + "TransformActor": "gf:TransformActor", + "DescendActor": "gf:DescendActor", + "VertexRouterActor": "gf:VertexRouterActor", + "FileConnector": "gf:FileConnector", + "TableConnector": "gf:TableConnector", + "SparqlConnector": "gf:SparqlConnector", + "BoundConnector": "gf:BoundConnector", + "EdgeInferSpec": "gf:EdgeInferSpec", + "ResourceConnectorBinding": "gf:ResourceConnectorBinding", + "ConnectorConnectionBinding": "gf:ConnectorConnectionBinding", + "StagingProxyBinding": "gf:StagingProxyBinding", + "EdgePhysicalSpec": "gf:EdgePhysicalSpec", + "Index": "gf:Index", + "hasSchema": { + "@id": "gf:hasSchema", + "@type": "@id" + }, + "hasIngestionModel": { + "@id": "gf:hasIngestionModel", + "@type": "@id" + }, + "hasBindings": { + "@id": "gf:hasBindings", + "@type": "@id" + }, + "hasCoreSchema": { + "@id": "gf:hasCoreSchema", + "@type": "@id" + }, + "hasVertexConfig": { + "@id": "gf:hasVertexConfig", + "@type": "@id" + }, + "hasEdgeConfig": { + "@id": "gf:hasEdgeConfig", + "@type": "@id" + }, + "hasMetadata": { + "@id": "gf:hasMetadata", + "@type": "@id" + }, + "hasDatabaseProfile": { + "@id": "gf:hasDatabaseProfile", + "@type": "@id" + }, + "hasVertex": { + "@id": "gf:hasVertex", + "@type": "@id" + }, + "hasEdge": { + "@id": "gf:hasEdge", + "@type": "@id" + }, + "hasField": { + "@id": "gf:hasField", + "@type": "@id" + }, + "hasIdentity": { + "@id": "gf:hasIdentity", + "@type": "@id" + }, + "edgeSource": { + "@id": "gf:edgeSource", + "@type": "@id" + }, + "edgeTarget": { + "@id": "gf:edgeTarget", + "@type": "@id" + }, + "hasResource": { + "@id": "gf:hasResource", + "@type": "@id" + }, + "hasTransform": { + "@id": "gf:hasTransform", + "@type": "@id" + }, + "hasActor": { + "@id": "gf:hasActor", + "@type": "@id" + }, + "executesTransform": { + "@id": "gf:executesTransform", + "@type": "@id" + }, + "hasDress": { + "@id": "gf:hasDress", + "@type": "@id" + }, + "hasKeySelection": { + "@id": "gf:hasKeySelection", + "@type": "@id" + }, + "hasConnector": { + "@id": "gf:hasConnector", + "@type": "@id" + }, + "bindsResourceToConnector": { + "@id": "gf:bindsResourceToConnector", + "@type": "@id" + }, + "bindsConnectorToConnProxy": { + "@id": "gf:bindsConnectorToConnProxy", + "@type": "@id" + }, + "hasStagingProxy": { + "@id": "gf:hasStagingProxy", + "@type": "@id" + }, + "hasVertexIndex": { + "@id": "gf:hasVertexIndex", + "@type": "@id" + }, + "hasEdgeSpec": { + "@id": "gf:hasEdgeSpec", + "@type": "@id" + }, + "hasIndex": { + "@id": "gf:hasIndex", + "@type": "@id" + }, + "hasEdgeInferOnly": { + "@id": "gf:hasEdgeInferOnly", + "@type": "@id" + }, + "hasEdgeInferExcept": { + "@id": "gf:hasEdgeInferExcept", + "@type": "@id" + }, + "name": "gf:name", + "version": "gf:version", + "description": "gf:description", + "enumValue": "gf:enumValue", + "fieldType": { + "@id": "gf:fieldType", + "@type": "@id" + }, + "dbFlavor": { + "@id": "gf:dbFlavor", + "@type": "@id" + }, + "targetNamespace": "gf:targetNamespace", + "identityName": "gf:identityName", + "relation": "gf:relation", + "edgeIdentities": "gf:edgeIdentities", + "edgeType": "gf:edgeType", + "edgeBy": "gf:edgeBy", + "profileVertexName": "gf:profileVertexName", + "indexName": "gf:indexName", + "indexField": "gf:indexField", + "indexUnique": { + "@id": "gf:indexUnique", + "@type": "xsd:boolean" + }, + "indexType": "gf:indexType", + "indexDeduplicate": { + "@id": "gf:indexDeduplicate", + "@type": "xsd:boolean" + }, + "indexSparse": { + "@id": "gf:indexSparse", + "@type": "xsd:boolean" + }, + "indexExcludeEdgeEndpoints": { + "@id": "gf:indexExcludeEdgeEndpoints", + "@type": "xsd:boolean" + }, + "specSource": "gf:specSource", + "specTarget": "gf:specTarget", + "specRelation": "gf:specRelation", + "specPurpose": "gf:specPurpose", + "specRelationName": "gf:specRelationName", + "specIndexesMode": "gf:specIndexesMode", + "blank": { + "@id": "gf:blank", + "@type": "xsd:boolean" + }, + "edgesOnDuplicate": { + "@id": "gf:edgesOnDuplicate", + "@type": "@id" + }, + "resourceName": "gf:resourceName", + "connectorName": "gf:connectorName", + "connProxy": "gf:connProxy", + "stepIndex": { + "@id": "gf:stepIndex", + "@type": "xsd:integer" + }, + "stepPayload": "gf:stepPayload", + "actorType": "gf:actorType", + "transformModule": "gf:transformModule", + "transformFunction": "gf:transformFunction", + "transformInput": "gf:transformInput", + "transformOutput": "gf:transformOutput", + "transformParams": "gf:transformParams", + "transformTarget": { + "@id": "gf:transformTarget", + "@type": "@id" + }, + "dressKey": "gf:dressKey", + "dressValue": "gf:dressValue", + "keySelectionMode": { + "@id": "gf:keySelectionMode", + "@type": "@id" + }, + "keySelectionName": "gf:keySelectionName", + "boundSourceKind": { + "@id": "gf:boundSourceKind", + "@type": "@id" + }, + "artifactIndex": { + "@id": "gf:artifactIndex", + "@type": "xsd:integer" + }, + "vertexIndexes": "gf:vertexIndexes", + "edgeSpecs": "gf:edgeSpecs", + "forceTypes": "gf:forceTypes", + "identityFromAllProperties": "gf:identityFromAllProperties", + "resourcePayload": "gf:resourcePayload", + "connectorPayload": "gf:connectorPayload", + "profilePayload": "gf:profilePayload", + "edgePayload": "gf:edgePayload", + "vertexPayload": "gf:vertexPayload", + "prefLabel": "skos:prefLabel" + } +} diff --git a/graflo/rdf/ontology/graflo.ttl b/graflo/rdf/ontology/graflo.ttl new file mode 100644 index 00000000..558aed5a --- /dev/null +++ b/graflo/rdf/ontology/graflo.ttl @@ -0,0 +1,685 @@ +@prefix gf: . +@prefix owl: . +@prefix rdf: . +@prefix rdfs: . +@prefix skos: . +@prefix xsd: . +@prefix prov: . + + a owl:Ontology ; + owl:versionIRI ; + owl:versionInfo "1.0.0" ; + rdfs:label "GraFlo Ontology"@en ; + skos:prefLabel "GraFlo Ontology"@en ; + rdfs:comment "OWL vocabulary describing GraFlo GraphManifest contracts: schema, ingestion, bindings, transforms, and pipeline actors."@en . + +################################################################# +# Classes +################################################################# + +gf:GrafloArtifact a owl:Class ; + rdfs:label "GraFlo artifact"@en ; + skos:prefLabel "Artifact"@en . + +gf:GraphManifest a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact , prov:Entity ; + rdfs:label "Graph manifest"@en ; + skos:prefLabel "GraphManifest"@en . + +gf:Schema a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Graph schema document"@en ; + skos:prefLabel "Schema"@en . + +gf:CoreSchema a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Core schema"@en ; + skos:prefLabel "CoreSchema"@en . + +gf:VertexConfig a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Vertex config"@en ; + skos:prefLabel "VertexConfig"@en ; + rdfs:comment "Manifest-facing vertex_config block containing vertex declarations and identity policy."@en . + +gf:EdgeConfig a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Edge config"@en ; + skos:prefLabel "EdgeConfig"@en ; + rdfs:comment "Manifest-facing edge_config block containing edge declarations."@en . + +gf:GraphMetadata a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Graph metadata"@en ; + skos:prefLabel "Metadata"@en . + +gf:DatabaseProfile a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Database profile"@en ; + skos:prefLabel "DatabaseProfile"@en . + +gf:Vertex a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Vertex"@en ; + skos:prefLabel "Vertex"@en ; + rdfs:comment "Manifest-facing vertex item in vertex_config.vertices."@en . + +gf:Edge a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Edge"@en ; + skos:prefLabel "Edge"@en ; + rdfs:comment "Manifest-facing edge item in edge_config.edges."@en . + +gf:Field a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Field"@en ; + skos:prefLabel "Field"@en ; + rdfs:comment "Manifest-facing field item used in vertex.properties and edge.properties."@en . + +gf:Identity a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Identity field"@en ; + skos:prefLabel "Identity"@en ; + rdfs:comment "Manifest-facing identity element (string) promoted to a first-class ontology node for traceability."@en . + +gf:IngestionModel a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Ingestion model"@en ; + skos:prefLabel "IngestionModel"@en . + +gf:Resource a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Resource"@en ; + skos:prefLabel "Resource"@en . + +gf:EdgeInferSpec a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Edge inference selector"@en ; + skos:prefLabel "EdgeInferSpec"@en . + +gf:ProtoTransform a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact , prov:Activity ; + rdfs:label "Proto transform"@en ; + skos:prefLabel "ProtoTransform"@en . + +gf:Transform a owl:Class ; + rdfs:subClassOf gf:ProtoTransform ; + rdfs:label "Transform"@en ; + skos:prefLabel "Transform"@en . + +gf:DressConfig a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Dress configuration"@en ; + skos:prefLabel "DressConfig"@en . + +gf:KeySelectionConfig a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Key selection configuration"@en ; + skos:prefLabel "KeySelectionConfig"@en . + +gf:Actor a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Actor"@en ; + skos:prefLabel "Actor"@en . + +gf:VertexProducingActor a owl:Class ; + rdfs:subClassOf gf:Actor ; + rdfs:label "Vertex-producing actor"@en ; + skos:prefLabel "VertexProducingActor"@en . + +gf:VertexActor a owl:Class ; + rdfs:subClassOf gf:VertexProducingActor ; + rdfs:label "Vertex actor"@en ; + skos:prefLabel "VertexActor"@en ; + rdfs:comment "Manifest-facing actor name (vertex)."@en . + +gf:EdgeActor a owl:Class ; + rdfs:subClassOf gf:Actor ; + rdfs:label "Edge actor"@en ; + skos:prefLabel "EdgeActor"@en ; + rdfs:comment "Manifest-facing actor name (edge)."@en . + +gf:TransformActor a owl:Class ; + rdfs:subClassOf gf:Actor ; + rdfs:label "Transform actor"@en ; + skos:prefLabel "TransformActor"@en . + +gf:DescendActor a owl:Class ; + rdfs:subClassOf gf:Actor ; + rdfs:label "Descend actor"@en ; + skos:prefLabel "DescendActor"@en . + +gf:VertexRouterActor a owl:Class ; + rdfs:subClassOf gf:VertexProducingActor ; + rdfs:label "Vertex router actor"@en ; + skos:prefLabel "VertexRouterActor"@en . + +gf:Bindings a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Bindings"@en ; + skos:prefLabel "Bindings"@en . + +gf:BoundConnector a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Bound connector"@en ; + skos:prefLabel "BoundConnector"@en . + +gf:FileConnector a owl:Class ; + rdfs:subClassOf gf:BoundConnector ; + rdfs:label "File connector"@en ; + skos:prefLabel "FileConnector"@en . + +gf:TableConnector a owl:Class ; + rdfs:subClassOf gf:BoundConnector ; + rdfs:label "Table connector"@en ; + skos:prefLabel "TableConnector"@en . + +gf:SparqlConnector a owl:Class ; + rdfs:subClassOf gf:BoundConnector ; + rdfs:label "SPARQL connector"@en ; + skos:prefLabel "SparqlConnector"@en . + +gf:ResourceConnectorBinding a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Resource connector binding"@en ; + skos:prefLabel "ResourceConnectorBinding"@en . + +gf:ConnectorConnectionBinding a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Connector connection binding"@en ; + skos:prefLabel "ConnectorConnectionBinding"@en . + +gf:StagingProxyBinding a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Staging proxy binding"@en ; + skos:prefLabel "StagingProxyBinding"@en . + +gf:EdgePhysicalSpec a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Edge physical spec"@en ; + skos:prefLabel "EdgePhysicalSpec"@en . + +gf:Index a owl:Class ; + rdfs:subClassOf gf:GrafloArtifact ; + rdfs:label "Database index"@en ; + skos:prefLabel "Index"@en . + +gf:FieldType a owl:Class ; + rdfs:label "Field type enumeration"@en ; + skos:prefLabel "FieldType"@en . + +gf:DBType a owl:Class ; + rdfs:label "Database type enumeration"@en ; + skos:prefLabel "DBType"@en . + +gf:BoundSourceKind a owl:Class ; + rdfs:label "Bound source kind enumeration"@en ; + skos:prefLabel "BoundSourceKind"@en . + +gf:TransformTarget a owl:Class ; + rdfs:label "Transform target enumeration"@en ; + skos:prefLabel "TransformTarget"@en . + +gf:TransformStrategy a owl:Class ; + rdfs:label "Transform strategy enumeration"@en ; + skos:prefLabel "TransformStrategy"@en . + +gf:KeySelectionMode a owl:Class ; + rdfs:label "Key selection mode enumeration"@en ; + skos:prefLabel "KeySelectionMode"@en . + +gf:EdgeDuplicatePolicy a owl:Class ; + rdfs:label "Edge duplicate policy enumeration"@en ; + skos:prefLabel "EdgeDuplicatePolicy"@en . + +################################################################# +# Object properties +################################################################# + +gf:hasSchema a owl:ObjectProperty ; + rdfs:domain gf:GraphManifest ; + rdfs:range gf:Schema . + +gf:hasIngestionModel a owl:ObjectProperty ; + rdfs:domain gf:GraphManifest ; + rdfs:range gf:IngestionModel . + +gf:hasBindings a owl:ObjectProperty ; + rdfs:domain gf:GraphManifest ; + rdfs:range gf:Bindings . + +gf:hasCoreSchema a owl:ObjectProperty ; + rdfs:domain gf:Schema ; + rdfs:range gf:CoreSchema . + +gf:hasVertexConfig a owl:ObjectProperty ; + rdfs:domain gf:CoreSchema ; + rdfs:range gf:VertexConfig ; + rdfs:comment "Manifest path: core_schema.vertex_config."@en . + +gf:hasEdgeConfig a owl:ObjectProperty ; + rdfs:domain gf:CoreSchema ; + rdfs:range gf:EdgeConfig ; + rdfs:comment "Manifest path: core_schema.edge_config."@en . + +gf:hasMetadata a owl:ObjectProperty ; + rdfs:domain gf:Schema ; + rdfs:range gf:GraphMetadata . + +gf:hasDatabaseProfile a owl:ObjectProperty ; + rdfs:domain gf:Schema ; + rdfs:range gf:DatabaseProfile . + +gf:hasVertex a owl:ObjectProperty ; + rdfs:domain gf:VertexConfig ; + rdfs:range gf:Vertex ; + rdfs:comment "Manifest-facing edge from VertexConfig to Vertex items."@en . + +gf:hasEdge a owl:ObjectProperty ; + rdfs:domain gf:EdgeConfig ; + rdfs:range gf:Edge ; + rdfs:comment "Manifest-facing edge from EdgeConfig to Edge items."@en . + +gf:hasField a owl:ObjectProperty ; + rdfs:domain gf:Vertex , gf:Edge ; + rdfs:range gf:Field . + +gf:hasIdentity a owl:ObjectProperty ; + rdfs:domain gf:Vertex , gf:Edge ; + rdfs:range gf:Identity ; + rdfs:comment "Manifest-facing identity chain node for vertex and edge identities."@en . + +gf:edgeSource a owl:ObjectProperty ; + rdfs:domain gf:Edge ; + rdfs:range gf:Vertex . + +gf:edgeTarget a owl:ObjectProperty ; + rdfs:domain gf:Edge ; + rdfs:range gf:Vertex . + +gf:hasResource a owl:ObjectProperty ; + rdfs:domain gf:IngestionModel ; + rdfs:range gf:Resource . + +gf:hasTransform a owl:ObjectProperty ; + rdfs:domain gf:IngestionModel ; + rdfs:range gf:ProtoTransform . + +gf:hasActor a owl:ObjectProperty ; + rdfs:domain gf:Resource , gf:DescendActor ; + rdfs:range gf:Actor . + +gf:targetsVertex a owl:ObjectProperty ; + rdfs:domain gf:VertexProducingActor ; + rdfs:range gf:Vertex ; + rdfs:comment "Vertex type(s) this actor produces observations for."@en . + +gf:targetsEdge a owl:ObjectProperty ; + rdfs:domain gf:EdgeActor ; + rdfs:range gf:Edge ; + rdfs:comment "Edge type(s) this actor may produce."@en . + +gf:executesTransform a owl:ObjectProperty ; + rdfs:domain gf:TransformActor ; + rdfs:range gf:ProtoTransform ; + rdfs:comment "Connects a transform actor step to the declared transform definition it executes."@en . + +gf:hasDress a owl:ObjectProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range gf:DressConfig . + +gf:hasKeySelection a owl:ObjectProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range gf:KeySelectionConfig . + +gf:hasConnector a owl:ObjectProperty ; + rdfs:domain gf:Bindings ; + rdfs:range gf:BoundConnector . + +gf:bindsResourceToConnector a owl:ObjectProperty ; + rdfs:domain gf:Bindings ; + rdfs:range gf:ResourceConnectorBinding . + +gf:bindsConnectorToConnProxy a owl:ObjectProperty ; + rdfs:domain gf:Bindings ; + rdfs:range gf:ConnectorConnectionBinding . + +gf:hasStagingProxy a owl:ObjectProperty ; + rdfs:domain gf:Bindings ; + rdfs:range gf:StagingProxyBinding . + +gf:hasVertexIndex a owl:ObjectProperty ; + rdfs:domain gf:DatabaseProfile ; + rdfs:range gf:Index ; + rdfs:comment "Database profile secondary index attached to a logical vertex."@en . + +gf:hasEdgeSpec a owl:ObjectProperty ; + rdfs:domain gf:DatabaseProfile ; + rdfs:range gf:EdgePhysicalSpec ; + rdfs:comment "Database profile edge physical specification."@en . + +gf:refinesEdge a owl:ObjectProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range gf:Edge ; + rdfs:comment "Logical Edge this physical spec is a DB-layer refinement of."@en . + +gf:hasIndex a owl:ObjectProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range gf:Index ; + rdfs:comment "Secondary index attached to an edge physical specification."@en . + +gf:hasEdgeInferOnly a owl:ObjectProperty ; + rdfs:domain gf:Resource ; + rdfs:range gf:EdgeInferSpec . + +gf:hasEdgeInferExcept a owl:ObjectProperty ; + rdfs:domain gf:Resource ; + rdfs:range gf:EdgeInferSpec . + +################################################################# +# Datatype properties +################################################################# + +gf:name a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:version a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:description a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:enumValue a owl:DatatypeProperty ; + rdfs:range xsd:string ; + rdfs:comment "Canonical string value for a named enumeration individual." . + +gf:fieldType a owl:ObjectProperty ; + rdfs:domain gf:Field ; + rdfs:range gf:FieldType . + +gf:dbFlavor a owl:ObjectProperty ; + rdfs:domain gf:DatabaseProfile ; + rdfs:range gf:DBType . + +gf:targetNamespace a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:identityName a owl:DatatypeProperty ; + rdfs:domain gf:Identity ; + rdfs:range xsd:string ; + rdfs:comment "Identity token value as shown in manifest identity arrays."@en . + +gf:relation a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:blank a owl:DatatypeProperty ; + rdfs:domain gf:Vertex ; + rdfs:range xsd:boolean . + +gf:edgesOnDuplicate a owl:ObjectProperty ; + rdfs:domain gf:IngestionModel ; + rdfs:range gf:EdgeDuplicatePolicy . + +gf:resourceName a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:connectorName a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:connProxy a owl:DatatypeProperty ; + rdfs:range xsd:string . + +gf:actorType a owl:DatatypeProperty ; + rdfs:domain gf:Actor ; + rdfs:range xsd:string . + +gf:stepIndex a owl:DatatypeProperty ; + rdfs:domain gf:Actor ; + rdfs:range xsd:integer . + +gf:stepPayload a owl:DatatypeProperty ; + rdfs:domain gf:Actor ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded pipeline step dict for lossless round-trip." . + +gf:transformModule a owl:DatatypeProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range xsd:string . + +gf:transformFunction a owl:DatatypeProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range xsd:string . + +gf:transformInput a owl:DatatypeProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range xsd:string . + +gf:transformOutput a owl:DatatypeProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range xsd:string . + +gf:transformParams a owl:DatatypeProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded transform params dict." . + +gf:transformTarget a owl:ObjectProperty ; + rdfs:domain gf:ProtoTransform ; + rdfs:range gf:TransformTarget . + +gf:transformStrategy a owl:ObjectProperty ; + rdfs:domain gf:Transform ; + rdfs:range gf:TransformStrategy . + +gf:renameMap a owl:DatatypeProperty ; + rdfs:domain gf:Transform ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded rename dict." . + +gf:dressKey a owl:DatatypeProperty ; + rdfs:domain gf:DressConfig ; + rdfs:range xsd:string . + +gf:dressValue a owl:DatatypeProperty ; + rdfs:domain gf:DressConfig ; + rdfs:range xsd:string . + +gf:keySelectionMode a owl:ObjectProperty ; + rdfs:domain gf:KeySelectionConfig ; + rdfs:range gf:KeySelectionMode . + +gf:keySelectionName a owl:DatatypeProperty ; + rdfs:domain gf:KeySelectionConfig ; + rdfs:range xsd:string . + +gf:boundSourceKind a owl:ObjectProperty ; + rdfs:domain gf:BoundConnector ; + rdfs:range gf:BoundSourceKind . + +gf:connectorPayload a owl:DatatypeProperty ; + rdfs:domain gf:BoundConnector ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded connector configuration excluding hash." . + +gf:resourcePayload a owl:DatatypeProperty ; + rdfs:domain gf:Resource ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded resource scalar/list fields excluding pipeline and name." . + +gf:profilePayload a owl:DatatypeProperty ; + rdfs:domain gf:DatabaseProfile ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded database profile extension fields." . + +gf:edgePayload a owl:DatatypeProperty ; + rdfs:domain gf:Edge ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded edge extension fields (identities, type, by, etc.)." . + +gf:vertexPayload a owl:DatatypeProperty ; + rdfs:domain gf:Vertex ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded vertex extension fields (filters, etc.)." . + +gf:artifactIndex a owl:DatatypeProperty ; + rdfs:range xsd:integer ; + rdfs:comment "Stable ordering index for list-valued manifest children." . + +gf:edgeIdentities a owl:DatatypeProperty ; + rdfs:domain gf:Edge ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded edge identity token groups from edge.identities." . + +gf:edgeType a owl:DatatypeProperty ; + rdfs:domain gf:Edge ; + rdfs:range xsd:string ; + rdfs:comment "Edge materialization mode such as direct or indirect." . + +gf:edgeBy a owl:DatatypeProperty ; + rdfs:domain gf:Edge ; + rdfs:range xsd:string ; + rdfs:comment "For indirect edges, the vertex type that defines the edge." . + +gf:profileVertexName a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:string ; + rdfs:comment "Logical vertex name this database profile index belongs to." . + +gf:indexName a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:string . + +gf:indexField a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:string ; + rdfs:comment "Ordered field list for this index; one triple per field." . + +gf:indexUnique a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:boolean . + +gf:indexType a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:string . + +gf:indexDeduplicate a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:boolean . + +gf:indexSparse a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:boolean . + +gf:indexExcludeEdgeEndpoints a owl:DatatypeProperty ; + rdfs:domain gf:Index ; + rdfs:range xsd:boolean . + +gf:specSource a owl:DatatypeProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range xsd:string . + +gf:specTarget a owl:DatatypeProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range xsd:string . + +gf:specRelation a owl:DatatypeProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range xsd:string . + +gf:specPurpose a owl:DatatypeProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range xsd:string . + +gf:specRelationName a owl:DatatypeProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range xsd:string . + +gf:specIndexesMode a owl:DatatypeProperty ; + rdfs:domain gf:EdgePhysicalSpec ; + rdfs:range xsd:string . + +gf:forceTypes a owl:DatatypeProperty ; + rdfs:domain gf:VertexConfig ; + rdfs:range xsd:string ; + rdfs:comment "JSON-encoded mapping of vertex names to forced inferred types." . + +gf:identityFromAllProperties a owl:DatatypeProperty ; + rdfs:domain gf:VertexConfig ; + rdfs:range xsd:boolean ; + rdfs:comment "Whether vertices without explicit identity default to all property names." . + +gf:vertexIndexes a owl:DatatypeProperty ; + rdfs:domain gf:DatabaseProfile ; + rdfs:range xsd:string ; + rdfs:comment "Legacy JSON-encoded database profile vertex indexes (superseded by gf:hasVertexIndex)." . + +gf:edgeSpecs a owl:DatatypeProperty ; + rdfs:domain gf:DatabaseProfile ; + rdfs:range xsd:string ; + rdfs:comment "Legacy JSON-encoded database profile edge specs (superseded by gf:hasEdgeSpec)." . + +################################################################# +# FieldType individuals +################################################################# + +gf:INT a gf:FieldType ; gf:enumValue "INT" ; rdfs:label "INT" ; skos:prefLabel "INT" . +gf:UINT a gf:FieldType ; gf:enumValue "UINT" ; rdfs:label "UINT" ; skos:prefLabel "UINT" . +gf:FLOAT a gf:FieldType ; gf:enumValue "FLOAT" ; rdfs:label "FLOAT" ; skos:prefLabel "FLOAT" . +gf:DOUBLE a gf:FieldType ; gf:enumValue "DOUBLE" ; rdfs:label "DOUBLE" ; skos:prefLabel "DOUBLE" . +gf:BOOL a gf:FieldType ; gf:enumValue "BOOL" ; rdfs:label "BOOL" ; skos:prefLabel "BOOL" . +gf:STRING a gf:FieldType ; gf:enumValue "STRING" ; rdfs:label "STRING" ; skos:prefLabel "STRING" . +gf:DATETIME a gf:FieldType ; gf:enumValue "DATETIME" ; rdfs:label "DATETIME" ; skos:prefLabel "DATETIME" . + +################################################################# +# DBType individuals +################################################################# + +gf:ArangoDB a gf:DBType ; gf:enumValue "arango" ; rdfs:label "ArangoDB" ; skos:prefLabel "ArangoDB" . +gf:Neo4j a gf:DBType ; gf:enumValue "neo4j" ; rdfs:label "Neo4j" ; skos:prefLabel "Neo4j" . +gf:TigerGraph a gf:DBType ; gf:enumValue "tigergraph" ; rdfs:label "TigerGraph" ; skos:prefLabel "TigerGraph" . +gf:FalkorDB a gf:DBType ; gf:enumValue "falkordb" ; rdfs:label "FalkorDB" ; skos:prefLabel "FalkorDB" . +gf:Memgraph a gf:DBType ; gf:enumValue "memgraph" ; rdfs:label "Memgraph" ; skos:prefLabel "Memgraph" . +gf:Nebula a gf:DBType ; gf:enumValue "nebula" ; rdfs:label "Nebula" ; skos:prefLabel "Nebula" . +gf:Postgres a gf:DBType ; gf:enumValue "postgres" ; rdfs:label "Postgres" ; skos:prefLabel "Postgres" . +gf:MySQL a gf:DBType ; gf:enumValue "mysql" ; rdfs:label "MySQL" ; skos:prefLabel "MySQL" . +gf:MongoDB a gf:DBType ; gf:enumValue "mongodb" ; rdfs:label "MongoDB" ; skos:prefLabel "MongoDB" . +gf:SQLite a gf:DBType ; gf:enumValue "sqlite" ; rdfs:label "SQLite" ; skos:prefLabel "SQLite" . +gf:SparqlEndpoint a gf:DBType ; gf:enumValue "sparql" ; rdfs:label "SPARQL endpoint" ; skos:prefLabel "SparqlEndpoint" . + +################################################################# +# BoundSourceKind individuals +################################################################# + +gf:FileSource a gf:BoundSourceKind ; gf:enumValue "file" ; rdfs:label "File source" ; skos:prefLabel "FileSource" . +gf:SqlTableSource a gf:BoundSourceKind ; gf:enumValue "sql_table" ; rdfs:label "SQL table source" ; skos:prefLabel "SqlTableSource" . +gf:SparqlSource a gf:BoundSourceKind ; gf:enumValue "sparql" ; rdfs:label "SPARQL source" ; skos:prefLabel "SparqlSource" . + +################################################################# +# TransformTarget individuals +################################################################# + +gf:ValuesTarget a gf:TransformTarget ; gf:enumValue "values" ; rdfs:label "Values target" ; skos:prefLabel "ValuesTarget" . +gf:KeysTarget a gf:TransformTarget ; gf:enumValue "keys" ; rdfs:label "Keys target" ; skos:prefLabel "KeysTarget" . + +################################################################# +# TransformStrategy individuals +################################################################# + +gf:SingleStrategy a gf:TransformStrategy ; gf:enumValue "single" ; rdfs:label "Single strategy" ; skos:prefLabel "SingleStrategy" . +gf:EachStrategy a gf:TransformStrategy ; gf:enumValue "each" ; rdfs:label "Each strategy" ; skos:prefLabel "EachStrategy" . +gf:AllStrategy a gf:TransformStrategy ; gf:enumValue "all" ; rdfs:label "All strategy" ; skos:prefLabel "AllStrategy" . + +################################################################# +# KeySelectionMode individuals +################################################################# + +gf:AllKeysMode a gf:KeySelectionMode ; gf:enumValue "all" ; rdfs:label "All keys" ; skos:prefLabel "AllKeysMode" . +gf:IncludeKeysMode a gf:KeySelectionMode ; gf:enumValue "include" ; rdfs:label "Include keys" ; skos:prefLabel "IncludeKeysMode" . +gf:ExcludeKeysMode a gf:KeySelectionMode ; gf:enumValue "exclude" ; rdfs:label "Exclude keys" ; skos:prefLabel "ExcludeKeysMode" . + +################################################################# +# EdgeDuplicatePolicy individuals +################################################################# + +gf:IgnoreDuplicate a gf:EdgeDuplicatePolicy ; gf:enumValue "ignore" ; rdfs:label "Ignore duplicate" ; skos:prefLabel "IgnoreDuplicate" . +gf:UpsertDuplicate a gf:EdgeDuplicatePolicy ; gf:enumValue "upsert" ; rdfs:label "Upsert duplicate" ; skos:prefLabel "UpsertDuplicate" . diff --git a/graflo/rdf/serializer.py b/graflo/rdf/serializer.py new file mode 100644 index 00000000..381def93 --- /dev/null +++ b/graflo/rdf/serializer.py @@ -0,0 +1,733 @@ +"""Serialize GraphManifest instances to RDF.""" + +from __future__ import annotations + +from typing import Any, cast + +from rdflib import BNode, Graph, RDF, URIRef +from rdflib.namespace import XSD + +from graflo.architecture.contract.bindings.connectors import ( + FileConnector, + SparqlConnector, + TableConnector, +) +from graflo.architecture.contract.manifest import GraphManifest +from graflo.architecture.contract.ingestion.transform import ( + DressConfig, + KeySelectionConfig, + ProtoTransform, +) +from graflo.architecture.graph_types import EdgeId +from graflo.architecture.schema.edge import Edge +from graflo.architecture.schema.vertex import Field, Vertex +from graflo.rdf import namespace as ns +from graflo.rdf.utils import ( + add_enum_individual, + add_literal, + actor_step_class, + actor_step_type, + join_uri, + json_literal, + load_ontology_graph, +) + + +class ManifestRdfSerializer: + """Convert a :class:`GraphManifest` into RDF using the GraFlo meta-ontology.""" + + def __init__(self, *, include_ontology: bool = True) -> None: + self._include_ontology = include_ontology + + def to_graph(self, manifest: GraphManifest, base_uri: str) -> Graph: + """Serialize manifest to an rdflib graph.""" + graph = Graph() + graph.bind("gf", ns.GF) + graph.bind("xsd", XSD) + if self._include_ontology: + graph += load_ontology_graph() + + manifest_uri = URIRef(base_uri.rstrip("/")) + graph.add((manifest_uri, RDF.type, ns.GraphManifest)) + vertex_uri_by_name: dict[str, URIRef] | None = None + edge_uri_by_id: dict[EdgeId, URIRef] | None = None + + if manifest.graph_schema is not None: + schema_uri = URIRef(join_uri(base_uri, "schema")) + graph.add((manifest_uri, ns.hasSchema, schema_uri)) + self._emit_schema(graph, schema_uri, manifest.graph_schema) + core_uri = URIRef(join_uri(str(schema_uri), "core")) + vertex_uri_by_name = { + vertex.name: URIRef(join_uri(str(core_uri), "vertex", vertex.name)) + for vertex in manifest.graph_schema.core_schema.vertex_config.vertices + } + edge_uri_by_id = { + edge.edge_id: URIRef( + join_uri(str(core_uri), "edge", self._edge_key(edge)) + ) + for edge in manifest.graph_schema.core_schema.edge_config.edges + } + + if manifest.ingestion_model is not None: + ingestion_uri = URIRef(join_uri(base_uri, "ingestion")) + graph.add((manifest_uri, ns.hasIngestionModel, ingestion_uri)) + self._emit_ingestion_model( + graph, + base_uri, + ingestion_uri, + manifest.ingestion_model, + vertex_uri_by_name=vertex_uri_by_name, + edge_uri_by_id=edge_uri_by_id, + ) + + if manifest.bindings is not None: + bindings_uri = URIRef(join_uri(base_uri, "bindings")) + graph.add((manifest_uri, ns.hasBindings, bindings_uri)) + self._emit_bindings(graph, base_uri, bindings_uri, manifest.bindings) + + return graph + + def to_turtle(self, manifest: GraphManifest, base_uri: str) -> str: + """Serialize manifest to Turtle.""" + return self.to_graph(manifest, base_uri).serialize(format="turtle") + + def to_json_ld(self, manifest: GraphManifest, base_uri: str) -> str: + """Serialize manifest to JSON-LD.""" + return self.to_graph(manifest, base_uri).serialize(format="json-ld") + + def _emit_schema(self, graph: Graph, schema_uri: URIRef, schema: Any) -> None: + graph.add((schema_uri, RDF.type, ns.Schema)) + + metadata_uri = URIRef(join_uri(str(schema_uri), "metadata")) + graph.add((schema_uri, ns.hasMetadata, metadata_uri)) + graph.add((metadata_uri, RDF.type, ns.GraphMetadata)) + add_literal(graph, metadata_uri, ns.name, schema.metadata.name) + add_literal(graph, metadata_uri, ns.version, schema.metadata.version) + add_literal(graph, metadata_uri, ns.description, schema.metadata.description) + + core_uri = URIRef(join_uri(str(schema_uri), "core")) + graph.add((schema_uri, ns.hasCoreSchema, core_uri)) + graph.add((core_uri, RDF.type, ns.CoreSchema)) + + vertex_config_uri = URIRef(join_uri(str(core_uri), "vertex-config")) + graph.add((core_uri, ns.hasVertexConfig, vertex_config_uri)) + graph.add((vertex_config_uri, RDF.type, ns.VertexConfig)) + self._emit_vertex_config( + graph, vertex_config_uri, schema.core_schema.vertex_config + ) + + edge_config_uri = URIRef(join_uri(str(core_uri), "edge-config")) + graph.add((core_uri, ns.hasEdgeConfig, edge_config_uri)) + graph.add((edge_config_uri, RDF.type, ns.EdgeConfig)) + + vertex_uri_by_name: dict[str, URIRef] = {} + for index, vertex in enumerate(schema.core_schema.vertex_config.vertices): + vertex_uri = URIRef(join_uri(str(core_uri), "vertex", vertex.name)) + vertex_uri_by_name[vertex.name] = vertex_uri + graph.add((vertex_config_uri, ns.hasVertex, vertex_uri)) + add_literal(graph, vertex_uri, ns.artifactIndex, index) + self._emit_vertex(graph, vertex_uri, vertex) + + edge_uri_by_id: dict[EdgeId, URIRef] = {} + for index, edge in enumerate(schema.core_schema.edge_config.edges): + edge_key = self._edge_key(edge) + edge_uri = URIRef(join_uri(str(core_uri), "edge", edge_key)) + edge_uri_by_id[edge.edge_id] = edge_uri + graph.add((edge_config_uri, ns.hasEdge, edge_uri)) + add_literal(graph, edge_uri, ns.artifactIndex, index) + self._emit_edge(graph, edge_uri, edge, vertex_uri_by_name) + + profile_uri = URIRef(join_uri(str(schema_uri), "db-profile")) + graph.add((schema_uri, ns.hasDatabaseProfile, profile_uri)) + self._emit_database_profile( + graph, profile_uri, schema.db_profile, edge_uri_by_id=edge_uri_by_id + ) + + def _emit_vertex_config( + self, + graph: Graph, + vertex_config_uri: URIRef, + vertex_config: Any, + ) -> None: + if vertex_config.force_types: + graph.add( + ( + vertex_config_uri, + ns.forceTypes, + json_literal(vertex_config.force_types), + ) + ) + add_literal( + graph, + vertex_config_uri, + ns.identityFromAllProperties, + vertex_config.identity_from_all_properties, + ) + + def _emit_vertex(self, graph: Graph, vertex_uri: URIRef, vertex: Vertex) -> None: + graph.add((vertex_uri, RDF.type, ns.Vertex)) + add_literal(graph, vertex_uri, ns.name, vertex.name) + add_literal(graph, vertex_uri, ns.description, vertex.description) + add_literal(graph, vertex_uri, ns.blank, vertex.blank) + + for identity in vertex.identity: + identity_node = BNode() + graph.add((vertex_uri, ns.hasIdentity, identity_node)) + graph.add((identity_node, RDF.type, ns.Identity)) + add_literal(graph, identity_node, ns.identityName, identity) + + payload = {} + if vertex.filters: + payload["filters"] = [ + f.model_dump(mode="json", by_alias=True) + if hasattr(f, "model_dump") + else f + for f in vertex.filters + ] + if payload: + graph.add((vertex_uri, ns.vertexPayload, json_literal(payload))) + + for index, field in enumerate(vertex.properties): + self._emit_field(graph, vertex_uri, field, index) + + def _emit_field( + self, + graph: Graph, + owner_uri: URIRef, + field: Field | str, + index: int, + ) -> None: + if isinstance(field, str): + field_obj = Field(name=field) + else: + field_obj = field + field_uri = URIRef(join_uri(str(owner_uri), "field", field_obj.name)) + graph.add((owner_uri, ns.hasField, field_uri)) + graph.add((field_uri, RDF.type, ns.Field)) + add_literal(graph, field_uri, ns.artifactIndex, index) + add_literal(graph, field_uri, ns.name, field_obj.name) + add_literal(graph, field_uri, ns.description, field_obj.description) + if field_obj.type is not None: + add_enum_individual( + graph, + field_uri, + ns.fieldType, + str(field_obj.type), + ns.ENUM_REGISTRIES["field_type"], + ) + + def _emit_edge( + self, + graph: Graph, + edge_uri: URIRef, + edge: Edge, + vertex_uri_by_name: dict[str, URIRef], + ) -> None: + graph.add((edge_uri, RDF.type, ns.Edge)) + add_literal(graph, edge_uri, ns.relation, edge.relation) + add_literal(graph, edge_uri, ns.description, edge.description) + + source_uri = vertex_uri_by_name.get(edge.source) + target_uri = vertex_uri_by_name.get(edge.target) + if source_uri is not None: + graph.add((edge_uri, ns.edgeSource, source_uri)) + if target_uri is not None: + graph.add((edge_uri, ns.edgeTarget, target_uri)) + + payload: dict[str, Any] = {} + if edge.identities: + graph.add((edge_uri, ns.edgeIdentities, json_literal(edge.identities))) + if edge.type is not None: + add_literal(graph, edge_uri, ns.edgeType, str(edge.type)) + if edge.by is not None: + add_literal(graph, edge_uri, ns.edgeBy, edge.by) + if payload: + graph.add((edge_uri, ns.edgePayload, json_literal(payload))) + + for index, field in enumerate(edge.properties): + self._emit_field(graph, edge_uri, field, index) + + def _emit_database_profile( + self, + graph: Graph, + profile_uri: URIRef, + profile: Any, + *, + edge_uri_by_id: dict[EdgeId, URIRef] | None = None, + ) -> None: + graph.add((profile_uri, RDF.type, ns.DatabaseProfile)) + add_enum_individual( + graph, + profile_uri, + ns.dbFlavor, + str(profile.db_flavor), + ns.ENUM_REGISTRIES["db_type"], + ) + add_literal(graph, profile_uri, ns.targetNamespace, profile.target_namespace) + self._emit_profile_indexes( + graph, profile_uri, profile, edge_uri_by_id=edge_uri_by_id + ) + + payload = self._model_payload( + profile, ns.MODEL_PAYLOAD_EXCLUDES["database_profile"] + ) + if payload: + graph.add((profile_uri, ns.profilePayload, json_literal(payload))) + + def _emit_profile_indexes( + self, + graph: Graph, + profile_uri: URIRef, + profile: Any, + *, + edge_uri_by_id: dict[EdgeId, URIRef] | None = None, + ) -> None: + for vertex_name, indexes in profile.vertex_indexes.items(): + for index_position, index in enumerate(indexes): + index_uri = URIRef( + join_uri( + str(profile_uri), + "vertex-index", + vertex_name, + str(index_position), + ) + ) + graph.add((profile_uri, ns.hasVertexIndex, index_uri)) + self._emit_index( + graph, + index_uri, + index, + vertex_name=vertex_name, + ) + + for spec_position, edge_spec in enumerate(profile.edge_specs): + spec_uri = URIRef( + join_uri(str(profile_uri), "edge-spec", str(spec_position)) + ) + graph.add((profile_uri, ns.hasEdgeSpec, spec_uri)) + graph.add((spec_uri, RDF.type, ns.EdgePhysicalSpec)) + add_literal(graph, spec_uri, ns.specSource, edge_spec.source) + add_literal(graph, spec_uri, ns.specTarget, edge_spec.target) + add_literal(graph, spec_uri, ns.specRelation, edge_spec.relation) + add_literal(graph, spec_uri, ns.specPurpose, edge_spec.purpose) + add_literal(graph, spec_uri, ns.specRelationName, edge_spec.relation_name) + add_literal(graph, spec_uri, ns.specIndexesMode, edge_spec.indexes_mode) + if edge_uri_by_id is not None: + edge_uri = edge_uri_by_id.get( + (edge_spec.source, edge_spec.target, edge_spec.relation) + ) + if edge_uri is not None: + graph.add((spec_uri, ns.refinesEdge, edge_uri)) + + for index_position, index in enumerate(edge_spec.indexes): + index_uri = URIRef( + join_uri(str(spec_uri), "index", str(index_position)) + ) + graph.add((spec_uri, ns.hasIndex, index_uri)) + self._emit_index(graph, index_uri, index) + + def _emit_index( + self, + graph: Graph, + index_uri: URIRef, + index: Any, + *, + vertex_name: str | None = None, + ) -> None: + graph.add((index_uri, RDF.type, ns.Index)) + add_literal(graph, index_uri, ns.profileVertexName, vertex_name) + add_literal(graph, index_uri, ns.indexName, index.name) + add_literal(graph, index_uri, ns.indexUnique, index.unique) + index_type = getattr(index.type, "value", str(index.type)) + add_literal(graph, index_uri, ns.indexType, index_type) + add_literal(graph, index_uri, ns.indexDeduplicate, index.deduplicate) + add_literal(graph, index_uri, ns.indexSparse, index.sparse) + add_literal( + graph, + index_uri, + ns.indexExcludeEdgeEndpoints, + index.exclude_edge_endpoints, + ) + for field in index.fields: + add_literal(graph, index_uri, ns.indexField, field) + + def _emit_ingestion_model( + self, + graph: Graph, + base_uri: str, + ingestion_uri: URIRef, + ingestion_model: Any, + *, + vertex_uri_by_name: dict[str, URIRef] | None = None, + edge_uri_by_id: dict[EdgeId, URIRef] | None = None, + ) -> None: + graph.add((ingestion_uri, RDF.type, ns.IngestionModel)) + add_enum_individual( + graph, + ingestion_uri, + ns.edgesOnDuplicate, + ingestion_model.edges_on_duplicate, + ns.ENUM_REGISTRIES["edge_duplicate_policy"], + ) + + transform_uri_by_name: dict[str, URIRef] = {} + for index, transform in enumerate(ingestion_model.transforms): + transform_uri = URIRef( + join_uri( + base_uri, "ingestion", "transform", transform.name or "unnamed" + ) + ) + if transform.name: + transform_uri_by_name[transform.name] = transform_uri + graph.add((ingestion_uri, ns.hasTransform, transform_uri)) + add_literal(graph, transform_uri, ns.artifactIndex, index) + self._emit_proto_transform(graph, transform_uri, transform) + + for index, resource in enumerate(ingestion_model.resources): + resource_uri = URIRef( + join_uri(base_uri, "ingestion", "resource", resource.name) + ) + graph.add((ingestion_uri, ns.hasResource, resource_uri)) + add_literal(graph, resource_uri, ns.artifactIndex, index) + self._emit_resource( + graph, + resource_uri, + resource, + transform_uri_by_name=transform_uri_by_name, + vertex_uri_by_name=vertex_uri_by_name, + edge_uri_by_id=edge_uri_by_id, + ) + + def _emit_proto_transform( + self, + graph: Graph, + transform_uri: URIRef, + transform: ProtoTransform, + ) -> None: + graph.add((transform_uri, RDF.type, ns.ProtoTransform)) + add_literal(graph, transform_uri, ns.name, transform.name) + add_literal(graph, transform_uri, ns.transformModule, transform.module) + add_literal(graph, transform_uri, ns.transformFunction, transform.foo) + + for item in transform.input: + add_literal(graph, transform_uri, ns.transformInput, item) + for item in transform.output: + add_literal(graph, transform_uri, ns.transformOutput, item) + + extra_payload: dict[str, Any] = {} + if transform.params: + extra_payload["params"] = transform.params + if transform.input_groups: + extra_payload["input_groups"] = [ + list(group) for group in transform.input_groups + ] + if transform.output_groups: + extra_payload["output_groups"] = [ + list(group) for group in transform.output_groups + ] + if extra_payload: + graph.add((transform_uri, ns.transformParams, json_literal(extra_payload))) + + add_enum_individual( + graph, + transform_uri, + ns.transformTarget, + transform.target, + ns.ENUM_REGISTRIES["transform_target"], + ) + + if transform.dress is not None: + dress_uri = BNode() + graph.add((transform_uri, ns.hasDress, dress_uri)) + self._emit_dress_config(graph, dress_uri, transform.dress) + + if transform.keys.mode != "all" or transform.keys.names: + keys_uri = BNode() + graph.add((transform_uri, ns.hasKeySelection, keys_uri)) + self._emit_key_selection(graph, keys_uri, transform.keys) + + def _emit_dress_config( + self, graph: Graph, dress_uri: BNode, dress: DressConfig + ) -> None: + graph.add((dress_uri, RDF.type, ns.DressConfig)) + add_literal(graph, dress_uri, ns.dressKey, dress.key) + add_literal(graph, dress_uri, ns.dressValue, dress.value) + + def _emit_key_selection( + self, + graph: Graph, + keys_uri: BNode, + keys: KeySelectionConfig, + ) -> None: + graph.add((keys_uri, RDF.type, ns.KeySelectionConfig)) + add_enum_individual( + graph, + keys_uri, + ns.keySelectionMode, + keys.mode, + ns.ENUM_REGISTRIES["key_selection_mode"], + ) + for key_name in keys.names: + add_literal(graph, keys_uri, ns.keySelectionName, key_name) + + def _emit_resource( + self, + graph: Graph, + resource_uri: URIRef, + resource: Any, + *, + transform_uri_by_name: dict[str, URIRef] | None = None, + vertex_uri_by_name: dict[str, URIRef] | None = None, + edge_uri_by_id: dict[EdgeId, URIRef] | None = None, + ) -> None: + graph.add((resource_uri, RDF.type, ns.Resource)) + add_literal(graph, resource_uri, ns.name, resource.name) + + payload = self._model_payload(resource, ns.MODEL_PAYLOAD_EXCLUDES["resource"]) + if payload: + graph.add((resource_uri, ns.resourcePayload, json_literal(payload))) + + for index, step in enumerate(resource.pipeline): + step_node = self._emit_actor_step( + graph, + step, + index=index, + transform_uri_by_name=transform_uri_by_name, + vertex_uri_by_name=vertex_uri_by_name, + edge_uri_by_id=edge_uri_by_id, + ) + graph.add((resource_uri, ns.hasActor, step_node)) + + for spec in resource.infer_edge_only: + spec_uri = BNode() + graph.add((resource_uri, ns.hasEdgeInferOnly, spec_uri)) + self._emit_edge_infer_spec(graph, spec_uri, spec) + for spec in resource.infer_edge_except: + spec_uri = BNode() + graph.add((resource_uri, ns.hasEdgeInferExcept, spec_uri)) + self._emit_edge_infer_spec(graph, spec_uri, spec) + + def _emit_edge_infer_spec(self, graph: Graph, spec_uri: BNode, spec: Any) -> None: + graph.add((spec_uri, RDF.type, ns.EdgeInferSpec)) + graph.add( + ( + spec_uri, + ns.stepPayload, + json_literal( + { + "source": spec.source, + "target": spec.target, + "relation": spec.relation, + } + ), + ) + ) + + def _emit_actor_step( + self, + graph: Graph, + step: dict[str, Any], + *, + index: int, + transform_uri_by_name: dict[str, URIRef] | None = None, + vertex_uri_by_name: dict[str, URIRef] | None = None, + edge_uri_by_id: dict[EdgeId, URIRef] | None = None, + ) -> BNode: + step_node = BNode() + step_type = actor_step_type(step) + graph.add((step_node, RDF.type, URIRef(str(actor_step_class(step_type))))) + graph.add((step_node, RDF.type, ns.Actor)) + add_literal(graph, step_node, ns.actorType, step_type) + add_literal(graph, step_node, ns.stepIndex, index) + graph.add((step_node, ns.stepPayload, json_literal(step))) + if step_type == "vertex": + vertex_name = step.get("vertex") + if ( + isinstance(vertex_name, str) + and vertex_uri_by_name is not None + and vertex_name in vertex_uri_by_name + ): + graph.add( + (step_node, ns.targetsVertex, vertex_uri_by_name[vertex_name]) + ) + if step_type == "vertex_router" and vertex_uri_by_name is not None: + type_map = step.get("type_map") + if isinstance(type_map, dict): + for mapped in type_map.values(): + if isinstance(mapped, str) and mapped in vertex_uri_by_name: + graph.add( + (step_node, ns.targetsVertex, vertex_uri_by_name[mapped]) + ) + if step_type == "edge" and edge_uri_by_id is not None: + + def _link_edge(source: str, target: str, relation: str | None) -> None: + edge_uri = edge_uri_by_id.get((source, target, relation)) + if edge_uri is not None: + graph.add((step_node, ns.targetsEdge, edge_uri)) + + source = step.get("from") + target = step.get("to") + relation = step.get("relation") + if isinstance(source, str) and isinstance(target, str): + _link_edge( + source, target, relation if isinstance(relation, str) else None + ) + links = step.get("links") + if isinstance(links, list): + for link in links: + if not isinstance(link, dict): + continue + link_source = link.get("from") + link_target = link.get("to") + link_relation = link.get("relation") + if isinstance(link_source, str) and isinstance(link_target, str): + _link_edge( + link_source, + link_target, + link_relation if isinstance(link_relation, str) else None, + ) + if step_type == "transform": + transform_name = step.get("name") + if not isinstance(transform_name, str): + call_spec = step.get("call") + if isinstance(call_spec, dict): + use_name = call_spec.get("use") + if isinstance(use_name, str): + transform_name = use_name + if not isinstance(transform_name, str): + transform_step = step.get("transform") + if isinstance(transform_step, dict): + direct_name = transform_step.get("name") + if isinstance(direct_name, str): + transform_name = direct_name + nested_call = transform_step.get("call") + if not isinstance(transform_name, str) and isinstance( + nested_call, dict + ): + nested_use = nested_call.get("use") + if isinstance(nested_use, str): + transform_name = nested_use + if isinstance(transform_name, str) and transform_uri_by_name is not None: + transform_uri = transform_uri_by_name.get(transform_name) + if transform_uri is not None: + graph.add((step_node, ns.executesTransform, transform_uri)) + + if step_type == "descend": + nested_steps = step.get("pipeline") + if isinstance(nested_steps, list): + for nested_index, nested_step in enumerate(nested_steps): + if isinstance(nested_step, dict): + nested_node = self._emit_actor_step( + graph, + cast(dict[str, Any], nested_step), + index=nested_index, + transform_uri_by_name=transform_uri_by_name, + vertex_uri_by_name=vertex_uri_by_name, + edge_uri_by_id=edge_uri_by_id, + ) + graph.add((step_node, ns.hasActor, nested_node)) + return step_node + + def _emit_bindings( + self, + graph: Graph, + base_uri: str, + bindings_uri: URIRef, + bindings: Any, + ) -> None: + graph.add((bindings_uri, RDF.type, ns.Bindings)) + + for index, connector in enumerate(bindings.connectors): + connector_hash = connector.hash or connector.__class__.__name__ + connector_uri = URIRef( + join_uri(base_uri, "bindings", "connector", connector_hash) + ) + graph.add((bindings_uri, ns.hasConnector, connector_uri)) + add_literal(graph, connector_uri, ns.artifactIndex, index) + self._emit_connector(graph, connector_uri, connector) + + for mapping in bindings.resource_connector: + binding_uri = BNode() + graph.add((bindings_uri, ns.bindsResourceToConnector, binding_uri)) + graph.add((binding_uri, RDF.type, ns.ResourceConnectorBinding)) + resource = ( + mapping.resource + if hasattr(mapping, "resource") + else mapping["resource"] + ) + connector = ( + mapping.connector + if hasattr(mapping, "connector") + else mapping["connector"] + ) + add_literal(graph, binding_uri, ns.resourceName, resource) + add_literal(graph, binding_uri, ns.connectorName, connector) + + for mapping in bindings.connector_connection: + binding_uri = BNode() + graph.add((bindings_uri, ns.bindsConnectorToConnProxy, binding_uri)) + graph.add((binding_uri, RDF.type, ns.ConnectorConnectionBinding)) + connector = ( + mapping.connector + if hasattr(mapping, "connector") + else mapping["connector"] + ) + proxy = ( + mapping.conn_proxy + if hasattr(mapping, "conn_proxy") + else mapping["conn_proxy"] + ) + add_literal(graph, binding_uri, ns.connectorName, connector) + add_literal(graph, binding_uri, ns.connProxy, proxy) + + for mapping in bindings.staging_proxy: + binding_uri = BNode() + graph.add((bindings_uri, ns.hasStagingProxy, binding_uri)) + graph.add((binding_uri, RDF.type, ns.StagingProxyBinding)) + name = mapping.name if hasattr(mapping, "name") else mapping["name"] + proxy = ( + mapping.conn_proxy + if hasattr(mapping, "conn_proxy") + else mapping["conn_proxy"] + ) + add_literal(graph, binding_uri, ns.name, name) + add_literal(graph, binding_uri, ns.connProxy, proxy) + + def _emit_connector( + self, + graph: Graph, + connector_uri: URIRef, + connector: FileConnector | TableConnector | SparqlConnector, + ) -> None: + connector_class = ns.CONNECTOR_CLASSES[type(connector).__name__] + graph.add((connector_uri, RDF.type, URIRef(str(connector_class)))) + graph.add((connector_uri, RDF.type, ns.BoundConnector)) + add_literal(graph, connector_uri, ns.name, connector.name) + add_literal(graph, connector_uri, ns.resourceName, connector.resource_name) + add_enum_individual( + graph, + connector_uri, + ns.boundSourceKind, + connector.bound_source_kind().value, + ns.ENUM_REGISTRIES["bound_source_kind"], + ) + + payload = self._model_payload(connector, ns.MODEL_PAYLOAD_EXCLUDES["connector"]) + if payload: + graph.add((connector_uri, ns.connectorPayload, json_literal(payload))) + + @staticmethod + def _model_payload(model: Any, exclude_fields: set[str]) -> dict[str, Any]: + payload = model.model_dump( + mode="json", + by_alias=True, + exclude=exclude_fields, + exclude_none=True, + exclude_defaults=True, + ) + return payload if isinstance(payload, dict) else {} + + @staticmethod + def _edge_key(edge: Edge) -> str: + relation = edge.relation or "relates" + return f"{edge.source}_{relation}_{edge.target}" diff --git a/graflo/rdf/utils.py b/graflo/rdf/utils.py new file mode 100644 index 00000000..90a4ba84 --- /dev/null +++ b/graflo/rdf/utils.py @@ -0,0 +1,137 @@ +"""Shared helpers for GraFlo RDF serialization.""" + +from __future__ import annotations + +import json +import re +from pathlib import Path +from typing import Any +from urllib.parse import quote + +from rdflib import BNode, Graph, Literal, RDF, URIRef +from rdflib.namespace import XSD + +from graflo.architecture.pipeline.runtime.actor.config.normalize import ( + normalize_actor_step, +) +from graflo.rdf import namespace as ns + + +def ontology_path() -> Path: + """Return the packaged GraFlo meta-ontology Turtle file path.""" + return Path(__file__).resolve().parent / "ontology" / "graflo.ttl" + + +def load_ontology_graph() -> Graph: + """Load the GraFlo meta-ontology into an rdflib graph.""" + graph = Graph() + graph.parse(str(ontology_path()), format="turtle") + return graph + + +def slug_token(value: str) -> str: + """Normalize arbitrary text into a URI path segment.""" + cleaned = re.sub(r"[^A-Za-z0-9._-]+", "-", value.strip()) + return quote(cleaned.strip("-") or "item", safe="-._~") + + +def join_uri(base_uri: str, *parts: str) -> str: + """Join base URI with path segments.""" + base = base_uri.rstrip("/") + "/" + tail = "/".join(slug_token(part) for part in parts if part) + return base + tail if tail else base.rstrip("/") + + +def json_literal(value: Any) -> Literal: + """Encode a Python value as an xsd:string JSON literal.""" + return Literal(json.dumps(value, sort_keys=True), datatype=XSD.string) + + +def parse_json_literal(value: Literal | str | None) -> Any: + """Decode a JSON literal back to Python.""" + if value is None: + return None + text = str(value) + if not text: + return None + return json.loads(text) + + +def add_literal( + graph: Graph, + subject: URIRef | BNode, + predicate: URIRef, + value: Any, +) -> None: + """Add a literal triple when value is not None.""" + if value is None: + return + if isinstance(value, bool): + graph.add((subject, predicate, Literal(value, datatype=XSD.boolean))) + elif isinstance(value, int): + graph.add((subject, predicate, Literal(value, datatype=XSD.integer))) + elif isinstance(value, float): + graph.add((subject, predicate, Literal(value, datatype=XSD.decimal))) + else: + graph.add((subject, predicate, Literal(str(value)))) + + +def add_enum_individual( + graph: Graph, + subject: URIRef | BNode, + predicate: URIRef, + value: str | None, + mapping: dict[str, object], +) -> None: + """Link subject to a named enumeration individual.""" + if value is None: + return + individual = mapping.get(str(value)) + if individual is not None: + graph.add((subject, predicate, URIRef(str(individual)))) + + +def add_rdf_list( + graph: Graph, + values: list[str], +) -> BNode | None: + """Build an RDF collection for string values.""" + if not values: + return None + head = BNode() + current = head + for index, item in enumerate(values): + graph.add((current, RDF.first, Literal(item))) + if index == len(values) - 1: + graph.add((current, RDF.rest, RDF.nil)) + else: + nxt = BNode() + graph.add((current, RDF.rest, nxt)) + current = nxt + return head + + +def actor_step_type(step: dict[str, Any]) -> str: + """Return normalized actor step type.""" + normalized = normalize_actor_step(dict(step)) + step_type = normalized.get("type") + if not isinstance(step_type, str): + raise ValueError(f"Unsupported pipeline step: {step!r}") + return step_type + + +def actor_step_class(step_type: str) -> object: + """Map actor step type to ontology class.""" + cls = ns.ACTOR_STEP_CLASSES.get(step_type) + if cls is None: + raise ValueError(f"Unknown actor step type: {step_type!r}") + return cls + + +def reverse_enum(mapping: dict[str, object], individual: URIRef | BNode) -> str | None: + """Resolve named individual IRI back to enum string.""" + individual_str = str(individual) + for value, term in mapping.items(): + if str(term) == individual_str: + return value + return None diff --git a/mkdocs.yml b/mkdocs.yml index 29da77fe..d2f07903 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -66,6 +66,8 @@ nav: - Runtime connector updates: concepts/runtime_connector_updates.md - Backend Index Behavior: concepts/backend_indexes.md - Document cast errors and doc error sink: concepts/ingestion_doc_errors.md +- Model: + - GraFlo ontology (manifest RDF): model/graflo_ontology.md - Guides: - TigerGraph bulk load (CSV + LOADING JOB): guides/tigergraph_bulk_load.md - Examples: diff --git a/pyproject.toml b/pyproject.toml index 67a9eaee..3a016b3a 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -36,11 +36,11 @@ dependencies = [ "urllib3>=2.0.0", "xmltodict>=0.14.2,<0.15" ] -description = "A framework for transforming tabular (CSV, SQL) and hierarchical data (JSON, XML) into property graphs and ingesting them into graph databases (ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, NebulaGraph). Features automatic PostgreSQL schema inference." +description = "Manifest-driven Graph Schema & Transformation Language (GSTL): define labeled property graph schemas, ingest from CSV/JSON/Parquet/SQL/RDF/SPARQL/API sources, infer schemas from PostgreSQL 3NF or OWL/RDFS ontologies, apply schema migrations, and project to ArangoDB, Neo4j, TigerGraph, FalkorDB, Memgraph, or NebulaGraph." name = "graflo" readme = "README.md" requires-python = ">=3.11" -version = "1.7.34" +version = "1.8.0" [project.optional-dependencies] dev = [ @@ -66,8 +66,10 @@ plot = [ ingest = "graflo.cli.ingest:ingest" install_tigergraph_queries = "graflo.cli.install_tigergraph_queries:install_tigergraph_queries" manage_dbs = "graflo.cli.manage_dbs:manage_dbs" +manifest_to_rdf = "graflo.rdf.cli:manifest_to_rdf" migrate_schema = "graflo.cli.migrate_schema:migrate_schema" plot_manifest = "graflo.cli.plot_manifest:plot_manifest" +rdf_to_manifest = "graflo.rdf.cli:rdf_to_manifest" run_tigergraph_queries = "graflo.cli.run_tigergraph_queries:run_tigergraph_queries" xml2json = "graflo.cli.plot_schema:xml2json" diff --git a/test/docs/test_build_ontology_viz.py b/test/docs/test_build_ontology_viz.py new file mode 100644 index 00000000..41aa5808 --- /dev/null +++ b/test/docs/test_build_ontology_viz.py @@ -0,0 +1,92 @@ +"""Smoke tests for the GraFlo ontology visualization build script.""" + +from __future__ import annotations + +import importlib.util +import json +from pathlib import Path + +from rdflib import Graph, Literal, URIRef +from rdflib.namespace import OWL, RDF, RDFS, SKOS + +REPO_ROOT = Path(__file__).resolve().parents[2] +BUILD_SCRIPT = REPO_ROOT / "docs" / "scripts" / "build_ontology_viz.py" +EXTRACT_SCRIPT = REPO_ROOT / "docs" / "scripts" / "ontology_viz" / "extract.py" +OUTPUT_DIR = REPO_ROOT / "docs" / "assets" / "graflo-ontology-viz" +INDEX_HTML = OUTPUT_DIR / "index.html" +EMBED_HTML = OUTPUT_DIR / "embed.html" +GRAPH_JSON = OUTPUT_DIR / "graph-data.json" +ONTOLOGY_IRI = "https://ontology.growgraph.dev/graflo" + + +def _load_module(path: Path, name: str): + spec = importlib.util.spec_from_file_location(name, path) + assert spec is not None and spec.loader is not None + module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(module) + return module + + +def test_committed_ontology_viz_assets_exist() -> None: + assert INDEX_HTML.is_file(), ( + "Run docs/scripts/build_ontology_viz.py and commit assets" + ) + assert EMBED_HTML.is_file() + assert GRAPH_JSON.is_file() + assert (OUTPUT_DIR / "graph-view.js").is_file() + + +def test_committed_ontology_viz_contains_metadata() -> None: + html = INDEX_HTML.read_text(encoding="utf-8") + assert "GraFlo Ontology" in html + assert "GRAFLO_ONTOLOGY_GRAPH" in html + payload = json.loads(GRAPH_JSON.read_text(encoding="utf-8")) + assert payload["ontology"] == ONTOLOGY_IRI + assert payload["nodes"] + assert any(edge["kind"] == "subClassOf" for edge in payload["edges"]) + + +def test_build_ontology_viz_script_runs() -> None: + module = _load_module(BUILD_SCRIPT, "build_ontology_viz") + viz_id = module.build_ontology_viz() + assert viz_id == "hierarchical-graph" + assert INDEX_HTML.is_file() + assert EMBED_HTML.is_file() + + +def test_extract_graph_has_subclass_and_property_edges() -> None: + extract = _load_module(EXTRACT_SCRIPT, "ontology_viz_extract") + payload = extract.extract_ontology_graph() + kinds = {edge["kind"] for edge in payload["edges"]} + assert "subClassOf" in kinds + assert "objectProperty" in kinds or "datatypeProperty" in kinds + assert payload["nodeWidth"] > 0 + assert payload["nodeHeight"] > 0 + + +def test_extract_prefers_skos_pref_label_for_nodes() -> None: + extract = _load_module(EXTRACT_SCRIPT, "ontology_viz_extract") + graph = Graph() + node_uri = URIRef("https://ontology.growgraph.dev/graflo/LabelNode") + graph.add((node_uri, RDF.type, OWL.Class)) + graph.add((node_uri, RDFS.label, Literal("Technical Label"))) + graph.add((node_uri, SKOS.prefLabel, Literal("User Label"))) + + payload = extract.extract_ontology_graph(graph) + node = next(item for item in payload["nodes"] if item["id"] == str(node_uri)) + assert node["label"] == "User Label" + + +def test_extract_label_fallback_is_rdfs_then_local_name() -> None: + extract = _load_module(EXTRACT_SCRIPT, "ontology_viz_extract") + graph = Graph() + rdfs_node = URIRef("https://ontology.growgraph.dev/graflo/RdfsNode") + local_node = URIRef("https://ontology.growgraph.dev/graflo/LocalNode") + graph.add((rdfs_node, RDF.type, OWL.Class)) + graph.add((rdfs_node, RDFS.label, Literal("From RDFS"))) + graph.add((local_node, RDF.type, OWL.Class)) + + payload = extract.extract_ontology_graph(graph) + labels = {item["id"]: item["label"] for item in payload["nodes"]} + assert labels[str(rdfs_node)] == "From RDFS" + assert labels[str(local_node)] == "LocalNode" diff --git a/test/rdf/test_manifest_rdf.py b/test/rdf/test_manifest_rdf.py new file mode 100644 index 00000000..65d46dff --- /dev/null +++ b/test/rdf/test_manifest_rdf.py @@ -0,0 +1,180 @@ +"""Tests for GraphManifest RDF round-trip conversion.""" + +from __future__ import annotations + +import pathlib + +import yaml +from rdflib import Graph, URIRef +from rdflib.namespace import OWL, RDF + +from graflo import GraphManifest +from graflo.architecture.database_features import EdgePhysicalSpec +from graflo.architecture.graph_types import Index +from graflo.rdf import namespace as ns +from graflo.rdf.deserializer import ManifestRdfDeserializer +from graflo.rdf.serializer import ManifestRdfSerializer +from graflo.rdf.utils import load_ontology_graph, ontology_path + + +EXAMPLES_DIR = pathlib.Path(__file__).resolve().parents[2] / "examples" +BASE_URI = "https://growgraph.dev/manifests/test/" + + +def _load_example_manifest(name: str) -> GraphManifest: + path = EXAMPLES_DIR / name / "manifest.yaml" + with path.open(encoding="utf-8") as handle: + data = yaml.safe_load(handle) + return GraphManifest.from_dict(data) + + +def _canonical(manifest: GraphManifest) -> dict: + return manifest.to_minimal_canonical_dict() + + +def test_ontology_file_exists_and_loads() -> None: + path = ontology_path() + assert path.is_file() + graph = load_ontology_graph() + ontology_uri = URIRef(ns.GF_ONTOLOGY_IRI) + assert (ontology_uri, RDF.type, OWL.Ontology) in graph + assert (ontology_uri, OWL.versionIRI, URIRef(ns.GF_VERSION_IRI)) in graph + version_info = next(graph.objects(ontology_uri, OWL.versionInfo), None) + assert version_info is not None + assert str(version_info) == ns.GF_VERSION + + +def test_manifest_to_rdf_contains_core_triples() -> None: + manifest = _load_example_manifest("1-ingest-csv") + serializer = ManifestRdfSerializer(include_ontology=False) + graph = serializer.to_graph(manifest, BASE_URI) + + manifest_uri = URIRef(BASE_URI.rstrip("/")) + assert (manifest_uri, RDF.type, ns.GraphManifest) in graph + assert (manifest_uri, ns.hasSchema, None) is not True + assert len(list(graph.objects(manifest_uri, ns.hasSchema))) == 1 + assert len(list(graph.objects(manifest_uri, ns.hasIngestionModel))) == 1 + assert len(list(graph.objects(manifest_uri, ns.hasBindings))) == 1 + + +def test_round_trip_example_1_ingest_csv() -> None: + original = _load_example_manifest("1-ingest-csv") + serializer = ManifestRdfSerializer(include_ontology=False) + deserializer = ManifestRdfDeserializer() + + ttl = serializer.to_turtle(original, BASE_URI) + restored = deserializer.from_turtle(ttl, BASE_URI.rstrip("/")) + + assert _canonical(restored) == _canonical(original) + + +def test_round_trip_example_2_with_transforms() -> None: + original = _load_example_manifest("2-ingest-self-references") + serializer = ManifestRdfSerializer(include_ontology=False) + deserializer = ManifestRdfDeserializer() + + graph = serializer.to_graph(original, BASE_URI) + restored = deserializer.from_graph(graph, BASE_URI.rstrip("/")) + + assert _canonical(restored) == _canonical(original) + + +def test_round_trip_example_3_edge_weights() -> None: + original = _load_example_manifest("3-ingest-csv-edge-weights") + serializer = ManifestRdfSerializer(include_ontology=False) + deserializer = ManifestRdfDeserializer() + + ttl = serializer.to_turtle(original, BASE_URI) + restored = deserializer.from_turtle(ttl, BASE_URI.rstrip("/")) + + assert _canonical(restored) == _canonical(original) + + +def test_turtle_output_serializes_with_ontology() -> None: + manifest = _load_example_manifest("1-ingest-csv") + serializer = ManifestRdfSerializer(include_ontology=True) + ttl = serializer.to_turtle(manifest, BASE_URI) + + graph = Graph() + graph.parse(data=ttl, format="turtle") + manifest_uri = URIRef(BASE_URI.rstrip("/")) + assert (manifest_uri, RDF.type, ns.GraphManifest) in graph + + +def test_json_ld_output_is_parseable() -> None: + manifest = _load_example_manifest("1-ingest-csv") + serializer = ManifestRdfSerializer(include_ontology=False) + payload = serializer.to_json_ld(manifest, BASE_URI) + + graph = Graph() + graph.parse(data=payload, format="json-ld") + restored = ManifestRdfDeserializer().from_graph(graph, BASE_URI.rstrip("/")) + assert _canonical(restored) == _canonical(manifest) + + +def test_round_trip_preserves_vertex_config_policy_fields() -> None: + original = _load_example_manifest("1-ingest-csv") + assert original.graph_schema is not None + original.graph_schema.core_schema.vertex_config.force_types = { + "Person": ["STRING", "INT"] + } + original.graph_schema.core_schema.vertex_config.identity_from_all_properties = False + + serializer = ManifestRdfSerializer(include_ontology=False) + deserializer = ManifestRdfDeserializer() + restored = deserializer.from_graph( + serializer.to_graph(original, BASE_URI), BASE_URI.rstrip("/") + ) + + assert restored.graph_schema is not None + restored_vertex_cfg = restored.graph_schema.core_schema.vertex_config + assert restored_vertex_cfg.force_types == {"Person": ["STRING", "INT"]} + assert restored_vertex_cfg.identity_from_all_properties is False + + +def test_context_has_new_vertex_config_and_label_terms() -> None: + context_path = ( + pathlib.Path(__file__).resolve().parents[2] + / "graflo" + / "rdf" + / "ontology" + / "graflo-context.jsonld" + ) + payload = context_path.read_text(encoding="utf-8") + assert '"forceTypes": "gf:forceTypes"' in payload + assert '"identityFromAllProperties": "gf:identityFromAllProperties"' in payload + assert '"prefLabel": "skos:prefLabel"' in payload + + +def test_profile_and_transform_actor_semantic_links_are_emitted() -> None: + manifest = _load_example_manifest("2-ingest-self-references") + assert manifest.graph_schema is not None + manifest.graph_schema.db_profile.vertex_indexes = { + "Person": [Index(fields=["name"])] + } + manifest.graph_schema.db_profile.edge_specs = [ + EdgePhysicalSpec( + source="Person", + target="Person", + relation="follows", + indexes=[Index(fields=["created_at"])], + ) + ] + + graph = ManifestRdfSerializer(include_ontology=False).to_graph(manifest, BASE_URI) + + profile_nodes = list(graph.subjects(RDF.type, ns.DatabaseProfile)) + assert profile_nodes + profile_node = profile_nodes[0] + vertex_index_nodes = list(graph.objects(profile_node, ns.hasVertexIndex)) + edge_spec_nodes = list(graph.objects(profile_node, ns.hasEdgeSpec)) + assert vertex_index_nodes + assert edge_spec_nodes + assert any(graph.objects(vertex_index_nodes[0], ns.indexField)) + assert any(graph.objects(edge_spec_nodes[0], ns.hasIndex)) + + transform_actor_nodes = list(graph.subjects(RDF.type, ns.TransformActorStep)) + assert transform_actor_nodes + assert any( + any(graph.objects(node, ns.executesTransform)) for node in transform_actor_nodes + ) diff --git a/uv.lock b/uv.lock index 5f0f2efb..91a03445 100644 --- a/uv.lock +++ b/uv.lock @@ -434,7 +434,7 @@ dependencies = [ ] name = "graflo" source = {editable = "."} -version = "1.7.34" +version = "1.8.0" [package.metadata] provides-extras = ["dev", "docs", "plot"]