From 0450557be248da7c29d426fe1ab265486c28c021 Mon Sep 17 00:00:00 2001 From: Luke Sy Date: Mon, 23 Feb 2026 05:25:29 +1100 Subject: [PATCH 1/5] Add design documents: MANIFESTO, ROSGRAPH proposal, and FAQ Three founding documents for the rosgraph project: - MANIFESTO.md: project direction (why, what, how) - ROSGRAPH.md: full technical proposal (schema, architecture, phasing) - FAQ.md: audience-organized FAQ covering 9 perspectives Signed-off-by: Luke Sy --- docs/FAQ.md | 874 +++++++++++++++++++++++ docs/MANIFESTO.md | 15 + docs/ROSGRAPH.md | 1724 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 2613 insertions(+) create mode 100644 docs/FAQ.md create mode 100644 docs/MANIFESTO.md create mode 100644 docs/ROSGRAPH.md diff --git a/docs/FAQ.md b/docs/FAQ.md new file mode 100644 index 0000000..b01d3c6 --- /dev/null +++ b/docs/FAQ.md @@ -0,0 +1,874 @@ +# rosgraph — Frequently Asked Questions + +> **Parent:** [ROSGRAPH.md](ROSGRAPH.md) (technical proposal) + +Organized by who's asking. Find your perspective, jump to the +questions that matter to you. + +--- + +## Table of Contents + +1. [New ROS Developer](#1-new-ros-developer) +2. [AI-Assisted Developer](#2-ai-assisted-developer) +3. [Engineering Lead / System Integrator / DevOps](#3-engineering-lead--system-integrator--devops) +4. [Safety-Critical Engineer](#4-safety-critical-engineer) +5. [MoveIt / nav2 / Popular Module User](#5-moveit--nav2--popular-module-user) +6. [The Skeptic](#6-the-skeptic) +7. [Package Maintainer / ROS Governance](#7-package-maintainer--ros-governance) +8. [Educator / University Researcher](#8-educator--university-researcher) +9. [Embedded / Resource-Constrained Developer](#9-embedded--resource-constrained-developer) + +--- + +## 1. New ROS Developer + +### What problem does rosgraph solve? + +ROS 2 doesn't verify that your nodes are wired correctly until +runtime — and often not even then. Type mismatches between publishers +and subscribers fail silently. QoS incompatibilities drop connections +with no error. Parameter renames break launch files with no build +error. + +rosgraph catches these at build time. See [PROPOSAL.md §1, "The +Problem, Concretely"](PROPOSAL.md#the-problem-concretely) for four +real-world examples. + +### How much do I need to learn? + +Write one `interface.yaml` per node (~15 lines for a basic pub/sub +node). Run three commands: + +```bash +rosgraph generate . # generates code +rosgraph lint . # checks for issues +rosgraph monitor # watches the running system +``` + +The YAML schema has IDE autocompletion via JSON Schema. See the [Quick +Start](PROPOSAL.md#quick-start-what-it-looks-like) for a complete +minimal example. + +### What do I stop doing when I adopt rosgraph? + +- **Stop writing pub/sub boilerplate.** Publisher creation, subscriber + setup, parameter declaration — all generated from `interface.yaml`. +- **Stop manually syncing parameters between code and launch files.** + `interface.yaml` is the single source of truth for parameter names, + types, defaults, and validation ranges. +- **Stop debugging silent QoS mismatches.** `rosgraph lint` catches + incompatible QoS profiles before you launch. +- **Stop wondering if your launch files reference the right nodes.** + `rosgraph lint` validates node refs, remappings, and parameter + overrides. + +### Will error messages actually be helpful? + +Error quality is a design requirement, not an afterthought. The +architecture follows Ruff's model ([PROPOSAL.md +§10.3](PROPOSAL.md#103-static-analysis-architecture)): + +- Every diagnostic includes a rule code (`TOP001`), the location in + `interface.yaml`, and what's wrong. +- Safe fixes can be auto-applied. Unsafe fixes are flagged but not + auto-applied. +- SARIF output enables inline annotations in GitHub PRs. +- `--add-noqa` generates suppression comments for existing issues, + so you can adopt gradually without noise. + +### Do I need to learn YAML schema syntax? + +Not really. If your editor has the YAML Language Server (most do), +you get autocompletion, inline validation, and hover docs from the +JSON Schema ([PROPOSAL.md §6, G2](PROPOSAL.md#6-feature-list)). Write +a few fields, let the editor fill in the structure. + +--- + +## 2. AI-Assisted Developer + +### How does rosgraph work with AI coding tools? + +`interface.yaml` is a machine-readable contract — exactly what LLMs +are good at consuming and generating. The `InterfaceDescriptor` IR +([PROPOSAL.md §3.3](PROPOSAL.md#33-the-interfacedescriptor-ir)) is a +JSON blob containing a node's complete API: topics, types, QoS, +parameters, lifecycle state. An AI agent reads this to understand what +a node does, generate implementation code, write tests, or suggest +fixes — without parsing C++ or Python source. + +See [PROPOSAL.md §3.13](PROPOSAL.md#313-ai--tooling-integration) for +the full AI integration design. + +### Can I use `rosgraph generate` as an agent tool? + +Yes. An AI agent writing a ROS node can: +1. Generate `interface.yaml` from a natural language description +2. Run `rosgraph generate .` as a tool call to get type-safe + scaffolding +3. Write only the business logic into the generated skeleton +4. Run `rosgraph lint .` to verify the graph is correct + +This avoids the common failure mode of LLMs hallucinating ROS +boilerplate (wrong QoS defaults, missing component registration, +incorrect parameter declaration). + +### Will there be an MCP server? + +It's architecturally planned ([PROPOSAL.md +§3.13](PROPOSAL.md#313-ai--tooling-integration)). An MCP server +would expose: +- Graph state (which nodes exist, what they publish/subscribe) +- Lint results (current issues in the workspace) +- Interface schemas (what a specific node expects) +- Resolved topic names (after remapping/namespacing) + +This lets Claude Code, Cursor, or Copilot answer "what topics does +the perception pipeline publish?" from structured data, not grep. + +### Can an AI generate `interface.yaml` from a description? + +Yes — the constrained schema makes this tractable. The schema has +~10 top-level keys with well-defined types. "I need a node that +subscribes to a lidar point cloud, filters it, and publishes the +result" produces a valid `interface.yaml` that `rosgraph generate` +immediately scaffolds. + +`rosgraph discover` ([PROPOSAL.md +§3.10](PROPOSAL.md#310-rosgraph-discover--runtime-to-spec-generation)) +can also generate `interface.yaml` from a running node, which an LLM +can then refine — adding descriptions, suggesting QoS rationale, and +grouping related interfaces. + +### What about IDE / LSP integration? + +Phase 1 delivers JSON Schema validation (IDE autocompletion for +`interface.yaml`). A dedicated LSP server would add: +- Hover for message type definitions +- Go-to-definition for `$ref` targets +- Inline diagnostics from `rosgraph lint` +- Cross-file rename support + +This benefits both human developers and AI agents operating within +IDE contexts. See [PROPOSAL.md +§3.13](PROPOSAL.md#313-ai--tooling-integration). + +--- + +## 3. Engineering Lead / System Integrator / DevOps + +### Who owns an `interface.yaml`? + +The node author defines it. Downstream consumers depend on the +installed version in `share//interfaces/`. Changes are +coordinated via: + +- `rosgraph breaking` ([PROPOSAL.md + §3.9](PROPOSAL.md#39-rosgraph-breaking--breaking-change-detection)) + — automated detection of breaking changes in CI, blocking merges + that break downstream consumers. +- Installed interfaces — downstream teams depend on published + interfaces without pulling source code. +- Semantic versioning alignment — breaking = major, dangerous = minor, + safe = patch. + +See [PROPOSAL.md §3.14](PROPOSAL.md#314-scale--fleet-considerations). + +### How does this scale to hundreds of packages? + +- **Lint performance target:** 100 packages in under 5 seconds + (Design Principle 7). Analysis is single-pass over the graph model + with parallel per-package processing and content caching. +- **Multi-workspace analysis:** Installed `interface.yaml` files in + underlays serve as cached facts. Only your workspace is analyzed, + not the entire underlay. See [PROPOSAL.md + §3.12](PROPOSAL.md#312-multi-workspace-analysis). +- **Differential analysis:** `--new-only` reports only issues + introduced since the base branch. No noise from existing code. +- **Per-package configuration:** Override lint rules per package via + `rosgraph.toml`. + +### I compose nodes from multiple vendors. How does rosgraph help? + +`system.yaml` (Layer 2 schema, [PROPOSAL.md +§3.2](PROPOSAL.md#32-schema-layers)) declares the intended system +composition — which nodes, which namespaces, which parameter overrides, +which remappings. `rosgraph lint` validates the composed graph: + +- **Type mismatches** across package boundaries (Node A publishes + `Twist`, Node B subscribes expecting `TwistStamped`) +- **QoS incompatibilities** between a vendor's publisher and your + subscriber +- **Disconnected subgraphs** — nodes that should be connected but + aren't due to a namespace or remapping error +- **Invalid remappings** — remaps pointing to nonexistent topics + +If a vendor doesn't ship `interface.yaml`, use `rosgraph discover` +([PROPOSAL.md +§3.10](PROPOSAL.md#310-rosgraph-discover--runtime-to-spec-generation)) +to generate one from a running instance of the vendor's node. The +discovered spec becomes your integration contract. + +### How does rosgraph fit into CI? + +rosgraph is CI-first by design (Design Principle 8): + +```yaml +# GitHub Actions example +- name: Lint graph + run: rosgraph lint . --output-format sarif --new-only --base main + # SARIF output → GitHub Security tab, PR annotations + +- name: Check breaking changes + run: rosgraph breaking --base main + # Exit code 1 if breaking changes detected + +- name: Run contract tests + run: rosgraph test + # Schema-driven interface conformance tests +``` + +Output formats: `text`, `json`, `sarif` (GitHub Security tab), +`github` (Actions annotations), `junit` (test reports). All +configurable via `rosgraph.toml` or `--output-format`. See +[PROPOSAL.md §3.11](PROPOSAL.md#311-configuration). + +For brownfield adoption, `--add-noqa` generates inline suppression +comments for all existing issues, creating a clean baseline. You +don't get 500 warnings on your first PR. + +### What about the colcon build workflow? + +`colcon-rosgraph` (Phase 2) is a thin colcon verb plugin that delegates +to the standalone `rosgraph` binary. It adds `colcon lint`, +`colcon docs`, `colcon discover`, and `colcon breaking` — iterating +packages in dependency order with parallel execution. See [PROPOSAL.md +§3.15](PROPOSAL.md#315-colcon-integration). + +Phase 1 works standalone: `rosgraph lint .` in any directory. No +colcon dependency required. + +### What about fleet-level monitoring? + +`rosgraph monitor` runs per-robot. For fleet-scale observability: + +- The Prometheus `/metrics` exporter (M7) enables standard Grafana + dashboards aggregated across the fleet. +- The `/rosgraph/diff` topic on each robot can be bridged to a + central system for aggregated drift analysis. +- The architecture uses standard observability patterns (Prometheus, + structured logs, `/diagnostics`) rather than inventing fleet-specific + infrastructure. + +Runtime performance targets: reconciliation < 500ms for 200 nodes, +< 50MB memory, < 5% CPU at steady state. See [PROPOSAL.md +§3.14](PROPOSAL.md#314-scale--fleet-considerations). + +### Can we enforce org-specific conventions? + +Yes. `rosgraph.toml` supports per-package rule overrides, custom +naming patterns, and rule selection. The Spectral-inspired YAML-native +rule system ([PROPOSAL.md +§10.3](PROPOSAL.md#103-static-analysis-architecture)) means a +robotics engineer can write custom rules without knowing Rust or C++. + +### Does rosgraph handle launch file complexity? + +Three strategies, phased by tractability ([PROPOSAL.md +§3.5](PROPOSAL.md#35-rosgraph-lint--static-analysis)): + +1. **YAML launch files** — fully parseable, Phase 1 +2. **`system.yaml`** — static composition schema, fully analyzable, + Phase 1 +3. **Python launch AST** — pattern matching for `Node()`, + `LaunchConfiguration()`, etc., Phase 2 + +Python launch files with complex conditionals, loops, or dynamically +computed node sets can't be fully statically analyzed. `system.yaml` +is the escape hatch for systems that need full analyzability. + +--- + +## 4. Safety-Critical Engineer + +### Does rosgraph help with certification? + +rosgraph is not a safety tool — it's a development and verification +tool that produces artifacts useful in safety cases. See [PROPOSAL.md +§11](PROPOSAL.md#11-safety--certification) for the full mapping. + +Key artifacts: + +| rosgraph artifact | Evidence type | +|---|---| +| `interface.yaml` | Software architecture description | +| `rosgraph lint` SARIF output | Static analysis results | +| `rosgraph monitor` logs | Runtime verification evidence | +| `rosgraph test` results | Interface conformance evidence | +| `rosgraph breaking` output | Change impact analysis | + +### Which safety standards does this map to? + +IEC 61508 (general functional safety), ISO 26262 (automotive), +IEC 62304 (medical), DO-178C (aerospace), ISO 13482 (service robots), +and ISO 21448 / SOTIF. See [PROPOSAL.md +§11.1](PROPOSAL.md#111-relevant-standards) for how rosgraph maps to +each. + +### What about behavioral properties? + +Phase 1-2 covers structural properties: type matches, QoS +compatibility, graph connectivity. This is a necessary precondition +for behavioral safety — you can't reason about message timing if the +messages aren't connected correctly. + +Behavioral analysis (Phase 3+) adds temporal and causal properties, +inspired by HAROS HPL: + +``` +globally: /emergency_stop causes /motor_disable within 100ms +globally: /heartbeat absent_for 500ms causes /safe_stop +``` + +See [PROPOSAL.md §11.4](PROPOSAL.md#114-behavioral-properties-future). + +### Are monitor alert thresholds configurable? + +Yes. The defaults (10s for `NodeMissing`, 30s for `UnexpectedNode`) +are tuned for general robotics. Safety-critical deployments override +them via `rosgraph.toml`: + +```toml +[monitor.alerts] +NodeMissing = { grace_period_ms = 1000, severity = "critical" } +TopicMissing = { grace_period_ms = 500, severity = "critical" } +``` + +See [PROPOSAL.md §11.3](PROPOSAL.md#113-configurable-safety-levels). + +### Are there safety-specific lint rules? + +Planned for Phase 2-3: + +| Rule | Description | +|---|---| +| `SAF001` | Critical subscriber has < N publishers (no redundancy) | +| `SAF002` | Single point of failure in graph topology | +| `SAF003` | Safety-critical node is not lifecycle-managed | +| `TF001` | Declared `frame_id` not published by any node | +| `TF002` | Broken frame chain (no transform path) | + +The analyzer architecture supports adding these without changes. +See [PROPOSAL.md §11.5](PROPOSAL.md#115-safety-relevant-lint-rules-future). + +### What about determinism and real-time guarantees? + +`rosgraph monitor` is an observation tool, not a safety-critical +component. It runs in its own process, does not interfere with the +monitored system, and its failure does not affect the system under +observation. It is not designed to be real-time safe. + +For hard real-time requirements, the monitor's output (Prometheus +metrics, diagnostics topics) can be consumed by a separate real-time +safety monitor. rosgraph provides the graph model; the real-time +enforcement layer is a separate concern. + +### What about audit trails? + +`rosgraph lint` produces SARIF output with timestamps, tool version, +rule versions, and results. This can be stored as CI artifacts for +audit purposes. A dedicated audit log format for `rosgraph monitor` +(continuous verification evidence) is not in Phase 1 but the +structured output (JSON, SARIF) makes it straightforward to add. + +--- + +## 5. MoveIt / nav2 / Popular Module User + +### Does rosgraph work with nav2's plugin system? + +Yes, via the mixin system ([PROPOSAL.md +§3.2](PROPOSAL.md#32-schema-layers)). Plugins that inject interfaces +into a host node are declared as mixins: + +```yaml +# nodes/follow_path/interface.yaml +node: + name: follow_path + package: nav2_controller + +parameters: + controller_plugin: + type: string + default_value: "dwb_core::DWBLocalPlanner" + +mixins: + - ref: dwb_core/dwb_local_planner # brings in max_vel_x, etc. + - ref: nav2_costmap_2d/costmap # brings in costmap params +``` + +The host's effective interface = its own declaration + all mixin +interfaces merged. This gives `rosgraph lint` and `rosgraph monitor` +the complete picture. + +Mixins are Phase 2 (G15). Phase 1 works for nodes without plugins. + +### What happens when I switch plugins (e.g., DWB → MPPI)? + +You update the mixin reference in `interface.yaml`. The effective +interface changes at build time, and `rosgraph generate` produces new +scaffolding. This is a build-time concern — `rosgraph lint` validates +the graph with the new plugin's interface. + +If the plugin is selected at runtime via parameter, this falls under +"dynamic interfaces" (Design Principle 12) — rosgraph declares the +static portion and `rosgraph monitor` flags unexpected interfaces. + +### Does rosgraph validate TF frames? + +Planned for Phase 2-3. `TF001` checks that declared `frame_id` values +are published by some node in the graph. `TF002` checks that frame +chains are connected (no broken transform paths). See [PROPOSAL.md +§11.5](PROPOSAL.md#115-safety-relevant-lint-rules-future). + +TF is the #1 source of silent bugs in ROS 2 navigation and +manipulation. This is high-value but requires the graph model to +include TF publisher information, which depends on `interface.yaml` +having a `frame_id` annotation. + +### What about `generate_parameter_library` compatibility? + +Full compatibility is a non-negotiable design principle ([PROPOSAL.md +§2, DP9](PROPOSAL.md#2-design-principles)). The `parameters:` section +of `interface.yaml` IS the `generate_parameter_library` format. A +standalone gen_param_lib YAML file works as-is when placed in +`interface.yaml`. rosgraph delegates to gen_param_lib at build time. +See [PROPOSAL.md §9.2](PROPOSAL.md#92-tool-assessments). + +### Can rosgraph lint my existing launch files? + +Phase 1 supports YAML launch files (direct parse) and `system.yaml` +(Layer 2 schema). Phase 2 adds Python launch file AST analysis for +standard `launch_ros` patterns — `Node()`, `LaunchConfiguration()`, +`DeclareLaunchArgument()`. + +Limitations: Python launch files that use conditionals, loops, or +dynamically computed node sets cannot be fully statically analyzed. +`system.yaml` is the escape hatch for systems that need full static +analyzability. See [PROPOSAL.md +§3.5](PROPOSAL.md#35-rosgraph-lint--static-analysis). + +### Does this work with Gazebo / Isaac Sim? + +Simulators expose ROS interfaces that look identical to real hardware. +`rosgraph discover` can introspect a simulated system and generate +`interface.yaml`. `rosgraph monitor` can verify that a simulated +system matches the declared graph. `rosgraph lint` doesn't +distinguish between real and simulated — it validates the graph model. + +### What about message type changes across ROS distros? + +`interface.yaml` references message types by name (e.g., +`geometry_msgs/msg/Twist`). Message type compatibility across distros +is a ROS infrastructure concern, not a rosgraph concern. rosgraph +validates that publishers and subscribers on the same topic agree on +type — it doesn't validate that the type definition itself is +compatible across distros. + +`rosgraph breaking` can detect when a type reference changes between +versions of an `interface.yaml`. + +--- + +## 6. The Skeptic + +### I write good tests. Why do I need another YAML file? + +Tests catch type mismatches and QoS issues at launch time — after you +wait 30 seconds for the stack to start, watch it fail, read the logs, +and figure out which of 40 nodes has the wrong type. Then you fix it, +rebuild, relaunch, and wait again. + +`rosgraph lint` catches the same bugs in under 5 seconds, before +launch, in CI, before anyone else has to debug it. It's the difference +between "tests catch bugs" and "bugs never reach the test phase." + +### What's the overhead? + +Per node: one `interface.yaml` file (~15-30 lines). Most of it is +information you're already specifying in code (topic names, message +types, QoS settings, parameter names) — `interface.yaml` centralizes +it. + +What you get back: +- No pub/sub boilerplate (generated) +- No parameter declaration boilerplate (generated via + `generate_parameter_library`) +- Pre-launch graph validation +- Runtime graph monitoring +- Auto-generated API documentation + +The net line-count change is typically negative for nodes with +parameters. + +### What if rosgraph can't express what I need? + +Escape hatches: +- **`# rosgraph: noqa: TOP001`** — suppress specific lint rules per + line. +- **Per-package ignores** — exclude entire packages from specific + rules via `rosgraph.toml`. +- **Undeclared interfaces** — if your code creates publishers that + aren't in `interface.yaml`, the code still works. `rosgraph monitor` + flags them as `UnexpectedTopic` (a warning, not an error). +- **Composition pattern** — generated code holds a `rclcpp::Node` + (has-a), not inherits from it. You always have access to the + underlying node for anything the schema can't express. + +See [PROPOSAL.md §12](PROPOSAL.md#12-scope--limitations) for the full +limitations discussion. + +### Does code generation add runtime overhead? + +The composition pattern (has-a Node) adds one level of indirection +compared to direct inheritance. This is a pointer dereference — single +nanoseconds. The generated pub/sub wrappers are thin forwarding calls. +No virtual dispatch is added that wouldn't already exist in the ROS +client library. + +The parameter validation code (from `generate_parameter_library`) runs +at parameter-set time, not in the hot path. + +### What happens when only my package has an `interface.yaml`? + +You still get: +- **Code generation** — less boilerplate in your node +- **Parameter validation** — runtime type and range checking +- **Self-documentation** — your node's API is machine-readable + +Cross-package value (type mismatch detection, QoS compatibility +checking, contract testing) grows with adoption. `rosgraph discover` +lets you generate specs for neighboring packages from a running +system, bootstrapping the cross-package graph incrementally. + +### This proposal has 51 features. Is this realistic? + +Phase 1 ([PROPOSAL.md §4](PROPOSAL.md#4-phasing)) is the commitment: +~12 features covering core schema, basic code generation, and +highest-value lint and monitor rules. Later phases are contingent on +adoption. + +The tool builds on existing work — cake for code generation, +`generate_parameter_library` for parameters, `graph-monitor` message +definitions for runtime. Phase 1 is stabilizing and unifying existing +pieces, not building from scratch. + +### Won't the spec just drift from reality like NoDL? + +NoDL died because it was a pure description format — no code +generation. Maintaining a spec that doesn't produce anything is +thankless work. + +`interface.yaml` generates code. If you change the spec, the generated +code changes. If you change the code without changing the spec, +`rosgraph monitor` flags the discrepancy at runtime. The two-way +binding (codegen + runtime monitoring) is what prevents the drift +that killed NoDL. + +The honest limitation: business logic is hand-written. If a developer +adds an undeclared publisher inside a callback, `rosgraph lint` won't +catch it at build time. `rosgraph monitor` catches it at runtime as +`UnexpectedTopic`. See [PROPOSAL.md +§12](PROPOSAL.md#12-scope--limitations). + +### When should I NOT use rosgraph? + +- **Quick prototyping** — single throwaway node, not worth the file. +- **Single-node packages** — minimal lint value, though codegen may + still save boilerplate. +- **Highly dynamic interfaces** — nodes that create/destroy publishers + at runtime based on conditions can't be fully declared. + +See [PROPOSAL.md §12, "When Not to Use +rosgraph"](PROPOSAL.md#when-not-to-use-rosgraph). + +--- + +## 7. Package Maintainer / ROS Governance + +### What does rosgraph mean for my package? + +If you maintain a ROS 2 package, `interface.yaml` is a machine-readable +contract for your node's public API — topics, services, actions, +parameters, QoS. It replaces the informal contract currently scattered +across READMEs, launch file comments, and source code. + +For consumers of your package, this means: +- **API discoverability.** `rosgraph docs` auto-generates browsable API + reference from your `interface.yaml`. No more stale READMEs. +- **Breaking change visibility.** `rosgraph breaking` classifies + interface changes as breaking/dangerous/safe, giving downstream users + clear upgrade guidance. See [PROPOSAL.md + §3.9](PROPOSAL.md#39-rosgraph-breaking--breaking-change-detection). +- **Contract testing.** Downstream packages can run `rosgraph test` + against your declared interface to verify compatibility. See + [PROPOSAL.md §3.7](PROPOSAL.md#37-rosgraph-test--contract-testing). + +### Do I have to adopt rosgraph to be compatible with it? + +No. Packages without `interface.yaml` are skipped, not errored (Design +Principle 6). Downstream users can run `rosgraph discover` against your +running node to generate a spec for their own use. Your package doesn't +need to ship `interface.yaml` for others to benefit — though shipping +one is much better, since discovered specs require human review and may +miss QoS details. + +### How does this affect my release process? + +`rosgraph breaking` runs in CI comparing the current `interface.yaml` +against the previous release. Breaking changes block the merge unless +explicitly acknowledged. This is opt-in per package via `rosgraph.toml` +and maps to semantic versioning: breaking = major, dangerous = minor, +safe = patch. See [PROPOSAL.md +§3.14](PROPOSAL.md#314-scale--fleet-considerations). + +### What about packages with plugin systems? + +If your package exposes a plugin API (like nav2's controller plugins), +the mixin system (Phase 2, G15) lets plugin authors declare the +interfaces they inject into the host node. The host's effective +interface is the merge of its own declaration plus all mixin fragments. +See [PROPOSAL.md §3.2](PROPOSAL.md#32-schema-layers). + +Until mixins ship in Phase 2, the host node's `interface.yaml` covers +its own direct interfaces. Plugins that add extra topics/parameters +are flagged by `rosgraph monitor` as unexpected — visible but not +validated. + +### What's the adoption path toward `ros_core`? + +Deliberately incremental ([PROPOSAL.md §4, "Adoption +Path"](PROPOSAL.md#adoption-path)): + +1. **`ros-tooling` organization** — institutional backing, CI + infrastructure, release process. graph-monitor already lives here. +2. **REP for `interface.yaml` schema** — formalizes the declaration + format as a community standard, independent of the rosgraph tool. +3. **docs.ros.org tutorial integration** — if "write your first node" + uses `interface.yaml`, every new ROS developer learns it from day + one. This is the highest-leverage adoption path. +4. **`ros_core` proposal** — after demonstrated adoption across + multiple distros. + +### Why not extend existing tools instead? + +Each existing tool covers one capability but none covers the full +scope. The gap analysis ([PROPOSAL.md +§9.3](PROPOSAL.md#93-gap-analysis)) shows five major gaps: graph diff, +graph linting, QoS static analysis, behavioral properties, and CI graph +validation. No single existing tool can be extended to fill all five. + +rosgraph builds on existing work where possible: +- `generate_parameter_library` for parameters (used as-is) +- `rosgraph_monitor_msgs` for runtime message definitions (adopted) +- cake's design decisions for code generation (validated) +- HAROS's metamodel for the graph model (adapted) + +### What's the maintenance burden? + +Phase 1 is ~12 features covering core schema, basic codegen, and +highest-value lint/monitor rules. The design minimizes ongoing +maintenance: + +- **Schema versioning** (G14) — `schema_version` field with migration + tooling prevents breaking changes to `interface.yaml` format. +- **IR-based plugin protocol** — code generation plugins are standalone + executables, independently maintained. +- **Analyzer DAG** — lint rules are isolated, independently testable + values (not subclasses). Adding or removing a rule doesn't affect + others. + +The risk factor: this is a new tool, not an extension of something with +existing momentum. It requires sustained contributor commitment. + +### How does this interact with the ROS 2 type system? + +rosgraph references existing `.msg`, `.srv`, and `.action` types — it +doesn't replace them (Design Principle 9). `interface.yaml` declares +which types a node uses; `rosidl` still defines the types themselves. + +The graph model ([PROPOSAL.md §3.1](PROPOSAL.md#31-the-graph-model)) +includes a `MessageTypeDB` that resolves type references to their +definitions for compatibility checking. This uses the existing +`rosidl` output — rosgraph doesn't parse `.msg` files directly. + +### What about governance and community standards? + +The REP process is the standard mechanism for formalizing ROS community +standards. A REP for the `interface.yaml` schema would: + +- Define the YAML schema specification independent of the rosgraph tool +- Allow alternative implementations (someone could build a different + tool that consumes the same schema) +- Provide a formal review process for schema changes +- Signal community endorsement + +The REP is Step 2 of the adoption path — after the tool has proven +itself in `ros-tooling` with real users. + +### What's the risk if this doesn't get adopted? + +The worst case: rosgraph becomes another single-maintainer tool in the +ecosystem (like cake and breadcrumb today). The mitigation strategy: + +- **`ros-tooling` hosting** — institutional backing reduces bus factor +- **REP-based schema** — the schema outlives the tool if it becomes a + standard +- **`generate_parameter_library` compatibility** — the parameters + portion works with the most mature tool in the space, regardless of + rosgraph's fate +- **Standalone value** — even without ecosystem adoption, a single team + gets code generation and parameter validation from day one + +--- + +## 8. Educator / University Researcher + +### Can I use rosgraph for teaching ROS 2? + +Yes, and this is one of the highest-leverage adoption paths. The Quick +Start ([PROPOSAL.md §1](PROPOSAL.md#quick-start-what-it-looks-like)) +shows a complete workflow in 3 commands: + +```bash +rosgraph generate . # generates node scaffolding +rosgraph lint . # checks for issues +rosgraph monitor # watches the running system +``` + +For teaching, `interface.yaml` forces students to think about their +node's API before writing implementation code — topics, types, QoS, +parameters. This is better pedagogy than the current approach of +copy-pasting publisher boilerplate and tweaking it. + +### Does this lower the barrier for students? + +Significantly. A student writes ~15 lines of YAML declaring what their +node does, runs `rosgraph generate`, and gets a working scaffold with +type-safe publishers, subscribers, and validated parameters. They write +only the business logic. No boilerplate, no silent type mismatches, no +mysterious QoS failures. + +Error messages are designed to be helpful — rule codes, file locations, +clear descriptions of what's wrong and how to fix it. See [PROPOSAL.md +§10.3](PROPOSAL.md#103-static-analysis-architecture). + +### How does rosgraph relate to HAROS? + +HAROS ([PROPOSAL.md §10.6](PROPOSAL.md#106-ros-domain-prior-art-haros)) +was the prior art for graph analysis in ROS — built at the University +of Minho (2016–2021). rosgraph borrows HAROS's metamodel and HPL +property language concepts, but differs fundamentally: + +- **HAROS extracted interfaces from source code.** rosgraph uses + explicit declarations (`interface.yaml`). Declarations are simpler, + more reliable, and enable code generation. +- **HAROS was ROS 1 only.** rosgraph is built for ROS 2 concepts: + QoS, lifecycle, components, actions, DDS discovery. +- **HAROS died because extraction broke.** catkin → ament, rospack → + colcon, XML launch → Python launch. Declaration-based tools don't + break when the build system changes. + +### Can I use rosgraph for research on ROS system verification? + +The graph model ([PROPOSAL.md §3.1](PROPOSAL.md#31-the-graph-model)) +is a structured representation of the ROS computation graph — nodes, +topics, services, actions, parameters, QoS, connections. It's +exportable as JSON via the `InterfaceDescriptor` IR ([PROPOSAL.md +§3.3](PROPOSAL.md#33-the-interfacedescriptor-ir)). + +Research opportunities: +- **Formal verification.** The graph model is a natural input for model + checkers. Behavioral properties (Phase 3+, [PROPOSAL.md + §11.4](PROPOSAL.md#114-behavioral-properties-future)) enable temporal + logic specifications. +- **Static analysis.** The analyzer DAG architecture ([PROPOSAL.md + §3.5](PROPOSAL.md#35-rosgraph-lint--static-analysis)) supports custom + analysis passes without modifying core code. +- **Runtime monitoring.** The declared-vs-observed diff ([PROPOSAL.md + §3.6](PROPOSAL.md#36-rosgraph-monitor--runtime-reconciliation)) is a + rich data source for anomaly detection research. +- **ROS ecosystem studies.** Interface coverage, graph topology + patterns, common QoS configurations — all extractable from + `interface.yaml` files across the ecosystem. + +### What about publishing results that use rosgraph? + +The tool is open-source (planned for `ros-tooling` organization). The +SARIF and JSON output formats produce structured, reproducible results +suitable for academic publication. The graph model provides a formal +vocabulary for describing ROS system architectures. + +--- + +## 9. Embedded / Resource-Constrained Developer + +### Does rosgraph add runtime overhead to my nodes? + +The generated code uses a composition pattern (has-a `Node`, not is-a +`Node`). This adds one pointer indirection — single nanoseconds. The +generated pub/sub wrappers are thin forwarding calls. No virtual +dispatch is added beyond what the ROS client library already uses. + +Parameter validation (via `generate_parameter_library`) runs at +parameter-set time, not in the hot path. See [PROPOSAL.md §3.4, +"Design decisions"](PROPOSAL.md#34-rosgraph-generate--code-generation). + +### Does `rosgraph monitor` run on the robot? + +Yes, but it's optional. `rosgraph monitor` is a separate process — it +doesn't instrument or modify your nodes. If your platform can't spare +the resources, don't run it. You still get full value from build-time +tools (`rosgraph generate`, `rosgraph lint`). + +Runtime targets for `rosgraph monitor` ([PROPOSAL.md +§3.14](PROPOSAL.md#314-scale--fleet-considerations)): +- Memory: < 50MB resident +- CPU: < 5% of one core at steady-state (5s scrape interval) +- No additional DDS traffic beyond standard discovery + +For very constrained platforms, run `rosgraph monitor` off-board +(e.g., on a companion computer) observing the same DDS domain. + +### Does rosgraph work with micro-ROS? + +micro-ROS nodes communicate via the standard DDS/XRCE-DDS bridge. +`rosgraph discover` and `rosgraph monitor` observe them through the +bridge like any other node. `interface.yaml` declarations work for +micro-ROS nodes — the schema is language-agnostic. + +Code generation for micro-ROS C is not in Phase 1. The IR-based plugin +architecture ([PROPOSAL.md +§3.3](PROPOSAL.md#33-the-interfacedescriptor-ir)) supports adding a +micro-ROS code generation plugin without changes to the core tool. + +### What about real-time constraints? + +`rosgraph monitor` is not real-time safe — it's an observation tool +running in its own process. It does not interfere with the monitored +system, and its failure does not affect the system under observation. + +For hard real-time requirements, the monitor's Prometheus metrics and +diagnostics topics can be consumed by a separate real-time safety +monitor. rosgraph provides the graph model; real-time enforcement is a +separate concern. See [PROPOSAL.md +§11](PROPOSAL.md#11-safety--certification). + +### Does the build toolchain add cross-compilation complexity? + +`rosgraph generate` runs at build time on the host, producing standard +C++ and Python source files. These are compiled by the normal +cross-compilation toolchain (`colcon build --cmake-args +-DCMAKE_TOOLCHAIN_FILE=...`). rosgraph itself doesn't need to run on +the target — it's a host-side tool, like `cmake` or `protoc`. diff --git a/docs/MANIFESTO.md b/docs/MANIFESTO.md new file mode 100644 index 0000000..70faf6b --- /dev/null +++ b/docs/MANIFESTO.md @@ -0,0 +1,15 @@ +# ROSGraph — Direction + +## Why + +Robotics engineers spend too much time on ROS plumbing — writing boilerplate, debugging invisible wiring, and keeping launch files in sync with code — instead of building their application. + +## What + +A declarative, observable ROS graph. Engineers declare what their system should be; tooling generates the code and verifies the running system matches the spec. + +## How + +1. **Language** — a formal spec to describe node interfaces and system graphs. +2. **Tooling** — translate declarations into working code. +3. **Verification** — compare spec against reality, both at runtime and statically before launch. diff --git a/docs/ROSGRAPH.md b/docs/ROSGRAPH.md new file mode 100644 index 0000000..8469ec5 --- /dev/null +++ b/docs/ROSGRAPH.md @@ -0,0 +1,1724 @@ +# rosgraph — Technical Proposal + +> **Status:** Proposal +> **Date:** 2026-02-22 +> **Parent:** [MANIFESTO.md](MANIFESTO.md) (direction) + +--- + +## Table of Contents + +1. [Executive Summary](#1-executive-summary) +2. [Design Principles](#2-design-principles) +3. [Architecture](#3-architecture) +4. [Phasing](#4-phasing) +5. [Language Choice](#5-language-choice) +6. [Feature List](#6-feature-list) +7. [Lint Rule Codes](#7-lint-rule-codes) +8. [Monitor Alert Rules](#8-monitor-alert-rules) +9. [Existing ROS 2 Ecosystem](#9-existing-ros-2-ecosystem) +10. [Prior Art](#10-prior-art) +11. [Safety & Certification](#11-safety--certification) +12. [Scope & Limitations](#12-scope--limitations) +13. [Resolved Questions](#13-resolved-questions) + +--- + +## 1. Executive Summary + +ROS 2 has no production-ready tool for verifying that a running system +matches its declared architecture, no standard schema for declaring node +interfaces, and no unified CLI for graph analysis. The ecosystem is +fragmented across single-purpose tools with overlapping scope and bus +factors of one. + +| Category | Capability | Current tool | Status | +|---|---|---|---| +| **Schema** | Node interface declaration | cake / nodl / gen_param_lib | cake early; nodl dead; gpl params-only | +| **Codegen** | Static graph from launch files | breadcrumb + clingwrap | Early-stage, solo dev | +| **Runtime** | Runtime graph monitoring | graph-monitor | Mid-stage, institutional | +| **Runtime** | Runtime tracing | ros2_tracing | Mature, production | +| **Runtime** | Latency analysis | CARET | Mature, Tier IV | +| **Runtime** | Graph visualisation | Foxglove, Dear RosNodeViewer | Mature but live-only | +| **Runtime** | **Graph diff (expected vs. actual)** | **Nothing** | **Major gap** | +| **Static** | **Graph linting (pre-launch)** | **Nothing** | **Major gap** | +| **Static** | **QoS static analysis** | breadcrumb (partial) | Early-stage | +| **Static** | **CI graph validation** | **Nothing** | **Major gap** | +| **Docs** | **Node API documentation** | **Nothing** (hand-written only) | **Major gap** | +| — | **Behavioural properties** | **Nothing** (HPL was ROS 1) | **Major gap** | + +### The Problem, Concretely + +Today in ROS 2: + +- Node A publishes `/cmd_vel` as `Twist`. Node B subscribes to + `/cmd_vel` as `String`. You discover this at runtime — or don't, + because the subscriber silently receives nothing. +- A publisher uses `BEST_EFFORT` QoS, a subscriber uses `RELIABLE`. + DDS refuses the connection. A warning is logged but easy to miss in + a busy console. The subscriber just never gets messages. +- A node crashes mid-deployment. The rest of the system keeps running. + Nobody knows until a customer reports a failure 20 minutes later. +- You rename a parameter. Three launch files reference the old name. + `colcon build` succeeds. The system launches. The parameter silently + takes its default value. + +These are real, common bugs in production ROS 2 systems. rosgraph +catches all four — the first two at build time (`rosgraph lint`), the +third at runtime (`rosgraph monitor`), the fourth at lint time. + +This document proposes **`rosgraph`** — a single tool with subcommands +covering the four goals of the ROSGraph Working Group: + +``` +rosgraph +├── rosgraph generate (Goal 2: spec → code) +├── rosgraph lint (Goal 4: static graph analysis) +├── rosgraph monitor (Goal 3: runtime reconciliation) +├── rosgraph test (Goal 3: contract testing) +├── rosgraph docs (documentation generation) +├── rosgraph breaking (breaking change detection) +└── rosgraph discover (runtime → spec, brownfield adoption) +``` + +Three key insights drive the design: + +1. **The ROS computation graph is not source code — it is a typed, + directed graph with QoS-annotated edges.** Analysis tools should + operate on a graph model, not on ASTs. Source code parsing is a + loader that feeds the model, not the analysis target. + +2. **Goals 3–4 are schema conformance problems** ("does reality match + the spec?"), not traditional program analysis. Once you have a + machine-readable spec (`interface.yaml`), verification falls out + naturally — the same pattern as `buf lint`, Pact contract tests, + and Kubernetes reconciliation. + +3. **A declaration without code generation is a non-starter.** NoDL + proved this. The schema must generate code, documentation, and + validation to stay in sync with reality. `interface.yaml` is + simultaneously the source for code generation, the lint target for + static analysis, the contract for runtime verification, and the + reference for documentation. + +### Quick Start (What It Looks Like) + +A minimal `interface.yaml`: + +```yaml +schema_version: "1.0" +node: + name: talker + package: demo_pkg + +publishers: + - topic: ~/chatter + type: std_msgs/msg/String + qos: { reliability: RELIABLE, depth: 10 } + +parameters: + publish_rate: + type: double + default_value: 1.0 + description: "Publishing rate in Hz" + validation: + bounds<>: [0.1, 100.0] +``` + +What you get: + +```bash +rosgraph generate . # → C++ header, Python module, parameter validation +rosgraph lint . # → "no issues" or "TOP001: type mismatch on /cmd_vel" +rosgraph monitor # → live diff: declared graph vs. running system +``` + +The generated code gives you a typed context struct with publishers, +subscribers, and validated parameters — no boilerplate. You write +business logic; rosgraph generates the wiring. + +--- + +## 2. Design Principles + +### Core Philosophy + +1. **The graph is the program.** Analysis operates on the typed, + QoS-annotated computation graph — not source code ASTs. Source + parsing is a loader that feeds the model, not the analysis target. + +2. **Declare first, verify always.** `interface.yaml` is the single + source of truth. Code generation, static analysis, and runtime + monitoring all verify against the declaration. + +3. **One schema, many consumers.** The same `interface.yaml` drives + code generation, documentation, linting, monitoring, contract + testing, and security policy generation. + +4. **One tool, not ten.** `rosgraph` with subcommands replaces + fragmented single-purpose tools. One CLI, one config, one output + format. + +### Developer Experience + +5. **Zero-config value, progressive disclosure.** Given + `interface.yaml` files, the default rules catch real bugs (type + mismatches, QoS incompatibilities) with no additional configuration. + A minimal 10-line `interface.yaml` produces a working node; + lifecycle, mixins, and parameterized QoS are opt-in. + +6. **Brownfield first, gradual adoption.** `rosgraph discover` + generates specs from running nodes. `--add-noqa` suppresses existing + issues. Packages without `interface.yaml` are skipped, not errored. + +7. **Speed is a feature.** An architectural property, not an + afterthought. Target: lint a 100-package workspace in under 5 + seconds. + +8. **Backward compatibility is non-negotiable.** Existing + `generate_parameter_library` YAML works as-is inside `parameters:`. + Existing `.msg`/`.srv`/`.action` files are referenced, not replaced. + +### Verification & CI + +9. **CI-first.** SARIF output, GitHub annotations, exit codes, and + differential analysis are primary design targets. + +10. **Validation at every stage.** Author time: JSON Schema. Build + time: structural + semantic. Launch time: declared vs. configured. + Runtime: declared vs. observed. + +11. **Correctness rules are errors; style rules are warnings.** Type + mismatches and QoS incompatibilities fail CI. Naming conventions + warn. + +### Scope + +12. **Declared interfaces are the primary target.** The schema + describes the *intended* interface — the same boundary drawn by + Protobuf, AsyncAPI, Smithy, and OpenAPI. For partially dynamic + nodes (e.g., nav2 plugin hosts), worst-case bounds can be declared + with `optional: true`; `rosgraph monitor` validates these at + runtime and flags truly undeclared interfaces as `UnexpectedTopic`. + +13. **Structural first, behavioural later.** Phase 1–2: type matches, + QoS compatibility, graph connectivity — the foundation that + safety-critical systems (ISO 26262, IEC 61508) require as evidence. + Behavioural properties (temporal/causal, e.g. "/e_stop causes + /motor_disable within 100ms") are Phase 3+, drawing on prior art + from HAROS HPL and runtime verification tools like STL/MTL + monitors. The structural graph model is designed to extend to + behavioural annotations without schema redesign. + +--- + +## 3. Architecture + +One tool. One graph model. Four capabilities. + +``` + ┌──────────────────────┐ + │ Graph Model │ + │ (shared library) │ + │ │ + │ Nodes, Topics, │ + │ Services, Actions, │ + │ Parameters, QoS, │ + │ Connections │ + └───────┬──────┬────────┘ + │ │ + ┌────────────┘ └───────────────┐ + │ │ + ┌─────────▼──────────┐ ┌──────────▼───────────┐ + │ Build-time tools │ │ Runtime tools │ + │ │ │ │ + │ rosgraph generate │ │ rosgraph monitor │ + │ rosgraph lint │ │ rosgraph test │ + │ rosgraph docs │ │ rosgraph discover │ + │ rosgraph breaking │ │ │ + └────────────────────┘ └───────────────────────┘ +``` + +### 3.1 The Graph Model + +A language-agnostic representation of the ROS computation graph. Every +loader produces it; every analyzer consumes it. + +``` +ComputationGraph +├── nodes: [NodeInterface] +│ ├── name, namespace, package, executable +│ ├── publishers: [{topic, msg_type, qos}] +│ ├── subscribers: [{topic, msg_type, qos}] +│ ├── services: [{name, srv_type}] +│ ├── clients: [{name, srv_type}] +│ ├── action_servers: [{name, action_type}] +│ ├── action_clients: [{name, action_type}] +│ ├── parameters: [{name, type, default, validators}] +│ └── lifecycle_state: str | None +├── topics: [TopicInfo] +│ ├── name, msg_type +│ ├── publishers: [NodeRef] +│ ├── subscribers: [NodeRef] +│ └── qos_profiles: [QoSProfile] +├── services: [ServiceInfo] +├── actions: [ActionInfo] +└── connections: [Connection] + ├── source: NodeRef + ├── target: NodeRef + ├── channel: TopicRef | ServiceRef | ActionRef + └── qos_compatible: bool +``` + +### 3.2 Schema Layers + +Three schema levels, each building on the previous: + +**Layer 1 — Node Interface Schema** (per-node declaration) + +```yaml +# interface.yaml +schema_version: "1.0" + +node: + name: lidar_processor + package: perception_pkg + lifecycle: managed # managed | unmanaged (default) + +parameters: + # Exact generate_parameter_library format (backward-compatible) + voxel_size: + type: double + default_value: 0.05 + description: "Voxel grid filter leaf size (meters)" + validation: + bounds<>: [0.01, 1.0] + read_only: false + robot_frame: + type: string + default_value: "base_link" + read_only: true + +publishers: + - topic: ~/filtered_points + type: sensor_msgs/msg/PointCloud2 + qos: + history: 5 + reliability: RELIABLE + durability: TRANSIENT_LOCAL + description: "Filtered and downsampled point cloud" + +subscribers: + - topic: ~/raw_points + type: sensor_msgs/msg/PointCloud2 + qos: + history: 1 + reliability: BEST_EFFORT + description: "Raw point cloud from lidar driver" + +services: + - name: ~/set_filter_params + type: perception_msgs/srv/SetFilterParams + +actions: + - name: ~/process_scan + type: perception_msgs/action/ProcessScan + +timers: + - name: process_timer + period_ms: 100 + description: "Main processing loop" +``` + +**Layer 2 — Composed System Schema** (launch-level declaration) + +```yaml +# system.yaml +schema_version: "1.0" +name: perception_pipeline + +nodes: + - ref: perception_pkg/lidar_processor + namespace: /robot1 + parameters: + voxel_size: 0.1 + remappings: + ~/raw_points: /lidar/points + + - ref: perception_pkg/object_detector + namespace: /robot1 + +connections: # Explicit wiring (optional, for validation) + - from: lidar_processor/~/filtered_points + to: object_detector/~/input_cloud +``` + +**Mixins — Composable Interface Fragments** (G15, Phase 2) + +Plugins that inject interfaces into a host node (e.g., nav2 controller +plugins adding parameters and topics via the node handle) are declared +via `mixins:`. Each mixin is itself an `interface.yaml` fragment +declaring the topics, parameters, and services it adds. The host node's +effective interface is the merge of its own declaration plus all mixins. + +```yaml +# nodes/follow_path/interface.yaml +node: + name: follow_path + package: nav2_controller + +parameters: + controller_plugin: + type: string + default_value: "dwb_core::DWBLocalPlanner" + +mixins: + - ref: dwb_core/dwb_local_planner # brings in max_vel_x, min_vel_y, etc. + - ref: nav2_costmap_2d/costmap # brings in costmap params + topics +``` + +This pattern (borrowed from Smithy's mixin concept) gives `rosgraph +lint` and `rosgraph monitor` the complete interface picture without +requiring the host node to redeclare everything its plugins add. +Requires the `$ref` / fragment system (G15) as a prerequisite. + +**Layer 3 — Observation Schema** (runtime-observed state) + +```yaml +# observed.yaml (auto-generated from running system) +node: + name: lidar_processor + package: perception_pkg + pid: 12345 + state: active # lifecycle state if managed + +publishers: + - topic: /robot1/lidar_processor/filtered_points + type: sensor_msgs/msg/PointCloud2 + qos: + reliability: RELIABLE + durability: TRANSIENT_LOCAL + depth: 5 + stats: + message_count: 14523 + frequency_hz: 9.98 + subscribers_matched: 2 + +# ... subscribers, services, actions, parameters with actual values +``` + +### 3.3 The InterfaceDescriptor (IR) + +The parsed, validated, fully-resolved representation of a node's +interface. Serializable as JSON for plugin communication: + +```json +{ + "schema_version": "1.0", + "node": { + "name": "lidar_processor", + "package": "perception_pkg", + "lifecycle": "managed" + }, + "parameters": [ + { + "name": "voxel_size", + "type": "double", + "default_value": 0.05, + "description": "Voxel grid filter leaf size (meters)", + "validation": { "bounds": [0.01, 1.0] }, + "read_only": false + } + ], + "publishers": [ + { + "topic": "~/filtered_points", + "resolved_topic": "/robot1/lidar_processor/filtered_points", + "message_type": "sensor_msgs/msg/PointCloud2", + "qos": { "history": 5, "reliability": "RELIABLE", "durability": "TRANSIENT_LOCAL" }, + "description": "Filtered and downsampled point cloud" + } + ] +} +``` + +Plugins receive this IR via stdin (or file path) and produce generated +files. + +### 3.4 `rosgraph generate` — Code Generation + +Translates `interface.yaml` into working node implementations. + +``` +┌─────────────────────────────────────────┐ +│ interface.yaml (per node) │ +└────────────────┬────────────────────────┘ + │ +┌────────────────▼────────────────────────┐ +│ Parser / Validator │ +│ 1. YAML parse │ +│ 2. JSON Schema validation (structural) │ +│ 3. Semantic validation (type refs, QoS)│ +│ 4. Produce InterfaceDescriptor (IR) │ +└────────────────┬────────────────────────┘ + │ + ┌──────────┼──────────────────┐ + │ │ │ +┌─────▼─────┐ ┌─▼──────────┐ ┌────▼──────┐ +│ C++ Plugin│ │Python Plugin│ │Docs Plugin│ +│ │ │ │ │ │ +│ - header │ │ - module │ │ - API ref │ +│ - reg.cpp │ │ - params │ │ - graph │ +│ - params │ │ - __init__ │ │ fragment│ +└───────────┘ └─────────────┘ └───────────┘ +``` + +**Build integration:** + +```cmake +cmake_minimum_required(VERSION 3.22) +project(perception_pkg) +find_package(rosgraph REQUIRED) +rosgraph_auto_package() +``` + +Under the hood, `rosgraph_auto_package()`: +1. Scans `nodes/` for subdirectories with `interface.yaml` +2. Validates each `interface.yaml` (structural + semantic) +3. Invokes C++ plugin → header, registration, params YAML +4. Delegates to `generate_parameter_library()` for parameters +5. Compiles and links +6. Installs interface YAMLs to `share//interfaces/` + +**Design decisions:** +- **Composition over inheritance.** Generated code holds a + `rclcpp::Node` (has-a), not inherits from it. Context struct is a + flat aggregation of generated components plus user state. +- **`generate_parameter_library` as backend.** Uses the existing, + widely-adopted parameter library rather than reimplementing. +- **Convention-over-configuration.** Directory layout (`nodes/`, + `interfaces/`, `launch/`, `config/`) determines behavior. + +### 3.5 `rosgraph lint` — Static Analysis + +Pre-launch verification of the ROS graph. + +``` +┌────────────────────────────────────┐ +│ Loaders │ +│ ┌───────────┐ ┌───────────────┐ │ +│ │interface. │ │ launch files │ │ +│ │yaml parser │ │ (clingwrap/ │ │ +│ │ │ │ native) │ │ +│ └─────┬─────┘ └──────┬────────┘ │ +│ └───────┬───────┘ │ +│ ▼ │ +│ ┌──────────────────────────┐ │ +│ │ Graph Model │ │ +│ └────────────┬─────────────┘ │ +│ ▼ │ +│ ┌──────────────────────────┐ │ +│ │ Analyzer DAG │ │ +│ │ (parallel execution) │ │ +│ │ │ │ +│ │ [topic_resolver] │ │ +│ │ ↓ │ │ +│ │ [type_mismatch_checker] │ │ +│ │ [qos_compat_checker] │ │ +│ │ [naming_convention] │ │ +│ │ [disconnected_subgraph] │ │ +│ │ [unused_node] │ │ +│ │ [launch_linter] │ │ +│ │ ... │ │ +│ └────────────┬─────────────┘ │ +│ ▼ │ +│ ┌──────────────────────────┐ │ +│ │ Post-Processing │ │ +│ │ - suppression filter │ │ +│ │ - severity assignment │ │ +│ │ - deduplication │ │ +│ │ - differential (new │ │ +│ │ issues only for CI) │ │ +│ └────────────┬─────────────┘ │ +│ ▼ │ +│ ┌──────────────────────────┐ │ +│ │ Output Formatters │ │ +│ │ text, JSON, SARIF, │ │ +│ │ GitHub, JUnit │ │ +│ └──────────────────────────┘ │ +└────────────────────────────────────┘ +``` + +**Analyzer definition pattern (from Go analysis framework):** + +```python +# Each analyzer is a value, not a subclass +topic_resolver = GraphAnalyzer( + name="topic_resolver", + doc="Resolves topic names to their message types across the graph", + requires=[], + result_type=TopicTypeMap, + run=resolve_topics, +) + +type_mismatch = GraphAnalyzer( + name="type_mismatch", + doc="Checks that all pub/sub on a topic agree on message type", + requires=[topic_resolver], + result_type=None, + run=check_type_mismatches, +) +``` + +See [§7 Lint Rule Codes](#7-lint-rule-codes) for the full rule system. + +**Launch file loading strategy:** + +Three loader paths, not mutually exclusive, phased by tractability: + +| Loader | Launch format | Extraction method | Phase | Limitations | +|---|---|---|---|---| +| YAML launch | YAML launch files | Direct parse | 1 | Limited expressiveness | +| `system.yaml` | Layer 2 schema | Direct parse | 1 | Requires manual authoring | +| Python launch AST | Standard `launch_ros` | AST pattern matching | 2 | Cannot handle dynamic logic (conditionals, loops) | + +- **YAML launch files** are statically parseable — `rosgraph lint` can + extract node declarations, remappings, and parameter overrides + directly. +- **Python launch files** are imperative and Turing-complete, but most + are declarative-in-spirit. AST-level pattern matching for common + patterns (`Node()`, `LaunchConfiguration()`, + `DeclareLaunchArgument()`) captures ~80% of real launch files without + execution. +- **Layer 2 `system.yaml`** (§3.2) sidesteps the problem entirely — + a static YAML file declaring the intended system composition. Launch + files still run the system, but `system.yaml` is the lint/monitor + source of truth for graph analysis. + +The lint diagram's "launch files" loader encompasses all three paths. + +### 3.6 `rosgraph monitor` — Runtime Reconciliation + +Kubernetes-style reconciliation loop comparing declared vs. observed +graph state. + +``` +┌─────────────────────────────────────────────────┐ +│ rosgraph monitor │ +│ │ +│ ┌───────────────┐ ┌──────────────────────┐ │ +│ │ Declared State │ │ Observed State │ │ +│ │ (from YAML / │ │ (from DDS │ │ +│ │ interface │ │ discovery) │ │ +│ │ files) │ │ │ │ +│ └───────┬───────┘ └──────────┬───────────┘ │ +│ │ │ │ +│ └──────────┬─────────────┘ │ +│ ▼ │ +│ ┌──────────────────────────────────┐ │ +│ │ Reconciliation Engine │ │ +│ │ │ │ +│ │ Level-triggered (not edge) │ │ +│ │ Idempotent │ │ +│ │ Requeue with backoff │ │ +│ └──────────────┬───────────────────┘ │ +│ ▼ │ +│ ┌──────────────────────────────────┐ │ +│ │ Diff Computation │ │ +│ │ │ │ +│ │ - Missing/extra nodes │ │ +│ │ - Missing/extra topics │ │ +│ │ - QoS mismatches │ │ +│ │ - Type mismatches │ │ +│ │ - Parameter drift │ │ +│ └──────────────┬───────────────────┘ │ +│ ▼ │ +│ ┌──────────────────────────────────┐ │ +│ │ Exporters │ │ +│ │ │ │ +│ │ - ROS topics (graph_diff msg) │ │ +│ │ - Prometheus /metrics endpoint │ │ +│ │ - Structured log output │ │ +│ │ - Alerting (via diagnostics) │ │ +│ └──────────────────────────────────┘ │ +└─────────────────────────────────────────────────┘ +``` + +**Reconciliation loop:** + +```python +while running: + declared = load_declared_graph(interface_files, launch_files) + observed = scrape_live_graph(dds_discovery) + + diff = compute_graph_diff(declared, observed) + + if diff.has_issues(): + publish_diff(diff) # ROS topic: /rosgraph/diff + update_metrics(diff) # Prometheus: rosgraph_missing_nodes, etc. + emit_diagnostics(diff) # /diagnostics for standard tooling + + publish_status(observed) # ROS topic: /rosgraph/status + + # Adaptive interval: faster when drifting, slower when stable + if diff.has_critical(): + sleep(1s) + else: + sleep(5s) +``` + +See [§8 Monitor Alert Rules](#8-monitor-alert-rules) for the alert +system. + +**Relationship to graph-monitor:** `rosgraph monitor` is a new +implementation, not an extension of the existing graph-monitor package. +graph-monitor's value is its `rmw_stats_shim` and +`rosgraph_monitor_msgs` message definitions — these are reusable +regardless of architecture. However, graph-monitor lacks the +reconciliation engine (declared vs. observed diff) that is the core of +`rosgraph monitor`, and retrofitting it would constrain the design. + +The integration path: adopt or align with graph-monitor's message +definitions (`rosgraph_monitor_msgs`), reimplement the graph scraping +and reconciliation, and offer to upstream the reconciliation capability +back to graph-monitor if its maintainers are interested. + +### 3.7 `rosgraph test` — Contract Testing + +Schema-driven verification of running nodes against their declarations. + +Three testing modes (modelled on Schemathesis, Dredd, and Pact): + +**Interface conformance** (Dredd model): Run a node, then +systematically verify its actual interface matches its +`interface.yaml`. Check every declared publisher is active, call every +declared service, verify every parameter exists with the declared type +and default. + +**Fuzz testing** (Schemathesis model): Auto-generate messages matching +declared subscriber types, publish them, verify the node produces +outputs on declared publisher topics with correct types. + +**Cross-node contract testing** (Pact model): Node A's +`interface.yaml` declares it subscribes to `/cmd_vel` (Twist). Node +B's `interface.yaml` declares it publishes `/cmd_vel` (Twist). The +contract test verifies they agree on type and QoS compatibility. + +### 3.8 `rosgraph docs` — Documentation Generation + +Auto-generated "Swagger UI for ROS nodes" — browsable API reference +docs from `interface.yaml`. Covers topics, services, actions, +parameters, QoS settings, and message type definitions. + +Output formats: Markdown (for GitHub Pages / docs.ros.org), HTML +(standalone), JSON (for embedding in other tools). + +### 3.9 `rosgraph breaking` — Breaking Change Detection + +Compares two versions of `interface.yaml` and classifies changes: + +| Classification | Examples | +|---|---| +| **Breaking** | Removed topic, changed message type, removed parameter, incompatible QoS change | +| **Dangerous** | Changed QoS (may affect connectivity), narrowed parameter range | +| **Safe** | Added optional parameter, added new publisher, widened parameter range | + +Modelled on `buf breaking` and `graphql-inspector`. + +### 3.10 `rosgraph discover` — Runtime-to-Spec Generation + +Introspects a running node via DDS discovery and generates an +`interface.yaml` from the observed interface. The "slice of cake" +brownfield adoption path. + +```bash +# Generate interface.yaml from a running node +rosgraph discover /lidar_processor -o nodes/lidar_processor/interface.yaml +``` + +Modelled on Terraform's `import` command. + +### 3.11 Configuration + +**`rosgraph.toml`** — single configuration file for all subcommands: + +```toml +[lint] +select = ["TOP", "SRV", "QOS", "GRF"] # enable these rule families +ignore = ["NME001"] # except this specific rule + +[lint.per-package-ignores] +"generated_*" = ["ALL"] # skip generated packages +"*_test" = ["GRF002"] # allow unused nodes in tests + +[generate] +plugins = ["cpp", "python"] +out_dir = "generated" + +[output] +format = "text" # text | json | sarif | github + +[ci] +new-only = true # only new issues (differential) +base-branch = "main" +``` + +### 3.12 Multi-Workspace Analysis + +ROS 2 workspaces overlay each other (e.g., `ros_base` underlay + your +packages + a vendor overlay). When `rosgraph lint` analyzes your +workspace, it needs interface information from packages in the underlay. + +The solution follows the Go analysis framework's per-package fact +caching pattern: installed `interface.yaml` files in +`share//interfaces/` (placed there by `rosgraph_auto_package()` +at install time) serve as cached analysis artifacts. `rosgraph lint` +reads these from the underlay without re-analyzing underlay packages, +only analyzing packages in the current workspace. + +This is a Phase 2 concern. Phase 1 assumes a single workspace. + +### 3.13 AI & Tooling Integration + +`interface.yaml` and the `InterfaceDescriptor` IR (§3.3) are +machine-readable contracts describing a node's complete API. This +makes them natural integration points for AI-assisted development +tools and IDE infrastructure. + +**AI as IR consumer.** The JSON-serialized `InterfaceDescriptor` +contains everything an LLM needs to understand a node's interface: +topics, types, QoS, parameters, lifecycle state. An AI agent can read +this to generate implementation code, write tests, suggest fixes, or +answer questions about the system — without parsing source code. + +**MCP server.** A Model Context Protocol server exposing graph state, +lint results, and interface schemas enables AI coding tools (Claude +Code, Cursor, Copilot) to query the ROS graph as structured context. +"What topics does the perception pipeline publish?" answered from +the graph model, not from grep. + +**AI-assisted discovery.** `rosgraph discover` (§3.10) generates +`interface.yaml` from a running system. The raw output from DDS +discovery is complete but lacks descriptions, rationale, and grouping. +An LLM can refine the generated spec — inferring descriptions from +topic names and message types, suggesting QoS profiles based on +message patterns, and grouping related interfaces. + +**Language Server Protocol (LSP).** An LSP server for `interface.yaml` +enables IDE features beyond JSON Schema validation: hover for message +type definitions, go-to-definition for `$ref` targets, inline +diagnostics from `rosgraph lint`, and cross-file rename support. This +benefits both human developers and AI agents operating within IDE +contexts. + +**Natural language to spec.** The constrained schema makes +`interface.yaml` a tractable generation target for LLMs. "I need a +node that subscribes to a lidar point cloud, filters it, and publishes +the result" produces a valid `interface.yaml` that `rosgraph generate` +can immediately scaffold into working code. + +These are not Phase 1 deliverables, but the architecture should not +preclude them. The IR-based plugin protocol (§3.3) and structured +output formats (JSON, SARIF) are the key enablers — they exist for +code generation and CI, but AI consumers are a natural extension. + +### 3.14 Scale & Fleet Considerations + +§3.12 covers multi-workspace analysis. This section addresses +concerns beyond a single developer's workstation. + +**Interface ownership.** In multi-team organizations, `interface.yaml` +files are shared contracts. The owner is typically the node author +(they define the interface), but downstream consumers depend on it. +Changes require coordination. rosgraph supports this via: +- `rosgraph breaking` (§3.9) — automated detection of breaking + changes in CI, blocking merges that break downstream consumers. +- Installed interfaces in `share//interfaces/` — downstream + teams depend on published interfaces without pulling source code. +- Semantic versioning alignment — the breaking/dangerous/safe + classification maps to semver: breaking = major, dangerous = minor + (review required), safe = patch. + +**Multi-robot systems.** The `system.yaml` (Layer 2, §3.2) supports +namespaced node instances (`namespace: /robot1`). For multi-robot +systems, each robot's graph is a namespaced instance of the same +`system.yaml`. Fleet-level analysis — "which robots are running +interface version X?" — is out of scope for Phase 1–2 but the +architecture supports it: `rosgraph monitor` on each robot publishes +graph snapshots that a fleet-level aggregator can collect. + +**Fleet monitoring.** `rosgraph monitor` (§3.6) runs per-robot. For +fleet-scale observability, the monitor's Prometheus exporter (M7) +enables standard fleet dashboards via Grafana. The `/rosgraph/diff` +topic on each robot can be bridged to a central system for aggregated +drift analysis. The architecture deliberately uses standard +observability patterns (Prometheus metrics, structured logs, +diagnostics topics) rather than inventing fleet-specific +infrastructure. + +**Performance targets.** Build-time targets are stated in DP7 (100 +packages in 5 seconds). Runtime targets for `rosgraph monitor`: +- Reconciliation cycle: < 500ms for a 200-node system +- Memory overhead: < 50MB resident for graph state +- CPU: < 5% of one core at steady-state (5s scrape interval) + +These are design targets, not commitments — they guide architectural +decisions (e.g., choosing Rust for the diff engine). + +### 3.15 colcon Integration + +`colcon` uses a `VerbExtensionPoint` plugin system — any Python package +can register new verbs via `setup.cfg` entry points. Existing examples: +`colcon-clean` adds `colcon clean`, `colcon-cache` adds `colcon cache`. + +The architecture is **`rosgraph` as standalone tool, `colcon-rosgraph` +as thin workspace wrapper**: + +``` +colcon-rosgraph (Python, verb plugin) + └── delegates to → rosgraph (standalone binary) +``` + +This mirrors how `colcon-cmake` shells out to `cmake` — the colcon verb +handles workspace iteration, package ordering, and parallel execution; +the core tool handles single-package analysis. + +**What maps naturally to colcon verbs:** + +| Command | colcon verb | Notes | +|---|---|---| +| `rosgraph generate` | — | Already runs via `rosgraph_auto_package()` in `colcon build` | +| `rosgraph test` | — | Already runs via CTest in `colcon test` | +| `rosgraph lint` | `colcon lint` | Iterates packages in dependency order, parallel per-package lint | +| `rosgraph docs` | `colcon docs` | Generates docs per package, aggregates into workspace docs | +| `rosgraph discover` | `colcon discover` | Generates `interface.yaml` for all running nodes | +| `rosgraph breaking` | `colcon breaking` | Checks all packages against their previous interface versions | + +**What doesn't fit:** + +`rosgraph monitor` is a long-running daemon, not a build-and-exit verb. +It stays as a standalone command (or a `ros2 launch` node). + +**Why both CLIs:** + +- `colcon lint` for the workspace workflow — lint all packages, respect + dependency order, parallel execution, workspace-level reporting. +- `rosgraph lint path/to/interface.yaml` for single-file use, CI + pipelines, and environments without colcon. + +**Language independence.** The colcon plugin is always Python (colcon +requires it), but it delegates to `rosgraph` via subprocess — so the +core tool's language is unconstrained. Rust, Python, or hybrid all work +identically. The colcon integration does not factor into the language +choice (§5). + +The colcon plugin is a Phase 2 deliverable — Phase 1 focuses on the +standalone `rosgraph` tool. The plugin is trivial once the core tool +exists. + +--- + +## 4. Phasing + +### Phase 1 — Foundation + +Deliver the core schema, basic code generation, and highest-value +static + runtime checks. + +**Schema & generate:** +- G1-G10 (existing cake features — stabilize and adopt) +- G11 (lifecycle nodes — blocks nav2/ros2_control adoption) +- G14 (schema versioning — needed before v1.0) + +**Lint (P0 rules):** +- L1 (topic type mismatch), L2 (QoS compatibility), L3 (disconnected + subgraph) +- L5 (SARIF output), L6 (differential analysis) + +**Monitor (P0 features):** +- M1 (declared-vs-observed diff), M2 (missing node alerting), + M5 (graph snapshots) + +### Phase 2 — Adoption Enablers + +Lower barriers for existing codebases. Fill out the rule set. + +**Schema & generate:** +- G12 (timers), G13 (nested parameters), G15 (mixins) +- O1 (`rosgraph docs`), O2 (`rosgraph discover`) + +**Lint (P1 rules + infrastructure):** +- L4 (launch validation), L7 (naming), L8 (unused node), + L9 (parameter validation), L10 (circular deps) +- L11 (inline suppression), L12 (per-package config), + L13 (`--add-noqa`), L14 (semantic validation) + +**Monitor (P1 features):** +- M3 (QoS drift), M4 (runtime type mismatch), M6 (topic stats), + M8 (unexpected node), M9 (health diagnostics) + +### Phase 3 — Scale the Toolchain + +Enable community extension and advanced analysis. + +**Schema & generate:** +- G16 (plugin architecture), G17 (callback groups), + G19 (system composition schema) +- O3 (`rosgraph breaking`), O4 (`rosgraph test`) + +**Lint:** +- L15 (interface coverage) + +**Monitor:** +- M7 (Prometheus endpoint), M10 (adaptive scrape), + M11 (lifecycle state) + +### Phase 4 — Ecosystem Integration + +Future-proofing and niche use cases. + +- G18 (middleware bindings) +- O5 (`rosgraph policy` — SROS 2 security policies) +- M12 (runtime interface coverage) + +### Adoption Path + +rosgraph is unlikely to reach `ros_core` initially — that requires +broad consensus and a high stability bar. A more realistic progression: + +1. **`ros-tooling` organization** (where graph-monitor already lives) — + institutional backing, CI infrastructure, release process. +2. **REP (ROS Enhancement Proposal)** for the `interface.yaml` schema — + formalizes the declaration format as a community standard. +3. **docs.ros.org tutorial integration** — if the "write your first + node" tutorial uses `interface.yaml`, every new ROS developer learns + it from day one. This is the highest-leverage adoption path. +4. **`ros_core` proposal** — after demonstrated adoption across multiple + distros, propose for inclusion in a future distribution. + +--- + +## 5. Language Choice + +The implementation language is an open decision for the WG. The +trade-offs are structural, not preferential. + +### Option A: Rust + +Follows Ruff's model. Speed as an architectural property. + +| Axis | Assessment | +|---|---| +| Performance | Best. Single-pass analysis, zero-cost abstractions, no GC pauses. Achieves the "100 packages in 5s" target. | +| Contribution barrier | Highest. Most ROS contributors know C++/Python, not Rust. | +| Ecosystem fit | Moderate. `rclrs` exists but is not tier-1. CLI tools don't need ROS client library integration. | +| Deployment | Single static binary. No runtime dependencies. | +| Plugin story | WASM plugins (Extism) or process-based (protoc model). | + +### Option B: Python + +Follows the ROS 2 ecosystem convention. + +| Axis | Assessment | +|---|---| +| Performance | Weakest. 10-100x slower than Rust for analysis workloads. May not meet performance targets. | +| Contribution barrier | Lowest. Every ROS developer knows Python. | +| Ecosystem fit | Best. cake is Python. `launch_ros` is Python. Direct reuse of existing parsing libraries. | +| Deployment | Requires Python runtime. `pip install` or ROS package. | +| Plugin story | Native Python plugins. Trivial to write. | + +### Option C: Rust core + Python bindings + +Hybrid via PyO3. Performance-critical core (parsing, graph model, diff +engine, lint rules) in Rust; Python CLI and plugin layer on top. + +| Axis | Assessment | +|---|---| +| Performance | Near-Rust for analysis; Python overhead for CLI/plugin dispatch only. | +| Contribution barrier | Moderate. Core contributors need Rust; plugin authors use Python. | +| Ecosystem fit | Good. Python-facing API integrates with ROS ecosystem. | +| Deployment | Python package with native extension. Requires build toolchain for distribution. | +| Plugin story | Python plugins (native) + WASM plugins (for sandboxing). | + +### Decision factors + +The choice depends on which constraint the WG prioritizes: +- If **speed** is the binding constraint → Rust or hybrid +- If **community contribution** is the binding constraint → Python +- If **both matter** → hybrid, accepting the build complexity + +Note: the colcon integration (§3.15) does not constrain this choice. +The `colcon-rosgraph` plugin is always Python but delegates to the +`rosgraph` binary via subprocess, so the core tool can be any language. + +--- + +## 6. Feature List + +### Schema & Code Generation (`rosgraph generate`) + +| # | Feature | Priority | Description | +|---|---------|----------|-------------| +| G1 | YAML interface declaration | P0 | Single `interface.yaml` per node declaring all ROS 2 entities | +| G2 | JSON Schema validation | P0 | Structural validation with IDE autocompletion via YAML Language Server | +| G3 | C++ code generation | P0 | Typed context, pub/sub/srv/action wrappers, component registration | +| G4 | Python code generation | P0 | Dataclass context, pub/sub/srv/action wrappers | +| G5 | Parameter generation | P0 | Delegates to `generate_parameter_library` (backward-compatible) | +| G6 | QoS declaration | P0 | Required for pub/sub, supports all DDS QoS policies | +| G7 | Parameterized QoS | P0 | `${param:name}` references in QoS fields | +| G8 | Dynamic topic names | P0 | `${param:name}` and `${for_each_param:name}` | +| G9 | Composition pattern | P0 | Has-a `Node`, not is-a `Node` | +| G10 | Zero-boilerplate build | P0 | `rosgraph_auto_package()` CMake macro | +| G11 | Lifecycle node support | P0 | `lifecycle: managed` in node spec | +| G12 | Timer declarations | P1 | `timers:` section with period, callback name | +| G13 | Nested parameters | P1 | Hierarchical parameter structures (parity with gen_param_lib) | +| G14 | Schema versioning | P1 | `schema_version` field with migration tooling | +| G15 | Mixins / shared fragments | P1 | `$ref` to common interface fragments | +| G16 | Plugin architecture | P2 | IR-based pipeline, standalone plugins per language | +| G17 | Callback group declarations | P2 | `callback_groups:` with entity assignment | +| G18 | Middleware bindings | P3 | Protocol-specific config (DDS, Zenoh) | +| G19 | System composition schema | P2 | Multi-node graph declaration (`system.yaml`, Layer 2) | + +### Static Analysis (`rosgraph lint`) + +| # | Feature | Priority | Description | +|---|---------|----------|-------------| +| L1 | Topic type mismatch detection | P0 | Flag when pub and sub on same topic disagree on message type | +| L2 | QoS compatibility checking | P0 | Flag incompatible QoS profiles (reliability, durability, deadline) | +| L3 | Disconnected subgraph detection | P0 | Flag nodes/topics with no connections | +| L4 | Launch file validation | P0 | Detect undefined node refs, invalid remaps, unresolved substitutions | +| L5 | SARIF / CI output | P0 | Structured output for GitHub Security tab, PR annotations | +| L6 | Differential analysis | P0 | `--new-only` reports only issues introduced since base branch | +| L7 | Naming convention enforcement | P1 | Check names against configurable patterns | +| L8 | Unused node detection | P1 | Flag nodes declared but not in any launch config | +| L9 | Parameter validation | P1 | Check values against declared types, ranges, validators | +| L10 | Circular dependency detection | P1 | Flag service/action chains that could deadlock | +| L11 | Inline suppression | P1 | `# rosgraph: noqa: TOP001` in launch/YAML files | +| L12 | Per-package configuration | P1 | Override rules per package via `rosgraph.toml` | +| L13 | `--add-noqa` for adoption | P1 | Generate suppression comments for all existing issues | +| L14 | Semantic validation | P1 | Full type resolution, QoS compatibility checks | +| L15 | Interface coverage reporting | P2 | Which declared topics/services are exercised in tests | + +### Runtime Monitoring (`rosgraph monitor`) + +| # | Feature | Priority | Description | +|---|---------|----------|-------------| +| M1 | Declared-vs-observed graph diff | P0 | Compare declared interfaces against live DDS discovery | +| M2 | Missing node alerting | P0 | Alert when a declared node is not present | +| M3 | QoS drift detection | P0 | Alert when observed QoS differs from declared | +| M4 | Type mismatch detection (runtime) | P0 | Alert when observed types differ from declaration | +| M5 | Graph snapshot publishing | P0 | Periodic `rosgraph_monitor_msgs/Graph` snapshots | +| M6 | Topic statistics | P1 | Message rate, latency, queue depth per topic | +| M7 | Prometheus /metrics endpoint | P1 | Export graph metrics for Grafana dashboards | +| M8 | Unexpected node detection | P1 | Alert on nodes present but not declared | +| M9 | Health diagnostics integration | P1 | Publish to `/diagnostics` for standard ROS tooling | +| M10 | Adaptive scrape interval | P2 | Faster scraping when drift detected, slower when stable | +| M11 | Lifecycle state monitoring | P2 | Track lifecycle transitions against expectations | +| M12 | Interface coverage tracking | P2 | Which declared interfaces are exercised at runtime | + +### Other Subcommands + +| # | Feature | Subcommand | Priority | Description | +|---|---------|------------|----------|-------------| +| O1 | Documentation generation | `rosgraph docs` | P1 | Auto-generated API reference from schema | +| O2 | Runtime-to-spec discovery | `rosgraph discover` | P1 | Introspect running nodes → `interface.yaml` | +| O3 | Breaking change detection | `rosgraph breaking` | P2 | Detect breaking interface changes across releases | +| O4 | Contract testing | `rosgraph test` | P2 | Schema-driven verification of running nodes | +| O5 | Security policy generation | `rosgraph policy` | P3 | Auto-generate SROS 2 policies from schema | + +--- + +## 7. Lint Rule Codes + +Rule codes use hierarchical prefix system (modelled on Ruff). Rules +can be selected at any granularity: `TOP` (all topic rules), +`TOP001` (specific rule). + +| Prefix | Category | Example rules | +|--------|----------|---------------| +| `TOP` | Topic rules | `TOP001` type mismatch, `TOP002` no subscribers, `TOP003` naming convention | +| `SRV` | Service rules | `SRV001` unmatched client, `SRV002` type mismatch | +| `ACT` | Action rules | `ACT001` unmatched client, `ACT002` type mismatch | +| `PRM` | Parameter rules | `PRM001` missing default, `PRM002` type violation, `PRM003` undeclared | +| `QOS` | QoS rules | `QOS001` reliability mismatch, `QOS002` durability incompatible, `QOS003` deadline violation | +| `LCH` | Launch rules | `LCH001` undefined node ref, `LCH002` invalid remap, `LCH003` unresolved substitution | +| `GRF` | Graph-level rules | `GRF001` disconnected subgraph, `GRF002` unused node, `GRF003` circular dependency | +| `NME` | Naming rules | `NME001` topic naming convention, `NME002` node naming convention | +| `SAF` | Safety rules | `SAF001` insufficient redundancy, `SAF002` single point of failure, `SAF003` unmanaged safety node | +| `TF` | TF frame rules | `TF001` undeclared frame_id, `TF002` broken frame chain | + +**Rule lifecycle:** preview → stable → deprecated → removed. New rules +always enter as preview. + +**Fix applicability:** Safe (preserves semantics), unsafe (may alter +behaviour), display-only (suggestion). Per-rule override via config. + +--- + +## 8. Monitor Alert Rules + +| Alert | Condition | Grace period | Severity | +|---|---|---|---| +| `NodeMissing` | Declared node not observed | 10s | critical | +| `UnexpectedNode` | Observed node not declared | 30s | warning | +| `TopicMissing` | Declared topic not present | 5s | critical | +| `QoSMismatch` | Declared QoS ≠ observed QoS | 0s | error | +| `TypeMismatch` | Declared msg type ≠ observed | 0s | critical | +| `ThroughputDrop` | Rate < expected minimum | 30s | warning | + +Grace periods prevent flapping during startup and transient states. +All thresholds (grace period, severity) are configurable via +`rosgraph.toml` — see §11.3 for safety-critical overrides. + +--- + +## 9. Existing ROS 2 Ecosystem + +### 9.1 Maturity Matrix + +| Tool | Stars | Contributors | Last active | Maturity | Bus factor | +|---|---|---|---|---|---| +| **generate_parameter_library** | 353 | 41 | 2026-02 | Production | Healthy | +| **ros2_tracing** | 237 | 30 | 2026-02 | Production (QL1) | Healthy | +| **topic_tools** | 126 | 25 | 2025-08 | Mature | Healthy | +| **launch_ros** | 78 | 71 | 2026-02 | Core infrastructure | Healthy | +| **cake** | 36 | 1 | 2026-02 | Early-stage | 1 (risk) | +| **graph-monitor** | 31 | 3 | 2025-11 | Mid-stage | Low | +| **nodl** | 10 | 7 | 2022-11 | Dormant | N/A | +| **clingwrap** | 9 | 1 | 2026-02 | Early-stage | 1 (risk) | +| **breadcrumb** | 6 | 1 | 2026-02 | Early-stage | 1 (risk) | +| **HAROS** (ROS 1) | 197 | — | 2021-09 | Abandoned | N/A | +| **CARET** | 97 | 18 | active | Mature (Tier IV) | Healthy | + +### 9.2 Tool Assessments + +**cake** — Declarative code generation. `interface.yaml` → C++ and +Python node scaffolding. Functional pattern (has-a Node, not is-a +Node). The fundamental bet is correct: making the interface declaration +the source of truth for code generation is the only way to prevent +schema-code drift. Core design decisions (YAML-driven, +composition-based, schema-validated, codegen-first) are sound. +cake's author is a WG member; rosgraph's Layer 1 schema builds +directly on cake's format, and G1–G10 represent stabilizing cake's +capabilities under the rosgraph umbrella — addressing the bus-factor +risk while preserving the design. Gaps: no lifecycle support, no +timers, no nested parameters, no formal IR, no plugin architecture, +no runtime-to-spec generation. + +**generate_parameter_library** — The most mature tool in the space. +Production-proven in MoveIt2 and ros2_control. Rich validation. The +unification path: the `parameters:` section of `interface.yaml` IS the +`generate_parameter_library` format (already demonstrated in cake). +rosgraph delegates to `generate_parameter_library` at build time rather +than reimplementing parameter generation. The key invariant: a +standalone gen_param_lib YAML file works as-is when placed in the +`parameters:` block of `interface.yaml`. Ownership transfer to +`ros-tooling` would be ideal but is not required — schema compatibility +is sufficient. + +**graph-monitor** — Official ROSGraph WG backing. Publishes structured +graph messages. The `rmw_stats_shim` approach is architecturally sound. +Gap: can report *what exists* but not *what's wrong* — no comparison +against a declared spec. + +**breadcrumb + clingwrap** — Proves the concept of static graph +extraction from launch files. The tight coupling to clingwrap's +non-standard launch API is the primary concern. Static analysis should +work with standard `launch_ros` patterns. + +**nodl** — Dormant since 2022. Correct problem identification but +fatal flaw: no code generation. Superseded by cake's YAML approach. +Key lesson: **a description format without code generation is a +non-starter.** + +**ros2_tracing + CARET** — The most mature dynamic analysis tools. +QL1 certification, production-proven at Tier IV. Complementary to +rosgraph: tracing provides instrumentation, CARET provides latency +analysis, rosgraph provides graph structure analysis. + +### 9.3 Gap Analysis + +| Category | Capability | Current tool | Status | +|---|---|---|---| +| **Schema** | Node interface declaration | cake / nodl / gen_param_lib | cake early; nodl dead; gpl params-only | +| **Codegen** | Static graph from launch files | breadcrumb + clingwrap | Early-stage, solo dev | +| **Runtime** | Runtime graph monitoring | graph-monitor | Mid-stage, institutional | +| **Runtime** | Runtime tracing | ros2_tracing | Mature, production | +| **Runtime** | Latency analysis | CARET | Mature, Tier IV | +| **Runtime** | Graph visualisation | Foxglove, Dear RosNodeViewer | Mature but live-only | +| **Runtime** | **Graph diff (expected vs. actual)** | **Nothing** | **Major gap** | +| **Static** | **Graph linting (pre-launch)** | **Nothing** | **Major gap** | +| **Static** | **QoS static analysis** | breadcrumb (partial) | Early-stage | +| **Static** | **CI graph validation** | **Nothing** | **Major gap** | +| **Docs** | **Node API documentation** | **Nothing** (hand-written only) | **Major gap** | +| — | **Behavioural properties** | **Nothing** (HPL was ROS 1) | **Major gap** | + +--- + +## 10. Prior Art + +Organized by what we borrow, not by framework. Each framework appears +once at its primary contribution. + +### 10.1 Schema Design + +#### AsyncAPI + +The closest structural match to ROS topics. Version 3 cleanly separates +channels, messages, operations, and components at the top level. + +**What to borrow:** +- **Structural separation.** `publishers`, `subscribers`, `services`, + `actions`, `parameters` as peer top-level sections. +- **`components` + `$ref` pattern.** Define QoS profiles or common + parameter sets once, reference everywhere. +- **Trait system.** Define a `reliable_sensor` trait with QoS settings, + apply to multiple publishers. Traits merge via JSON Merge Patch + (RFC 7386). +- **Protocol bindings.** Core schema stays middleware-agnostic; + DDS-specific QoS, Zenoh settings, or shared-memory config in a + `bindings:` block. +- **Parameterized addresses.** Topic name templates + (`sensors/{robot_name}/lidar`) map to ROS 2 namespace/remapping and + `${param:name}` syntax. + +**Gaps:** No services (as typed req/res pair), no actions, no +parameters, no lifecycle, no timers, no TF frames. Single-application +scope (which is actually the right scope for a node interface). + +#### Smithy (AWS) + +Protocol-agnostic interface definition language. Shapes decorated with +traits. + +**What to borrow:** +- **Typed, composable traits** for extensible metadata — the most + powerful metadata mechanism surveyed: + ``` + @qos(reliability: "reliable", depth: 10) + @lifecycle(managed: true) + @parameter_range(min: 0.0, max: 10.0) + @frame_id("base_link") + ``` +- **Mixins** for shared structure. A `lifecycle_diagnostics` mixin adds + a diagnostics publisher and period parameter to any node that + includes it. +- **Resource lifecycle operations** — maps to ROS 2 lifecycle node + transitions. + +#### CUE + +Constraint-based configuration language where types and values are the +same thing. Not a codegen tool — a validation tool. + +**What to borrow:** +- **Constraints as types.** `voxel_size: float & >=0.01 & <=1.0`. The + JSON Schema equivalent (`minimum`, `maximum`, `enum`) is already + used by the existing `interface.schema.yaml`. +- **Incremental constraints.** Base schema + deployment-specific + overlays (e.g., production QoS profiles layered onto a base + `interface.yaml`). +- **Configuration validation.** Validate that launch parameter + overrides are compatible with a node's declared interface. + +### 10.2 Pipeline & Code Generation + +#### Protocol Buffers / Buf CLI + +The single most important architectural lesson: **an intermediate +representation (IR) between parsing and generation**. + +``` +interface.yaml ──> [Parser/Validator] ──> InterfaceDescriptor (IR) + ├──> [Plugin: C++] ──> scaffolding + ├──> [Plugin: Python] ──> scaffolding + ├──> [Plugin: Docs] ──> API reference + └──> [Plugin: Launch] ──> templates +``` + +**What to borrow:** +- **IR-based plugin protocol.** Standalone executables consuming a + serialized `InterfaceDescriptor` via stdin/file. Community members + write `rosgraph-gen-rust` without touching the core codebase. +- **Config-driven generation** (`buf.gen.yaml` pattern): + ```yaml + version: 1 + plugins: + - name: cpp + out: generated/cpp + options: { lifecycle: managed } + - name: python + out: generated/python + ``` +- **Validation as separate layers.** Structural (does the YAML parse?) + → semantic (do referenced types exist?) → breaking (did the interface + change incompatibly?). Maps to `rosgraph lint`, `rosgraph validate`, + `rosgraph breaking`. +- **Deterministic, reproducible output.** Same inputs → byte-identical + output. CI can verify generated code is up to date. + +**What to borrow from Buf CLI specifically:** +- `buf lint` — configurable schema linting with ~50 rules by category. + Config-driven rule selection. +- `buf breaking` — breaking change detection between schema versions. +- Integrated toolchain: `buf generate`, `buf lint`, `buf breaking`, + `buf format` as subcommands of one tool. + +#### TypeSpec (Microsoft) + +**What to borrow:** +- **Multi-emitter architecture.** One spec, many outputs: + ``` + interface.yaml ──> C++ emitter ──> node_interface.hpp + ──> Python emitter ──> interface.py + ──> Docs emitter ──> node_api_reference.md + ──> Launch emitter ──> default_launch.py + ──> Graph emitter ──> rosgraph_monitor_msgs/NodeInterface + ``` +- **Emitter-specific validation.** Each emitter adds its own checks + (e.g., C++ emitter warns about names that produce invalid C++ + identifiers). + +#### OpenAPI + +**What to borrow:** +- **The "Swagger UI" experience.** Auto-generated interactive + documentation from a schema. A "Swagger UI for ROS nodes" where every + node has browsable API docs showing topics, services, actions, + parameters, QoS, and message type definitions — generated from + `interface.yaml`. +- **JSON Schema integration.** OpenAPI 3.1 aligned fully with JSON + Schema. The existing `interface.schema.yaml` (JSON Schema Draft + 2020-12) is the right foundation. + +### 10.3 Static Analysis Architecture + +#### Ruff + +A Python linter written in Rust. Relevant not for Python linting but as +the **best-in-class architecture for building a rule-based analysis +tool**. + +**What to borrow:** + +| Ruff pattern | rosgraph equivalent | +|---|---| +| Rule enum + compile-time registry | `Rule` enum: `TOP001`, `SRV001`, `QOS001`, `GRF001` | +| Hierarchical prefix codes | `TOP` (topic), `SRV` (service), `ACT` (action), `QOS`, `GRF` (graph) | +| Single-pass traversal | Build graph model once, run all rules in one walk | +| Safe/unsafe fix classification | Safe: add missing QoS. Unsafe: rename topic. Display-only: suggest restructure | +| Preview → stable lifecycle | Same graduation for new rules | +| Per-file-ignores | Per-package-ignores, per-launch-file-ignores | +| Inline suppression | `# rosgraph: noqa: TOP001` | +| SARIF output | GitHub Security tab integration | +| Monolithic, no plugins initially | All rules built-in. WASM plugins later | +| Zero-config defaults | Small, high-confidence default rule set | +| `--add-noqa` for gradual adoption | Essential for existing ROS workspaces | + +**Key architectural lesson:** Speed is an architectural property, not an +optimisation. Rust + hand-written parser + single-pass + parallel +package processing + content caching + compile-time codegen. + +#### Go Analysis Framework + +The gold standard for pluggable static analysis architecture. Used by +`go vet`, gopls, and golangci-lint. + +**What to borrow:** + +``` +GraphAnalyzer { + name: str + doc: str + requires: [GraphAnalyzer] # horizontal deps + result_type: Type | None # typed output for dependent analyzers + fact_types: [Fact] # cross-package facts + run: (GraphPass) → (result, [Diagnostic]) +} + +GraphPass { + graph: ComputationGraph # the full graph model + node: NodeInterface # current node under analysis + types: MessageTypeDB # all known msg/srv/action types + qos: QoSProfileDB # QoS profiles in the graph + result_of: {Analyzer: Any} # results from required analyzers + report: (Diagnostic) → void + import_fact: (scope, Fact) → bool + export_fact: (scope, Fact) → void +} +``` + +Key patterns: +1. **Analyzers as values, not subclasses** — trivially composable +2. **Pass as abstraction barrier** — same analyzer in CLI, IDE, CI +3. **Horizontal dependencies** via `Requires`/`ResultOf` — typed data + flow between analyzers +4. **Vertical facts** for cross-package analysis — cached per-package + results enabling separate modular analysis +5. **Action graph** — 2D grid (analyzer x package), independent actions + execute in parallel + +#### golangci-lint + +**What to borrow:** +- **Meta-linter pattern.** One CLI, one config, one output format + wrapping many analyzers. +- **Shared parse.** All analyzers share one AST/model parse. +- **Post-processing pipeline.** `noqa` filter → exclusion rules → + severity assignment → deduplication → output formatting. +- **Differential analysis.** `new-from-merge-base: main` reports only + issues in code changed since the base branch. Critical for CI + adoption in large codebases. + +#### Spectral + +**What to borrow:** +- **YAML-native lint rules** that work directly on `interface.yaml` + without language-specific parsing. Custom rulesets in YAML — a + robotics engineer can author a rule without knowing Rust or C++. + Low barrier to writing new rules. + +### 10.4 Runtime Monitoring Architecture + +#### OpenTelemetry + +Collector pipeline: Receiver → Processor → Exporter. Connectors join +pipelines and enable signal type conversion. + +**What to borrow:** +- **Pipeline architecture** for `rosgraph monitor`. +- **Auto-instrumentation.** Two complementary paths: + - *Runtime observation* (zero-code): DDS discovery provides the graph + without modifying any node. + - *Code-generated instrumentation*: rosgraph-generated code embeds + topic stats, heartbeats, structured logging. + - The **three-way comparison** (declared vs. runtime-observed vs. + self-reported) catches issues that any two-way comparison misses. + +#### Prometheus + +**What to borrow:** +- **Pull model.** Periodic scraping produces consistent point-in-time + snapshots. Absence of data is itself a signal (node is down). +- **Alerting rules** with `for` durations to prevent flapping. +- **Metric types mapping:** + + | Prometheus type | ROS topic statistics equivalent | + |---|---| + | Counter | Messages published (total), dropped messages | + | Gauge | Active subscribers, queue depth, alive nodes | + | Histogram | Inter-arrival times, message sizes, latency distribution | + +#### Kubernetes Controllers + +**What to borrow:** +- **Level-triggered reconciliation** (not edge-triggered). React to the + *current difference* between desired and actual state, not to + individual change events. If an event is missed, the next + reconciliation still catches the drift. +- **Idempotent.** Running reconciliation twice with the same state + produces the same diff and alerts. +- **Requeue with backoff.** After detecting drift, recheck sooner (1s). + If drift persists, escalate. +- **Status reporting.** Maintained separately from the declared spec, + enabling external tools to query current state independently. + +### 10.5 Contract Testing & Verification + +| Framework | What it does | What to borrow for `rosgraph test` | +|---|---|---| +| **Schemathesis** | Fuzz a live API against its OpenAPI spec. Auto-generates test cases from schema. | Fuzz a running node against `interface.yaml` — auto-generate messages matching declared types, verify outputs. | +| **Dredd** | Start a live server, send requests matching the spec, validate responses. The spec IS the test plan. | Run a node, systematically verify its interface matches declaration. Call every service, check every publisher. | +| **Pact** | Consumer-driven contract testing. Consumer declares expectations; provider verifies. | Cross-node contract verification: Node A subscribes to `/cmd_vel` (Twist), Node B publishes it. Verify they agree on type. | +| **gRPC health + reflection** | Standardized health checking + runtime introspection of services/methods. | Health reporting interface that rosgraph-generated nodes expose automatically. Runtime introspection vs. declared interface. | +| **graphql-inspector** | Schema diff (breaking/dangerous/safe). Coverage: which fields are actually queried. | Interface coverage: "which declared topics are exercised in tests?" Schema diff between interface versions. | + +### 10.6 ROS Domain Prior Art: HAROS + +The High-Assurance ROS framework (University of Minho, 2016–2021). The +only tool that accomplished Goals 3–4 for ROS, but only for ROS 1. + +**Pipeline:** Package discovery → CMake parsing → launch file parsing → +source code parsing (libclang for C++, limited Python AST) → +computation graph assembly → plugin-based analysis → JSON export. + +**The metamodel.** Formal classes for the ROS graph: `Node`, +`NodeInstance`, `Topic`, `Service`, `Parameter`, plus typed link classes +(`PublishLink`, `SubscribeLink`, etc.) carrying source conditions and +dependency sets. This metamodel is HAROS's most transferable +contribution. + +**HPL (HAROS Property Language).** Behavioural properties for +message-passing systems: +``` +globally: no /cmd_vel {linear.x > 1.0} # speed limit +globally: /bumper causes /stop_cmd # response +globally: /cmd_vel requires /trajectory within 5s # precedence +``` + +HPL drove three verification paths from a single spec: model checking +(Electrum/Alloy), runtime monitors (generated), and property-based +testing (Hypothesis strategies). + +**Why it died for ROS 2.** The extraction pipeline assumes catkin, +`rospack`, XML launch files, `ros::NodeHandle`. ROS 2 changed +everything. The maintainer closed ROS 2 support as *wontfix*. + +**What to borrow:** Metamodel, HPL's scope+pattern+event structure, +plugin separation (source-level vs. model-level), one spec → multiple +verification modes. + +**What to do differently:** Use declarations (`interface.yaml`) as +primary source of truth (not source code parsing); support ROS 2 +concepts HAROS never had (QoS, lifecycle, components, actions, DDS +discovery). + +--- + +## 11. Safety & Certification + +rosgraph is not a safety tool — it is a development and verification +tool that produces artifacts useful in safety cases. This section maps +rosgraph capabilities to the evidence types required by safety +standards. + +### 11.1 Relevant Standards + +| Standard | Domain | How rosgraph helps | +|---|---|---| +| **IEC 61508** | General functional safety | Design verification evidence (graph analysis), runtime monitoring | +| **ISO 26262** | Automotive | Interface specification (`interface.yaml` as design artifact), static verification | +| **IEC 62304** | Medical device software | Software architecture documentation, traceability | +| **DO-178C** | Aerospace | Requirements traceability, structural coverage analysis | +| **ISO 13482** | Service robots | Interface documentation, runtime monitoring | +| **ISO 21448 (SOTIF)** | Safety of intended functionality | Graph analysis for identifying missing/unexpected interfaces | + +### 11.2 Artifact-to-Evidence Mapping + +| rosgraph artifact | Evidence type | Useful for | +|---|---|---| +| `interface.yaml` | Software architecture description | Design phase documentation | +| `rosgraph lint` SARIF output | Static analysis results | Verification evidence | +| `rosgraph monitor` logs | Runtime verification evidence | Validation phase | +| `rosgraph test` results | Interface conformance evidence | Integration testing | +| `rosgraph breaking` output | Change impact analysis | Change management | +| `rosgraph docs` output | API documentation | Design review | + +### 11.3 Configurable Safety Levels + +Monitor alert grace periods (§8) and severity levels must be +configurable for safety-critical deployments: + +```toml +[monitor.alerts] +NodeMissing = { grace_period_ms = 1000, severity = "critical" } # 1s for surgical robot +UnexpectedNode = { grace_period_ms = 5000, severity = "error" } +TopicMissing = { grace_period_ms = 500, severity = "critical" } +``` + +The defaults in §8 are tuned for general robotics. Safety-critical +deployments override them via `rosgraph.toml`. + +### 11.4 Behavioral Properties (Future) + +Structural analysis (Phase 1–2) proves the graph is correctly wired — +a necessary precondition for behavioral safety. Behavioral analysis +(Phase 3+) proves temporal and causal properties: + +``` +globally: /emergency_stop causes /motor_disable within 100ms +globally: no /cmd_vel {linear.x > max_speed} +globally: /heartbeat absent_for 500ms causes /safe_stop +``` + +This capability, inspired by HAROS HPL (§10.6), is where the deeper +safety value lies. The structural graph model (§3.1) is designed to +be extensible to behavioral annotations without schema redesign. + +### 11.5 Safety-Relevant Lint Rules (Future) + +| Rule | Description | Phase | +|---|---|---| +| `SAF001` | Critical subscriber has < N publishers (no redundancy) | 2 | +| `SAF002` | Single point of failure in graph topology | 2 | +| `SAF003` | Safety-critical node is not lifecycle-managed | 2 | +| `TF001` | Declared `frame_id` not published by any node in graph | 2 | +| `TF002` | Frame chain broken (no transform path between declared frames) | 3 | + +These rules are not in Phase 1 but the analyzer architecture (§3.5) +supports adding them without architectural changes. + +--- + +## 12. Scope & Limitations + +### When Not to Use rosgraph + +rosgraph adds value when the cost of interface bugs exceeds the cost +of maintaining declarations. This trade-off favors rosgraph in +multi-node systems, team environments, and production deployments. +It does not favor rosgraph in every context: + +- **Quick prototyping.** If you're experimenting with a single node + and will throw it away next week, `interface.yaml` is overhead. + Use standard `rclcpp` / `rclpy` directly. +- **Single-node packages.** A package with one node and no + cross-package interfaces gets minimal lint value. The code + generation may still be worthwhile for parameter validation. +- **Highly dynamic interfaces.** Nodes that create publishers and + subscribers at runtime based on dynamic conditions (e.g., a + plugin host that discovers its interface at startup) are outside + scope (DP12). rosgraph can declare the static portion and flag + the dynamic portion as unexpected, but it cannot generate code + for interfaces it doesn't know about at build time. + +### Known Limitations + +**Spec-code drift for business logic.** Code generation covers the +structural skeleton (pub/sub creation, parameter declaration, lifecycle +transitions). Business logic is hand-written. If a developer adds an +undeclared publisher inside a callback, `rosgraph lint` won't catch it +at build time — only `rosgraph monitor` flags it at runtime as +`UnexpectedTopic`. This is a fundamental limitation of any +declaration-based approach: the declaration describes the intended +interface, not the implementation. + +**Launch file coverage.** Python launch files are Turing-complete. +AST pattern matching (§3.5) handles common declarative patterns but +cannot resolve dynamic logic (conditionals based on environment +variables, loops generating node sets). `system.yaml` (Layer 2) is +the escape hatch for systems that need full static analyzability. + +**Ecosystem bootstrapping.** rosgraph's cross-package analysis (type +mismatch detection, contract testing) requires multiple packages to +have `interface.yaml`. The single-package value proposition is code +generation and parameter validation. Cross-package value grows with +adoption. `rosgraph discover` (§3.10) lowers the barrier by generating +specs from running systems, but the generated specs require human +review and refinement. + +**Scope of this proposal.** This document covers 51 features across +7 subcommands. Not all will be built. Phase 1 (§4) is the commitment +— the minimum viable tool that delivers value. Later phases are +contingent on adoption and contributor capacity. + +--- + +## 13. Resolved Questions + +The following questions were raised during the proposal drafting process +and have been resolved. Answers are integrated into the relevant +sections of this document. + +| # | Question | Resolution | Section | +|---|----------|------------|---------| +| 1 | Dynamic interfaces | Out of scope — rosgraph covers declared interfaces only (Design Principle 12). Undeclared runtime interfaces are flagged as `UnexpectedTopic` by monitor. | §2 | +| 2 | Launch substitution evaluation | Three-path loader strategy: YAML launch (direct parse), `system.yaml` (static), Python launch AST (pattern matching). | §3.5 | +| 3 | Behavioural properties | Structural first (Phase 1–2), behavioural later (Phase 3+) if adoption warrants it (Design Principle 13). | §2 | +| 4 | `generate_parameter_library` unification | Keep as standalone, maintain schema compatibility. rosgraph delegates to gen_param_lib at build time. | §9.2 | +| 5 | Multi-workspace analysis | Per-package fact caching via installed `interface.yaml` files. Phase 2 concern. | §3.12 | +| 6 | Launch file extraction without clingwrap | Partial AST extraction for standard `launch_ros` patterns, with `system.yaml` as fully-static alternative. | §3.5 | +| 7 | Relationship to graph-monitor | New implementation. Adopt graph-monitor's message definitions, reimplement scraping + reconciliation. | §3.6 | +| 8 | Mixin pattern | `mixins:` section referencing interface fragments. Host's effective interface = own declaration + all mixins merged. | §3.2 | +| 9 | Adoption path | `ros-tooling` org → REP for schema → docs.ros.org tutorials → `ros_core` (long-term). | §4 | +| 10 | Declaration scope | Structural (node interfaces) only for Phase 1–2. Behavioural scope deferred to Phase 3+. | §2 | From fa0478043d63101a44f929905e498cc46cf47e6f Mon Sep 17 00:00:00 2001 From: Luke Sy Date: Mon, 23 Feb 2026 22:23:44 +1100 Subject: [PATCH 2/5] Trim FAQ, add system.yaml convergence note MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - FAQ: reduce from 880 to ~360 lines, add General section with cross-cutting questions, keep 2-3 essential per audience - FAQ: fix all PROPOSAL.md cross-references to ROSGRAPH.md - FAQ: add launch file / param config convergence question - ROSGRAPH §3.2: add "Toward a single source of truth" note on system.yaml replacing launch files and parameter configs - ROSGRAPH §12: remove resolved questions section (internal notes) Signed-off-by: Luke Sy --- docs/FAQ.md | 877 ++++++++++++----------------------------------- docs/ROSGRAPH.md | 33 +- 2 files changed, 222 insertions(+), 688 deletions(-) diff --git a/docs/FAQ.md b/docs/FAQ.md index b01d3c6..583cd5c 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -9,171 +9,139 @@ questions that matter to you. ## Table of Contents +0. [General](#0-general) 1. [New ROS Developer](#1-new-ros-developer) -2. [AI-Assisted Developer](#2-ai-assisted-developer) -3. [Engineering Lead / System Integrator / DevOps](#3-engineering-lead--system-integrator--devops) -4. [Safety-Critical Engineer](#4-safety-critical-engineer) -5. [MoveIt / nav2 / Popular Module User](#5-moveit--nav2--popular-module-user) -6. [The Skeptic](#6-the-skeptic) -7. [Package Maintainer / ROS Governance](#7-package-maintainer--ros-governance) -8. [Educator / University Researcher](#8-educator--university-researcher) -9. [Embedded / Resource-Constrained Developer](#9-embedded--resource-constrained-developer) +2. [Engineering Lead / System Integrator / DevOps](#2-engineering-lead--system-integrator--devops) +3. [MoveIt / nav2 / Popular Module User](#3-moveit--nav2--popular-module-user) +4. [AI-Assisted Developer](#4-ai-assisted-developer) +5. [Package Maintainer / ROS Governance](#5-package-maintainer--ros-governance) +6. [Educator / University Researcher](#6-educator--university-researcher) +7. [Embedded / Resource-Constrained Developer](#7-embedded--resource-constrained-developer) +8. [The Skeptic](#8-the-skeptic) +9. [Safety-Critical Engineer](#9-safety-critical-engineer) --- -## 1. New ROS Developer +## 0. General ### What problem does rosgraph solve? -ROS 2 doesn't verify that your nodes are wired correctly until -runtime — and often not even then. Type mismatches between publishers -and subscribers fail silently. QoS incompatibilities drop connections -with no error. Parameter renames break launch files with no build -error. +When you connect ROS 2 nodes together, mistakes are invisible. If one +node sends a `Twist` message but another node expects a +`TwistStamped`, nothing warns you — the subscriber just never receives +data. If you misspell a topic name in a launch file, the node launches +fine but sits there doing nothing. You end up staring at +`ros2 topic list` wondering why nothing is connected. + +rosgraph catches these wiring mistakes before you even launch your +system. You describe what each node publishes, subscribes to, and what +settings it needs in a short YAML file. Then `rosgraph lint` checks +that everything fits together — like a spell checker, but for your +ROS graph. -rosgraph catches these at build time. See [PROPOSAL.md §1, "The -Problem, Concretely"](PROPOSAL.md#the-problem-concretely) for four -real-world examples. +See [ROSGRAPH.md §1, "The Problem, +Concretely"](ROSGRAPH.md#the-problem-concretely) for four real-world +examples. ### How much do I need to learn? -Write one `interface.yaml` per node (~15 lines for a basic pub/sub -node). Run three commands: +One file per node (`interface.yaml`, about 15 lines) and three +commands: ```bash -rosgraph generate . # generates code -rosgraph lint . # checks for issues -rosgraph monitor # watches the running system +rosgraph generate . # creates starter code from your YAML +rosgraph lint . # checks for wiring mistakes +rosgraph monitor # watches the running system for problems ``` -The YAML schema has IDE autocompletion via JSON Schema. See the [Quick -Start](PROPOSAL.md#quick-start-what-it-looks-like) for a complete -minimal example. - -### What do I stop doing when I adopt rosgraph? - -- **Stop writing pub/sub boilerplate.** Publisher creation, subscriber - setup, parameter declaration — all generated from `interface.yaml`. -- **Stop manually syncing parameters between code and launch files.** - `interface.yaml` is the single source of truth for parameter names, - types, defaults, and validation ranges. -- **Stop debugging silent QoS mismatches.** `rosgraph lint` catches - incompatible QoS profiles before you launch. -- **Stop wondering if your launch files reference the right nodes.** - `rosgraph lint` validates node refs, remappings, and parameter - overrides. - -### Will error messages actually be helpful? +Your editor will autocomplete the YAML fields for you — no need to +memorize the format. See the [Quick +Start](ROSGRAPH.md#quick-start-what-it-looks-like) for a complete +example. -Error quality is a design requirement, not an afterthought. The -architecture follows Ruff's model ([PROPOSAL.md -§10.3](PROPOSAL.md#103-static-analysis-architecture)): - -- Every diagnostic includes a rule code (`TOP001`), the location in - `interface.yaml`, and what's wrong. -- Safe fixes can be auto-applied. Unsafe fixes are flagged but not - auto-applied. -- SARIF output enables inline annotations in GitHub PRs. -- `--add-noqa` generates suppression comments for existing issues, - so you can adopt gradually without noise. - -### Do I need to learn YAML schema syntax? +### What's the overhead? -Not really. If your editor has the YAML Language Server (most do), -you get autocompletion, inline validation, and hover docs from the -JSON Schema ([PROPOSAL.md §6, G2](PROPOSAL.md#6-feature-list)). Write -a few fields, let the editor fill in the structure. +Per node: one `interface.yaml` file (~15-30 lines). Most of it is +information you're already specifying in code (topic names, message +types, QoS settings, parameter names) — `interface.yaml` centralizes +it. ---- +What you get back: +- No pub/sub boilerplate (generated) +- No parameter declaration boilerplate (generated via + `generate_parameter_library`) +- Pre-launch graph validation +- Runtime graph monitoring +- Auto-generated API documentation -## 2. AI-Assisted Developer +The net line-count change is typically negative for nodes with +parameters. -### How does rosgraph work with AI coding tools? +### What about my launch files and parameter configs? -`interface.yaml` is a machine-readable contract — exactly what LLMs -are good at consuming and generating. The `InterfaceDescriptor` IR -([PROPOSAL.md §3.3](PROPOSAL.md#33-the-interfacedescriptor-ir)) is a -JSON blob containing a node's complete API: topics, types, QoS, -parameters, lifecycle state. An AI agent reads this to understand what -a node does, generate implementation code, write tests, or suggest -fixes — without parsing C++ or Python source. +`system.yaml` (Layer 2) overlaps heavily with both — all three +describe which nodes run, with what parameters, and with what +remappings. The long-term direction is convergence: `system.yaml` +becomes the graph spec, the parameter config, *and* the launch +description in one file. `rosgraph generate` emits a runnable launch +file from the same spec that `rosgraph lint` validates — no drift +between what you analyze and what you run. -See [PROPOSAL.md §3.13](PROPOSAL.md#313-ai--tooling-integration) for -the full AI integration design. +For projects with multiple deployment configurations (sim, real, test), +each gets its own `system.yaml`, replacing both the per-config launch +file and the per-config parameter YAML. See [ROSGRAPH.md +§3.2](ROSGRAPH.md#32-schema-layers). -### Can I use `rosgraph generate` as an agent tool? +### Won't the spec just drift from reality like NoDL? -Yes. An AI agent writing a ROS node can: -1. Generate `interface.yaml` from a natural language description -2. Run `rosgraph generate .` as a tool call to get type-safe - scaffolding -3. Write only the business logic into the generated skeleton -4. Run `rosgraph lint .` to verify the graph is correct +NoDL died because it was a pure description format — no code +generation. Maintaining a spec that doesn't produce anything is +thankless work. -This avoids the common failure mode of LLMs hallucinating ROS -boilerplate (wrong QoS defaults, missing component registration, -incorrect parameter declaration). +`interface.yaml` generates code. If you change the spec, the generated +code changes. If you change the code without changing the spec, +`rosgraph monitor` flags the discrepancy at runtime. The two-way +binding (codegen + runtime monitoring) is what prevents the drift +that killed NoDL. -### Will there be an MCP server? +The honest limitation: business logic is hand-written. If a developer +adds an undeclared publisher inside a callback, `rosgraph lint` won't +catch it at build time. `rosgraph monitor` catches it at runtime as +`UnexpectedTopic`. See [ROSGRAPH.md +§12](ROSGRAPH.md#12-scope--limitations). -It's architecturally planned ([PROPOSAL.md -§3.13](PROPOSAL.md#313-ai--tooling-integration)). An MCP server -would expose: -- Graph state (which nodes exist, what they publish/subscribe) -- Lint results (current issues in the workspace) -- Interface schemas (what a specific node expects) -- Resolved topic names (after remapping/namespacing) +--- -This lets Claude Code, Cursor, or Copilot answer "what topics does -the perception pipeline publish?" from structured data, not grep. +## 1. New ROS Developer -### Can an AI generate `interface.yaml` from a description? +### What does rosgraph do for me? -Yes — the constrained schema makes this tractable. The schema has -~10 top-level keys with well-defined types. "I need a node that -subscribes to a lidar point cloud, filters it, and publishes the -result" produces a valid `interface.yaml` that `rosgraph generate` -immediately scaffolds. +- **Writes the repetitive code.** Creating publishers, subscribers, + and declaring parameters — `rosgraph generate` handles this from + your YAML file. You write only the interesting part (what your node + actually *does*). +- **Catches mistakes early.** Mismatched message types, misspelled + topic names, incompatible connection settings — found in seconds, + not after a 30-second launch-debug-relaunch cycle. +- **Keeps settings in one place.** Parameter names, types, and default + values live in `interface.yaml` instead of scattered across your + code, launch files, and README. -`rosgraph discover` ([PROPOSAL.md -§3.10](PROPOSAL.md#310-rosgraph-discover--runtime-to-spec-generation)) -can also generate `interface.yaml` from a running node, which an LLM -can then refine — adding descriptions, suggesting QoS rationale, and -grouping related interfaces. +### Will error messages make sense? -### What about IDE / LSP integration? +Yes — this is a design priority. Each error tells you: -Phase 1 delivers JSON Schema validation (IDE autocompletion for -`interface.yaml`). A dedicated LSP server would add: -- Hover for message type definitions -- Go-to-definition for `$ref` targets -- Inline diagnostics from `rosgraph lint` -- Cross-file rename support +- **Where:** which file and line has the problem +- **What:** a plain description of what's wrong +- **How to fix it:** a suggested correction, auto-applied when safe -This benefits both human developers and AI agents operating within -IDE contexts. See [PROPOSAL.md -§3.13](PROPOSAL.md#313-ai--tooling-integration). +No cryptic stack traces. No silent failures. See [ROSGRAPH.md +§10.3](ROSGRAPH.md#103-static-analysis-architecture) for the error +design. --- -## 3. Engineering Lead / System Integrator / DevOps - -### Who owns an `interface.yaml`? - -The node author defines it. Downstream consumers depend on the -installed version in `share//interfaces/`. Changes are -coordinated via: - -- `rosgraph breaking` ([PROPOSAL.md - §3.9](PROPOSAL.md#39-rosgraph-breaking--breaking-change-detection)) - — automated detection of breaking changes in CI, blocking merges - that break downstream consumers. -- Installed interfaces — downstream teams depend on published - interfaces without pulling source code. -- Semantic versioning alignment — breaking = major, dangerous = minor, - safe = patch. - -See [PROPOSAL.md §3.14](PROPOSAL.md#314-scale--fleet-considerations). +## 2. Engineering Lead / System Integrator / DevOps ### How does this scale to hundreds of packages? @@ -182,33 +150,28 @@ See [PROPOSAL.md §3.14](PROPOSAL.md#314-scale--fleet-considerations). with parallel per-package processing and content caching. - **Multi-workspace analysis:** Installed `interface.yaml` files in underlays serve as cached facts. Only your workspace is analyzed, - not the entire underlay. See [PROPOSAL.md - §3.12](PROPOSAL.md#312-multi-workspace-analysis). + not the entire underlay. See [ROSGRAPH.md + §3.12](ROSGRAPH.md#312-multi-workspace-analysis). - **Differential analysis:** `--new-only` reports only issues introduced since the base branch. No noise from existing code. -- **Per-package configuration:** Override lint rules per package via - `rosgraph.toml`. ### I compose nodes from multiple vendors. How does rosgraph help? -`system.yaml` (Layer 2 schema, [PROPOSAL.md -§3.2](PROPOSAL.md#32-schema-layers)) declares the intended system +`system.yaml` (Layer 2 schema, [ROSGRAPH.md +§3.2](ROSGRAPH.md#32-schema-layers)) declares the intended system composition — which nodes, which namespaces, which parameter overrides, which remappings. `rosgraph lint` validates the composed graph: -- **Type mismatches** across package boundaries (Node A publishes - `Twist`, Node B subscribes expecting `TwistStamped`) +- **Type mismatches** across package boundaries - **QoS incompatibilities** between a vendor's publisher and your subscriber - **Disconnected subgraphs** — nodes that should be connected but aren't due to a namespace or remapping error -- **Invalid remappings** — remaps pointing to nonexistent topics If a vendor doesn't ship `interface.yaml`, use `rosgraph discover` -([PROPOSAL.md -§3.10](PROPOSAL.md#310-rosgraph-discover--runtime-to-spec-generation)) -to generate one from a running instance of the vendor's node. The -discovered spec becomes your integration contract. +([ROSGRAPH.md +§3.10](ROSGRAPH.md#310-rosgraph-discover--runtime-to-spec-generation)) +to generate one from a running instance of the vendor's node. ### How does rosgraph fit into CI? @@ -218,178 +181,26 @@ rosgraph is CI-first by design (Design Principle 8): # GitHub Actions example - name: Lint graph run: rosgraph lint . --output-format sarif --new-only --base main - # SARIF output → GitHub Security tab, PR annotations - name: Check breaking changes run: rosgraph breaking --base main - # Exit code 1 if breaking changes detected - name: Run contract tests run: rosgraph test - # Schema-driven interface conformance tests ``` Output formats: `text`, `json`, `sarif` (GitHub Security tab), -`github` (Actions annotations), `junit` (test reports). All -configurable via `rosgraph.toml` or `--output-format`. See -[PROPOSAL.md §3.11](PROPOSAL.md#311-configuration). - -For brownfield adoption, `--add-noqa` generates inline suppression -comments for all existing issues, creating a clean baseline. You -don't get 500 warnings on your first PR. - -### What about the colcon build workflow? - -`colcon-rosgraph` (Phase 2) is a thin colcon verb plugin that delegates -to the standalone `rosgraph` binary. It adds `colcon lint`, -`colcon docs`, `colcon discover`, and `colcon breaking` — iterating -packages in dependency order with parallel execution. See [PROPOSAL.md -§3.15](PROPOSAL.md#315-colcon-integration). - -Phase 1 works standalone: `rosgraph lint .` in any directory. No -colcon dependency required. - -### What about fleet-level monitoring? - -`rosgraph monitor` runs per-robot. For fleet-scale observability: - -- The Prometheus `/metrics` exporter (M7) enables standard Grafana - dashboards aggregated across the fleet. -- The `/rosgraph/diff` topic on each robot can be bridged to a - central system for aggregated drift analysis. -- The architecture uses standard observability patterns (Prometheus, - structured logs, `/diagnostics`) rather than inventing fleet-specific - infrastructure. - -Runtime performance targets: reconciliation < 500ms for 200 nodes, -< 50MB memory, < 5% CPU at steady state. See [PROPOSAL.md -§3.14](PROPOSAL.md#314-scale--fleet-considerations). - -### Can we enforce org-specific conventions? - -Yes. `rosgraph.toml` supports per-package rule overrides, custom -naming patterns, and rule selection. The Spectral-inspired YAML-native -rule system ([PROPOSAL.md -§10.3](PROPOSAL.md#103-static-analysis-architecture)) means a -robotics engineer can write custom rules without knowing Rust or C++. - -### Does rosgraph handle launch file complexity? - -Three strategies, phased by tractability ([PROPOSAL.md -§3.5](PROPOSAL.md#35-rosgraph-lint--static-analysis)): - -1. **YAML launch files** — fully parseable, Phase 1 -2. **`system.yaml`** — static composition schema, fully analyzable, - Phase 1 -3. **Python launch AST** — pattern matching for `Node()`, - `LaunchConfiguration()`, etc., Phase 2 - -Python launch files with complex conditionals, loops, or dynamically -computed node sets can't be fully statically analyzed. `system.yaml` -is the escape hatch for systems that need full analyzability. - ---- - -## 4. Safety-Critical Engineer - -### Does rosgraph help with certification? - -rosgraph is not a safety tool — it's a development and verification -tool that produces artifacts useful in safety cases. See [PROPOSAL.md -§11](PROPOSAL.md#11-safety--certification) for the full mapping. - -Key artifacts: - -| rosgraph artifact | Evidence type | -|---|---| -| `interface.yaml` | Software architecture description | -| `rosgraph lint` SARIF output | Static analysis results | -| `rosgraph monitor` logs | Runtime verification evidence | -| `rosgraph test` results | Interface conformance evidence | -| `rosgraph breaking` output | Change impact analysis | - -### Which safety standards does this map to? - -IEC 61508 (general functional safety), ISO 26262 (automotive), -IEC 62304 (medical), DO-178C (aerospace), ISO 13482 (service robots), -and ISO 21448 / SOTIF. See [PROPOSAL.md -§11.1](PROPOSAL.md#111-relevant-standards) for how rosgraph maps to -each. - -### What about behavioral properties? - -Phase 1-2 covers structural properties: type matches, QoS -compatibility, graph connectivity. This is a necessary precondition -for behavioral safety — you can't reason about message timing if the -messages aren't connected correctly. - -Behavioral analysis (Phase 3+) adds temporal and causal properties, -inspired by HAROS HPL: - -``` -globally: /emergency_stop causes /motor_disable within 100ms -globally: /heartbeat absent_for 500ms causes /safe_stop -``` - -See [PROPOSAL.md §11.4](PROPOSAL.md#114-behavioral-properties-future). - -### Are monitor alert thresholds configurable? - -Yes. The defaults (10s for `NodeMissing`, 30s for `UnexpectedNode`) -are tuned for general robotics. Safety-critical deployments override -them via `rosgraph.toml`: - -```toml -[monitor.alerts] -NodeMissing = { grace_period_ms = 1000, severity = "critical" } -TopicMissing = { grace_period_ms = 500, severity = "critical" } -``` - -See [PROPOSAL.md §11.3](PROPOSAL.md#113-configurable-safety-levels). - -### Are there safety-specific lint rules? - -Planned for Phase 2-3: - -| Rule | Description | -|---|---| -| `SAF001` | Critical subscriber has < N publishers (no redundancy) | -| `SAF002` | Single point of failure in graph topology | -| `SAF003` | Safety-critical node is not lifecycle-managed | -| `TF001` | Declared `frame_id` not published by any node | -| `TF002` | Broken frame chain (no transform path) | - -The analyzer architecture supports adding these without changes. -See [PROPOSAL.md §11.5](PROPOSAL.md#115-safety-relevant-lint-rules-future). - -### What about determinism and real-time guarantees? - -`rosgraph monitor` is an observation tool, not a safety-critical -component. It runs in its own process, does not interfere with the -monitored system, and its failure does not affect the system under -observation. It is not designed to be real-time safe. - -For hard real-time requirements, the monitor's output (Prometheus -metrics, diagnostics topics) can be consumed by a separate real-time -safety monitor. rosgraph provides the graph model; the real-time -enforcement layer is a separate concern. - -### What about audit trails? - -`rosgraph lint` produces SARIF output with timestamps, tool version, -rule versions, and results. This can be stored as CI artifacts for -audit purposes. A dedicated audit log format for `rosgraph monitor` -(continuous verification evidence) is not in Phase 1 but the -structured output (JSON, SARIF) makes it straightforward to add. +`github` (Actions annotations), `junit` (test reports). See +[ROSGRAPH.md §3.11](ROSGRAPH.md#311-configuration). --- -## 5. MoveIt / nav2 / Popular Module User +## 3. MoveIt / nav2 / Popular Module User ### Does rosgraph work with nav2's plugin system? -Yes, via the mixin system ([PROPOSAL.md -§3.2](PROPOSAL.md#32-schema-layers)). Plugins that inject interfaces +Yes, via the mixin system ([ROSGRAPH.md +§3.2](ROSGRAPH.md#32-schema-layers)). Plugins that inject interfaces into a host node are declared as mixins: ```yaml @@ -398,222 +209,57 @@ node: name: follow_path package: nav2_controller -parameters: - controller_plugin: - type: string - default_value: "dwb_core::DWBLocalPlanner" - mixins: - - ref: dwb_core/dwb_local_planner # brings in max_vel_x, etc. - - ref: nav2_costmap_2d/costmap # brings in costmap params + - ref: dwb_core/dwb_local_planner + - ref: nav2_costmap_2d/costmap ``` The host's effective interface = its own declaration + all mixin -interfaces merged. This gives `rosgraph lint` and `rosgraph monitor` -the complete picture. - -Mixins are Phase 2 (G15). Phase 1 works for nodes without plugins. - -### What happens when I switch plugins (e.g., DWB → MPPI)? - -You update the mixin reference in `interface.yaml`. The effective -interface changes at build time, and `rosgraph generate` produces new -scaffolding. This is a build-time concern — `rosgraph lint` validates -the graph with the new plugin's interface. - -If the plugin is selected at runtime via parameter, this falls under -"dynamic interfaces" (Design Principle 12) — rosgraph declares the -static portion and `rosgraph monitor` flags unexpected interfaces. - -### Does rosgraph validate TF frames? - -Planned for Phase 2-3. `TF001` checks that declared `frame_id` values -are published by some node in the graph. `TF002` checks that frame -chains are connected (no broken transform paths). See [PROPOSAL.md -§11.5](PROPOSAL.md#115-safety-relevant-lint-rules-future). - -TF is the #1 source of silent bugs in ROS 2 navigation and -manipulation. This is high-value but requires the graph model to -include TF publisher information, which depends on `interface.yaml` -having a `frame_id` annotation. +interfaces merged. Mixins are Phase 2 (G15). Phase 1 works for nodes +without plugins. ### What about `generate_parameter_library` compatibility? -Full compatibility is a non-negotiable design principle ([PROPOSAL.md -§2, DP9](PROPOSAL.md#2-design-principles)). The `parameters:` section +Full compatibility is a non-negotiable design principle ([ROSGRAPH.md +§2, DP9](ROSGRAPH.md#2-design-principles)). The `parameters:` section of `interface.yaml` IS the `generate_parameter_library` format. A standalone gen_param_lib YAML file works as-is when placed in `interface.yaml`. rosgraph delegates to gen_param_lib at build time. -See [PROPOSAL.md §9.2](PROPOSAL.md#92-tool-assessments). - -### Can rosgraph lint my existing launch files? - -Phase 1 supports YAML launch files (direct parse) and `system.yaml` -(Layer 2 schema). Phase 2 adds Python launch file AST analysis for -standard `launch_ros` patterns — `Node()`, `LaunchConfiguration()`, -`DeclareLaunchArgument()`. - -Limitations: Python launch files that use conditionals, loops, or -dynamically computed node sets cannot be fully statically analyzed. -`system.yaml` is the escape hatch for systems that need full static -analyzability. See [PROPOSAL.md -§3.5](PROPOSAL.md#35-rosgraph-lint--static-analysis). - -### Does this work with Gazebo / Isaac Sim? - -Simulators expose ROS interfaces that look identical to real hardware. -`rosgraph discover` can introspect a simulated system and generate -`interface.yaml`. `rosgraph monitor` can verify that a simulated -system matches the declared graph. `rosgraph lint` doesn't -distinguish between real and simulated — it validates the graph model. - -### What about message type changes across ROS distros? - -`interface.yaml` references message types by name (e.g., -`geometry_msgs/msg/Twist`). Message type compatibility across distros -is a ROS infrastructure concern, not a rosgraph concern. rosgraph -validates that publishers and subscribers on the same topic agree on -type — it doesn't validate that the type definition itself is -compatible across distros. - -`rosgraph breaking` can detect when a type reference changes between -versions of an `interface.yaml`. +See [ROSGRAPH.md §9.2](ROSGRAPH.md#92-tool-assessments). --- -## 6. The Skeptic - -### I write good tests. Why do I need another YAML file? - -Tests catch type mismatches and QoS issues at launch time — after you -wait 30 seconds for the stack to start, watch it fail, read the logs, -and figure out which of 40 nodes has the wrong type. Then you fix it, -rebuild, relaunch, and wait again. - -`rosgraph lint` catches the same bugs in under 5 seconds, before -launch, in CI, before anyone else has to debug it. It's the difference -between "tests catch bugs" and "bugs never reach the test phase." - -### What's the overhead? - -Per node: one `interface.yaml` file (~15-30 lines). Most of it is -information you're already specifying in code (topic names, message -types, QoS settings, parameter names) — `interface.yaml` centralizes -it. - -What you get back: -- No pub/sub boilerplate (generated) -- No parameter declaration boilerplate (generated via - `generate_parameter_library`) -- Pre-launch graph validation -- Runtime graph monitoring -- Auto-generated API documentation - -The net line-count change is typically negative for nodes with -parameters. - -### What if rosgraph can't express what I need? - -Escape hatches: -- **`# rosgraph: noqa: TOP001`** — suppress specific lint rules per - line. -- **Per-package ignores** — exclude entire packages from specific - rules via `rosgraph.toml`. -- **Undeclared interfaces** — if your code creates publishers that - aren't in `interface.yaml`, the code still works. `rosgraph monitor` - flags them as `UnexpectedTopic` (a warning, not an error). -- **Composition pattern** — generated code holds a `rclcpp::Node` - (has-a), not inherits from it. You always have access to the - underlying node for anything the schema can't express. - -See [PROPOSAL.md §12](PROPOSAL.md#12-scope--limitations) for the full -limitations discussion. - -### Does code generation add runtime overhead? - -The composition pattern (has-a Node) adds one level of indirection -compared to direct inheritance. This is a pointer dereference — single -nanoseconds. The generated pub/sub wrappers are thin forwarding calls. -No virtual dispatch is added that wouldn't already exist in the ROS -client library. +## 4. AI-Assisted Developer -The parameter validation code (from `generate_parameter_library`) runs -at parameter-set time, not in the hot path. - -### What happens when only my package has an `interface.yaml`? - -You still get: -- **Code generation** — less boilerplate in your node -- **Parameter validation** — runtime type and range checking -- **Self-documentation** — your node's API is machine-readable - -Cross-package value (type mismatch detection, QoS compatibility -checking, contract testing) grows with adoption. `rosgraph discover` -lets you generate specs for neighboring packages from a running -system, bootstrapping the cross-package graph incrementally. - -### This proposal has 51 features. Is this realistic? - -Phase 1 ([PROPOSAL.md §4](PROPOSAL.md#4-phasing)) is the commitment: -~12 features covering core schema, basic code generation, and -highest-value lint and monitor rules. Later phases are contingent on -adoption. - -The tool builds on existing work — cake for code generation, -`generate_parameter_library` for parameters, `graph-monitor` message -definitions for runtime. Phase 1 is stabilizing and unifying existing -pieces, not building from scratch. - -### Won't the spec just drift from reality like NoDL? - -NoDL died because it was a pure description format — no code -generation. Maintaining a spec that doesn't produce anything is -thankless work. +### How does rosgraph work with AI coding tools? -`interface.yaml` generates code. If you change the spec, the generated -code changes. If you change the code without changing the spec, -`rosgraph monitor` flags the discrepancy at runtime. The two-way -binding (codegen + runtime monitoring) is what prevents the drift -that killed NoDL. +`interface.yaml` is a machine-readable contract — exactly what LLMs +are good at consuming and generating. The `InterfaceDescriptor` IR +([ROSGRAPH.md §3.3](ROSGRAPH.md#33-the-interfacedescriptor-ir)) is a +JSON blob containing a node's complete API: topics, types, QoS, +parameters, lifecycle state. An AI agent reads this to understand what +a node does, generate implementation code, write tests, or suggest +fixes — without parsing C++ or Python source. -The honest limitation: business logic is hand-written. If a developer -adds an undeclared publisher inside a callback, `rosgraph lint` won't -catch it at build time. `rosgraph monitor` catches it at runtime as -`UnexpectedTopic`. See [PROPOSAL.md -§12](PROPOSAL.md#12-scope--limitations). +See [ROSGRAPH.md §3.13](ROSGRAPH.md#313-ai--tooling-integration) for +the full AI integration design. -### When should I NOT use rosgraph? +### Can I use `rosgraph generate` as an agent tool? -- **Quick prototyping** — single throwaway node, not worth the file. -- **Single-node packages** — minimal lint value, though codegen may - still save boilerplate. -- **Highly dynamic interfaces** — nodes that create/destroy publishers - at runtime based on conditions can't be fully declared. +Yes. An AI agent writing a ROS node can: +1. Generate `interface.yaml` from a natural language description +2. Run `rosgraph generate .` as a tool call to get type-safe + scaffolding +3. Write only the business logic into the generated skeleton +4. Run `rosgraph lint .` to verify the graph is correct -See [PROPOSAL.md §12, "When Not to Use -rosgraph"](PROPOSAL.md#when-not-to-use-rosgraph). +This avoids the common failure mode of LLMs hallucinating ROS +boilerplate (wrong QoS defaults, missing component registration, +incorrect parameter declaration). --- -## 7. Package Maintainer / ROS Governance - -### What does rosgraph mean for my package? - -If you maintain a ROS 2 package, `interface.yaml` is a machine-readable -contract for your node's public API — topics, services, actions, -parameters, QoS. It replaces the informal contract currently scattered -across READMEs, launch file comments, and source code. - -For consumers of your package, this means: -- **API discoverability.** `rosgraph docs` auto-generates browsable API - reference from your `interface.yaml`. No more stale READMEs. -- **Breaking change visibility.** `rosgraph breaking` classifies - interface changes as breaking/dangerous/safe, giving downstream users - clear upgrade guidance. See [PROPOSAL.md - §3.9](PROPOSAL.md#39-rosgraph-breaking--breaking-change-detection). -- **Contract testing.** Downstream packages can run `rosgraph test` - against your declared interface to verify compatibility. See - [PROPOSAL.md §3.7](PROPOSAL.md#37-rosgraph-test--contract-testing). +## 5. Package Maintainer / ROS Governance ### Do I have to adopt rosgraph to be compatible with it? @@ -624,48 +270,26 @@ need to ship `interface.yaml` for others to benefit — though shipping one is much better, since discovered specs require human review and may miss QoS details. -### How does this affect my release process? - -`rosgraph breaking` runs in CI comparing the current `interface.yaml` -against the previous release. Breaking changes block the merge unless -explicitly acknowledged. This is opt-in per package via `rosgraph.toml` -and maps to semantic versioning: breaking = major, dangerous = minor, -safe = patch. See [PROPOSAL.md -§3.14](PROPOSAL.md#314-scale--fleet-considerations). - -### What about packages with plugin systems? - -If your package exposes a plugin API (like nav2's controller plugins), -the mixin system (Phase 2, G15) lets plugin authors declare the -interfaces they inject into the host node. The host's effective -interface is the merge of its own declaration plus all mixin fragments. -See [PROPOSAL.md §3.2](PROPOSAL.md#32-schema-layers). - -Until mixins ship in Phase 2, the host node's `interface.yaml` covers -its own direct interfaces. Plugins that add extra topics/parameters -are flagged by `rosgraph monitor` as unexpected — visible but not -validated. - ### What's the adoption path toward `ros_core`? -Deliberately incremental ([PROPOSAL.md §4, "Adoption -Path"](PROPOSAL.md#adoption-path)): +Deliberately incremental ([ROSGRAPH.md §4, "Adoption +Path"](ROSGRAPH.md#adoption-path)): 1. **`ros-tooling` organization** — institutional backing, CI - infrastructure, release process. graph-monitor already lives here. + infrastructure, release process. 2. **REP for `interface.yaml` schema** — formalizes the declaration format as a community standard, independent of the rosgraph tool. 3. **docs.ros.org tutorial integration** — if "write your first node" uses `interface.yaml`, every new ROS developer learns it from day - one. This is the highest-leverage adoption path. + one. 4. **`ros_core` proposal** — after demonstrated adoption across multiple distros. ### Why not extend existing tools instead? Each existing tool covers one capability but none covers the full -scope. The gap analysis ([PROPOSAL.md -§9.3](PROPOSAL.md#93-gap-analysis)) shows five major gaps: graph diff, +scope. The gap analysis ([ROSGRAPH.md +§9.3](ROSGRAPH.md#93-gap-analysis)) shows five major gaps: graph diff, graph linting, QoS static analysis, behavioral properties, and CI graph validation. No single existing tool can be extended to fill all five. @@ -675,144 +299,38 @@ rosgraph builds on existing work where possible: - cake's design decisions for code generation (validated) - HAROS's metamodel for the graph model (adapted) -### What's the maintenance burden? - -Phase 1 is ~12 features covering core schema, basic codegen, and -highest-value lint/monitor rules. The design minimizes ongoing -maintenance: - -- **Schema versioning** (G14) — `schema_version` field with migration - tooling prevents breaking changes to `interface.yaml` format. -- **IR-based plugin protocol** — code generation plugins are standalone - executables, independently maintained. -- **Analyzer DAG** — lint rules are isolated, independently testable - values (not subclasses). Adding or removing a rule doesn't affect - others. - -The risk factor: this is a new tool, not an extension of something with -existing momentum. It requires sustained contributor commitment. - -### How does this interact with the ROS 2 type system? - -rosgraph references existing `.msg`, `.srv`, and `.action` types — it -doesn't replace them (Design Principle 9). `interface.yaml` declares -which types a node uses; `rosidl` still defines the types themselves. - -The graph model ([PROPOSAL.md §3.1](PROPOSAL.md#31-the-graph-model)) -includes a `MessageTypeDB` that resolves type references to their -definitions for compatibility checking. This uses the existing -`rosidl` output — rosgraph doesn't parse `.msg` files directly. - -### What about governance and community standards? - -The REP process is the standard mechanism for formalizing ROS community -standards. A REP for the `interface.yaml` schema would: - -- Define the YAML schema specification independent of the rosgraph tool -- Allow alternative implementations (someone could build a different - tool that consumes the same schema) -- Provide a formal review process for schema changes -- Signal community endorsement - -The REP is Step 2 of the adoption path — after the tool has proven -itself in `ros-tooling` with real users. - -### What's the risk if this doesn't get adopted? - -The worst case: rosgraph becomes another single-maintainer tool in the -ecosystem (like cake and breadcrumb today). The mitigation strategy: - -- **`ros-tooling` hosting** — institutional backing reduces bus factor -- **REP-based schema** — the schema outlives the tool if it becomes a - standard -- **`generate_parameter_library` compatibility** — the parameters - portion works with the most mature tool in the space, regardless of - rosgraph's fate -- **Standalone value** — even without ecosystem adoption, a single team - gets code generation and parameter validation from day one - --- -## 8. Educator / University Researcher +## 6. Educator / University Researcher ### Can I use rosgraph for teaching ROS 2? -Yes, and this is one of the highest-leverage adoption paths. The Quick -Start ([PROPOSAL.md §1](PROPOSAL.md#quick-start-what-it-looks-like)) -shows a complete workflow in 3 commands: - -```bash -rosgraph generate . # generates node scaffolding -rosgraph lint . # checks for issues -rosgraph monitor # watches the running system -``` - -For teaching, `interface.yaml` forces students to think about their -node's API before writing implementation code — topics, types, QoS, -parameters. This is better pedagogy than the current approach of -copy-pasting publisher boilerplate and tweaking it. - -### Does this lower the barrier for students? - -Significantly. A student writes ~15 lines of YAML declaring what their -node does, runs `rosgraph generate`, and gets a working scaffold with -type-safe publishers, subscribers, and validated parameters. They write -only the business logic. No boilerplate, no silent type mismatches, no -mysterious QoS failures. - -Error messages are designed to be helpful — rule codes, file locations, -clear descriptions of what's wrong and how to fix it. See [PROPOSAL.md -§10.3](PROPOSAL.md#103-static-analysis-architecture). +Yes. The Quick Start +([ROSGRAPH.md §1](ROSGRAPH.md#quick-start-what-it-looks-like)) +shows a complete workflow in 3 commands. For teaching, +`interface.yaml` forces students to think about their node's API +before writing implementation code — topics, types, QoS, parameters. +This is better pedagogy than copy-pasting publisher boilerplate and +tweaking it. ### How does rosgraph relate to HAROS? -HAROS ([PROPOSAL.md §10.6](PROPOSAL.md#106-ros-domain-prior-art-haros)) +HAROS ([ROSGRAPH.md §10.6](ROSGRAPH.md#106-ros-domain-prior-art-haros)) was the prior art for graph analysis in ROS — built at the University of Minho (2016–2021). rosgraph borrows HAROS's metamodel and HPL property language concepts, but differs fundamentally: - **HAROS extracted interfaces from source code.** rosgraph uses - explicit declarations (`interface.yaml`). Declarations are simpler, - more reliable, and enable code generation. + explicit declarations (`interface.yaml`). - **HAROS was ROS 1 only.** rosgraph is built for ROS 2 concepts: QoS, lifecycle, components, actions, DDS discovery. - **HAROS died because extraction broke.** catkin → ament, rospack → colcon, XML launch → Python launch. Declaration-based tools don't break when the build system changes. -### Can I use rosgraph for research on ROS system verification? - -The graph model ([PROPOSAL.md §3.1](PROPOSAL.md#31-the-graph-model)) -is a structured representation of the ROS computation graph — nodes, -topics, services, actions, parameters, QoS, connections. It's -exportable as JSON via the `InterfaceDescriptor` IR ([PROPOSAL.md -§3.3](PROPOSAL.md#33-the-interfacedescriptor-ir)). - -Research opportunities: -- **Formal verification.** The graph model is a natural input for model - checkers. Behavioral properties (Phase 3+, [PROPOSAL.md - §11.4](PROPOSAL.md#114-behavioral-properties-future)) enable temporal - logic specifications. -- **Static analysis.** The analyzer DAG architecture ([PROPOSAL.md - §3.5](PROPOSAL.md#35-rosgraph-lint--static-analysis)) supports custom - analysis passes without modifying core code. -- **Runtime monitoring.** The declared-vs-observed diff ([PROPOSAL.md - §3.6](PROPOSAL.md#36-rosgraph-monitor--runtime-reconciliation)) is a - rich data source for anomaly detection research. -- **ROS ecosystem studies.** Interface coverage, graph topology - patterns, common QoS configurations — all extractable from - `interface.yaml` files across the ecosystem. - -### What about publishing results that use rosgraph? - -The tool is open-source (planned for `ros-tooling` organization). The -SARIF and JSON output formats produce structured, reproducible results -suitable for academic publication. The graph model provides a formal -vocabulary for describing ROS system architectures. - --- -## 9. Embedded / Resource-Constrained Developer +## 7. Embedded / Resource-Constrained Developer ### Does rosgraph add runtime overhead to my nodes? @@ -822,8 +340,8 @@ generated pub/sub wrappers are thin forwarding calls. No virtual dispatch is added beyond what the ROS client library already uses. Parameter validation (via `generate_parameter_library`) runs at -parameter-set time, not in the hot path. See [PROPOSAL.md §3.4, -"Design decisions"](PROPOSAL.md#34-rosgraph-generate--code-generation). +parameter-set time, not in the hot path. See [ROSGRAPH.md §3.4, +"Design decisions"](ROSGRAPH.md#34-rosgraph-generate--code-generation). ### Does `rosgraph monitor` run on the robot? @@ -832,43 +350,68 @@ doesn't instrument or modify your nodes. If your platform can't spare the resources, don't run it. You still get full value from build-time tools (`rosgraph generate`, `rosgraph lint`). -Runtime targets for `rosgraph monitor` ([PROPOSAL.md -§3.14](PROPOSAL.md#314-scale--fleet-considerations)): +Runtime targets ([ROSGRAPH.md +§3.14](ROSGRAPH.md#314-scale--fleet-considerations)): - Memory: < 50MB resident - CPU: < 5% of one core at steady-state (5s scrape interval) - No additional DDS traffic beyond standard discovery -For very constrained platforms, run `rosgraph monitor` off-board -(e.g., on a companion computer) observing the same DDS domain. +--- + +## 8. The Skeptic + +### This proposal has 51 features. Is this realistic? + +Phase 1 ([ROSGRAPH.md §4](ROSGRAPH.md#4-phasing)) is the commitment: +~12 features covering core schema, basic code generation, and +highest-value lint and monitor rules. Later phases are contingent on +adoption. + +The tool builds on existing work — cake for code generation, +`generate_parameter_library` for parameters, `graph-monitor` message +definitions for runtime. Phase 1 is stabilizing and unifying existing +pieces, not building from scratch. + +### When should I NOT use rosgraph? -### Does rosgraph work with micro-ROS? +- **Quick prototyping** — single throwaway node, not worth the file. +- **Single-node packages** — minimal lint value, though codegen may + still save boilerplate. +- **Highly dynamic interfaces** — nodes that create/destroy publishers + at runtime based on conditions can't be fully declared. -micro-ROS nodes communicate via the standard DDS/XRCE-DDS bridge. -`rosgraph discover` and `rosgraph monitor` observe them through the -bridge like any other node. `interface.yaml` declarations work for -micro-ROS nodes — the schema is language-agnostic. +See [ROSGRAPH.md §12, "When Not to Use +rosgraph"](ROSGRAPH.md#when-not-to-use-rosgraph). -Code generation for micro-ROS C is not in Phase 1. The IR-based plugin -architecture ([PROPOSAL.md -§3.3](PROPOSAL.md#33-the-interfacedescriptor-ir)) supports adding a -micro-ROS code generation plugin without changes to the core tool. +--- -### What about real-time constraints? +## 9. Safety-Critical Engineer -`rosgraph monitor` is not real-time safe — it's an observation tool -running in its own process. It does not interfere with the monitored -system, and its failure does not affect the system under observation. +### Does rosgraph help with certification? -For hard real-time requirements, the monitor's Prometheus metrics and -diagnostics topics can be consumed by a separate real-time safety -monitor. rosgraph provides the graph model; real-time enforcement is a -separate concern. See [PROPOSAL.md -§11](PROPOSAL.md#11-safety--certification). +rosgraph is not a safety tool — it's a development and verification +tool that produces artifacts useful in safety cases. See [ROSGRAPH.md +§11](ROSGRAPH.md#11-safety--certification). -### Does the build toolchain add cross-compilation complexity? +Key artifacts: + +| rosgraph artifact | Evidence type | +|---|---| +| `interface.yaml` | Software architecture description | +| `rosgraph lint` SARIF output | Static analysis results | +| `rosgraph monitor` logs | Runtime verification evidence | +| `rosgraph test` results | Interface conformance evidence | +| `rosgraph breaking` output | Change impact analysis | + +### What about behavioral properties? + +Phase 1-2 covers structural properties: type matches, QoS +compatibility, graph connectivity. Behavioral analysis (Phase 3+) adds +temporal and causal properties, inspired by HAROS HPL: + +``` +globally: /emergency_stop causes /motor_disable within 100ms +globally: /heartbeat absent_for 500ms causes /safe_stop +``` -`rosgraph generate` runs at build time on the host, producing standard -C++ and Python source files. These are compiled by the normal -cross-compilation toolchain (`colcon build --cmake-args --DCMAKE_TOOLCHAIN_FILE=...`). rosgraph itself doesn't need to run on -the target — it's a host-side tool, like `cmake` or `protoc`. +See [ROSGRAPH.md §11.4](ROSGRAPH.md#114-behavioral-properties-future). diff --git a/docs/ROSGRAPH.md b/docs/ROSGRAPH.md index 8469ec5..24e7308 100644 --- a/docs/ROSGRAPH.md +++ b/docs/ROSGRAPH.md @@ -20,7 +20,6 @@ 10. [Prior Art](#10-prior-art) 11. [Safety & Certification](#11-safety--certification) 12. [Scope & Limitations](#12-scope--limitations) -13. [Resolved Questions](#13-resolved-questions) --- @@ -353,6 +352,18 @@ connections: # Explicit wiring (optional, for validation) to: object_detector/~/input_cloud ``` +**Toward a single source of truth.** `system.yaml` overlaps +significantly with YAML launch files and parameter config files — all +three describe which nodes run, with what parameters, and with what +remappings. The long-term direction is convergence: `system.yaml` +becomes the graph spec, the parameter config, *and* the launch +description. `rosgraph generate` (or a thin `rosgraph launch` shim) +emits a runnable launch file from the same `system.yaml` that +`rosgraph lint` validates. One file, no drift between what you analyze +and what you run. For projects with multiple deployment configurations +(sim, real, test), each gets its own `system.yaml` — replacing both +the per-config launch file and the per-config parameter YAML. + **Mixins — Composable Interface Fragments** (G15, Phase 2) Plugins that inject interfaces into a host node (e.g., nav2 controller @@ -1702,23 +1713,3 @@ review and refinement. — the minimum viable tool that delivers value. Later phases are contingent on adoption and contributor capacity. ---- - -## 13. Resolved Questions - -The following questions were raised during the proposal drafting process -and have been resolved. Answers are integrated into the relevant -sections of this document. - -| # | Question | Resolution | Section | -|---|----------|------------|---------| -| 1 | Dynamic interfaces | Out of scope — rosgraph covers declared interfaces only (Design Principle 12). Undeclared runtime interfaces are flagged as `UnexpectedTopic` by monitor. | §2 | -| 2 | Launch substitution evaluation | Three-path loader strategy: YAML launch (direct parse), `system.yaml` (static), Python launch AST (pattern matching). | §3.5 | -| 3 | Behavioural properties | Structural first (Phase 1–2), behavioural later (Phase 3+) if adoption warrants it (Design Principle 13). | §2 | -| 4 | `generate_parameter_library` unification | Keep as standalone, maintain schema compatibility. rosgraph delegates to gen_param_lib at build time. | §9.2 | -| 5 | Multi-workspace analysis | Per-package fact caching via installed `interface.yaml` files. Phase 2 concern. | §3.12 | -| 6 | Launch file extraction without clingwrap | Partial AST extraction for standard `launch_ros` patterns, with `system.yaml` as fully-static alternative. | §3.5 | -| 7 | Relationship to graph-monitor | New implementation. Adopt graph-monitor's message definitions, reimplement scraping + reconciliation. | §3.6 | -| 8 | Mixin pattern | `mixins:` section referencing interface fragments. Host's effective interface = own declaration + all mixins merged. | §3.2 | -| 9 | Adoption path | `ros-tooling` org → REP for schema → docs.ros.org tutorials → `ros_core` (long-term). | §4 | -| 10 | Declaration scope | Structural (node interfaces) only for Phase 1–2. Behavioural scope deferred to Phase 3+. | §2 | From fc02a8ce0071a348d5a35648f82428b7fd9d2bf4 Mon Sep 17 00:00:00 2001 From: Luke Sy Date: Tue, 10 Mar 2026 03:14:37 +1100 Subject: [PATCH 3/5] Address PR feedback: summarise ROSGRAPH, update MANIFESTO - MANIFESTO: add undocumented interfaces problem, update wording - ROSGRAPH: greatly summarise from 1715 to 133 lines - Remove sections 3-12 (architecture, phasing, language choice, etc.) - Remove design principles (redundant with key insights) - Replace CLI subcommand tree with prioritised components - Add codegen requirements (plugin arch, distro-installable) - Condense gap table to bullet list - Clarify discovery vs monitoring distinction Signed-off-by: Luke Sy --- docs/MANIFESTO.md | 4 +- docs/ROSGRAPH.md | 1705 ++------------------------------------------- 2 files changed, 64 insertions(+), 1645 deletions(-) diff --git a/docs/MANIFESTO.md b/docs/MANIFESTO.md index 70faf6b..e22ed9a 100644 --- a/docs/MANIFESTO.md +++ b/docs/MANIFESTO.md @@ -4,9 +4,11 @@ Robotics engineers spend too much time on ROS plumbing — writing boilerplate, debugging invisible wiring, and keeping launch files in sync with code — instead of building their application. +The main interfaces of ROS systems (topics, parameters, services, actions) are undocumented by default. As systems grow larger they become harder to reason about, and the lack of well-defined interface contracts blocks automated tooling from helping. + ## What -A declarative, observable ROS graph. Engineers declare what their system should be; tooling generates the code and verifies the running system matches the spec. +A declarative, observable ROS graph. Engineers declare what their system should be; tooling generates the code and entities as needed, and verifies the running system matches the spec. ## How diff --git a/docs/ROSGRAPH.md b/docs/ROSGRAPH.md index 24e7308..9f452e1 100644 --- a/docs/ROSGRAPH.md +++ b/docs/ROSGRAPH.md @@ -6,45 +6,20 @@ --- -## Table of Contents +## Executive Summary -1. [Executive Summary](#1-executive-summary) -2. [Design Principles](#2-design-principles) -3. [Architecture](#3-architecture) -4. [Phasing](#4-phasing) -5. [Language Choice](#5-language-choice) -6. [Feature List](#6-feature-list) -7. [Lint Rule Codes](#7-lint-rule-codes) -8. [Monitor Alert Rules](#8-monitor-alert-rules) -9. [Existing ROS 2 Ecosystem](#9-existing-ros-2-ecosystem) -10. [Prior Art](#10-prior-art) -11. [Safety & Certification](#11-safety--certification) -12. [Scope & Limitations](#12-scope--limitations) +ROS 2 has no standard schema for declaring node interfaces and no +production-ready tooling for verifying that a running system matches its +declared architecture. The ecosystem is fragmented across single-purpose +tools with overlapping scope and bus factors of one. ---- - -## 1. Executive Summary - -ROS 2 has no production-ready tool for verifying that a running system -matches its declared architecture, no standard schema for declaring node -interfaces, and no unified CLI for graph analysis. The ecosystem is -fragmented across single-purpose tools with overlapping scope and bus -factors of one. +Key gaps — no existing tooling: -| Category | Capability | Current tool | Status | -|---|---|---|---| -| **Schema** | Node interface declaration | cake / nodl / gen_param_lib | cake early; nodl dead; gpl params-only | -| **Codegen** | Static graph from launch files | breadcrumb + clingwrap | Early-stage, solo dev | -| **Runtime** | Runtime graph monitoring | graph-monitor | Mid-stage, institutional | -| **Runtime** | Runtime tracing | ros2_tracing | Mature, production | -| **Runtime** | Latency analysis | CARET | Mature, Tier IV | -| **Runtime** | Graph visualisation | Foxglove, Dear RosNodeViewer | Mature but live-only | -| **Runtime** | **Graph diff (expected vs. actual)** | **Nothing** | **Major gap** | -| **Static** | **Graph linting (pre-launch)** | **Nothing** | **Major gap** | -| **Static** | **QoS static analysis** | breadcrumb (partial) | Early-stage | -| **Static** | **CI graph validation** | **Nothing** | **Major gap** | -| **Docs** | **Node API documentation** | **Nothing** (hand-written only) | **Major gap** | -| — | **Behavioural properties** | **Nothing** (HPL was ROS 1) | **Major gap** | +- **Graph diff** (expected vs. actual) +- **Graph linting** (pre-launch static analysis) +- **CI graph validation** +- **Node API documentation** (hand-written only today) +- **QoS static analysis** (breadcrumb is early-stage/partial) ### The Problem, Concretely @@ -62,23 +37,46 @@ Today in ROS 2: `colcon build` succeeds. The system launches. The parameter silently takes its default value. -These are real, common bugs in production ROS 2 systems. rosgraph -catches all four — the first two at build time (`rosgraph lint`), the -third at runtime (`rosgraph monitor`), the fourth at lint time. +These are real, common bugs in production ROS 2 systems. -This document proposes **`rosgraph`** — a single tool with subcommands -covering the four goals of the ROSGraph Working Group: +### Components -``` -rosgraph -├── rosgraph generate (Goal 2: spec → code) -├── rosgraph lint (Goal 4: static graph analysis) -├── rosgraph monitor (Goal 3: runtime reconciliation) -├── rosgraph test (Goal 3: contract testing) -├── rosgraph docs (documentation generation) -├── rosgraph breaking (breaking change detection) -└── rosgraph discover (runtime → spec, brownfield adoption) -``` +rosgraph is composed of the following components, ordered by priority. +These components may be wrapped by user interfaces (e.g. a CLI), but +are designed as independent, composable libraries. + +1. **Node Spec (NoDL)** — a formal, machine-readable schema for + declaring node interfaces (`interface.yaml`). This is the most core + part of the project; everything else builds on it. + +2. **Code Generation** — `nodl-generator` takes NoDL input and outputs + code for ROS client libraries (rclcpp, rclpy, rclrs). Must be + installable as part of a ROS distro (`apt-get install`). Requires a + plugin/sidechannel architecture so additional client libraries + (e.g. rcljava) can be supported without modifying the core generator. + +3. **Runtime Discovery** — introspect a running system and produce NoDL + specs from observed nodes. Enables brownfield adoption: point at an + existing system, generate `interface.yaml` files for every node, then + iteratively refine them. Unlike runtime monitoring (component 5), + discovery is a one-time migration tool, not a continuous process. + +4. **Node-level Unit Testing** — verify a single node conforms to its + declared spec in isolation. + +5. **Graph Analysis & Comparison** — integration-level verification. + Static analysis checks the full graph for type mismatches, QoS + incompatibilities, and missing connections before launch. Runtime + monitoring continuously diffs the declared graph against the live + system, flagging drift (crashed nodes, unexpected topics, QoS + changes) as it happens. + +6. **Documentation Generation** — produce API documentation directly + from NoDL specs. + +> **Open question:** implementation language for the generator tooling. + +### Key Insights Three key insights drive the design: @@ -87,11 +85,11 @@ Three key insights drive the design: operate on a graph model, not on ASTs. Source code parsing is a loader that feeds the model, not the analysis target. -2. **Goals 3–4 are schema conformance problems** ("does reality match - the spec?"), not traditional program analysis. Once you have a - machine-readable spec (`interface.yaml`), verification falls out - naturally — the same pattern as `buf lint`, Pact contract tests, - and Kubernetes reconciliation. +2. **Verification and analysis are schema conformance problems** + ("does reality match the spec?"), not traditional program analysis. + Once you have a machine-readable spec (`interface.yaml`), + verification falls out naturally — the same pattern as `buf lint`, + Pact contract tests, and Kubernetes reconciliation. 3. **A declaration without code generation is a non-starter.** NoDL proved this. The schema must generate code, documentation, and @@ -100,7 +98,7 @@ Three key insights drive the design: static analysis, the contract for runtime verification, and the reference for documentation. -### Quick Start (What It Looks Like) +### Example A minimal `interface.yaml`: @@ -124,1592 +122,11 @@ parameters: bounds<>: [0.1, 100.0] ``` -What you get: - -```bash -rosgraph generate . # → C++ header, Python module, parameter validation -rosgraph lint . # → "no issues" or "TOP001: type mismatch on /cmd_vel" -rosgraph monitor # → live diff: declared graph vs. running system -``` - -The generated code gives you a typed context struct with publishers, -subscribers, and validated parameters — no boilerplate. You write -business logic; rosgraph generates the wiring. +From this single file, the tooling can: +- **Generate** a typed C++/Python node context with publishers and validated parameters — no boilerplate +- **Lint** the full workspace graph for type mismatches and QoS incompatibilities before launch +- **Monitor** the running system and flag drift from the declared spec +- **Discover** a running system's interfaces and produce draft specs for brownfield adoption +- **Document** the node's API automatically --- - -## 2. Design Principles - -### Core Philosophy - -1. **The graph is the program.** Analysis operates on the typed, - QoS-annotated computation graph — not source code ASTs. Source - parsing is a loader that feeds the model, not the analysis target. - -2. **Declare first, verify always.** `interface.yaml` is the single - source of truth. Code generation, static analysis, and runtime - monitoring all verify against the declaration. - -3. **One schema, many consumers.** The same `interface.yaml` drives - code generation, documentation, linting, monitoring, contract - testing, and security policy generation. - -4. **One tool, not ten.** `rosgraph` with subcommands replaces - fragmented single-purpose tools. One CLI, one config, one output - format. - -### Developer Experience - -5. **Zero-config value, progressive disclosure.** Given - `interface.yaml` files, the default rules catch real bugs (type - mismatches, QoS incompatibilities) with no additional configuration. - A minimal 10-line `interface.yaml` produces a working node; - lifecycle, mixins, and parameterized QoS are opt-in. - -6. **Brownfield first, gradual adoption.** `rosgraph discover` - generates specs from running nodes. `--add-noqa` suppresses existing - issues. Packages without `interface.yaml` are skipped, not errored. - -7. **Speed is a feature.** An architectural property, not an - afterthought. Target: lint a 100-package workspace in under 5 - seconds. - -8. **Backward compatibility is non-negotiable.** Existing - `generate_parameter_library` YAML works as-is inside `parameters:`. - Existing `.msg`/`.srv`/`.action` files are referenced, not replaced. - -### Verification & CI - -9. **CI-first.** SARIF output, GitHub annotations, exit codes, and - differential analysis are primary design targets. - -10. **Validation at every stage.** Author time: JSON Schema. Build - time: structural + semantic. Launch time: declared vs. configured. - Runtime: declared vs. observed. - -11. **Correctness rules are errors; style rules are warnings.** Type - mismatches and QoS incompatibilities fail CI. Naming conventions - warn. - -### Scope - -12. **Declared interfaces are the primary target.** The schema - describes the *intended* interface — the same boundary drawn by - Protobuf, AsyncAPI, Smithy, and OpenAPI. For partially dynamic - nodes (e.g., nav2 plugin hosts), worst-case bounds can be declared - with `optional: true`; `rosgraph monitor` validates these at - runtime and flags truly undeclared interfaces as `UnexpectedTopic`. - -13. **Structural first, behavioural later.** Phase 1–2: type matches, - QoS compatibility, graph connectivity — the foundation that - safety-critical systems (ISO 26262, IEC 61508) require as evidence. - Behavioural properties (temporal/causal, e.g. "/e_stop causes - /motor_disable within 100ms") are Phase 3+, drawing on prior art - from HAROS HPL and runtime verification tools like STL/MTL - monitors. The structural graph model is designed to extend to - behavioural annotations without schema redesign. - ---- - -## 3. Architecture - -One tool. One graph model. Four capabilities. - -``` - ┌──────────────────────┐ - │ Graph Model │ - │ (shared library) │ - │ │ - │ Nodes, Topics, │ - │ Services, Actions, │ - │ Parameters, QoS, │ - │ Connections │ - └───────┬──────┬────────┘ - │ │ - ┌────────────┘ └───────────────┐ - │ │ - ┌─────────▼──────────┐ ┌──────────▼───────────┐ - │ Build-time tools │ │ Runtime tools │ - │ │ │ │ - │ rosgraph generate │ │ rosgraph monitor │ - │ rosgraph lint │ │ rosgraph test │ - │ rosgraph docs │ │ rosgraph discover │ - │ rosgraph breaking │ │ │ - └────────────────────┘ └───────────────────────┘ -``` - -### 3.1 The Graph Model - -A language-agnostic representation of the ROS computation graph. Every -loader produces it; every analyzer consumes it. - -``` -ComputationGraph -├── nodes: [NodeInterface] -│ ├── name, namespace, package, executable -│ ├── publishers: [{topic, msg_type, qos}] -│ ├── subscribers: [{topic, msg_type, qos}] -│ ├── services: [{name, srv_type}] -│ ├── clients: [{name, srv_type}] -│ ├── action_servers: [{name, action_type}] -│ ├── action_clients: [{name, action_type}] -│ ├── parameters: [{name, type, default, validators}] -│ └── lifecycle_state: str | None -├── topics: [TopicInfo] -│ ├── name, msg_type -│ ├── publishers: [NodeRef] -│ ├── subscribers: [NodeRef] -│ └── qos_profiles: [QoSProfile] -├── services: [ServiceInfo] -├── actions: [ActionInfo] -└── connections: [Connection] - ├── source: NodeRef - ├── target: NodeRef - ├── channel: TopicRef | ServiceRef | ActionRef - └── qos_compatible: bool -``` - -### 3.2 Schema Layers - -Three schema levels, each building on the previous: - -**Layer 1 — Node Interface Schema** (per-node declaration) - -```yaml -# interface.yaml -schema_version: "1.0" - -node: - name: lidar_processor - package: perception_pkg - lifecycle: managed # managed | unmanaged (default) - -parameters: - # Exact generate_parameter_library format (backward-compatible) - voxel_size: - type: double - default_value: 0.05 - description: "Voxel grid filter leaf size (meters)" - validation: - bounds<>: [0.01, 1.0] - read_only: false - robot_frame: - type: string - default_value: "base_link" - read_only: true - -publishers: - - topic: ~/filtered_points - type: sensor_msgs/msg/PointCloud2 - qos: - history: 5 - reliability: RELIABLE - durability: TRANSIENT_LOCAL - description: "Filtered and downsampled point cloud" - -subscribers: - - topic: ~/raw_points - type: sensor_msgs/msg/PointCloud2 - qos: - history: 1 - reliability: BEST_EFFORT - description: "Raw point cloud from lidar driver" - -services: - - name: ~/set_filter_params - type: perception_msgs/srv/SetFilterParams - -actions: - - name: ~/process_scan - type: perception_msgs/action/ProcessScan - -timers: - - name: process_timer - period_ms: 100 - description: "Main processing loop" -``` - -**Layer 2 — Composed System Schema** (launch-level declaration) - -```yaml -# system.yaml -schema_version: "1.0" -name: perception_pipeline - -nodes: - - ref: perception_pkg/lidar_processor - namespace: /robot1 - parameters: - voxel_size: 0.1 - remappings: - ~/raw_points: /lidar/points - - - ref: perception_pkg/object_detector - namespace: /robot1 - -connections: # Explicit wiring (optional, for validation) - - from: lidar_processor/~/filtered_points - to: object_detector/~/input_cloud -``` - -**Toward a single source of truth.** `system.yaml` overlaps -significantly with YAML launch files and parameter config files — all -three describe which nodes run, with what parameters, and with what -remappings. The long-term direction is convergence: `system.yaml` -becomes the graph spec, the parameter config, *and* the launch -description. `rosgraph generate` (or a thin `rosgraph launch` shim) -emits a runnable launch file from the same `system.yaml` that -`rosgraph lint` validates. One file, no drift between what you analyze -and what you run. For projects with multiple deployment configurations -(sim, real, test), each gets its own `system.yaml` — replacing both -the per-config launch file and the per-config parameter YAML. - -**Mixins — Composable Interface Fragments** (G15, Phase 2) - -Plugins that inject interfaces into a host node (e.g., nav2 controller -plugins adding parameters and topics via the node handle) are declared -via `mixins:`. Each mixin is itself an `interface.yaml` fragment -declaring the topics, parameters, and services it adds. The host node's -effective interface is the merge of its own declaration plus all mixins. - -```yaml -# nodes/follow_path/interface.yaml -node: - name: follow_path - package: nav2_controller - -parameters: - controller_plugin: - type: string - default_value: "dwb_core::DWBLocalPlanner" - -mixins: - - ref: dwb_core/dwb_local_planner # brings in max_vel_x, min_vel_y, etc. - - ref: nav2_costmap_2d/costmap # brings in costmap params + topics -``` - -This pattern (borrowed from Smithy's mixin concept) gives `rosgraph -lint` and `rosgraph monitor` the complete interface picture without -requiring the host node to redeclare everything its plugins add. -Requires the `$ref` / fragment system (G15) as a prerequisite. - -**Layer 3 — Observation Schema** (runtime-observed state) - -```yaml -# observed.yaml (auto-generated from running system) -node: - name: lidar_processor - package: perception_pkg - pid: 12345 - state: active # lifecycle state if managed - -publishers: - - topic: /robot1/lidar_processor/filtered_points - type: sensor_msgs/msg/PointCloud2 - qos: - reliability: RELIABLE - durability: TRANSIENT_LOCAL - depth: 5 - stats: - message_count: 14523 - frequency_hz: 9.98 - subscribers_matched: 2 - -# ... subscribers, services, actions, parameters with actual values -``` - -### 3.3 The InterfaceDescriptor (IR) - -The parsed, validated, fully-resolved representation of a node's -interface. Serializable as JSON for plugin communication: - -```json -{ - "schema_version": "1.0", - "node": { - "name": "lidar_processor", - "package": "perception_pkg", - "lifecycle": "managed" - }, - "parameters": [ - { - "name": "voxel_size", - "type": "double", - "default_value": 0.05, - "description": "Voxel grid filter leaf size (meters)", - "validation": { "bounds": [0.01, 1.0] }, - "read_only": false - } - ], - "publishers": [ - { - "topic": "~/filtered_points", - "resolved_topic": "/robot1/lidar_processor/filtered_points", - "message_type": "sensor_msgs/msg/PointCloud2", - "qos": { "history": 5, "reliability": "RELIABLE", "durability": "TRANSIENT_LOCAL" }, - "description": "Filtered and downsampled point cloud" - } - ] -} -``` - -Plugins receive this IR via stdin (or file path) and produce generated -files. - -### 3.4 `rosgraph generate` — Code Generation - -Translates `interface.yaml` into working node implementations. - -``` -┌─────────────────────────────────────────┐ -│ interface.yaml (per node) │ -└────────────────┬────────────────────────┘ - │ -┌────────────────▼────────────────────────┐ -│ Parser / Validator │ -│ 1. YAML parse │ -│ 2. JSON Schema validation (structural) │ -│ 3. Semantic validation (type refs, QoS)│ -│ 4. Produce InterfaceDescriptor (IR) │ -└────────────────┬────────────────────────┘ - │ - ┌──────────┼──────────────────┐ - │ │ │ -┌─────▼─────┐ ┌─▼──────────┐ ┌────▼──────┐ -│ C++ Plugin│ │Python Plugin│ │Docs Plugin│ -│ │ │ │ │ │ -│ - header │ │ - module │ │ - API ref │ -│ - reg.cpp │ │ - params │ │ - graph │ -│ - params │ │ - __init__ │ │ fragment│ -└───────────┘ └─────────────┘ └───────────┘ -``` - -**Build integration:** - -```cmake -cmake_minimum_required(VERSION 3.22) -project(perception_pkg) -find_package(rosgraph REQUIRED) -rosgraph_auto_package() -``` - -Under the hood, `rosgraph_auto_package()`: -1. Scans `nodes/` for subdirectories with `interface.yaml` -2. Validates each `interface.yaml` (structural + semantic) -3. Invokes C++ plugin → header, registration, params YAML -4. Delegates to `generate_parameter_library()` for parameters -5. Compiles and links -6. Installs interface YAMLs to `share//interfaces/` - -**Design decisions:** -- **Composition over inheritance.** Generated code holds a - `rclcpp::Node` (has-a), not inherits from it. Context struct is a - flat aggregation of generated components plus user state. -- **`generate_parameter_library` as backend.** Uses the existing, - widely-adopted parameter library rather than reimplementing. -- **Convention-over-configuration.** Directory layout (`nodes/`, - `interfaces/`, `launch/`, `config/`) determines behavior. - -### 3.5 `rosgraph lint` — Static Analysis - -Pre-launch verification of the ROS graph. - -``` -┌────────────────────────────────────┐ -│ Loaders │ -│ ┌───────────┐ ┌───────────────┐ │ -│ │interface. │ │ launch files │ │ -│ │yaml parser │ │ (clingwrap/ │ │ -│ │ │ │ native) │ │ -│ └─────┬─────┘ └──────┬────────┘ │ -│ └───────┬───────┘ │ -│ ▼ │ -│ ┌──────────────────────────┐ │ -│ │ Graph Model │ │ -│ └────────────┬─────────────┘ │ -│ ▼ │ -│ ┌──────────────────────────┐ │ -│ │ Analyzer DAG │ │ -│ │ (parallel execution) │ │ -│ │ │ │ -│ │ [topic_resolver] │ │ -│ │ ↓ │ │ -│ │ [type_mismatch_checker] │ │ -│ │ [qos_compat_checker] │ │ -│ │ [naming_convention] │ │ -│ │ [disconnected_subgraph] │ │ -│ │ [unused_node] │ │ -│ │ [launch_linter] │ │ -│ │ ... │ │ -│ └────────────┬─────────────┘ │ -│ ▼ │ -│ ┌──────────────────────────┐ │ -│ │ Post-Processing │ │ -│ │ - suppression filter │ │ -│ │ - severity assignment │ │ -│ │ - deduplication │ │ -│ │ - differential (new │ │ -│ │ issues only for CI) │ │ -│ └────────────┬─────────────┘ │ -│ ▼ │ -│ ┌──────────────────────────┐ │ -│ │ Output Formatters │ │ -│ │ text, JSON, SARIF, │ │ -│ │ GitHub, JUnit │ │ -│ └──────────────────────────┘ │ -└────────────────────────────────────┘ -``` - -**Analyzer definition pattern (from Go analysis framework):** - -```python -# Each analyzer is a value, not a subclass -topic_resolver = GraphAnalyzer( - name="topic_resolver", - doc="Resolves topic names to their message types across the graph", - requires=[], - result_type=TopicTypeMap, - run=resolve_topics, -) - -type_mismatch = GraphAnalyzer( - name="type_mismatch", - doc="Checks that all pub/sub on a topic agree on message type", - requires=[topic_resolver], - result_type=None, - run=check_type_mismatches, -) -``` - -See [§7 Lint Rule Codes](#7-lint-rule-codes) for the full rule system. - -**Launch file loading strategy:** - -Three loader paths, not mutually exclusive, phased by tractability: - -| Loader | Launch format | Extraction method | Phase | Limitations | -|---|---|---|---|---| -| YAML launch | YAML launch files | Direct parse | 1 | Limited expressiveness | -| `system.yaml` | Layer 2 schema | Direct parse | 1 | Requires manual authoring | -| Python launch AST | Standard `launch_ros` | AST pattern matching | 2 | Cannot handle dynamic logic (conditionals, loops) | - -- **YAML launch files** are statically parseable — `rosgraph lint` can - extract node declarations, remappings, and parameter overrides - directly. -- **Python launch files** are imperative and Turing-complete, but most - are declarative-in-spirit. AST-level pattern matching for common - patterns (`Node()`, `LaunchConfiguration()`, - `DeclareLaunchArgument()`) captures ~80% of real launch files without - execution. -- **Layer 2 `system.yaml`** (§3.2) sidesteps the problem entirely — - a static YAML file declaring the intended system composition. Launch - files still run the system, but `system.yaml` is the lint/monitor - source of truth for graph analysis. - -The lint diagram's "launch files" loader encompasses all three paths. - -### 3.6 `rosgraph monitor` — Runtime Reconciliation - -Kubernetes-style reconciliation loop comparing declared vs. observed -graph state. - -``` -┌─────────────────────────────────────────────────┐ -│ rosgraph monitor │ -│ │ -│ ┌───────────────┐ ┌──────────────────────┐ │ -│ │ Declared State │ │ Observed State │ │ -│ │ (from YAML / │ │ (from DDS │ │ -│ │ interface │ │ discovery) │ │ -│ │ files) │ │ │ │ -│ └───────┬───────┘ └──────────┬───────────┘ │ -│ │ │ │ -│ └──────────┬─────────────┘ │ -│ ▼ │ -│ ┌──────────────────────────────────┐ │ -│ │ Reconciliation Engine │ │ -│ │ │ │ -│ │ Level-triggered (not edge) │ │ -│ │ Idempotent │ │ -│ │ Requeue with backoff │ │ -│ └──────────────┬───────────────────┘ │ -│ ▼ │ -│ ┌──────────────────────────────────┐ │ -│ │ Diff Computation │ │ -│ │ │ │ -│ │ - Missing/extra nodes │ │ -│ │ - Missing/extra topics │ │ -│ │ - QoS mismatches │ │ -│ │ - Type mismatches │ │ -│ │ - Parameter drift │ │ -│ └──────────────┬───────────────────┘ │ -│ ▼ │ -│ ┌──────────────────────────────────┐ │ -│ │ Exporters │ │ -│ │ │ │ -│ │ - ROS topics (graph_diff msg) │ │ -│ │ - Prometheus /metrics endpoint │ │ -│ │ - Structured log output │ │ -│ │ - Alerting (via diagnostics) │ │ -│ └──────────────────────────────────┘ │ -└─────────────────────────────────────────────────┘ -``` - -**Reconciliation loop:** - -```python -while running: - declared = load_declared_graph(interface_files, launch_files) - observed = scrape_live_graph(dds_discovery) - - diff = compute_graph_diff(declared, observed) - - if diff.has_issues(): - publish_diff(diff) # ROS topic: /rosgraph/diff - update_metrics(diff) # Prometheus: rosgraph_missing_nodes, etc. - emit_diagnostics(diff) # /diagnostics for standard tooling - - publish_status(observed) # ROS topic: /rosgraph/status - - # Adaptive interval: faster when drifting, slower when stable - if diff.has_critical(): - sleep(1s) - else: - sleep(5s) -``` - -See [§8 Monitor Alert Rules](#8-monitor-alert-rules) for the alert -system. - -**Relationship to graph-monitor:** `rosgraph monitor` is a new -implementation, not an extension of the existing graph-monitor package. -graph-monitor's value is its `rmw_stats_shim` and -`rosgraph_monitor_msgs` message definitions — these are reusable -regardless of architecture. However, graph-monitor lacks the -reconciliation engine (declared vs. observed diff) that is the core of -`rosgraph monitor`, and retrofitting it would constrain the design. - -The integration path: adopt or align with graph-monitor's message -definitions (`rosgraph_monitor_msgs`), reimplement the graph scraping -and reconciliation, and offer to upstream the reconciliation capability -back to graph-monitor if its maintainers are interested. - -### 3.7 `rosgraph test` — Contract Testing - -Schema-driven verification of running nodes against their declarations. - -Three testing modes (modelled on Schemathesis, Dredd, and Pact): - -**Interface conformance** (Dredd model): Run a node, then -systematically verify its actual interface matches its -`interface.yaml`. Check every declared publisher is active, call every -declared service, verify every parameter exists with the declared type -and default. - -**Fuzz testing** (Schemathesis model): Auto-generate messages matching -declared subscriber types, publish them, verify the node produces -outputs on declared publisher topics with correct types. - -**Cross-node contract testing** (Pact model): Node A's -`interface.yaml` declares it subscribes to `/cmd_vel` (Twist). Node -B's `interface.yaml` declares it publishes `/cmd_vel` (Twist). The -contract test verifies they agree on type and QoS compatibility. - -### 3.8 `rosgraph docs` — Documentation Generation - -Auto-generated "Swagger UI for ROS nodes" — browsable API reference -docs from `interface.yaml`. Covers topics, services, actions, -parameters, QoS settings, and message type definitions. - -Output formats: Markdown (for GitHub Pages / docs.ros.org), HTML -(standalone), JSON (for embedding in other tools). - -### 3.9 `rosgraph breaking` — Breaking Change Detection - -Compares two versions of `interface.yaml` and classifies changes: - -| Classification | Examples | -|---|---| -| **Breaking** | Removed topic, changed message type, removed parameter, incompatible QoS change | -| **Dangerous** | Changed QoS (may affect connectivity), narrowed parameter range | -| **Safe** | Added optional parameter, added new publisher, widened parameter range | - -Modelled on `buf breaking` and `graphql-inspector`. - -### 3.10 `rosgraph discover` — Runtime-to-Spec Generation - -Introspects a running node via DDS discovery and generates an -`interface.yaml` from the observed interface. The "slice of cake" -brownfield adoption path. - -```bash -# Generate interface.yaml from a running node -rosgraph discover /lidar_processor -o nodes/lidar_processor/interface.yaml -``` - -Modelled on Terraform's `import` command. - -### 3.11 Configuration - -**`rosgraph.toml`** — single configuration file for all subcommands: - -```toml -[lint] -select = ["TOP", "SRV", "QOS", "GRF"] # enable these rule families -ignore = ["NME001"] # except this specific rule - -[lint.per-package-ignores] -"generated_*" = ["ALL"] # skip generated packages -"*_test" = ["GRF002"] # allow unused nodes in tests - -[generate] -plugins = ["cpp", "python"] -out_dir = "generated" - -[output] -format = "text" # text | json | sarif | github - -[ci] -new-only = true # only new issues (differential) -base-branch = "main" -``` - -### 3.12 Multi-Workspace Analysis - -ROS 2 workspaces overlay each other (e.g., `ros_base` underlay + your -packages + a vendor overlay). When `rosgraph lint` analyzes your -workspace, it needs interface information from packages in the underlay. - -The solution follows the Go analysis framework's per-package fact -caching pattern: installed `interface.yaml` files in -`share//interfaces/` (placed there by `rosgraph_auto_package()` -at install time) serve as cached analysis artifacts. `rosgraph lint` -reads these from the underlay without re-analyzing underlay packages, -only analyzing packages in the current workspace. - -This is a Phase 2 concern. Phase 1 assumes a single workspace. - -### 3.13 AI & Tooling Integration - -`interface.yaml` and the `InterfaceDescriptor` IR (§3.3) are -machine-readable contracts describing a node's complete API. This -makes them natural integration points for AI-assisted development -tools and IDE infrastructure. - -**AI as IR consumer.** The JSON-serialized `InterfaceDescriptor` -contains everything an LLM needs to understand a node's interface: -topics, types, QoS, parameters, lifecycle state. An AI agent can read -this to generate implementation code, write tests, suggest fixes, or -answer questions about the system — without parsing source code. - -**MCP server.** A Model Context Protocol server exposing graph state, -lint results, and interface schemas enables AI coding tools (Claude -Code, Cursor, Copilot) to query the ROS graph as structured context. -"What topics does the perception pipeline publish?" answered from -the graph model, not from grep. - -**AI-assisted discovery.** `rosgraph discover` (§3.10) generates -`interface.yaml` from a running system. The raw output from DDS -discovery is complete but lacks descriptions, rationale, and grouping. -An LLM can refine the generated spec — inferring descriptions from -topic names and message types, suggesting QoS profiles based on -message patterns, and grouping related interfaces. - -**Language Server Protocol (LSP).** An LSP server for `interface.yaml` -enables IDE features beyond JSON Schema validation: hover for message -type definitions, go-to-definition for `$ref` targets, inline -diagnostics from `rosgraph lint`, and cross-file rename support. This -benefits both human developers and AI agents operating within IDE -contexts. - -**Natural language to spec.** The constrained schema makes -`interface.yaml` a tractable generation target for LLMs. "I need a -node that subscribes to a lidar point cloud, filters it, and publishes -the result" produces a valid `interface.yaml` that `rosgraph generate` -can immediately scaffold into working code. - -These are not Phase 1 deliverables, but the architecture should not -preclude them. The IR-based plugin protocol (§3.3) and structured -output formats (JSON, SARIF) are the key enablers — they exist for -code generation and CI, but AI consumers are a natural extension. - -### 3.14 Scale & Fleet Considerations - -§3.12 covers multi-workspace analysis. This section addresses -concerns beyond a single developer's workstation. - -**Interface ownership.** In multi-team organizations, `interface.yaml` -files are shared contracts. The owner is typically the node author -(they define the interface), but downstream consumers depend on it. -Changes require coordination. rosgraph supports this via: -- `rosgraph breaking` (§3.9) — automated detection of breaking - changes in CI, blocking merges that break downstream consumers. -- Installed interfaces in `share//interfaces/` — downstream - teams depend on published interfaces without pulling source code. -- Semantic versioning alignment — the breaking/dangerous/safe - classification maps to semver: breaking = major, dangerous = minor - (review required), safe = patch. - -**Multi-robot systems.** The `system.yaml` (Layer 2, §3.2) supports -namespaced node instances (`namespace: /robot1`). For multi-robot -systems, each robot's graph is a namespaced instance of the same -`system.yaml`. Fleet-level analysis — "which robots are running -interface version X?" — is out of scope for Phase 1–2 but the -architecture supports it: `rosgraph monitor` on each robot publishes -graph snapshots that a fleet-level aggregator can collect. - -**Fleet monitoring.** `rosgraph monitor` (§3.6) runs per-robot. For -fleet-scale observability, the monitor's Prometheus exporter (M7) -enables standard fleet dashboards via Grafana. The `/rosgraph/diff` -topic on each robot can be bridged to a central system for aggregated -drift analysis. The architecture deliberately uses standard -observability patterns (Prometheus metrics, structured logs, -diagnostics topics) rather than inventing fleet-specific -infrastructure. - -**Performance targets.** Build-time targets are stated in DP7 (100 -packages in 5 seconds). Runtime targets for `rosgraph monitor`: -- Reconciliation cycle: < 500ms for a 200-node system -- Memory overhead: < 50MB resident for graph state -- CPU: < 5% of one core at steady-state (5s scrape interval) - -These are design targets, not commitments — they guide architectural -decisions (e.g., choosing Rust for the diff engine). - -### 3.15 colcon Integration - -`colcon` uses a `VerbExtensionPoint` plugin system — any Python package -can register new verbs via `setup.cfg` entry points. Existing examples: -`colcon-clean` adds `colcon clean`, `colcon-cache` adds `colcon cache`. - -The architecture is **`rosgraph` as standalone tool, `colcon-rosgraph` -as thin workspace wrapper**: - -``` -colcon-rosgraph (Python, verb plugin) - └── delegates to → rosgraph (standalone binary) -``` - -This mirrors how `colcon-cmake` shells out to `cmake` — the colcon verb -handles workspace iteration, package ordering, and parallel execution; -the core tool handles single-package analysis. - -**What maps naturally to colcon verbs:** - -| Command | colcon verb | Notes | -|---|---|---| -| `rosgraph generate` | — | Already runs via `rosgraph_auto_package()` in `colcon build` | -| `rosgraph test` | — | Already runs via CTest in `colcon test` | -| `rosgraph lint` | `colcon lint` | Iterates packages in dependency order, parallel per-package lint | -| `rosgraph docs` | `colcon docs` | Generates docs per package, aggregates into workspace docs | -| `rosgraph discover` | `colcon discover` | Generates `interface.yaml` for all running nodes | -| `rosgraph breaking` | `colcon breaking` | Checks all packages against their previous interface versions | - -**What doesn't fit:** - -`rosgraph monitor` is a long-running daemon, not a build-and-exit verb. -It stays as a standalone command (or a `ros2 launch` node). - -**Why both CLIs:** - -- `colcon lint` for the workspace workflow — lint all packages, respect - dependency order, parallel execution, workspace-level reporting. -- `rosgraph lint path/to/interface.yaml` for single-file use, CI - pipelines, and environments without colcon. - -**Language independence.** The colcon plugin is always Python (colcon -requires it), but it delegates to `rosgraph` via subprocess — so the -core tool's language is unconstrained. Rust, Python, or hybrid all work -identically. The colcon integration does not factor into the language -choice (§5). - -The colcon plugin is a Phase 2 deliverable — Phase 1 focuses on the -standalone `rosgraph` tool. The plugin is trivial once the core tool -exists. - ---- - -## 4. Phasing - -### Phase 1 — Foundation - -Deliver the core schema, basic code generation, and highest-value -static + runtime checks. - -**Schema & generate:** -- G1-G10 (existing cake features — stabilize and adopt) -- G11 (lifecycle nodes — blocks nav2/ros2_control adoption) -- G14 (schema versioning — needed before v1.0) - -**Lint (P0 rules):** -- L1 (topic type mismatch), L2 (QoS compatibility), L3 (disconnected - subgraph) -- L5 (SARIF output), L6 (differential analysis) - -**Monitor (P0 features):** -- M1 (declared-vs-observed diff), M2 (missing node alerting), - M5 (graph snapshots) - -### Phase 2 — Adoption Enablers - -Lower barriers for existing codebases. Fill out the rule set. - -**Schema & generate:** -- G12 (timers), G13 (nested parameters), G15 (mixins) -- O1 (`rosgraph docs`), O2 (`rosgraph discover`) - -**Lint (P1 rules + infrastructure):** -- L4 (launch validation), L7 (naming), L8 (unused node), - L9 (parameter validation), L10 (circular deps) -- L11 (inline suppression), L12 (per-package config), - L13 (`--add-noqa`), L14 (semantic validation) - -**Monitor (P1 features):** -- M3 (QoS drift), M4 (runtime type mismatch), M6 (topic stats), - M8 (unexpected node), M9 (health diagnostics) - -### Phase 3 — Scale the Toolchain - -Enable community extension and advanced analysis. - -**Schema & generate:** -- G16 (plugin architecture), G17 (callback groups), - G19 (system composition schema) -- O3 (`rosgraph breaking`), O4 (`rosgraph test`) - -**Lint:** -- L15 (interface coverage) - -**Monitor:** -- M7 (Prometheus endpoint), M10 (adaptive scrape), - M11 (lifecycle state) - -### Phase 4 — Ecosystem Integration - -Future-proofing and niche use cases. - -- G18 (middleware bindings) -- O5 (`rosgraph policy` — SROS 2 security policies) -- M12 (runtime interface coverage) - -### Adoption Path - -rosgraph is unlikely to reach `ros_core` initially — that requires -broad consensus and a high stability bar. A more realistic progression: - -1. **`ros-tooling` organization** (where graph-monitor already lives) — - institutional backing, CI infrastructure, release process. -2. **REP (ROS Enhancement Proposal)** for the `interface.yaml` schema — - formalizes the declaration format as a community standard. -3. **docs.ros.org tutorial integration** — if the "write your first - node" tutorial uses `interface.yaml`, every new ROS developer learns - it from day one. This is the highest-leverage adoption path. -4. **`ros_core` proposal** — after demonstrated adoption across multiple - distros, propose for inclusion in a future distribution. - ---- - -## 5. Language Choice - -The implementation language is an open decision for the WG. The -trade-offs are structural, not preferential. - -### Option A: Rust - -Follows Ruff's model. Speed as an architectural property. - -| Axis | Assessment | -|---|---| -| Performance | Best. Single-pass analysis, zero-cost abstractions, no GC pauses. Achieves the "100 packages in 5s" target. | -| Contribution barrier | Highest. Most ROS contributors know C++/Python, not Rust. | -| Ecosystem fit | Moderate. `rclrs` exists but is not tier-1. CLI tools don't need ROS client library integration. | -| Deployment | Single static binary. No runtime dependencies. | -| Plugin story | WASM plugins (Extism) or process-based (protoc model). | - -### Option B: Python - -Follows the ROS 2 ecosystem convention. - -| Axis | Assessment | -|---|---| -| Performance | Weakest. 10-100x slower than Rust for analysis workloads. May not meet performance targets. | -| Contribution barrier | Lowest. Every ROS developer knows Python. | -| Ecosystem fit | Best. cake is Python. `launch_ros` is Python. Direct reuse of existing parsing libraries. | -| Deployment | Requires Python runtime. `pip install` or ROS package. | -| Plugin story | Native Python plugins. Trivial to write. | - -### Option C: Rust core + Python bindings - -Hybrid via PyO3. Performance-critical core (parsing, graph model, diff -engine, lint rules) in Rust; Python CLI and plugin layer on top. - -| Axis | Assessment | -|---|---| -| Performance | Near-Rust for analysis; Python overhead for CLI/plugin dispatch only. | -| Contribution barrier | Moderate. Core contributors need Rust; plugin authors use Python. | -| Ecosystem fit | Good. Python-facing API integrates with ROS ecosystem. | -| Deployment | Python package with native extension. Requires build toolchain for distribution. | -| Plugin story | Python plugins (native) + WASM plugins (for sandboxing). | - -### Decision factors - -The choice depends on which constraint the WG prioritizes: -- If **speed** is the binding constraint → Rust or hybrid -- If **community contribution** is the binding constraint → Python -- If **both matter** → hybrid, accepting the build complexity - -Note: the colcon integration (§3.15) does not constrain this choice. -The `colcon-rosgraph` plugin is always Python but delegates to the -`rosgraph` binary via subprocess, so the core tool can be any language. - ---- - -## 6. Feature List - -### Schema & Code Generation (`rosgraph generate`) - -| # | Feature | Priority | Description | -|---|---------|----------|-------------| -| G1 | YAML interface declaration | P0 | Single `interface.yaml` per node declaring all ROS 2 entities | -| G2 | JSON Schema validation | P0 | Structural validation with IDE autocompletion via YAML Language Server | -| G3 | C++ code generation | P0 | Typed context, pub/sub/srv/action wrappers, component registration | -| G4 | Python code generation | P0 | Dataclass context, pub/sub/srv/action wrappers | -| G5 | Parameter generation | P0 | Delegates to `generate_parameter_library` (backward-compatible) | -| G6 | QoS declaration | P0 | Required for pub/sub, supports all DDS QoS policies | -| G7 | Parameterized QoS | P0 | `${param:name}` references in QoS fields | -| G8 | Dynamic topic names | P0 | `${param:name}` and `${for_each_param:name}` | -| G9 | Composition pattern | P0 | Has-a `Node`, not is-a `Node` | -| G10 | Zero-boilerplate build | P0 | `rosgraph_auto_package()` CMake macro | -| G11 | Lifecycle node support | P0 | `lifecycle: managed` in node spec | -| G12 | Timer declarations | P1 | `timers:` section with period, callback name | -| G13 | Nested parameters | P1 | Hierarchical parameter structures (parity with gen_param_lib) | -| G14 | Schema versioning | P1 | `schema_version` field with migration tooling | -| G15 | Mixins / shared fragments | P1 | `$ref` to common interface fragments | -| G16 | Plugin architecture | P2 | IR-based pipeline, standalone plugins per language | -| G17 | Callback group declarations | P2 | `callback_groups:` with entity assignment | -| G18 | Middleware bindings | P3 | Protocol-specific config (DDS, Zenoh) | -| G19 | System composition schema | P2 | Multi-node graph declaration (`system.yaml`, Layer 2) | - -### Static Analysis (`rosgraph lint`) - -| # | Feature | Priority | Description | -|---|---------|----------|-------------| -| L1 | Topic type mismatch detection | P0 | Flag when pub and sub on same topic disagree on message type | -| L2 | QoS compatibility checking | P0 | Flag incompatible QoS profiles (reliability, durability, deadline) | -| L3 | Disconnected subgraph detection | P0 | Flag nodes/topics with no connections | -| L4 | Launch file validation | P0 | Detect undefined node refs, invalid remaps, unresolved substitutions | -| L5 | SARIF / CI output | P0 | Structured output for GitHub Security tab, PR annotations | -| L6 | Differential analysis | P0 | `--new-only` reports only issues introduced since base branch | -| L7 | Naming convention enforcement | P1 | Check names against configurable patterns | -| L8 | Unused node detection | P1 | Flag nodes declared but not in any launch config | -| L9 | Parameter validation | P1 | Check values against declared types, ranges, validators | -| L10 | Circular dependency detection | P1 | Flag service/action chains that could deadlock | -| L11 | Inline suppression | P1 | `# rosgraph: noqa: TOP001` in launch/YAML files | -| L12 | Per-package configuration | P1 | Override rules per package via `rosgraph.toml` | -| L13 | `--add-noqa` for adoption | P1 | Generate suppression comments for all existing issues | -| L14 | Semantic validation | P1 | Full type resolution, QoS compatibility checks | -| L15 | Interface coverage reporting | P2 | Which declared topics/services are exercised in tests | - -### Runtime Monitoring (`rosgraph monitor`) - -| # | Feature | Priority | Description | -|---|---------|----------|-------------| -| M1 | Declared-vs-observed graph diff | P0 | Compare declared interfaces against live DDS discovery | -| M2 | Missing node alerting | P0 | Alert when a declared node is not present | -| M3 | QoS drift detection | P0 | Alert when observed QoS differs from declared | -| M4 | Type mismatch detection (runtime) | P0 | Alert when observed types differ from declaration | -| M5 | Graph snapshot publishing | P0 | Periodic `rosgraph_monitor_msgs/Graph` snapshots | -| M6 | Topic statistics | P1 | Message rate, latency, queue depth per topic | -| M7 | Prometheus /metrics endpoint | P1 | Export graph metrics for Grafana dashboards | -| M8 | Unexpected node detection | P1 | Alert on nodes present but not declared | -| M9 | Health diagnostics integration | P1 | Publish to `/diagnostics` for standard ROS tooling | -| M10 | Adaptive scrape interval | P2 | Faster scraping when drift detected, slower when stable | -| M11 | Lifecycle state monitoring | P2 | Track lifecycle transitions against expectations | -| M12 | Interface coverage tracking | P2 | Which declared interfaces are exercised at runtime | - -### Other Subcommands - -| # | Feature | Subcommand | Priority | Description | -|---|---------|------------|----------|-------------| -| O1 | Documentation generation | `rosgraph docs` | P1 | Auto-generated API reference from schema | -| O2 | Runtime-to-spec discovery | `rosgraph discover` | P1 | Introspect running nodes → `interface.yaml` | -| O3 | Breaking change detection | `rosgraph breaking` | P2 | Detect breaking interface changes across releases | -| O4 | Contract testing | `rosgraph test` | P2 | Schema-driven verification of running nodes | -| O5 | Security policy generation | `rosgraph policy` | P3 | Auto-generate SROS 2 policies from schema | - ---- - -## 7. Lint Rule Codes - -Rule codes use hierarchical prefix system (modelled on Ruff). Rules -can be selected at any granularity: `TOP` (all topic rules), -`TOP001` (specific rule). - -| Prefix | Category | Example rules | -|--------|----------|---------------| -| `TOP` | Topic rules | `TOP001` type mismatch, `TOP002` no subscribers, `TOP003` naming convention | -| `SRV` | Service rules | `SRV001` unmatched client, `SRV002` type mismatch | -| `ACT` | Action rules | `ACT001` unmatched client, `ACT002` type mismatch | -| `PRM` | Parameter rules | `PRM001` missing default, `PRM002` type violation, `PRM003` undeclared | -| `QOS` | QoS rules | `QOS001` reliability mismatch, `QOS002` durability incompatible, `QOS003` deadline violation | -| `LCH` | Launch rules | `LCH001` undefined node ref, `LCH002` invalid remap, `LCH003` unresolved substitution | -| `GRF` | Graph-level rules | `GRF001` disconnected subgraph, `GRF002` unused node, `GRF003` circular dependency | -| `NME` | Naming rules | `NME001` topic naming convention, `NME002` node naming convention | -| `SAF` | Safety rules | `SAF001` insufficient redundancy, `SAF002` single point of failure, `SAF003` unmanaged safety node | -| `TF` | TF frame rules | `TF001` undeclared frame_id, `TF002` broken frame chain | - -**Rule lifecycle:** preview → stable → deprecated → removed. New rules -always enter as preview. - -**Fix applicability:** Safe (preserves semantics), unsafe (may alter -behaviour), display-only (suggestion). Per-rule override via config. - ---- - -## 8. Monitor Alert Rules - -| Alert | Condition | Grace period | Severity | -|---|---|---|---| -| `NodeMissing` | Declared node not observed | 10s | critical | -| `UnexpectedNode` | Observed node not declared | 30s | warning | -| `TopicMissing` | Declared topic not present | 5s | critical | -| `QoSMismatch` | Declared QoS ≠ observed QoS | 0s | error | -| `TypeMismatch` | Declared msg type ≠ observed | 0s | critical | -| `ThroughputDrop` | Rate < expected minimum | 30s | warning | - -Grace periods prevent flapping during startup and transient states. -All thresholds (grace period, severity) are configurable via -`rosgraph.toml` — see §11.3 for safety-critical overrides. - ---- - -## 9. Existing ROS 2 Ecosystem - -### 9.1 Maturity Matrix - -| Tool | Stars | Contributors | Last active | Maturity | Bus factor | -|---|---|---|---|---|---| -| **generate_parameter_library** | 353 | 41 | 2026-02 | Production | Healthy | -| **ros2_tracing** | 237 | 30 | 2026-02 | Production (QL1) | Healthy | -| **topic_tools** | 126 | 25 | 2025-08 | Mature | Healthy | -| **launch_ros** | 78 | 71 | 2026-02 | Core infrastructure | Healthy | -| **cake** | 36 | 1 | 2026-02 | Early-stage | 1 (risk) | -| **graph-monitor** | 31 | 3 | 2025-11 | Mid-stage | Low | -| **nodl** | 10 | 7 | 2022-11 | Dormant | N/A | -| **clingwrap** | 9 | 1 | 2026-02 | Early-stage | 1 (risk) | -| **breadcrumb** | 6 | 1 | 2026-02 | Early-stage | 1 (risk) | -| **HAROS** (ROS 1) | 197 | — | 2021-09 | Abandoned | N/A | -| **CARET** | 97 | 18 | active | Mature (Tier IV) | Healthy | - -### 9.2 Tool Assessments - -**cake** — Declarative code generation. `interface.yaml` → C++ and -Python node scaffolding. Functional pattern (has-a Node, not is-a -Node). The fundamental bet is correct: making the interface declaration -the source of truth for code generation is the only way to prevent -schema-code drift. Core design decisions (YAML-driven, -composition-based, schema-validated, codegen-first) are sound. -cake's author is a WG member; rosgraph's Layer 1 schema builds -directly on cake's format, and G1–G10 represent stabilizing cake's -capabilities under the rosgraph umbrella — addressing the bus-factor -risk while preserving the design. Gaps: no lifecycle support, no -timers, no nested parameters, no formal IR, no plugin architecture, -no runtime-to-spec generation. - -**generate_parameter_library** — The most mature tool in the space. -Production-proven in MoveIt2 and ros2_control. Rich validation. The -unification path: the `parameters:` section of `interface.yaml` IS the -`generate_parameter_library` format (already demonstrated in cake). -rosgraph delegates to `generate_parameter_library` at build time rather -than reimplementing parameter generation. The key invariant: a -standalone gen_param_lib YAML file works as-is when placed in the -`parameters:` block of `interface.yaml`. Ownership transfer to -`ros-tooling` would be ideal but is not required — schema compatibility -is sufficient. - -**graph-monitor** — Official ROSGraph WG backing. Publishes structured -graph messages. The `rmw_stats_shim` approach is architecturally sound. -Gap: can report *what exists* but not *what's wrong* — no comparison -against a declared spec. - -**breadcrumb + clingwrap** — Proves the concept of static graph -extraction from launch files. The tight coupling to clingwrap's -non-standard launch API is the primary concern. Static analysis should -work with standard `launch_ros` patterns. - -**nodl** — Dormant since 2022. Correct problem identification but -fatal flaw: no code generation. Superseded by cake's YAML approach. -Key lesson: **a description format without code generation is a -non-starter.** - -**ros2_tracing + CARET** — The most mature dynamic analysis tools. -QL1 certification, production-proven at Tier IV. Complementary to -rosgraph: tracing provides instrumentation, CARET provides latency -analysis, rosgraph provides graph structure analysis. - -### 9.3 Gap Analysis - -| Category | Capability | Current tool | Status | -|---|---|---|---| -| **Schema** | Node interface declaration | cake / nodl / gen_param_lib | cake early; nodl dead; gpl params-only | -| **Codegen** | Static graph from launch files | breadcrumb + clingwrap | Early-stage, solo dev | -| **Runtime** | Runtime graph monitoring | graph-monitor | Mid-stage, institutional | -| **Runtime** | Runtime tracing | ros2_tracing | Mature, production | -| **Runtime** | Latency analysis | CARET | Mature, Tier IV | -| **Runtime** | Graph visualisation | Foxglove, Dear RosNodeViewer | Mature but live-only | -| **Runtime** | **Graph diff (expected vs. actual)** | **Nothing** | **Major gap** | -| **Static** | **Graph linting (pre-launch)** | **Nothing** | **Major gap** | -| **Static** | **QoS static analysis** | breadcrumb (partial) | Early-stage | -| **Static** | **CI graph validation** | **Nothing** | **Major gap** | -| **Docs** | **Node API documentation** | **Nothing** (hand-written only) | **Major gap** | -| — | **Behavioural properties** | **Nothing** (HPL was ROS 1) | **Major gap** | - ---- - -## 10. Prior Art - -Organized by what we borrow, not by framework. Each framework appears -once at its primary contribution. - -### 10.1 Schema Design - -#### AsyncAPI - -The closest structural match to ROS topics. Version 3 cleanly separates -channels, messages, operations, and components at the top level. - -**What to borrow:** -- **Structural separation.** `publishers`, `subscribers`, `services`, - `actions`, `parameters` as peer top-level sections. -- **`components` + `$ref` pattern.** Define QoS profiles or common - parameter sets once, reference everywhere. -- **Trait system.** Define a `reliable_sensor` trait with QoS settings, - apply to multiple publishers. Traits merge via JSON Merge Patch - (RFC 7386). -- **Protocol bindings.** Core schema stays middleware-agnostic; - DDS-specific QoS, Zenoh settings, or shared-memory config in a - `bindings:` block. -- **Parameterized addresses.** Topic name templates - (`sensors/{robot_name}/lidar`) map to ROS 2 namespace/remapping and - `${param:name}` syntax. - -**Gaps:** No services (as typed req/res pair), no actions, no -parameters, no lifecycle, no timers, no TF frames. Single-application -scope (which is actually the right scope for a node interface). - -#### Smithy (AWS) - -Protocol-agnostic interface definition language. Shapes decorated with -traits. - -**What to borrow:** -- **Typed, composable traits** for extensible metadata — the most - powerful metadata mechanism surveyed: - ``` - @qos(reliability: "reliable", depth: 10) - @lifecycle(managed: true) - @parameter_range(min: 0.0, max: 10.0) - @frame_id("base_link") - ``` -- **Mixins** for shared structure. A `lifecycle_diagnostics` mixin adds - a diagnostics publisher and period parameter to any node that - includes it. -- **Resource lifecycle operations** — maps to ROS 2 lifecycle node - transitions. - -#### CUE - -Constraint-based configuration language where types and values are the -same thing. Not a codegen tool — a validation tool. - -**What to borrow:** -- **Constraints as types.** `voxel_size: float & >=0.01 & <=1.0`. The - JSON Schema equivalent (`minimum`, `maximum`, `enum`) is already - used by the existing `interface.schema.yaml`. -- **Incremental constraints.** Base schema + deployment-specific - overlays (e.g., production QoS profiles layered onto a base - `interface.yaml`). -- **Configuration validation.** Validate that launch parameter - overrides are compatible with a node's declared interface. - -### 10.2 Pipeline & Code Generation - -#### Protocol Buffers / Buf CLI - -The single most important architectural lesson: **an intermediate -representation (IR) between parsing and generation**. - -``` -interface.yaml ──> [Parser/Validator] ──> InterfaceDescriptor (IR) - ├──> [Plugin: C++] ──> scaffolding - ├──> [Plugin: Python] ──> scaffolding - ├──> [Plugin: Docs] ──> API reference - └──> [Plugin: Launch] ──> templates -``` - -**What to borrow:** -- **IR-based plugin protocol.** Standalone executables consuming a - serialized `InterfaceDescriptor` via stdin/file. Community members - write `rosgraph-gen-rust` without touching the core codebase. -- **Config-driven generation** (`buf.gen.yaml` pattern): - ```yaml - version: 1 - plugins: - - name: cpp - out: generated/cpp - options: { lifecycle: managed } - - name: python - out: generated/python - ``` -- **Validation as separate layers.** Structural (does the YAML parse?) - → semantic (do referenced types exist?) → breaking (did the interface - change incompatibly?). Maps to `rosgraph lint`, `rosgraph validate`, - `rosgraph breaking`. -- **Deterministic, reproducible output.** Same inputs → byte-identical - output. CI can verify generated code is up to date. - -**What to borrow from Buf CLI specifically:** -- `buf lint` — configurable schema linting with ~50 rules by category. - Config-driven rule selection. -- `buf breaking` — breaking change detection between schema versions. -- Integrated toolchain: `buf generate`, `buf lint`, `buf breaking`, - `buf format` as subcommands of one tool. - -#### TypeSpec (Microsoft) - -**What to borrow:** -- **Multi-emitter architecture.** One spec, many outputs: - ``` - interface.yaml ──> C++ emitter ──> node_interface.hpp - ──> Python emitter ──> interface.py - ──> Docs emitter ──> node_api_reference.md - ──> Launch emitter ──> default_launch.py - ──> Graph emitter ──> rosgraph_monitor_msgs/NodeInterface - ``` -- **Emitter-specific validation.** Each emitter adds its own checks - (e.g., C++ emitter warns about names that produce invalid C++ - identifiers). - -#### OpenAPI - -**What to borrow:** -- **The "Swagger UI" experience.** Auto-generated interactive - documentation from a schema. A "Swagger UI for ROS nodes" where every - node has browsable API docs showing topics, services, actions, - parameters, QoS, and message type definitions — generated from - `interface.yaml`. -- **JSON Schema integration.** OpenAPI 3.1 aligned fully with JSON - Schema. The existing `interface.schema.yaml` (JSON Schema Draft - 2020-12) is the right foundation. - -### 10.3 Static Analysis Architecture - -#### Ruff - -A Python linter written in Rust. Relevant not for Python linting but as -the **best-in-class architecture for building a rule-based analysis -tool**. - -**What to borrow:** - -| Ruff pattern | rosgraph equivalent | -|---|---| -| Rule enum + compile-time registry | `Rule` enum: `TOP001`, `SRV001`, `QOS001`, `GRF001` | -| Hierarchical prefix codes | `TOP` (topic), `SRV` (service), `ACT` (action), `QOS`, `GRF` (graph) | -| Single-pass traversal | Build graph model once, run all rules in one walk | -| Safe/unsafe fix classification | Safe: add missing QoS. Unsafe: rename topic. Display-only: suggest restructure | -| Preview → stable lifecycle | Same graduation for new rules | -| Per-file-ignores | Per-package-ignores, per-launch-file-ignores | -| Inline suppression | `# rosgraph: noqa: TOP001` | -| SARIF output | GitHub Security tab integration | -| Monolithic, no plugins initially | All rules built-in. WASM plugins later | -| Zero-config defaults | Small, high-confidence default rule set | -| `--add-noqa` for gradual adoption | Essential for existing ROS workspaces | - -**Key architectural lesson:** Speed is an architectural property, not an -optimisation. Rust + hand-written parser + single-pass + parallel -package processing + content caching + compile-time codegen. - -#### Go Analysis Framework - -The gold standard for pluggable static analysis architecture. Used by -`go vet`, gopls, and golangci-lint. - -**What to borrow:** - -``` -GraphAnalyzer { - name: str - doc: str - requires: [GraphAnalyzer] # horizontal deps - result_type: Type | None # typed output for dependent analyzers - fact_types: [Fact] # cross-package facts - run: (GraphPass) → (result, [Diagnostic]) -} - -GraphPass { - graph: ComputationGraph # the full graph model - node: NodeInterface # current node under analysis - types: MessageTypeDB # all known msg/srv/action types - qos: QoSProfileDB # QoS profiles in the graph - result_of: {Analyzer: Any} # results from required analyzers - report: (Diagnostic) → void - import_fact: (scope, Fact) → bool - export_fact: (scope, Fact) → void -} -``` - -Key patterns: -1. **Analyzers as values, not subclasses** — trivially composable -2. **Pass as abstraction barrier** — same analyzer in CLI, IDE, CI -3. **Horizontal dependencies** via `Requires`/`ResultOf` — typed data - flow between analyzers -4. **Vertical facts** for cross-package analysis — cached per-package - results enabling separate modular analysis -5. **Action graph** — 2D grid (analyzer x package), independent actions - execute in parallel - -#### golangci-lint - -**What to borrow:** -- **Meta-linter pattern.** One CLI, one config, one output format - wrapping many analyzers. -- **Shared parse.** All analyzers share one AST/model parse. -- **Post-processing pipeline.** `noqa` filter → exclusion rules → - severity assignment → deduplication → output formatting. -- **Differential analysis.** `new-from-merge-base: main` reports only - issues in code changed since the base branch. Critical for CI - adoption in large codebases. - -#### Spectral - -**What to borrow:** -- **YAML-native lint rules** that work directly on `interface.yaml` - without language-specific parsing. Custom rulesets in YAML — a - robotics engineer can author a rule without knowing Rust or C++. - Low barrier to writing new rules. - -### 10.4 Runtime Monitoring Architecture - -#### OpenTelemetry - -Collector pipeline: Receiver → Processor → Exporter. Connectors join -pipelines and enable signal type conversion. - -**What to borrow:** -- **Pipeline architecture** for `rosgraph monitor`. -- **Auto-instrumentation.** Two complementary paths: - - *Runtime observation* (zero-code): DDS discovery provides the graph - without modifying any node. - - *Code-generated instrumentation*: rosgraph-generated code embeds - topic stats, heartbeats, structured logging. - - The **three-way comparison** (declared vs. runtime-observed vs. - self-reported) catches issues that any two-way comparison misses. - -#### Prometheus - -**What to borrow:** -- **Pull model.** Periodic scraping produces consistent point-in-time - snapshots. Absence of data is itself a signal (node is down). -- **Alerting rules** with `for` durations to prevent flapping. -- **Metric types mapping:** - - | Prometheus type | ROS topic statistics equivalent | - |---|---| - | Counter | Messages published (total), dropped messages | - | Gauge | Active subscribers, queue depth, alive nodes | - | Histogram | Inter-arrival times, message sizes, latency distribution | - -#### Kubernetes Controllers - -**What to borrow:** -- **Level-triggered reconciliation** (not edge-triggered). React to the - *current difference* between desired and actual state, not to - individual change events. If an event is missed, the next - reconciliation still catches the drift. -- **Idempotent.** Running reconciliation twice with the same state - produces the same diff and alerts. -- **Requeue with backoff.** After detecting drift, recheck sooner (1s). - If drift persists, escalate. -- **Status reporting.** Maintained separately from the declared spec, - enabling external tools to query current state independently. - -### 10.5 Contract Testing & Verification - -| Framework | What it does | What to borrow for `rosgraph test` | -|---|---|---| -| **Schemathesis** | Fuzz a live API against its OpenAPI spec. Auto-generates test cases from schema. | Fuzz a running node against `interface.yaml` — auto-generate messages matching declared types, verify outputs. | -| **Dredd** | Start a live server, send requests matching the spec, validate responses. The spec IS the test plan. | Run a node, systematically verify its interface matches declaration. Call every service, check every publisher. | -| **Pact** | Consumer-driven contract testing. Consumer declares expectations; provider verifies. | Cross-node contract verification: Node A subscribes to `/cmd_vel` (Twist), Node B publishes it. Verify they agree on type. | -| **gRPC health + reflection** | Standardized health checking + runtime introspection of services/methods. | Health reporting interface that rosgraph-generated nodes expose automatically. Runtime introspection vs. declared interface. | -| **graphql-inspector** | Schema diff (breaking/dangerous/safe). Coverage: which fields are actually queried. | Interface coverage: "which declared topics are exercised in tests?" Schema diff between interface versions. | - -### 10.6 ROS Domain Prior Art: HAROS - -The High-Assurance ROS framework (University of Minho, 2016–2021). The -only tool that accomplished Goals 3–4 for ROS, but only for ROS 1. - -**Pipeline:** Package discovery → CMake parsing → launch file parsing → -source code parsing (libclang for C++, limited Python AST) → -computation graph assembly → plugin-based analysis → JSON export. - -**The metamodel.** Formal classes for the ROS graph: `Node`, -`NodeInstance`, `Topic`, `Service`, `Parameter`, plus typed link classes -(`PublishLink`, `SubscribeLink`, etc.) carrying source conditions and -dependency sets. This metamodel is HAROS's most transferable -contribution. - -**HPL (HAROS Property Language).** Behavioural properties for -message-passing systems: -``` -globally: no /cmd_vel {linear.x > 1.0} # speed limit -globally: /bumper causes /stop_cmd # response -globally: /cmd_vel requires /trajectory within 5s # precedence -``` - -HPL drove three verification paths from a single spec: model checking -(Electrum/Alloy), runtime monitors (generated), and property-based -testing (Hypothesis strategies). - -**Why it died for ROS 2.** The extraction pipeline assumes catkin, -`rospack`, XML launch files, `ros::NodeHandle`. ROS 2 changed -everything. The maintainer closed ROS 2 support as *wontfix*. - -**What to borrow:** Metamodel, HPL's scope+pattern+event structure, -plugin separation (source-level vs. model-level), one spec → multiple -verification modes. - -**What to do differently:** Use declarations (`interface.yaml`) as -primary source of truth (not source code parsing); support ROS 2 -concepts HAROS never had (QoS, lifecycle, components, actions, DDS -discovery). - ---- - -## 11. Safety & Certification - -rosgraph is not a safety tool — it is a development and verification -tool that produces artifacts useful in safety cases. This section maps -rosgraph capabilities to the evidence types required by safety -standards. - -### 11.1 Relevant Standards - -| Standard | Domain | How rosgraph helps | -|---|---|---| -| **IEC 61508** | General functional safety | Design verification evidence (graph analysis), runtime monitoring | -| **ISO 26262** | Automotive | Interface specification (`interface.yaml` as design artifact), static verification | -| **IEC 62304** | Medical device software | Software architecture documentation, traceability | -| **DO-178C** | Aerospace | Requirements traceability, structural coverage analysis | -| **ISO 13482** | Service robots | Interface documentation, runtime monitoring | -| **ISO 21448 (SOTIF)** | Safety of intended functionality | Graph analysis for identifying missing/unexpected interfaces | - -### 11.2 Artifact-to-Evidence Mapping - -| rosgraph artifact | Evidence type | Useful for | -|---|---|---| -| `interface.yaml` | Software architecture description | Design phase documentation | -| `rosgraph lint` SARIF output | Static analysis results | Verification evidence | -| `rosgraph monitor` logs | Runtime verification evidence | Validation phase | -| `rosgraph test` results | Interface conformance evidence | Integration testing | -| `rosgraph breaking` output | Change impact analysis | Change management | -| `rosgraph docs` output | API documentation | Design review | - -### 11.3 Configurable Safety Levels - -Monitor alert grace periods (§8) and severity levels must be -configurable for safety-critical deployments: - -```toml -[monitor.alerts] -NodeMissing = { grace_period_ms = 1000, severity = "critical" } # 1s for surgical robot -UnexpectedNode = { grace_period_ms = 5000, severity = "error" } -TopicMissing = { grace_period_ms = 500, severity = "critical" } -``` - -The defaults in §8 are tuned for general robotics. Safety-critical -deployments override them via `rosgraph.toml`. - -### 11.4 Behavioral Properties (Future) - -Structural analysis (Phase 1–2) proves the graph is correctly wired — -a necessary precondition for behavioral safety. Behavioral analysis -(Phase 3+) proves temporal and causal properties: - -``` -globally: /emergency_stop causes /motor_disable within 100ms -globally: no /cmd_vel {linear.x > max_speed} -globally: /heartbeat absent_for 500ms causes /safe_stop -``` - -This capability, inspired by HAROS HPL (§10.6), is where the deeper -safety value lies. The structural graph model (§3.1) is designed to -be extensible to behavioral annotations without schema redesign. - -### 11.5 Safety-Relevant Lint Rules (Future) - -| Rule | Description | Phase | -|---|---|---| -| `SAF001` | Critical subscriber has < N publishers (no redundancy) | 2 | -| `SAF002` | Single point of failure in graph topology | 2 | -| `SAF003` | Safety-critical node is not lifecycle-managed | 2 | -| `TF001` | Declared `frame_id` not published by any node in graph | 2 | -| `TF002` | Frame chain broken (no transform path between declared frames) | 3 | - -These rules are not in Phase 1 but the analyzer architecture (§3.5) -supports adding them without architectural changes. - ---- - -## 12. Scope & Limitations - -### When Not to Use rosgraph - -rosgraph adds value when the cost of interface bugs exceeds the cost -of maintaining declarations. This trade-off favors rosgraph in -multi-node systems, team environments, and production deployments. -It does not favor rosgraph in every context: - -- **Quick prototyping.** If you're experimenting with a single node - and will throw it away next week, `interface.yaml` is overhead. - Use standard `rclcpp` / `rclpy` directly. -- **Single-node packages.** A package with one node and no - cross-package interfaces gets minimal lint value. The code - generation may still be worthwhile for parameter validation. -- **Highly dynamic interfaces.** Nodes that create publishers and - subscribers at runtime based on dynamic conditions (e.g., a - plugin host that discovers its interface at startup) are outside - scope (DP12). rosgraph can declare the static portion and flag - the dynamic portion as unexpected, but it cannot generate code - for interfaces it doesn't know about at build time. - -### Known Limitations - -**Spec-code drift for business logic.** Code generation covers the -structural skeleton (pub/sub creation, parameter declaration, lifecycle -transitions). Business logic is hand-written. If a developer adds an -undeclared publisher inside a callback, `rosgraph lint` won't catch it -at build time — only `rosgraph monitor` flags it at runtime as -`UnexpectedTopic`. This is a fundamental limitation of any -declaration-based approach: the declaration describes the intended -interface, not the implementation. - -**Launch file coverage.** Python launch files are Turing-complete. -AST pattern matching (§3.5) handles common declarative patterns but -cannot resolve dynamic logic (conditionals based on environment -variables, loops generating node sets). `system.yaml` (Layer 2) is -the escape hatch for systems that need full static analyzability. - -**Ecosystem bootstrapping.** rosgraph's cross-package analysis (type -mismatch detection, contract testing) requires multiple packages to -have `interface.yaml`. The single-package value proposition is code -generation and parameter validation. Cross-package value grows with -adoption. `rosgraph discover` (§3.10) lowers the barrier by generating -specs from running systems, but the generated specs require human -review and refinement. - -**Scope of this proposal.** This document covers 51 features across -7 subcommands. Not all will be built. Phase 1 (§4) is the commitment -— the minimum viable tool that delivers value. Later phases are -contingent on adoption and contributor capacity. - From f27121fee8fc4078f5c2cc928ffde14bd9d21f84 Mon Sep 17 00:00:00 2001 From: Luke Sy Date: Fri, 13 Mar 2026 03:47:30 +1100 Subject: [PATCH 4/5] Remove FAQ.md to simplify CLI reference maintenance Signed-off-by: Luke Sy --- docs/FAQ.md | 417 ---------------------------------------------------- 1 file changed, 417 deletions(-) delete mode 100644 docs/FAQ.md diff --git a/docs/FAQ.md b/docs/FAQ.md deleted file mode 100644 index 583cd5c..0000000 --- a/docs/FAQ.md +++ /dev/null @@ -1,417 +0,0 @@ -# rosgraph — Frequently Asked Questions - -> **Parent:** [ROSGRAPH.md](ROSGRAPH.md) (technical proposal) - -Organized by who's asking. Find your perspective, jump to the -questions that matter to you. - ---- - -## Table of Contents - -0. [General](#0-general) -1. [New ROS Developer](#1-new-ros-developer) -2. [Engineering Lead / System Integrator / DevOps](#2-engineering-lead--system-integrator--devops) -3. [MoveIt / nav2 / Popular Module User](#3-moveit--nav2--popular-module-user) -4. [AI-Assisted Developer](#4-ai-assisted-developer) -5. [Package Maintainer / ROS Governance](#5-package-maintainer--ros-governance) -6. [Educator / University Researcher](#6-educator--university-researcher) -7. [Embedded / Resource-Constrained Developer](#7-embedded--resource-constrained-developer) -8. [The Skeptic](#8-the-skeptic) -9. [Safety-Critical Engineer](#9-safety-critical-engineer) - ---- - -## 0. General - -### What problem does rosgraph solve? - -When you connect ROS 2 nodes together, mistakes are invisible. If one -node sends a `Twist` message but another node expects a -`TwistStamped`, nothing warns you — the subscriber just never receives -data. If you misspell a topic name in a launch file, the node launches -fine but sits there doing nothing. You end up staring at -`ros2 topic list` wondering why nothing is connected. - -rosgraph catches these wiring mistakes before you even launch your -system. You describe what each node publishes, subscribes to, and what -settings it needs in a short YAML file. Then `rosgraph lint` checks -that everything fits together — like a spell checker, but for your -ROS graph. - -See [ROSGRAPH.md §1, "The Problem, -Concretely"](ROSGRAPH.md#the-problem-concretely) for four real-world -examples. - -### How much do I need to learn? - -One file per node (`interface.yaml`, about 15 lines) and three -commands: - -```bash -rosgraph generate . # creates starter code from your YAML -rosgraph lint . # checks for wiring mistakes -rosgraph monitor # watches the running system for problems -``` - -Your editor will autocomplete the YAML fields for you — no need to -memorize the format. See the [Quick -Start](ROSGRAPH.md#quick-start-what-it-looks-like) for a complete -example. - -### What's the overhead? - -Per node: one `interface.yaml` file (~15-30 lines). Most of it is -information you're already specifying in code (topic names, message -types, QoS settings, parameter names) — `interface.yaml` centralizes -it. - -What you get back: -- No pub/sub boilerplate (generated) -- No parameter declaration boilerplate (generated via - `generate_parameter_library`) -- Pre-launch graph validation -- Runtime graph monitoring -- Auto-generated API documentation - -The net line-count change is typically negative for nodes with -parameters. - -### What about my launch files and parameter configs? - -`system.yaml` (Layer 2) overlaps heavily with both — all three -describe which nodes run, with what parameters, and with what -remappings. The long-term direction is convergence: `system.yaml` -becomes the graph spec, the parameter config, *and* the launch -description in one file. `rosgraph generate` emits a runnable launch -file from the same spec that `rosgraph lint` validates — no drift -between what you analyze and what you run. - -For projects with multiple deployment configurations (sim, real, test), -each gets its own `system.yaml`, replacing both the per-config launch -file and the per-config parameter YAML. See [ROSGRAPH.md -§3.2](ROSGRAPH.md#32-schema-layers). - -### Won't the spec just drift from reality like NoDL? - -NoDL died because it was a pure description format — no code -generation. Maintaining a spec that doesn't produce anything is -thankless work. - -`interface.yaml` generates code. If you change the spec, the generated -code changes. If you change the code without changing the spec, -`rosgraph monitor` flags the discrepancy at runtime. The two-way -binding (codegen + runtime monitoring) is what prevents the drift -that killed NoDL. - -The honest limitation: business logic is hand-written. If a developer -adds an undeclared publisher inside a callback, `rosgraph lint` won't -catch it at build time. `rosgraph monitor` catches it at runtime as -`UnexpectedTopic`. See [ROSGRAPH.md -§12](ROSGRAPH.md#12-scope--limitations). - ---- - -## 1. New ROS Developer - -### What does rosgraph do for me? - -- **Writes the repetitive code.** Creating publishers, subscribers, - and declaring parameters — `rosgraph generate` handles this from - your YAML file. You write only the interesting part (what your node - actually *does*). -- **Catches mistakes early.** Mismatched message types, misspelled - topic names, incompatible connection settings — found in seconds, - not after a 30-second launch-debug-relaunch cycle. -- **Keeps settings in one place.** Parameter names, types, and default - values live in `interface.yaml` instead of scattered across your - code, launch files, and README. - -### Will error messages make sense? - -Yes — this is a design priority. Each error tells you: - -- **Where:** which file and line has the problem -- **What:** a plain description of what's wrong -- **How to fix it:** a suggested correction, auto-applied when safe - -No cryptic stack traces. No silent failures. See [ROSGRAPH.md -§10.3](ROSGRAPH.md#103-static-analysis-architecture) for the error -design. - ---- - -## 2. Engineering Lead / System Integrator / DevOps - -### How does this scale to hundreds of packages? - -- **Lint performance target:** 100 packages in under 5 seconds - (Design Principle 7). Analysis is single-pass over the graph model - with parallel per-package processing and content caching. -- **Multi-workspace analysis:** Installed `interface.yaml` files in - underlays serve as cached facts. Only your workspace is analyzed, - not the entire underlay. See [ROSGRAPH.md - §3.12](ROSGRAPH.md#312-multi-workspace-analysis). -- **Differential analysis:** `--new-only` reports only issues - introduced since the base branch. No noise from existing code. - -### I compose nodes from multiple vendors. How does rosgraph help? - -`system.yaml` (Layer 2 schema, [ROSGRAPH.md -§3.2](ROSGRAPH.md#32-schema-layers)) declares the intended system -composition — which nodes, which namespaces, which parameter overrides, -which remappings. `rosgraph lint` validates the composed graph: - -- **Type mismatches** across package boundaries -- **QoS incompatibilities** between a vendor's publisher and your - subscriber -- **Disconnected subgraphs** — nodes that should be connected but - aren't due to a namespace or remapping error - -If a vendor doesn't ship `interface.yaml`, use `rosgraph discover` -([ROSGRAPH.md -§3.10](ROSGRAPH.md#310-rosgraph-discover--runtime-to-spec-generation)) -to generate one from a running instance of the vendor's node. - -### How does rosgraph fit into CI? - -rosgraph is CI-first by design (Design Principle 8): - -```yaml -# GitHub Actions example -- name: Lint graph - run: rosgraph lint . --output-format sarif --new-only --base main - -- name: Check breaking changes - run: rosgraph breaking --base main - -- name: Run contract tests - run: rosgraph test -``` - -Output formats: `text`, `json`, `sarif` (GitHub Security tab), -`github` (Actions annotations), `junit` (test reports). See -[ROSGRAPH.md §3.11](ROSGRAPH.md#311-configuration). - ---- - -## 3. MoveIt / nav2 / Popular Module User - -### Does rosgraph work with nav2's plugin system? - -Yes, via the mixin system ([ROSGRAPH.md -§3.2](ROSGRAPH.md#32-schema-layers)). Plugins that inject interfaces -into a host node are declared as mixins: - -```yaml -# nodes/follow_path/interface.yaml -node: - name: follow_path - package: nav2_controller - -mixins: - - ref: dwb_core/dwb_local_planner - - ref: nav2_costmap_2d/costmap -``` - -The host's effective interface = its own declaration + all mixin -interfaces merged. Mixins are Phase 2 (G15). Phase 1 works for nodes -without plugins. - -### What about `generate_parameter_library` compatibility? - -Full compatibility is a non-negotiable design principle ([ROSGRAPH.md -§2, DP9](ROSGRAPH.md#2-design-principles)). The `parameters:` section -of `interface.yaml` IS the `generate_parameter_library` format. A -standalone gen_param_lib YAML file works as-is when placed in -`interface.yaml`. rosgraph delegates to gen_param_lib at build time. -See [ROSGRAPH.md §9.2](ROSGRAPH.md#92-tool-assessments). - ---- - -## 4. AI-Assisted Developer - -### How does rosgraph work with AI coding tools? - -`interface.yaml` is a machine-readable contract — exactly what LLMs -are good at consuming and generating. The `InterfaceDescriptor` IR -([ROSGRAPH.md §3.3](ROSGRAPH.md#33-the-interfacedescriptor-ir)) is a -JSON blob containing a node's complete API: topics, types, QoS, -parameters, lifecycle state. An AI agent reads this to understand what -a node does, generate implementation code, write tests, or suggest -fixes — without parsing C++ or Python source. - -See [ROSGRAPH.md §3.13](ROSGRAPH.md#313-ai--tooling-integration) for -the full AI integration design. - -### Can I use `rosgraph generate` as an agent tool? - -Yes. An AI agent writing a ROS node can: -1. Generate `interface.yaml` from a natural language description -2. Run `rosgraph generate .` as a tool call to get type-safe - scaffolding -3. Write only the business logic into the generated skeleton -4. Run `rosgraph lint .` to verify the graph is correct - -This avoids the common failure mode of LLMs hallucinating ROS -boilerplate (wrong QoS defaults, missing component registration, -incorrect parameter declaration). - ---- - -## 5. Package Maintainer / ROS Governance - -### Do I have to adopt rosgraph to be compatible with it? - -No. Packages without `interface.yaml` are skipped, not errored (Design -Principle 6). Downstream users can run `rosgraph discover` against your -running node to generate a spec for their own use. Your package doesn't -need to ship `interface.yaml` for others to benefit — though shipping -one is much better, since discovered specs require human review and may -miss QoS details. - -### What's the adoption path toward `ros_core`? - -Deliberately incremental ([ROSGRAPH.md §4, "Adoption -Path"](ROSGRAPH.md#adoption-path)): - -1. **`ros-tooling` organization** — institutional backing, CI - infrastructure, release process. -2. **REP for `interface.yaml` schema** — formalizes the declaration - format as a community standard, independent of the rosgraph tool. -3. **docs.ros.org tutorial integration** — if "write your first node" - uses `interface.yaml`, every new ROS developer learns it from day - one. -4. **`ros_core` proposal** — after demonstrated adoption across - multiple distros. - -### Why not extend existing tools instead? - -Each existing tool covers one capability but none covers the full -scope. The gap analysis ([ROSGRAPH.md -§9.3](ROSGRAPH.md#93-gap-analysis)) shows five major gaps: graph diff, -graph linting, QoS static analysis, behavioral properties, and CI graph -validation. No single existing tool can be extended to fill all five. - -rosgraph builds on existing work where possible: -- `generate_parameter_library` for parameters (used as-is) -- `rosgraph_monitor_msgs` for runtime message definitions (adopted) -- cake's design decisions for code generation (validated) -- HAROS's metamodel for the graph model (adapted) - ---- - -## 6. Educator / University Researcher - -### Can I use rosgraph for teaching ROS 2? - -Yes. The Quick Start -([ROSGRAPH.md §1](ROSGRAPH.md#quick-start-what-it-looks-like)) -shows a complete workflow in 3 commands. For teaching, -`interface.yaml` forces students to think about their node's API -before writing implementation code — topics, types, QoS, parameters. -This is better pedagogy than copy-pasting publisher boilerplate and -tweaking it. - -### How does rosgraph relate to HAROS? - -HAROS ([ROSGRAPH.md §10.6](ROSGRAPH.md#106-ros-domain-prior-art-haros)) -was the prior art for graph analysis in ROS — built at the University -of Minho (2016–2021). rosgraph borrows HAROS's metamodel and HPL -property language concepts, but differs fundamentally: - -- **HAROS extracted interfaces from source code.** rosgraph uses - explicit declarations (`interface.yaml`). -- **HAROS was ROS 1 only.** rosgraph is built for ROS 2 concepts: - QoS, lifecycle, components, actions, DDS discovery. -- **HAROS died because extraction broke.** catkin → ament, rospack → - colcon, XML launch → Python launch. Declaration-based tools don't - break when the build system changes. - ---- - -## 7. Embedded / Resource-Constrained Developer - -### Does rosgraph add runtime overhead to my nodes? - -The generated code uses a composition pattern (has-a `Node`, not is-a -`Node`). This adds one pointer indirection — single nanoseconds. The -generated pub/sub wrappers are thin forwarding calls. No virtual -dispatch is added beyond what the ROS client library already uses. - -Parameter validation (via `generate_parameter_library`) runs at -parameter-set time, not in the hot path. See [ROSGRAPH.md §3.4, -"Design decisions"](ROSGRAPH.md#34-rosgraph-generate--code-generation). - -### Does `rosgraph monitor` run on the robot? - -Yes, but it's optional. `rosgraph monitor` is a separate process — it -doesn't instrument or modify your nodes. If your platform can't spare -the resources, don't run it. You still get full value from build-time -tools (`rosgraph generate`, `rosgraph lint`). - -Runtime targets ([ROSGRAPH.md -§3.14](ROSGRAPH.md#314-scale--fleet-considerations)): -- Memory: < 50MB resident -- CPU: < 5% of one core at steady-state (5s scrape interval) -- No additional DDS traffic beyond standard discovery - ---- - -## 8. The Skeptic - -### This proposal has 51 features. Is this realistic? - -Phase 1 ([ROSGRAPH.md §4](ROSGRAPH.md#4-phasing)) is the commitment: -~12 features covering core schema, basic code generation, and -highest-value lint and monitor rules. Later phases are contingent on -adoption. - -The tool builds on existing work — cake for code generation, -`generate_parameter_library` for parameters, `graph-monitor` message -definitions for runtime. Phase 1 is stabilizing and unifying existing -pieces, not building from scratch. - -### When should I NOT use rosgraph? - -- **Quick prototyping** — single throwaway node, not worth the file. -- **Single-node packages** — minimal lint value, though codegen may - still save boilerplate. -- **Highly dynamic interfaces** — nodes that create/destroy publishers - at runtime based on conditions can't be fully declared. - -See [ROSGRAPH.md §12, "When Not to Use -rosgraph"](ROSGRAPH.md#when-not-to-use-rosgraph). - ---- - -## 9. Safety-Critical Engineer - -### Does rosgraph help with certification? - -rosgraph is not a safety tool — it's a development and verification -tool that produces artifacts useful in safety cases. See [ROSGRAPH.md -§11](ROSGRAPH.md#11-safety--certification). - -Key artifacts: - -| rosgraph artifact | Evidence type | -|---|---| -| `interface.yaml` | Software architecture description | -| `rosgraph lint` SARIF output | Static analysis results | -| `rosgraph monitor` logs | Runtime verification evidence | -| `rosgraph test` results | Interface conformance evidence | -| `rosgraph breaking` output | Change impact analysis | - -### What about behavioral properties? - -Phase 1-2 covers structural properties: type matches, QoS -compatibility, graph connectivity. Behavioral analysis (Phase 3+) adds -temporal and causal properties, inspired by HAROS HPL: - -``` -globally: /emergency_stop causes /motor_disable within 100ms -globally: /heartbeat absent_for 500ms causes /safe_stop -``` - -See [ROSGRAPH.md §11.4](ROSGRAPH.md#114-behavioral-properties-future). From 309c4046de2d61290dcb17d49a4b3513f440eb31 Mon Sep 17 00:00:00 2001 From: Luke Sy Date: Fri, 20 Mar 2026 02:09:03 +1100 Subject: [PATCH 5/5] Address alistair-english PR feedback on ROSGRAPH.md - Add topic rename as a concrete problem example - Reframe discovery vs monitoring as same mechanism, different cadence - Collapse Key Insights to the one load-bearing point (codegen) - Remove node/package keys from interface.yaml example --- docs/ROSGRAPH.md | 39 ++++++++++++++------------------------- 1 file changed, 14 insertions(+), 25 deletions(-) diff --git a/docs/ROSGRAPH.md b/docs/ROSGRAPH.md index 9f452e1..9bdfb95 100644 --- a/docs/ROSGRAPH.md +++ b/docs/ROSGRAPH.md @@ -36,6 +36,9 @@ Today in ROS 2: - You rename a parameter. Three launch files reference the old name. `colcon build` succeeds. The system launches. The parameter silently takes its default value. +- You rename a topic from `/cmd_vel` to `/cmd`. Several downstream + nodes subscribed to the old name silently receive nothing. There is + no static analysis to tell you what depended on it. These are real, common bugs in production ROS 2 systems. @@ -58,8 +61,10 @@ are designed as independent, composable libraries. 3. **Runtime Discovery** — introspect a running system and produce NoDL specs from observed nodes. Enables brownfield adoption: point at an existing system, generate `interface.yaml` files for every node, then - iteratively refine them. Unlike runtime monitoring (component 5), - discovery is a one-time migration tool, not a continuous process. + iteratively refine them. Discovery and runtime monitoring (component 5) + share the same mechanism — observe the live graph, produce a spec, + diff against declared. The distinction is cadence: one-time migration + vs. continuous verification. 4. **Node-level Unit Testing** — verify a single node conforms to its declared spec in isolation. @@ -76,27 +81,14 @@ are designed as independent, composable libraries. > **Open question:** implementation language for the generator tooling. -### Key Insights +### Key Insight -Three key insights drive the design: - -1. **The ROS computation graph is not source code — it is a typed, - directed graph with QoS-annotated edges.** Analysis tools should - operate on a graph model, not on ASTs. Source code parsing is a - loader that feeds the model, not the analysis target. - -2. **Verification and analysis are schema conformance problems** - ("does reality match the spec?"), not traditional program analysis. - Once you have a machine-readable spec (`interface.yaml`), - verification falls out naturally — the same pattern as `buf lint`, - Pact contract tests, and Kubernetes reconciliation. - -3. **A declaration without code generation is a non-starter.** NoDL - proved this. The schema must generate code, documentation, and - validation to stay in sync with reality. `interface.yaml` is - simultaneously the source for code generation, the lint target for - static analysis, the contract for runtime verification, and the - reference for documentation. +**A declaration without code generation is a non-starter.** NoDL +proved this. The schema must generate code, documentation, and +validation to stay in sync with reality. `interface.yaml` is +simultaneously the source for code generation, the lint target for +static analysis, the contract for runtime verification, and the +reference for documentation. ### Example @@ -104,9 +96,6 @@ A minimal `interface.yaml`: ```yaml schema_version: "1.0" -node: - name: talker - package: demo_pkg publishers: - topic: ~/chatter