Skip to content

feat(ros-z-dds): add ros-z-bridge-dds — native DDS↔Zenoh bridge#176

Open
YuanYuYuan wants to merge 48 commits into
mainfrom
dev/ros2dds-bridge
Open

feat(ros-z-dds): add ros-z-bridge-dds — native DDS↔Zenoh bridge#176
YuanYuYuan wants to merge 48 commits into
mainfrom
dev/ros2dds-bridge

Conversation

@YuanYuYuan
Copy link
Copy Markdown
Collaborator

@YuanYuYuan YuanYuYuan commented May 8, 2026

Summary

  • Add ros-z-dds library crate exposing a DDS↔Zenoh bridge as a first-class ros-z component
  • Add ros-z-bridge-dds binary (thin CLI shell over ros-z-dds) connecting existing DDS-based ROS 2 nodes to a Zenoh/ros-z network without recompilation or code changes on either side
  • Full pub/sub, service, and action bridging with correct CDR framing, QoS mapping, and TRANSIENT_LOCAL support
  • Bridge-to-bridge federation across DDS domains via Zenoh liveliness tokens
  • ros_discovery_info publishing so ros2 topic/node/service list works on both sides
  • Documentation chapter and runnable examples: https://zettascalelabs.github.io/ros-z/pr-preview/pr-176/user-guide/dds-bridge/

Key Changes

  • crates/ros-z-dds/ — new library crate with ZDdsBridge (auto-discovery), ZDdsPubBridge, ZDdsSubBridge, ZDdsServiceBridge, ZDdsClientBridge, DdsBridgeExt typed-bridge trait
  • crates/ros-z-bridge-dds/ — thin CLI binary wiring ros-z-dds to clap args
  • docs/user-guide/dds-bridge.md — full documentation chapter including migration guide from zenoh-plugin-ros2dds
  • crates/ros-z-dds/examples/auto_bridge.rs and custom_bridge.rs runnable examples
  • 42 protocol-level unit tests and 6 rmw_cyclonedds_cpp interop tests

Fixes zenoh-plugin-ros2dds issues

#697 — Publications not delivered to remote_api WebSocket subscribers
Fixed by architecture. The old plugin ran embedded inside zenohd alongside remote_api; its internal publishers did not propagate to co-loaded plugin subscribers. This bridge is a standalone process with its own Zenoh client session. It publishes through the router like any other Zenoh client — WebSocket subscribers on the router receive the data normally.

#690 — Route drops when many DDS participants join simultaneously
Likely fixed. The old plugin had a transient-local route renegotiation step that could tear down an existing route under simultaneous participant joins and fail to re-establish it. This bridge has no renegotiation step: each DDS endpoint is independently tracked by GID. Adding new endpoints never disturbs existing routes. TRANSIENT_LOCAL caching is handled by AdvancedPublisher independently of route lifecycle.

#647 — Service reply dropped due to wrong client_guid in CDR framing
Fixed. The old plugin embedded an incorrect client_guid in the CDR request body, so the DDS server's reply was addressed to an unrecognized GUID and dropped. This bridge derives client_guid from req_writer.instance_handle() — the actual DDS writer GID — so the server addresses its reply correctly. In-flight queries are also tracked by a locally-generated sequence number, making response matching robust across DDS vendor differences.

#642 — Services silently fail in bridge → router → bridge topology
Fixed. The old plugin ran embedded in zenohd, creating ambiguity in how query replies were routed back through the same session. This bridge runs in client mode against an external router — queries travel client→router→client with clean routing symmetry. Queryables are declared .complete(true) so the router propagates them as capable service endpoints. Default service timeout is 10 s (vs the old 5 s); action get_result has its own 300 s timeout.

#576 — Namespace doesn't filter subscriptions; other users' traffic bleeds through
Fixed. The old plugin's --namespace only scoped outbound Zenoh publications; the bridge still subscribed to the full keyspace and re-injected all traffic into local DDS. In this bridge the namespace is part of every Zenoh key expression — both publications and subscriptions. A bridge with namespace /botA subscribes only to 0/botA/** keys and never receives publications from /botB.

#570 — Duplicate messages accumulate on each bridge restart
Fixed. The old plugin left stale route/liveliness entries on disconnect; each reconnect added a new route without removing the old ones. This bridge cleans up both sides: CycloneDDS undiscovery events trigger GID-based route removal; Zenoh liveliness tokens are tied to the session and automatically retracted on disconnect, causing the peer bridge to retire federation routes via SampleKind::Delete handling. A restarted bridge starts clean with a new ZID.

Reproducible Demos

Three self-contained Docker/Podman gists (each is a two-file Dockerfile + run.sh) that build everything from source and run an automated PASS/FAIL check end-to-end:

Gist What it demonstrates
sub-bridge-wildcard ros-z talker → bridge → DDS listener; before/after comparison of the ZDdsSubBridge wildcard-key fix
service-bridge-wildcard ros-z client → bridge → DDS server; before/after comparison of the ZDdsServiceBridge wildcard-queryable fix
bridge-federation DDS talker on domain 10 → Bridge A → zenohd → Bridge B → DDS listener on domain 20
gh gist clone 0796c93f2facb5afd6a45096bbc8b97c sub-bridge-wildcard
cd sub-bridge-wildcard
podman build -t sub-wildcard-demo .
podman run --rm sub-wildcard-demo

Breaking Changes

None — ros-z-dds and ros-z-bridge-dds are new crates. Existing ros-z public APIs are unchanged.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://ZettaScaleLabs.github.io/ros-z/pr-preview/pr-176/

Built to branch gh-pages at 2026-05-08 23:02 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

YuanYuYuan added 29 commits May 8, 2026 15:07
Bridges DDS-based ROS 2 nodes (rmw_cyclonedds_cpp) to a Zenoh/ros-z
network without requiring any ROS 2 packages on the bridge host.

Uses cyclors 0.2.7 to build CycloneDDS from source and forward raw CDR
bytes between DDS and Zenoh, no deserialization needed.

Key design points:
- RAII DdsEntity wrapper ensures dds_delete on drop (fixes ghost subscribers)
- Stable client_guid derived from participant instance handle (not per-writer)
- Push-based discovery via DDS builtin topic listeners
- Four route types: DdsToZenoh, ZenohToDds, ServiceRoute, ServiceCliRoute
- Namespace isolation via --namespace flag (ros2_name_to_zenoh_key)
- Allow/deny regex filtering for topic bridging

Integration tests in ros-z-tests cover all six scenarios from the
zenoh-plugin-ros2dds test suite: pub/sub (both directions), services
(both directions), and actions (both directions). CI job added for
Jazzy + CycloneDDS container.
QoS propagation:
- DdsToZenohRoute uses CongestionControl::Block for RELIABLE DDS writers
  when --reliable-routes-blocking (default true)
- DdsToZenohRoute uses AdvancedPublisher with history cache for
  TRANSIENT_LOCAL DDS writers; cache size = history.depth x 10
- Add is_reliable / is_transient_local helpers to qos.rs

Action get_result timeout:
- ServiceCliRoute uses 300 s querier timeout for _action/get_result topics
  (up from 10 s) to match zenoh-plugin-ros2dds DEFAULT_ACTION_GET_RESULT_TIMEOUT
- Add is_action_get_result_topic() to names.rs

Config improvements:
- --domain-id now defaults to ROS_DOMAIN_ID env var, falling back to 0
- --reliable-routes-blocking flag (default true)
- Per-type allow/deny regexes: --allow-pub, --deny-pub, --allow-sub,
  --deny-sub, --allow-service-srv, --deny-service-srv, --allow-service-cli,
  --deny-service-cli; global --allow/--deny as fallback

Unit tests (29 total across qos, names, pubsub, bridge, config)
…ality::Remote

Three bug fixes identified from zenoh-plugin-ros2dds issue analysis:

- (#647) Derive client_guid from req_writer instance handle instead of
  participant handle. CycloneDDS echoes the writer handle back as
  client_guid in the reply, so using the participant handle caused
  reply routing mismatches.

- (#647) Strip the 16-byte CddsRequestHeader (guid + seq_num) from DDS
  replies before forwarding to Zenoh. The Zenoh querier sent
  [CDR_HDR + payload] and expects the same shape back.

- (#642) Add .complete(true) to the service queryable so Zenoh routers
  recognise this bridge as a complete service provider, enabling
  cross-router service calls to succeed.

- (#542) Add .allowed_destination(Locality::Remote) to all DDS→Zenoh
  publishers so the bridge does not re-deliver its own publications to
  Zenoh subscribers on the same session, preventing routing loops when
  two bridge instances share a session and DDS domain.

Adds 5 unit tests covering sequence number extraction, reply header
stripping, and request payload construction.
G1 (TRANSIENT_LOCAL churn, #690): split DDS→Zenoh route into a shared
TopicPublisherSlot (Arc, keyed by (domain_id, topic_name)) and a
per-writer DdsToZenohRoute. Undiscovery starts a 5 s grace period before
evicting the slot so AdvancedPublisher history cache survives rapid DDS
participant restarts.

G2 (action filtering): add --allow-action / --deny-action CLI flags;
bridge dispatcher calls is_action_component() first and applies the
action filter before the per-type pub/sub/service filters.

G3 (QoS mismatch reporting): add qos_mismatch_reason(writer, reader)
to dds/qos.rs; log WARN at route creation when BEST_EFFORT writer meets
RELIABLE reader, or VOLATILE writer meets TRANSIENT_LOCAL reader.

G4 (type name attachment): attach the ROS 2 type name (UTF-8 bytes) as
a Zenoh publication attachment on every DDS→Zenoh message via
BridgePublisherInner::put_wait(bytes, Some(ros2_type)).

G5 (liveliness key builder): add liveliness.rs with build_node_lv_key /
build_entity_lv_key following the @ros2_lv/... format; mod declared in
main.rs for future token publication.

G6 (multi-domain bridging): --domain-id may be repeated; bridge creates
one CycloneDDS participant per domain, merges discovery channels into one
tagged stream, and keys all route maps by (domain_id, Gid) to prevent
GID collisions across domains.

G7 (TLS transport): add --zenoh-config-file <PATH> flag; main.rs loads
the JSON5 file before overlaying the --zenoh-endpoint; Cargo.toml adds
the transport_tls Zenoh feature.

All 61 unit tests pass.
G3: add QoS mismatch warning to service and service_cli routes — both
now call qos_mismatch_reason() after service_default_qos() so
RELIABLE/TRANSIENT_LOCAL incompatibilities surface in logs.

G5: fix liveliness key format in liveliness.rs — entity kind codes
corrected to MP/MS/SS/SC (were pub/sub/srv/cli); node key format now
uses /{node_id}/{node_id}/NN/%/ matching the ros-z-protocol spec;
enclave segment hardcoded to '%' and removed from build_entity_lv_key
signature; tests updated to assert correct format.

TRANSIENT_LOCAL subscriber: ZenohToDdsRoute now uses an AdvancedSubscriber
with HistoryConfig::default().detect_late_publishers() when the discovered
DDS reader has TRANSIENT_LOCAL durability, so late-joining DDS readers
receive historical samples from Zenoh publishers.
Declares one node-level liveliness token per bridged domain on startup
so ros2 node list shows the bridge as a virtual "ros_z_bridge" node.

Declares per-entity liveliness tokens (MP/MS/SS/SC) whenever a route is
created and drops them on undiscovery, keeping the ROS 2 graph consistent
with the current set of active bridged routes.

declare_entity_token() is non-fatal: a warning is logged and the route is
still stored if Zenoh liveliness is unavailable, so the bridge continues
to forward data even when graph visibility fails.

Also adds 10 unit tests covering qos_str selection, type name dispatch
(pub/sub vs service), counter monotonicity, and key format assertions for
all four entity kinds.
Replace the ros-z-protocol liveliness key format with the exact
key expression format used by zenoh-plugin-ros2dds so that
ros-z-bridge-ros2dds tokens are visible to the existing plugin
ecosystem:

  @/{zid}/@ros2_lv/{MP|MS|SS|SC}/{ke}/{type}[/{qos}]

Where ke and type use § (U+00A7) as the slash replacement.
QoS is encoded as integer discriminants matching the plugin's
qos_to_key_expr format. Service tokens carry no QoS suffix.

Remove node-level liveliness tokens (not part of the plugin
protocol) and the entity_counter / node_id fields they required.
Add dds_type_to_ros2_action_type, mirroring the plugin's function.
It strips all five action-specific DDS type suffixes before the
standard dds_type_to_ros2_type conversion:
  _SendGoal_{Request,Response}_, _GetResult_{Request,Response}_,
  _FeedbackMessage_

Without this, action topics would produce malformed type strings
in liveliness tokens (e.g. Fibonacci/SendGoal instead of Fibonacci).

Wire it into declare_entity_token: action components (detected via
is_action_component on the ros2_name) use dds_type_to_ros2_action_type
for all four entity kinds instead of the service/message variants.
Introduce a backend-neutral abstraction layer for DDS:
- dds/backend.rs: DdsParticipant + DdsWriter traits, BridgeQos and all
  QoS types with wire_discriminant() for liveliness encoding
- dds/cyclors/: CyclorsParticipant (CycloneDDS impl), entity/reader/
  writer/qos/discovery — all operating on BridgeQos
- DiscoveredEndpoint.qos is now BridgeQos, no cyclors types on trait boundary
- Bridge<P: DdsParticipant>, all routes generic over P
- liveliness.rs uses &BridgeQos throughout
- 4 raw cyclors files deleted; replaced by cyclors/ submodule

76 unit tests pass.
…in-ros2dds

Cover the previously untested paths:
- service_cli.rs: full CDR envelope protocol (DDS↔Zenoh) — request
  parsing (16-byte header extraction), Zenoh payload construction,
  reply reconstruction with header injection, BE endianness path,
  too-short rejection, and an end-to-end LE round-trip
- service.rs: BE CDR header path, BE reply endianness preservation,
  BE request payload construction, u64::MAX seq_num extraction
- backend.rs: wire discriminants for all ReliabilityKind, DurabilityKind,
  HistoryKind values (must match zenoh-plugin-ros2dds integer encoding)
- action.rs: is_action_component() for all five action sub-topics,
  plain topic/service negative cases, namespaced action, DDS prefix
- pubsub.rs: priority_from_u8() for all 7 valid values plus OOB fallback
- names.rs: is_request/reply/pubsub_topic(), ros2↔dds type roundtrip,
  namespace handling edge cases

Total: 76 → 118 tests (42 new, 0 failures)
…r_format

Add four public accessors on ZNode needed by ros-z-dds bridge types:
- session() → &Arc<Session>
- keyexpr_format() → &KeyExprFormat
- next_entity_id() → usize
- node_entity() → &NodeEntity

Add ZContextBuilder::with_key_expr_format() alias so callers can select
RmwZenoh or Ros2Dds key expression format at context creation time.
Add ros-z-dds as a first-class library crate exposing:
- DdsParticipant trait + CyclorsParticipant impl (from ros-z-bridge-ros2dds)
- ZDdsPubBridge  — DDS reader → Zenoh publisher
- ZDdsSubBridge  — Zenoh subscriber → DDS writer
- ZDdsServiceBridge — DDS service server → Zenoh queryable (complete(true))
- ZDdsClientBridge  — Zenoh querier → DDS service client
- ZDdsBridge     — auto-discovery orchestrator with allow/deny filters
- DdsBridgeExt   — typed extension trait on ZNode

Bridge types use node.keyexpr_format().topic_key_expr() for key expression
construction and node.session() for Zenoh entity declaration; no intermediate
abstractions are needed in ros-z core. TRANSIENT_LOCAL topics use
AdvancedPublisher/AdvancedSubscriber from zenoh-ext.
Replace the monolithic ros-z-bridge-ros2dds crate with a minimal
zenoh-bridge-dds binary (ros-z-bridge-dds) that delegates all bridge
logic to the ros-z-dds library:

main.rs (~40 lines):
  ZContextBuilder + ZNode + ZDdsBridge::new(node, participant).run()

config.rs: simplified CLI — single --domain-id, --allow, --deny,
  --node-name, timeout and cache multiplier flags.

Delete ros-z-bridge-ros2dds/src/{bridge,liveliness,wire_format,routes/,dds/}
— all logic now lives in ros-z-dds.

Update ros-z-tests: rename ros2dds_bridge.rs → bridge_dds.rs, update
spawn_bridge() to use --domain-id (singular).
Add RosDiscoveryPublisher that writes ParticipantEntitiesInfo CDR-LE
to the ros_discovery_info DDS topic (~1 Hz), making bridge endpoints
visible to ros2 topic list / node list / service list.

- ros_discovery.rs: CDR structs + background-task publisher
- gid.rs: add serde Serialize/Deserialize to Gid (needed for CDR)
- pubsub.rs/service.rs: expose reader_guid()/writer_guid() per route
- bridge.rs: create publisher on startup, register/unregister GIDs
  as routes are created and removed via gid_to_name undiscovery
- Cargo.toml: add cdr = 0.2.4 and serde/derive
…rage

- Add --wire-format CLI flag to zenoh-bridge-dds (rmw-zenoh default, ros2dds legacy)
- Fix tests 5–6 to use spawn_bridge_ros2dds (bare keys require ros2dds format)
- Add tests 7–10: rmw-zenoh primary tests (pub/sub and service in both directions)
- Add tests 11–13: API-level construction tests for ZDdsPubBridge, ZDdsSubBridge,
  and DdsBridgeExt typed helpers — no binary spawn required
- Enable ros-z-dds as optional dep in ros-z-tests under dds-bridge-interop feature
Subscribe to @ros2_lv/** liveliness and create complementary DDS routes
when a remote bridge announces endpoints:
- Remote Publisher → local ZDdsSubBridge (Zenoh→DDS relay)
- Remote Subscription → local ZDdsPubBridge (DDS→Zenoh relay)

Routes are reference-counted via RouteEntry<R> which tracks both local
DDS GIDs and remote bridge ZIDs. A route persists until all sources
retire, so federation routes survive local DDS endpoint churn.

Also fix liveliness key expressions to always include the type name:
when type_hash is None, use TypeHash::zero() as placeholder so remote
bridges can reconstruct DDS entities for the correct message type.

Add qos_profile_to_bridge_qos() inverse of bridge_qos_to_qos_profile()
for converting parsed QosProfile back to BridgeQos when creating
federated DDS endpoints.
Remote Service server → local ZDdsClientBridge (DDS clients can call it).
Remote Client → local ZDdsServiceBridge (DDS servers can respond to it).

Also migrates srv_routes and cli_routes from plain HashMap to RouteEntry<R>
so service routes are reference-counted by local DDS GIDs and remote bridge
ZIDs, matching the pub/sub route lifecycle introduced for federation.

Action get_result timeout is applied when the federated service name
contains /_action/get_result.
- New dds-bridge.md chapter covering architecture, quick start, CLI
  reference, wire format, topic filtering, federation, DDS discovery
  scope, programmatic API, and troubleshooting
- auto_bridge.rs: runnable example wrapping ZDdsBridge auto-discovery
- custom_bridge.rs: typed bridge using DdsBridgeExt with RosString
- Minimal README.md for ros-z-dds and ros-z-bridge-dds crates
- DDS Bridge added to mkdocs.yml Guides section
- DDS bridge examples section added to examples.md
- Dev-dependencies for examples: tracing-subscriber, ros-z-msgs, tokio/signal
- Remove with_key_expr_format alias from ZContextBuilder; callers use
  keyexpr_format() directly (main.rs updated)
- Remove bridge_dds_service/bridge_dds_client from DdsBridgeExt: generated
  request types use ::msg::dds_:: naming which doesn't match the _Request_
  suffix expected by dds_type_to_ros2_service_type
- Remove Transient/Persistent variants from DurabilityKind; ROS 2 nodes
  only produce Volatile/TransientLocal, and the CDurabilityKind conversion
  now maps the unused values to Volatile
ZDdsServiceBridge::new no longer takes a timeout argument — the DDS reply
wait is driven by the Zenoh querier timeout on the client side; the service
bridge side has no reply deadline to enforce.

liveliness_subscription_pattern() drops its KeyExprFormat parameter — both
wire formats use the same @ros2_lv/** admin prefix, so the argument served
no purpose.
… code

- Scope internal items: compute_cache_size → fn, reader_guid/writer_guid/
  wire_discriminant → pub(crate), reader/writer RAII field → _route
- Move wire_discriminant impls into #[cfg(test)] blocks; remove empty public impls
- Mark ZenohSubHandle #[allow(dead_code)] (RAII variant payloads)
- Fix compiler warnings: unused dp binding, redundant nested unsafe,
  dead iov_len_to_usize function, #[allow(async_fn_in_trait)] on DdsBridgeExt
- Change internal modules to pub(crate): cyclors, discovery, gid, names,
  qos, ros_discovery
- Narrow crate-root re-exports: remove RosDiscoveryPublisher, DdsReader,
  DdsWriter (no external callers)
- Remove dead code: service_default_bridge_qos, qos_mismatch_reason,
  is_reply_topic, ros2_name_to_zenoh_key, is_reliable, is_transient_local,
  and delete types.rs entirely (DDSRawSample + ddsrt_iov_len_to_usize)
@YuanYuYuan YuanYuYuan force-pushed the dev/ros2dds-bridge branch from a86ca5f to aceb954 Compare May 8, 2026 15:08
@YuanYuYuan YuanYuYuan changed the title feat(ros-z-dds): DDS↔Zenoh bridge — new ros-z-dds library and zenoh-bridge-dds binary feat(ros-z-dds): add DDS↔Zenoh bridge — ros-z-dds library and zenoh-bridge-dds binary May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/ros-z/src/node.rs 89.74% 4 Missing ⚠️
Files with missing lines Coverage Δ
crates/ros-z/src/node.rs 76.35% <89.74%> (+3.21%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@YuanYuYuan YuanYuYuan changed the title feat(ros-z-dds): add DDS↔Zenoh bridge — ros-z-dds library and zenoh-bridge-dds binary feat(ros-z-dds): add ros-z-bridge-dds — native DDS↔Zenoh bridge May 8, 2026
YuanYuYuan added 16 commits May 8, 2026 15:19
…elf, field_reassign_with_default, unnecessary casts)
…idge tests

Three failing bridge integration tests (tests 4, 8, 9):

- ZDdsServiceBridge req_writer used adapt_writer_qos_for_reader (→ BestEffort)
  instead of adapt_reader_qos_for_writer (→ Reliable).  A BestEffort DDS writer
  cannot communicate with a ROS 2 service server's Reliable request reader, so
  the DDS write was silently dropped and the Zenoh query timed out.

- ZDdsSubBridge declared its Zenoh subscriber on the exact EMPTY_TOPIC_TYPE/
  EMPTY_TOPIC_HASH key derived from the DDS endpoint.  When the DDS listener
  carried a real type hash in user_data the key was specific, so Zenoh publishers
  using the EMPTY key (test 8) were not routed to the bridge.  Fix: subscribe
  with a wildcard suffix (topic/**) so any publisher on the same ROS topic
  is forwarded regardless of type/hash.

- ZDdsServiceBridge declared its Zenoh queryable on the same specific key, so
  test 9 (querying EMPTY key) found no complete queryable.  Fix: use topic/**
  wildcard queryable to accept queries from any client type/hash.

Also redirect bridge stderr to the log file (was Stdio::null()) so bridge
errors are visible when tests fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant