feat(ros-z-dds): add ros-z-bridge-dds — native DDS↔Zenoh bridge#176
Open
YuanYuYuan wants to merge 48 commits into
Open
feat(ros-z-dds): add ros-z-bridge-dds — native DDS↔Zenoh bridge#176YuanYuYuan wants to merge 48 commits into
YuanYuYuan wants to merge 48 commits into
Conversation
|
Bridges DDS-based ROS 2 nodes (rmw_cyclonedds_cpp) to a Zenoh/ros-z network without requiring any ROS 2 packages on the bridge host. Uses cyclors 0.2.7 to build CycloneDDS from source and forward raw CDR bytes between DDS and Zenoh, no deserialization needed. Key design points: - RAII DdsEntity wrapper ensures dds_delete on drop (fixes ghost subscribers) - Stable client_guid derived from participant instance handle (not per-writer) - Push-based discovery via DDS builtin topic listeners - Four route types: DdsToZenoh, ZenohToDds, ServiceRoute, ServiceCliRoute - Namespace isolation via --namespace flag (ros2_name_to_zenoh_key) - Allow/deny regex filtering for topic bridging Integration tests in ros-z-tests cover all six scenarios from the zenoh-plugin-ros2dds test suite: pub/sub (both directions), services (both directions), and actions (both directions). CI job added for Jazzy + CycloneDDS container.
QoS propagation: - DdsToZenohRoute uses CongestionControl::Block for RELIABLE DDS writers when --reliable-routes-blocking (default true) - DdsToZenohRoute uses AdvancedPublisher with history cache for TRANSIENT_LOCAL DDS writers; cache size = history.depth x 10 - Add is_reliable / is_transient_local helpers to qos.rs Action get_result timeout: - ServiceCliRoute uses 300 s querier timeout for _action/get_result topics (up from 10 s) to match zenoh-plugin-ros2dds DEFAULT_ACTION_GET_RESULT_TIMEOUT - Add is_action_get_result_topic() to names.rs Config improvements: - --domain-id now defaults to ROS_DOMAIN_ID env var, falling back to 0 - --reliable-routes-blocking flag (default true) - Per-type allow/deny regexes: --allow-pub, --deny-pub, --allow-sub, --deny-sub, --allow-service-srv, --deny-service-srv, --allow-service-cli, --deny-service-cli; global --allow/--deny as fallback Unit tests (29 total across qos, names, pubsub, bridge, config)
…ality::Remote Three bug fixes identified from zenoh-plugin-ros2dds issue analysis: - (#647) Derive client_guid from req_writer instance handle instead of participant handle. CycloneDDS echoes the writer handle back as client_guid in the reply, so using the participant handle caused reply routing mismatches. - (#647) Strip the 16-byte CddsRequestHeader (guid + seq_num) from DDS replies before forwarding to Zenoh. The Zenoh querier sent [CDR_HDR + payload] and expects the same shape back. - (#642) Add .complete(true) to the service queryable so Zenoh routers recognise this bridge as a complete service provider, enabling cross-router service calls to succeed. - (#542) Add .allowed_destination(Locality::Remote) to all DDS→Zenoh publishers so the bridge does not re-deliver its own publications to Zenoh subscribers on the same session, preventing routing loops when two bridge instances share a session and DDS domain. Adds 5 unit tests covering sequence number extraction, reply header stripping, and request payload construction.
G1 (TRANSIENT_LOCAL churn, #690): split DDS→Zenoh route into a shared TopicPublisherSlot (Arc, keyed by (domain_id, topic_name)) and a per-writer DdsToZenohRoute. Undiscovery starts a 5 s grace period before evicting the slot so AdvancedPublisher history cache survives rapid DDS participant restarts. G2 (action filtering): add --allow-action / --deny-action CLI flags; bridge dispatcher calls is_action_component() first and applies the action filter before the per-type pub/sub/service filters. G3 (QoS mismatch reporting): add qos_mismatch_reason(writer, reader) to dds/qos.rs; log WARN at route creation when BEST_EFFORT writer meets RELIABLE reader, or VOLATILE writer meets TRANSIENT_LOCAL reader. G4 (type name attachment): attach the ROS 2 type name (UTF-8 bytes) as a Zenoh publication attachment on every DDS→Zenoh message via BridgePublisherInner::put_wait(bytes, Some(ros2_type)). G5 (liveliness key builder): add liveliness.rs with build_node_lv_key / build_entity_lv_key following the @ros2_lv/... format; mod declared in main.rs for future token publication. G6 (multi-domain bridging): --domain-id may be repeated; bridge creates one CycloneDDS participant per domain, merges discovery channels into one tagged stream, and keys all route maps by (domain_id, Gid) to prevent GID collisions across domains. G7 (TLS transport): add --zenoh-config-file <PATH> flag; main.rs loads the JSON5 file before overlaying the --zenoh-endpoint; Cargo.toml adds the transport_tls Zenoh feature. All 61 unit tests pass.
G3: add QoS mismatch warning to service and service_cli routes — both
now call qos_mismatch_reason() after service_default_qos() so
RELIABLE/TRANSIENT_LOCAL incompatibilities surface in logs.
G5: fix liveliness key format in liveliness.rs — entity kind codes
corrected to MP/MS/SS/SC (were pub/sub/srv/cli); node key format now
uses /{node_id}/{node_id}/NN/%/ matching the ros-z-protocol spec;
enclave segment hardcoded to '%' and removed from build_entity_lv_key
signature; tests updated to assert correct format.
TRANSIENT_LOCAL subscriber: ZenohToDdsRoute now uses an AdvancedSubscriber
with HistoryConfig::default().detect_late_publishers() when the discovered
DDS reader has TRANSIENT_LOCAL durability, so late-joining DDS readers
receive historical samples from Zenoh publishers.
Declares one node-level liveliness token per bridged domain on startup so ros2 node list shows the bridge as a virtual "ros_z_bridge" node. Declares per-entity liveliness tokens (MP/MS/SS/SC) whenever a route is created and drops them on undiscovery, keeping the ROS 2 graph consistent with the current set of active bridged routes. declare_entity_token() is non-fatal: a warning is logged and the route is still stored if Zenoh liveliness is unavailable, so the bridge continues to forward data even when graph visibility fails. Also adds 10 unit tests covering qos_str selection, type name dispatch (pub/sub vs service), counter monotonicity, and key format assertions for all four entity kinds.
Replace the ros-z-protocol liveliness key format with the exact
key expression format used by zenoh-plugin-ros2dds so that
ros-z-bridge-ros2dds tokens are visible to the existing plugin
ecosystem:
@/{zid}/@ros2_lv/{MP|MS|SS|SC}/{ke}/{type}[/{qos}]
Where ke and type use § (U+00A7) as the slash replacement.
QoS is encoded as integer discriminants matching the plugin's
qos_to_key_expr format. Service tokens carry no QoS suffix.
Remove node-level liveliness tokens (not part of the plugin
protocol) and the entity_counter / node_id fields they required.
Add dds_type_to_ros2_action_type, mirroring the plugin's function.
It strips all five action-specific DDS type suffixes before the
standard dds_type_to_ros2_type conversion:
_SendGoal_{Request,Response}_, _GetResult_{Request,Response}_,
_FeedbackMessage_
Without this, action topics would produce malformed type strings
in liveliness tokens (e.g. Fibonacci/SendGoal instead of Fibonacci).
Wire it into declare_entity_token: action components (detected via
is_action_component on the ros2_name) use dds_type_to_ros2_action_type
for all four entity kinds instead of the service/message variants.
Introduce a backend-neutral abstraction layer for DDS: - dds/backend.rs: DdsParticipant + DdsWriter traits, BridgeQos and all QoS types with wire_discriminant() for liveliness encoding - dds/cyclors/: CyclorsParticipant (CycloneDDS impl), entity/reader/ writer/qos/discovery — all operating on BridgeQos - DiscoveredEndpoint.qos is now BridgeQos, no cyclors types on trait boundary - Bridge<P: DdsParticipant>, all routes generic over P - liveliness.rs uses &BridgeQos throughout - 4 raw cyclors files deleted; replaced by cyclors/ submodule 76 unit tests pass.
…in-ros2dds Cover the previously untested paths: - service_cli.rs: full CDR envelope protocol (DDS↔Zenoh) — request parsing (16-byte header extraction), Zenoh payload construction, reply reconstruction with header injection, BE endianness path, too-short rejection, and an end-to-end LE round-trip - service.rs: BE CDR header path, BE reply endianness preservation, BE request payload construction, u64::MAX seq_num extraction - backend.rs: wire discriminants for all ReliabilityKind, DurabilityKind, HistoryKind values (must match zenoh-plugin-ros2dds integer encoding) - action.rs: is_action_component() for all five action sub-topics, plain topic/service negative cases, namespaced action, DDS prefix - pubsub.rs: priority_from_u8() for all 7 valid values plus OOB fallback - names.rs: is_request/reply/pubsub_topic(), ros2↔dds type roundtrip, namespace handling edge cases Total: 76 → 118 tests (42 new, 0 failures)
…r_format Add four public accessors on ZNode needed by ros-z-dds bridge types: - session() → &Arc<Session> - keyexpr_format() → &KeyExprFormat - next_entity_id() → usize - node_entity() → &NodeEntity Add ZContextBuilder::with_key_expr_format() alias so callers can select RmwZenoh or Ros2Dds key expression format at context creation time.
Add ros-z-dds as a first-class library crate exposing: - DdsParticipant trait + CyclorsParticipant impl (from ros-z-bridge-ros2dds) - ZDdsPubBridge — DDS reader → Zenoh publisher - ZDdsSubBridge — Zenoh subscriber → DDS writer - ZDdsServiceBridge — DDS service server → Zenoh queryable (complete(true)) - ZDdsClientBridge — Zenoh querier → DDS service client - ZDdsBridge — auto-discovery orchestrator with allow/deny filters - DdsBridgeExt — typed extension trait on ZNode Bridge types use node.keyexpr_format().topic_key_expr() for key expression construction and node.session() for Zenoh entity declaration; no intermediate abstractions are needed in ros-z core. TRANSIENT_LOCAL topics use AdvancedPublisher/AdvancedSubscriber from zenoh-ext.
Replace the monolithic ros-z-bridge-ros2dds crate with a minimal
zenoh-bridge-dds binary (ros-z-bridge-dds) that delegates all bridge
logic to the ros-z-dds library:
main.rs (~40 lines):
ZContextBuilder + ZNode + ZDdsBridge::new(node, participant).run()
config.rs: simplified CLI — single --domain-id, --allow, --deny,
--node-name, timeout and cache multiplier flags.
Delete ros-z-bridge-ros2dds/src/{bridge,liveliness,wire_format,routes/,dds/}
— all logic now lives in ros-z-dds.
Update ros-z-tests: rename ros2dds_bridge.rs → bridge_dds.rs, update
spawn_bridge() to use --domain-id (singular).
Add RosDiscoveryPublisher that writes ParticipantEntitiesInfo CDR-LE to the ros_discovery_info DDS topic (~1 Hz), making bridge endpoints visible to ros2 topic list / node list / service list. - ros_discovery.rs: CDR structs + background-task publisher - gid.rs: add serde Serialize/Deserialize to Gid (needed for CDR) - pubsub.rs/service.rs: expose reader_guid()/writer_guid() per route - bridge.rs: create publisher on startup, register/unregister GIDs as routes are created and removed via gid_to_name undiscovery - Cargo.toml: add cdr = 0.2.4 and serde/derive
…rage - Add --wire-format CLI flag to zenoh-bridge-dds (rmw-zenoh default, ros2dds legacy) - Fix tests 5–6 to use spawn_bridge_ros2dds (bare keys require ros2dds format) - Add tests 7–10: rmw-zenoh primary tests (pub/sub and service in both directions) - Add tests 11–13: API-level construction tests for ZDdsPubBridge, ZDdsSubBridge, and DdsBridgeExt typed helpers — no binary spawn required - Enable ros-z-dds as optional dep in ros-z-tests under dds-bridge-interop feature
Subscribe to @ros2_lv/** liveliness and create complementary DDS routes when a remote bridge announces endpoints: - Remote Publisher → local ZDdsSubBridge (Zenoh→DDS relay) - Remote Subscription → local ZDdsPubBridge (DDS→Zenoh relay) Routes are reference-counted via RouteEntry<R> which tracks both local DDS GIDs and remote bridge ZIDs. A route persists until all sources retire, so federation routes survive local DDS endpoint churn. Also fix liveliness key expressions to always include the type name: when type_hash is None, use TypeHash::zero() as placeholder so remote bridges can reconstruct DDS entities for the correct message type. Add qos_profile_to_bridge_qos() inverse of bridge_qos_to_qos_profile() for converting parsed QosProfile back to BridgeQos when creating federated DDS endpoints.
Remote Service server → local ZDdsClientBridge (DDS clients can call it). Remote Client → local ZDdsServiceBridge (DDS servers can respond to it). Also migrates srv_routes and cli_routes from plain HashMap to RouteEntry<R> so service routes are reference-counted by local DDS GIDs and remote bridge ZIDs, matching the pub/sub route lifecycle introduced for federation. Action get_result timeout is applied when the federated service name contains /_action/get_result.
- New dds-bridge.md chapter covering architecture, quick start, CLI reference, wire format, topic filtering, federation, DDS discovery scope, programmatic API, and troubleshooting - auto_bridge.rs: runnable example wrapping ZDdsBridge auto-discovery - custom_bridge.rs: typed bridge using DdsBridgeExt with RosString - Minimal README.md for ros-z-dds and ros-z-bridge-dds crates - DDS Bridge added to mkdocs.yml Guides section - DDS bridge examples section added to examples.md - Dev-dependencies for examples: tracing-subscriber, ros-z-msgs, tokio/signal
- Remove with_key_expr_format alias from ZContextBuilder; callers use keyexpr_format() directly (main.rs updated) - Remove bridge_dds_service/bridge_dds_client from DdsBridgeExt: generated request types use ::msg::dds_:: naming which doesn't match the _Request_ suffix expected by dds_type_to_ros2_service_type - Remove Transient/Persistent variants from DurabilityKind; ROS 2 nodes only produce Volatile/TransientLocal, and the CDurabilityKind conversion now maps the unused values to Volatile
ZDdsServiceBridge::new no longer takes a timeout argument — the DDS reply wait is driven by the Zenoh querier timeout on the client side; the service bridge side has no reply deadline to enforce. liveliness_subscription_pattern() drops its KeyExprFormat parameter — both wire formats use the same @ros2_lv/** admin prefix, so the argument served no purpose.
… code - Scope internal items: compute_cache_size → fn, reader_guid/writer_guid/ wire_discriminant → pub(crate), reader/writer RAII field → _route - Move wire_discriminant impls into #[cfg(test)] blocks; remove empty public impls - Mark ZenohSubHandle #[allow(dead_code)] (RAII variant payloads) - Fix compiler warnings: unused dp binding, redundant nested unsafe, dead iov_len_to_usize function, #[allow(async_fn_in_trait)] on DdsBridgeExt - Change internal modules to pub(crate): cyclors, discovery, gid, names, qos, ros_discovery - Narrow crate-root re-exports: remove RosDiscoveryPublisher, DdsReader, DdsWriter (no external callers) - Remove dead code: service_default_bridge_qos, qos_mismatch_reason, is_reply_topic, ros2_name_to_zenoh_key, is_reliable, is_transient_local, and delete types.rs entirely (DDSRawSample + ddsrt_iov_len_to_usize)
a86ca5f to
aceb954
Compare
Codecov Report❌ Patch coverage is
... and 3 files with indirect coverage changes 🚀 New features to boost your workflow:
|
…_to_zenoh_key and is_reply_topic
…elf, field_reassign_with_default, unnecessary casts)
…idge tests Three failing bridge integration tests (tests 4, 8, 9): - ZDdsServiceBridge req_writer used adapt_writer_qos_for_reader (→ BestEffort) instead of adapt_reader_qos_for_writer (→ Reliable). A BestEffort DDS writer cannot communicate with a ROS 2 service server's Reliable request reader, so the DDS write was silently dropped and the Zenoh query timed out. - ZDdsSubBridge declared its Zenoh subscriber on the exact EMPTY_TOPIC_TYPE/ EMPTY_TOPIC_HASH key derived from the DDS endpoint. When the DDS listener carried a real type hash in user_data the key was specific, so Zenoh publishers using the EMPTY key (test 8) were not routed to the bridge. Fix: subscribe with a wildcard suffix (topic/**) so any publisher on the same ROS topic is forwarded regardless of type/hash. - ZDdsServiceBridge declared its Zenoh queryable on the same specific key, so test 9 (querying EMPTY key) found no complete queryable. Fix: use topic/** wildcard queryable to accept queries from any client type/hash. Also redirect bridge stderr to the log file (was Stdio::null()) so bridge errors are visible when tests fail.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ros-z-ddslibrary crate exposing a DDS↔Zenoh bridge as a first-class ros-z componentros-z-bridge-ddsbinary (thin CLI shell overros-z-dds) connecting existing DDS-based ROS 2 nodes to a Zenoh/ros-z network without recompilation or code changes on either sideros_discovery_infopublishing soros2 topic/node/service listworks on both sidesKey Changes
crates/ros-z-dds/— new library crate withZDdsBridge(auto-discovery),ZDdsPubBridge,ZDdsSubBridge,ZDdsServiceBridge,ZDdsClientBridge,DdsBridgeExttyped-bridge traitcrates/ros-z-bridge-dds/— thin CLI binary wiringros-z-ddsto clap argsdocs/user-guide/dds-bridge.md— full documentation chapter including migration guide fromzenoh-plugin-ros2ddscrates/ros-z-dds/examples/—auto_bridge.rsandcustom_bridge.rsrunnable examplesFixes zenoh-plugin-ros2dds issues
#697 — Publications not delivered to remote_api WebSocket subscribers
Fixed by architecture. The old plugin ran embedded inside
zenohdalongsideremote_api; its internal publishers did not propagate to co-loaded plugin subscribers. This bridge is a standalone process with its own Zenoh client session. It publishes through the router like any other Zenoh client — WebSocket subscribers on the router receive the data normally.#690 — Route drops when many DDS participants join simultaneously
Likely fixed. The old plugin had a transient-local route renegotiation step that could tear down an existing route under simultaneous participant joins and fail to re-establish it. This bridge has no renegotiation step: each DDS endpoint is independently tracked by GID. Adding new endpoints never disturbs existing routes. TRANSIENT_LOCAL caching is handled by
AdvancedPublisherindependently of route lifecycle.#647 — Service reply dropped due to wrong client_guid in CDR framing
Fixed. The old plugin embedded an incorrect
client_guidin the CDR request body, so the DDS server's reply was addressed to an unrecognized GUID and dropped. This bridge derivesclient_guidfromreq_writer.instance_handle()— the actual DDS writer GID — so the server addresses its reply correctly. In-flight queries are also tracked by a locally-generated sequence number, making response matching robust across DDS vendor differences.#642 — Services silently fail in bridge → router → bridge topology
Fixed. The old plugin ran embedded in
zenohd, creating ambiguity in how query replies were routed back through the same session. This bridge runs in client mode against an external router — queries travel client→router→client with clean routing symmetry. Queryables are declared.complete(true)so the router propagates them as capable service endpoints. Default service timeout is 10 s (vs the old 5 s); action get_result has its own 300 s timeout.#576 — Namespace doesn't filter subscriptions; other users' traffic bleeds through
Fixed. The old plugin's
--namespaceonly scoped outbound Zenoh publications; the bridge still subscribed to the full keyspace and re-injected all traffic into local DDS. In this bridge the namespace is part of every Zenoh key expression — both publications and subscriptions. A bridge with namespace/botAsubscribes only to0/botA/**keys and never receives publications from/botB.#570 — Duplicate messages accumulate on each bridge restart
Fixed. The old plugin left stale route/liveliness entries on disconnect; each reconnect added a new route without removing the old ones. This bridge cleans up both sides: CycloneDDS undiscovery events trigger GID-based route removal; Zenoh liveliness tokens are tied to the session and automatically retracted on disconnect, causing the peer bridge to retire federation routes via
SampleKind::Deletehandling. A restarted bridge starts clean with a new ZID.Reproducible Demos
Three self-contained Docker/Podman gists (each is a two-file
Dockerfile+run.sh) that build everything from source and run an automated PASS/FAIL check end-to-end:ZDdsSubBridgewildcard-key fixZDdsServiceBridgewildcard-queryable fixBreaking Changes
None —
ros-z-ddsandros-z-bridge-ddsare new crates. Existingros-zpublic APIs are unchanged.