Skip to content

fix(RDKEMW-16441): gateway connect fails on numeric loopback IP when no external network#83

Open
brendanobra wants to merge 32 commits intomainfrom
topic/RDKEMW-16441
Open

fix(RDKEMW-16441): gateway connect fails on numeric loopback IP when no external network#83
brendanobra wants to merge 32 commits intomainfrom
topic/RDKEMW-16441

Conversation

@brendanobra
Copy link
Copy Markdown
Contributor

Problem

On RDK STBs where eth0/wifi are not active at boot (e.g. early app startup),
connecting to ws://127.0.0.1:3474 fails with "Underlying Transport Error".

Root cause: websocketpp's tcp::resolver::query uses AI_ADDRCONFIG (the POSIX
default) which causes getaddrinfo("127.0.0.1") to return HOST_NOT_FOUND when
no non-loopback interface is configured — even though the address is numeric and
loopback is always reachable.

Fix

Patched src/vendor/websocketpp/transport/asio/endpoint.hpp:

  • Detect numeric IP hosts via asio::ip::address::from_string() before calling
    async_resolve
  • Use resolver::query::numeric_host (AI_NUMERICHOST) for numeric IPs — bypasses
    AI_ADDRCONFIG, no DNS query is sent, resolution is in-kernel only
  • Hostname strings (e.g. "localhost") continue to use address_configured as before

Tests

  • TransportNumericIPUTest::ConnectViaNumericLoopbackIP — connect to ws://127.0.0.1
  • TransportNumericIPUTest::SendAndReceiveViaNumericIP — full data round-trip over numeric IP
  • TransportNumericIPResolverTest::ConnectFailureViaNumericIP — error path with no server listening
  • TransportIPv6UTest::ConnectViaIPv6LoopbackIP — same fix path exercised for IPv6 ::1

All four run under --network none (Docker, loopback only) to precisely reproduce
the STB environment. CI has a dedicated network_isolation_tests job that runs
these in docker run --network none.

Other changes

  • build.sh: default SYSROOT_PATH to / for dev-box builds; --docker flag
    to build/run inside the CI image without local deps; stale CMakeCache wipe when
    switching between host and Docker builds
  • cmake/version.cmake + .version: explicit 1.1.8 version pin (git describe
    was resolving ambiguously to v1.1.7)

bobra200 added 5 commits April 13, 2026 07:36
websocketpp constructs tcp::resolver::query(host, port) using the
Boost ASIO default flags, which include AI_ADDRCONFIG.  That flag
tells getaddrinfo(3) to return results only when a local interface
for the address family is configured.  On certain embedded devices
(RDK STBs) the loopback interface is not brought up independently --
it comes up as part of the general networking init.  When eth0/wifi
are down, lo is also down, so AI_ADDRCONFIG causes getaddrinfo to
return HOST_NOT_FOUND even for 127.0.0.1, producing:

  asio async_resolve error: asio.netdb:1 (Host not found (authoritative))
  WebSocket Connection Unknown ... websocketpp.transport:2 Underlying Transport Error

Fix: vendor websocketpp/transport/asio/endpoint.hpp (the one file
that constructs the resolver query).  For hosts that parse as a
numeric IP address via asio::ip::address::from_string(), use the
numeric_host flag (AI_NUMERICHOST) instead of address_configured
(AI_ADDRCONFIG).  This bypasses the DNS resolver and the
AI_ADDRCONFIG interface-presence check entirely.  For genuine
hostnames the existing address_configured behaviour is preserved.

The vendor/ directory is placed before system websocketpp headers in
target_include_directories so the patched file takes priority.
@brendanobra brendanobra requested review from Copilot and removed request for Copilot April 13, 2026 17:38
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Copyright scan failure
Commit: 498a789
Report detail: https://gist.github.com/rdkcmf-jenkins/dc687cd43780021c597c3f95072f5e28'

@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 498a789

Report detail: gist'

Copilot AI review requested due to automatic review settings April 13, 2026 17:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Firebolt gateway connection failures on numeric loopback IPs (e.g., ws://127.0.0.1) in “no external network” environments by patching websocketpp resolver behavior, and adds targeted network-isolation tests and CI coverage to prevent regressions.

Changes:

  • Patch vendored websocketpp Asio transport to treat numeric IP hosts as numeric_host (AI_NUMERICHOST), bypassing AI_ADDRCONFIG.
  • Add new unit tests and a CI job that runs them under docker --network none, plus a helper integration script to reproduce locally.
  • Introduce initial gateway connection retry settings/config and implement a retry loop in GatewayImpl::connect().

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/vendor/websocketpp/transport/asio/endpoint.hpp Resolver change to detect numeric IPs and force numeric_host during async resolve.
src/CMakeLists.txt Ensure src/vendor headers take precedence over system websocketpp headers.
test/unit/transportTest.cpp Add regression tests for numeric IPv4/IPv6 loopback and numeric-IP failure path.
test/integration/test-no-network.sh Local reproduction script for loopback-only / no-network environment.
.github/workflows/ci.yml Add network_isolation_tests job running selected tests with --network none.
src/gateway.cpp Add initial connection retry loop + watchdog wakeups; adjust disconnect logging/behavior.
include/firebolt/config.h.in Add reconnect_max_attempts and reconnect_delay_ms config knobs.
cmake/version.cmake Prefer .version file over git describe when present.
build.sh Add --docker mode and SYSROOT defaulting; attempt to wipe stale build dirs.

Comment thread src/gateway.cpp
Comment thread src/gateway.cpp
Comment thread src/gateway.cpp Outdated
Comment thread build.sh
Comment thread test/unit/transportTest.cpp Outdated
Comment thread src/gateway.cpp
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Copyright scan failure
Commit: 96ee593
Report detail: https://gist.github.com/rdkcmf-jenkins/c6ab54f55ab791fbe84636d2ca839063'

@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 96ee593

Report detail: gist'

src/vendor/websocketpp is a patched copy of websocketpp 0.8.2
(BSD-3-Clause, Copyright Peter Thorson). It is documented in
NOTICE and LICENSE. Exclude from BlackDuck scan to avoid
component re-identification.
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Copyright scan failure
Commit: b359a31
Report detail: https://gist.github.com/rdkcmf-jenkins/983a69d584719cb0f607b5f37c54a7a3'

@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: b359a31

Report detail: gist'

@brendanobra
Copy link
Copy Markdown
Contributor Author

@rdkcmf-jenkins re-scan

- gateway.cpp: reset connectResultError to None at the start of each
  retry attempt so a stale error from attempt N is never reported as
  the final error of attempt N+1 (Copilot line 537)
- gateway.cpp: add connectionListenerMtx to serialize reads/writes of
  connectionChangeListener between the websocketpp IO thread and the
  main thread (Copilot line 542)
- gateway.cpp: on 10-second connect timeout (connectResultReady not
  set), call transport.disconnect() to abort the in-flight async
  attempt before retrying (Copilot line 529 - stale attempt race)
- gatewayTest: add GatewayRetryUTest with three tests covering retry
  behaviour: server-delayed connect, no-retry fast-fail, and
  disconnect-aborts-retry-delay (Copilot line 529 - missing tests)
- transportTest: catch std::exception in the echo server message
  handler and report via ADD_FAILURE() instead of swallowing silently
  (Copilot line 740)
- build.sh: fix --docker incremental build wipe -- compare
  CMAKE_HOME_DIRECTORY against /workspace (container path) rather than
  $SCRIPT_DIR (host path) so Docker-built caches survive subsequent
  --docker invocations (Copilot build.sh:41)
Copilot AI review requested due to automatic review settings April 13, 2026 21:58
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Copyright scan failure
Commit: 43a6ee8
Report detail: https://gist.github.com/rdkcmf-jenkins/caa8345cfe6f8c3cdef1923831545c7c'

@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 43a6ee8

Report detail: gist'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a websocketpp DNS resolution edge case where connecting to a numeric loopback IP (e.g., 127.0.0.1) can fail on embedded devices with no configured non-loopback interfaces due to AI_ADDRCONFIG. It vendors a patched websocketpp ASIO endpoint resolver behavior and adds coverage (including CI network-isolated runs) plus a new initial-connect retry policy in the Gateway.

Changes:

  • Patch vendored websocketpp ASIO resolver to use numeric_host for numeric IP literals (bypassing AI_ADDRCONFIG).
  • Add unit + integration/CI “no network” coverage for numeric IPv4/IPv6 loopback connections and numeric-IP failure behavior.
  • Add Gateway initial connection retry policy (reconnect_max_attempts / reconnect_delay_ms) and supporting tests; enhance shutdown/watchdog responsiveness.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/vendor/websocketpp/transport/asio/endpoint.hpp Resolver flag change for numeric IP hosts to avoid AI_ADDRCONFIG failures.
src/CMakeLists.txt Ensures vendored websocketpp headers take precedence over system headers.
test/unit/transportTest.cpp Adds numeric IPv4/IPv6 loopback regression tests + numeric-IP failure test.
test/unit/gatewayTest.cpp Adds retry-policy tests for Gateway connect behavior.
include/firebolt/config.h.in Introduces public config knobs for initial connect retry policy.
src/gateway.cpp Implements connect retry loop and adjusts watchdog/shutdown behavior.
test/integration/test-no-network.sh Adds a reproducible “loopback-only” test runner (Docker/unshare).
.github/workflows/ci.yml Adds a CI job to run selected tests under docker --network none.
build.sh Adds --docker convenience and improves stale CMakeCache handling.
cmake/version.cmake Prefers .version pin before git describe for deterministic versioning.
.bdignore Excludes vendored third-party source from BlackDuck scans.

Comment thread src/gateway.cpp Outdated
Comment thread src/gateway.cpp Outdated
Comment thread src/gateway.cpp
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 13, 2026 22:08
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Copyright scan failure
Commit: 4e39799
Report detail: https://gist.github.com/rdkcmf-jenkins/aa68825cd8d552c0a8cb61013c45ce41'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Firebolt gateway connection failures when using numeric loopback IPs on devices without external network interfaces, and adds test/CI coverage for this scenario (including network-isolated execution). It also introduces an initial-connect retry policy and some developer/CI build improvements.

Changes:

  • Patch vendored WebSocket++ ASIO resolver behavior to use numeric_host for numeric IPs (bypassing AI_ADDRCONFIG).
  • Add unit + network-isolation tests for numeric IPv4/IPv6 loopback connections and resolver error paths; add gateway initial-connect retry tests.
  • Update gateway connect logic to support retries and improve disconnect/watchdog behavior; add CI job and build tooling updates.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/vendor/websocketpp/transport/asio/endpoint.hpp Detect numeric IPs and resolve with numeric_host to avoid AI_ADDRCONFIG failures.
src/CMakeLists.txt Ensure src/vendor include path precedes system WebSocket++ headers so the patch takes effect.
src/gateway.cpp Add initial-connect retry loop, callback synchronization, and watchdog/disconnect responsiveness changes.
include/firebolt/config.h.in Add reconnect_max_attempts / reconnect_delay_ms config fields.
include/firebolt/gateway.h Document callback threading semantics for connect() vs subsequent callbacks.
test/unit/transportTest.cpp Add numeric IPv4/IPv6 loopback connection + round-trip + failure-path tests.
test/unit/gatewayTest.cpp Add retry behavior unit tests for the gateway connect logic.
test/integration/test-no-network.sh Add a script to run targeted tests under network isolation (Docker/unshare).
.github/workflows/ci.yml Add network_isolation_tests CI job running the numeric-IP tests with --network none.
build.sh Add --docker mode, default SYSROOT_PATH, and stale cache wiping between environments.
cmake/version.cmake Prefer .version pin before git describe to avoid ambiguous tag resolution.
.version Pin project version to 1.1.8.
LICENSE Append BSD-3 license text (currently with placeholders).
.bdignore Exclude vendored sources from BlackDuck scans.

Comment thread src/gateway.cpp
Comment thread LICENSE
Comment thread src/gateway.cpp Outdated
…ht attempts promptly

Transport::disconnect() only called client_->close() when connectionStatus_
was Connected.  During DNS/TCP/WebSocket handshake the status stays
Disconnected, so stop_perpetual() alone did not cancel the pending async
operation and connectionThread_->join() could block for the full OS-level
TCP timeout rather than returning promptly.

Introduce TransportState::Connecting (set in connect() after client_->connect()
is queued, cleared by onOpen/onFail/onClose):

* disconnect() detects Connecting and calls client_->stop(), which forces the
  asio io_service to exit run() immediately so the connection thread can be
  joined without delay.  onFail will not fire after stop(), which is safe
  because the gateway condvar is already woken by disconnectRequested_.
* All other state transitions are unchanged: onOpen -> Connected,
  onFail/onClose -> Disconnected.
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 4d41cc0

Report detail: gist'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Firebolt gateway connection failures on numeric loopback IPs (e.g. ws://127.0.0.1) in “no external network” environments by patching the vendored WebSocket++ resolver behavior, and adds CI coverage for the regression scenario.

Changes:

  • Patch WebSocket++ Asio resolver to use numeric_host for numeric IP hosts, bypassing AI_ADDRCONFIG.
  • Add unit + network-isolation integration coverage for numeric IPv4/IPv6 loopback connections and gateway initial-connect retry behavior.
  • Add build/CI/versioning updates (Docker build option, .version pin, CI job for --network none tests).

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/vendor/websocketpp/transport/asio/endpoint.hpp Uses numeric_host for numeric IP resolution to avoid AI_ADDRCONFIG failures.
src/CMakeLists.txt Ensures vendored WebSocket++ headers take precedence over system headers.
src/transport.cpp Adds Connecting state and aborts in-flight connects on disconnect() via client_->stop().
src/gateway.cpp Implements initial-connect retry loop, early-abort on disconnect, and watchdog thread wake-up.
include/firebolt/config.h.in Adds reconnect_max_attempts / reconnect_delay_ms configuration fields.
include/firebolt/gateway.h Documents connect callback threading behavior (connect thread vs IO thread).
test/unit/transportTest.cpp Adds numeric-IP regression tests (IPv4 loopback, IPv6 loopback, failure path).
test/unit/gatewayTest.cpp Adds retry-policy unit tests for delayed server, no-retry, disconnect abort, already-connected behavior.
test/integration/test-no-network.sh Adds a network-isolation runner (Docker --network none, fallback to unshare --net).
.github/workflows/ci.yml Adds a dedicated network_isolation_tests job to run the new tests under --network none.
build.sh Adds --docker mode, SYSROOT defaulting, and stale CMake cache wiping when switching environments.
cmake/version.cmake Prefers .version pin before attempting git describe.
.version Pins project version to 1.1.8.
LICENSE Appends a BSD-3 template block intended for third-party licensing.
.bdignore Excludes src/vendor/ from BlackDuck scanning.

Comment thread LICENSE
Comment thread src/gateway.cpp
Comment thread include/firebolt/config.h.in Outdated
Comment thread build.sh Outdated
gateway.cpp: clamp reconnect_max_attempts to 100 before 1+N to prevent
  unsigned overflow when caller passes UINT_MAX or similar large value.

gateway.cpp + config.h.in: replace hard-coded 10s kConnectTimeout with
  cfg.connect_attempt_timeout_ms (default 10000 ms, backward compatible).
  Update reconnect_max_attempts docstring to describe the per-attempt
  timeout, the effective maximum blocking duration formula, and the 100-
  retry cap.  Add connect_attempt_timeout_ms field with full docstring.

build.sh: replace unsafe '-n "$@"' condition (undefined behaviour with
  multiple args inside [[ ]]) with '$# -gt 0'.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Firebolt gateway connection failures on RDK devices when connecting to numeric loopback IPs (e.g., 127.0.0.1) in “no external network” boot scenarios, and adds an initial-connect retry policy plus CI coverage for loopback-only environments.

Changes:

  • Patch vendored WebSocket++ Asio resolver logic to treat numeric IP hosts as numeric_host (AI_NUMERICHOST) to bypass AI_ADDRCONFIG.
  • Add initial connection retry/timeout behavior in the gateway + transport disconnect improvements for in-flight connects.
  • Add unit/integration tests and a CI job that runs selected tests under docker --network none.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/vendor/websocketpp/transport/asio/endpoint.hpp Detect numeric IP hosts and force numeric_host resolver flags to avoid AI_ADDRCONFIG failures.
src/CMakeLists.txt Ensures the vendored WebSocket++ header shadowing takes priority during library compilation.
src/transport.cpp Adds a Connecting state and abort logic to stop an in-progress connect on disconnect.
src/gateway.cpp Implements initial-connect retry loop with per-attempt timeout and early-abort on disconnect; adds synchronization and watchdog wake-up.
include/firebolt/config.h.in Documents and introduces retry-related config knobs for initial connect.
include/firebolt/gateway.h Documents callback threading behavior for connect() vs subsequent connection events.
test/unit/transportTest.cpp Adds numeric-IP and IPv6 loopback regression/unit tests (including failure path).
test/unit/gatewayTest.cpp Adds unit tests covering retry behavior, disconnect-abort, and AlreadyConnected callback semantics.
test/integration/test-no-network.sh Adds a loopback-only runner using Docker --network none (or unshare --net).
.github/workflows/ci.yml Adds a dedicated network_isolation_tests job running the loopback-only test filter.
build.sh Adds --docker support, defaults SYSROOT_PATH, and wipes stale CMake caches when switching environments.
cmake/version.cmake Prefers .version over git describe to make version selection deterministic.
.version Pins version to 1.1.8.
LICENSE Appends BSD-3 license text (but currently with placeholders).
.bdignore Excludes src/vendor/ from BlackDuck scanning.

Comment thread LICENSE
Comment thread src/gateway.cpp
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 3d62c00

Report detail: gist'

The condvar wrapper is installed in connectionChangeListener before
transport.connect() is called so it is in place when onOpen/onFail fires.
If transport.connect() returns AlreadyConnected, the wrapper is restored
to previousListener — but in the intervening window an IO-thread event
(e.g. the live connection dropping) could hit the wrapper and be silently
discarded instead of reaching the previously-registered callback.

Fix: capture previousListener and a shared skipPrevForward flag inside
the wrapper.  Events arriving before skipPrevForward is set (i.e. during
the AlreadyConnected window) are forwarded to previousListener so they are
not dropped.  skipPrevForward is set to true immediately after
transport.connect() returns Firebolt::Error::None (a new attempt was
actually queued), disabling the forward path so genuine new-attempt
events are not spuriously delivered to the old callback.
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 4cbe731

Report detail: gist'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses connection failures when using numeric loopback IPs (e.g. ws://127.0.0.1) on devices with no external network configured at boot by forcing numeric-host resolution to bypass AI_ADDRCONFIG. It also adds an initial-connection retry policy to the gateway and introduces network-isolation coverage in CI.

Changes:

  • Patch websocketpp’s ASIO resolver path to use numeric_host for numeric IPs (IPv4/IPv6), avoiding AI_ADDRCONFIG-triggered failures.
  • Add initial connection retry/timeout behavior to GatewayImpl::connect() plus watchdog shutdown improvements.
  • Add unit/integration tests (including --network none) and CI job wiring; update build/version tooling.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/vendor/websocketpp/transport/asio/endpoint.hpp Vendored override of websocketpp endpoint.hpp to apply numeric-host resolver flags for numeric IPs.
src/CMakeLists.txt Ensures src/vendor include path is preferred so the patched websocketpp header takes effect.
src/transport.cpp Adds Connecting state and aborts in-progress connects via client_->stop() during disconnect.
src/gateway.cpp Implements initial connect retry loop with per-attempt timeout and disconnect-abort behavior; adds synchronization around connection listener and watchdog wakeups.
include/firebolt/config.h.in Adds config fields for retry policy and per-attempt connect timeout (with documentation).
include/firebolt/gateway.h Documents callback threading behavior for connect() vs subsequent connection events.
test/unit/transportTest.cpp Adds regression/unit tests for numeric IPv4/IPv6 loopback behavior and numeric-IP failure path.
test/unit/gatewayTest.cpp Adds unit tests validating retry policy behaviors (delayed server, no-retry fail fast, disconnect abort, AlreadyConnected behavior).
test/integration/test-no-network.sh Adds a local runner for loopback-only / no-external-network reproduction via Docker or unshare.
.github/workflows/ci.yml Adds a dedicated network_isolation_tests job running selected tests under --network none.
build.sh Adds --docker helper mode, defaults SYSROOT_PATH to /, and wipes stale CMake caches when switching environments.
cmake/version.cmake Prefers .version file over git describe when present.
.version Pins project version string (1.1.8).
LICENSE Appends a BSD-3 license template block (currently with placeholders).
.bdignore Excludes vendored third-party source from BlackDuck scanning.

Comment thread src/transport.cpp
Comment thread LICENSE
connectionStatus_ is already std::atomic<TransportState> but all reads
and writes used implicit conversion operators, obscuring the atomic
semantics and leaving memory ordering unspecified.  Replace with explicit
.load()/.store() (default seq_cst) throughout connect(), disconnect(),
send(), start(), and the three websocketpp callbacks.
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: 87093da

Report detail: gist'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Firebolt gateway connection failures in “no external network” environments (e.g., early boot on RDK STBs) by ensuring numeric loopback IPs (like 127.0.0.1 / ::1) resolve without AI_ADDRCONFIG interference, and adds an initial-connect retry policy plus CI coverage for network-isolated scenarios.

Changes:

  • Patch websocketpp resolution to use numeric_host for numeric IPs to bypass AI_ADDRCONFIG.
  • Add initial connection retry logic in GatewayImpl::connect() (bounded attempts, delay slicing, per-attempt timeout) and Transport “Connecting” state handling.
  • Add numeric-IP + IPv6 regression tests and a dedicated CI job running tests under docker --network none, plus supporting build/version script updates.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/unit/transportTest.cpp Adds numeric-IP + IPv6 loopback regression unit tests and a numeric-IP failure-path test.
test/unit/gatewayTest.cpp Adds unit tests validating initial-connect retry behavior and edge cases (disconnect abort, already-connected).
test/integration/test-no-network.sh Adds a helper script to run selected tests under network isolation (Docker preferred; unshare --net fallback).
src/vendor/websocketpp/transport/asio/endpoint.hpp Vendored websocketpp header with resolver-flag change for numeric hosts.
src/transport.cpp Introduces Connecting state and aborts in-flight connect attempts during disconnect() via client_->stop().
src/gateway.cpp Implements initial-connect retry loop, connection-result synchronization, and watchdog wake-up via condition variable.
src/CMakeLists.txt Ensures src/vendor/ precedes system websocketpp headers for the transport target.
include/firebolt/gateway.h Documents callback threading behavior for connect() vs subsequent events.
include/firebolt/config.h.in Adds public config fields for retry policy and per-attempt connect timeout with documentation.
cmake/version.cmake Prefers .version file over git describe when present.
build.sh Adds --docker convenience mode, improves stale-cache handling, and defaults SYSROOT_PATH to /.
LICENSE Appends BSD-3 license text intended for vendored websocketpp attribution.
.version Pins project version to 1.1.8.
.github/workflows/ci.yml Adds network_isolation_tests job executing unit tests under docker --network none.
.bdignore Excludes src/vendor/ from BlackDuck scanning.
Comments suppressed due to low confidence (2)

src/gateway.cpp:431

  • GatewayImpl::~GatewayImpl() stops/join watchdog but doesn’t call disconnect(). Because members like connectionListenerMtx/connectResultCv are destroyed before the Transport member (due to declaration order), any late Transport onClose/onFail callback during Transport’s destructor can re-enter GatewayImpl::onConnectionChange() and attempt to lock already-destroyed mutexes (UB/crash). Call disconnect() (or otherwise fully tear down the Transport/IO thread and ensure callbacks can’t fire) from ~GatewayImpl() while all synchronization primitives are still alive.
    ~GatewayImpl()
    {
        if (watchdogRunning)
        {
            watchdogRunning = false;
            watchdogCv.notify_one();
            if (watchdogThread.joinable())
            {
                watchdogThread.join();
            }
        }
    }

src/CMakeLists.txt:94

  • The patched websocketpp header is only added as a PRIVATE include directory. Unit tests (and any other targets) compile websocketpp headers via system includes and won’t see src/vendor/, which can lead to different definitions of the same websocketpp templates across translation units (ODR violation / UB) if some TUs include the system endpoint.hpp and others include the patched one. Ensure all targets that include websocketpp (especially test/utApp, which directly uses websocketpp::server) see the same patched header, e.g. by propagating the vendor include path to those targets or making it an INTERFACE include for internal builds.
target_include_directories(${TARGET}
    PRIVATE
        # vendor/ must come before the system websocketpp headers so that
        # our patched endpoint.hpp (numeric-IP resolver fix) takes priority.
        $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/vendor>
        $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>
        $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
    PUBLIC
        $<BUILD_INTERFACE:${CMAKE_SOURCE_DIR}/include>
        $<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
        $<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
)

Comment thread src/gateway.cpp Outdated
Comment thread LICENSE
The skipPrevForward flag was set after transport.connect() returned, but
onOpen/onFail can fire on the IO thread between client_->connect() being
queued inside transport.connect() and skipPrevForward->store(true)
executing in gateway.cpp.  In that window the wrapper incorrectly
forwards the new-attempt callback to previousListener (the old session's
callback), triggering a spurious user-visible event.

Remove the forwarding mechanism entirely.  The wrapper now only signals
the condvar and never touches previousListener.  For the AlreadyConnected
window the theoretical one-event loss is negligible (synchronous return,
listener restored before connect() returns) and correct behaviour is
preferable to the data race.
@brendanobra brendanobra requested a review from Copilot April 14, 2026 22:30
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## Blackduck scan failure details

Summary: 0 violations, 0 files pending approval, 1 file pending identification.

  • Protex Server Path: /home/blackduck/github/firebolt-cpp-transport/83/rdkcentral/firebolt-cpp-transport

  • Commit: f1ea7a9

Report detail: gist'

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses connection failures to local numeric loopback WebSocket endpoints (e.g. ws://127.0.0.1) on devices with no configured external network interfaces at boot, and adds an initial-connect retry policy plus CI coverage for network-isolated scenarios.

Changes:

  • Patch vendored WebSocket++ resolver behavior to treat numeric IP hosts as AI_NUMERICHOST to bypass AI_ADDRCONFIG.
  • Add initial connection retry/timeout behavior in GatewayImpl::connect() and improve disconnect/watchdog responsiveness.
  • Add unit + network-isolated CI tests for numeric IP / IPv6 loopback connectivity and retry scenarios; add .version pin + build script enhancements.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/unit/transportTest.cpp Adds regression/unit tests for numeric IPv4/IPv6 loopback connect + send/receive + failure path.
test/unit/gatewayTest.cpp Adds retry-policy unit tests for delayed server, no-retry fast-fail, disconnect abort, already-connected behavior.
test/integration/test-no-network.sh Adds a helper script to run selected tests under Docker --network none (or unshare --net).
src/vendor/websocketpp/transport/asio/endpoint.hpp Applies numeric-host detection and uses numeric_host resolver flag for numeric IPs.
src/transport.cpp Adds Connecting state and adjusts disconnect behavior to abort in-flight connects; converts state ops to atomic loads/stores.
src/gateway.cpp Implements initial-connect retry loop with per-attempt timeout, early-abort on disconnect, listener synchronization, watchdog wakeups.
src/CMakeLists.txt Ensures vendored websocketpp headers take precedence over system headers.
include/firebolt/gateway.h Documents callback threading behavior for connect() vs subsequent connection events.
include/firebolt/config.h.in Introduces and documents reconnect_max_attempts, reconnect_delay_ms, connect_attempt_timeout_ms.
cmake/version.cmake Prefers .version over git describe to avoid ambiguous tag resolution.
build.sh Adds --docker mode, defaults SYSROOT_PATH to /, and wipes stale CMake caches when switching environments.
LICENSE Appends BSD-3 license text (currently with placeholders).
.version Pins project version to 1.1.8.
.github/workflows/ci.yml Adds a network_isolation_tests job running selected tests under --network none.
.bdignore Excludes src/vendor/ from BlackDuck scans.

Comment thread src/transport.cpp
Comment thread LICENSE
@brendanobra brendanobra requested a review from mhughesacn April 14, 2026 22:48
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

b'## WARNING: A Blackduck scan failure has been waived

A prior failure has been upvoted

  • Upvote reason: Thank you!

  • Commit: f1ea7a9
    '

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants