Add ORBTransportIdleTimeout to purge idle transports#2523
Add ORBTransportIdleTimeout to purge idle transports#2523jwillemsen wants to merge 38 commits intoDOCGroup:masterfrom
ORBTransportIdleTimeout to purge idle transports#2523Conversation
* TAO/tao/Transport_Idle_Timer.cpp:
* TAO/tao/Transport_Idle_Timer.h:
Added.
* TAO/tao/Cache_Entries_T.inl:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/tao.mpc:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tao/Transport_Timer.cpp:
* TAO/tao/Transport_Timer.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/default_resource.cpp:
* TAO/tao/default_resource.h:
* TAO/tao/Resource_Factory.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Idle_Timer.cpp:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport.h:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Connector.cpp:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/server.cpp:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
* TAO/tao/Transport.cpp:
* TAO/bin/tao_orb_tests.lst:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Idle_Timer.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/docs/Options.html:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tao/Transport_Idle_Timer.h:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
Deleted.
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
Added.
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/test.idl:
…d multiple gets too complicated for a unit test
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
Added.
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpc:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/svc.conf:
* TAO/tests/Transport_Idle_Timeout/svc_disabled.conf:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.h:
* TAO/tests/Transport_Idle_Timeout/client.cpp:
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tests/Transport_Idle_Timeout/test.idl:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport.cpp:
* TAO/tests/Transport_Idle_Timeout/Echo_i.cpp:
* TAO/tests/Transport_Idle_Timeout/client_multiple.cpp:
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.cpp:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
* TAO/tao/Transport_Connector.cpp:
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an idle-transport timeout feature: new Transport_Idle_Timer, transport idle-timer scheduling/cancellation and handler, cache-manager purgable purge path and idle checks, Resource_Factory accessor and option parsing, build updates, and a regression test suite exercising timeout behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/App
participant Cache as Transport Cache Manager
participant Trans as TAO_Transport
participant Timer as Transport Idle Timer
participant Reactor as ORB Reactor
Client->>Cache: find_transport()
activate Cache
Cache->>Trans: (if found) transport()->cancel_idle_timer()
activate Trans
Trans->>Reactor: cancel scheduled idle timer
deactivate Trans
Cache-->>Client: return transport
deactivate Cache
Client->>Trans: make_idle()
activate Trans
Trans->>Cache: notify entry idle
activate Cache
Cache-->>Trans: entry idle accepted
deactivate Cache
Trans->>Reactor: schedule idle timer for timeout_seconds
deactivate Trans
Note over Reactor: time passes
Reactor->>Timer: handle_timeout(current_time, act)
activate Timer
Timer->>Trans: add_reference()
Timer->>Trans: handle_idle_timeout(current_time, act)
activate Trans
Trans->>Cache: purge_entry_when_purgable(entry)
activate Cache
Cache-->>Trans: purged / not purgable
deactivate Cache
Trans->>Trans: close_connection() if purged
Trans->>Timer: remove_reference()
deactivate Trans
deactivate Timer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
* TAO/tao/Cache_Entries_T.inl:
|
@Mergifyio backport ace6tao2 |
🟠 Waiting for conditions to matchDetails
|
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
TAO/docs/Options.html (1)
1391-1396: Consider making default/disable behavior explicit.This new option entry is good; adding explicit wording for default value and disable behavior (for example,
0) would make configuration intent unambiguous.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/docs/Options.html` around lines 1391 - 1396, Update the -ORBTransportIdleTimeout option documentation to explicitly state the default behavior and how to disable it: mention the actual default value (e.g., "default: connections are kept until explicitly closed or purged by cache pressure") and specify the sentinel/disable value (e.g., "set to 0 to disable idle timeout and purging"); reference the option name -ORBTransportIdleTimeout so maintainers can find and change the <tr> entry accordingly.TAO/tao/Resource_Factory.h (1)
257-257: Please document the new virtual’s contract inline.At Line 257, add a short note for units and disable semantics (for example, whether
0disables) so all resource-factory implementations stay consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Resource_Factory.h` at line 257, Add an inline doc comment for the pure virtual transport_idle_timeout() in Resource_Factory explaining the contract: specify the units (e.g., seconds), what the return value represents (idle timeout for transports), and the disable semantics (e.g., return 0 to disable timeout or negative value is invalid), and any caller expectations (e.g., caller will treat returned value as an unsigned integer). Update the comment immediately above the virtual int transport_idle_timeout() const = 0; declaration so all Resource_Factory implementations follow the same behavior.TAO/tests/Transport_Idle_Timeout/client.cpp (1)
70-93: Consider a sharedwait_for_cache_size()helper for these assertions.These checks sample
current_size()once around timer-driven purges/reconnects, which makes the regression more timing-sensitive than it needs to be. Polling until the expected state (or timeout) in one shared helper would make TC-1/TC-2/TC-3 less flaky and remove the duplicated cache-inspection logic that also exists inEcho_i.cpp.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/client.cpp` around lines 70 - 93, The test is sampling transport cache size once making assertions flaky; add a shared helper wait_for_cache_size(ORB_ptr orb, size_t expected, std::chrono::milliseconds timeout) that polls cache_size(orb) until it equals expected or timeout elapses and returns bool, then replace the direct check() + cache_size() usages in Transport_Idle_Timeout/client.cpp (and the duplicate inspection in Echo_i.cpp) with calls to wait_for_cache_size and use its result for PASS/FAIL reporting; keep existing cache_size() and check() signatures intact or refactor check() to accept a boolean, and ensure the polling interval is small (e.g., 50–100ms) to avoid long sleeps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tao/tao.mpc`:
- Line 637: The Header_Files list in TAO/tao/tao.mpc incorrectly includes a
source file name "Transport_Idle_Timer.cpp"; change this entry to the header
filename "Transport_Idle_Timer.h" in the Header_Files section so it correctly
lists header files (locate the "Header_Files" block and replace the
Transport_Idle_Timer.cpp token with Transport_Idle_Timer.h).
In `@TAO/tao/Transport.cpp`:
- Around line 989-1027: handle_idle_timeout is racy because idle_timer_id_ alone
can be overwritten by concurrent schedule/cancel operations; add a generation
token (e.g. idle_timer_gen_ as an atomic/integer) and include that token in the
reactor act when scheduling timers in schedule_idle_timer and cancel_idle_timer
so the reactor callback and cancel compare generation before mutating state. On
schedule: increment idle_timer_gen_, store the new generation with
idle_timer_id_ and pass the generation as act; on cancel: only clear
idle_timer_id_ if the passed/observed generation matches and otherwise leave
newer timers alone; in handle_idle_timeout(const ACE_Time_Value&, const void
*act) extract and compare the generation in act to the current idle_timer_gen_
and return no-op if they differ, otherwise proceed with the existing
purge_entry_when_purgable/close_connection logic; protect updates to
idle_timer_id_ and idle_timer_gen_ with the existing transport lock or use
atomics to serialize access.
In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp`:
- Around line 158-165: The extra call to echo->ping in TC-1 recreates the
transport and prevents TC-2 (tc2_reconnect) from exercising
reconnect-after-idle-close; to fix, ensure TC-1 truly ends in the idle-closed
state by either moving the long-running call/check (the echo->ping(...,
sleep_sec, ... ) and check("TC-1 ...", cache_size(orb), 1)) into its own
scenario, or by re-applying the idle-close wait (the same wait used earlier in
this test) after that ping and before tc2_reconnect() so the connection is
closed again; update the test flow so tc2_reconnect() always runs against an
idle-closed transport.
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-191: The failure branches after spawning servers and
waiting/transferring IORs currently kill only the server that failed ($SV1 or
$SV2) leaving the other running; replace the per-branch calls like "$SV1->Kill
(); $SV1->TimedWait (1);" and "$SV2->Kill (); $SV2->TimedWait (1);" with a call
to the existing cleanup helper "$cleanup_servers->()" in the WaitForFileTimed,
GetFile, and PutFile failure branches so both servers are stopped consistently
(refer to symbols $server1, $server2, $SV1, $SV2 and the helper
$cleanup_servers->()).
- Around line 229-248: The calls to run_scenario() and run_multiple_scenario()
ignore their return values so setup failures (they return 1 on
spawn/IOR-transfer failures) don't affect the script exit; capture each call's
return and propagate into the existing $status (e.g., $status ||=
run_scenario(...); and similarly for run_multiple_scenario(...)) so any non-zero
result sets $status and causes the script to exit non-zero.
In `@TAO/tests/Transport_Idle_Timeout/svc.conf`:
- Around line 1-4: The inline doc and the configured timeout disagree: the
comment says "2 seconds" but the Resource_Factory sets "-ORBTransportIdleTimeout
3"; update the configuration to match the comment by changing the static
Resource_Factory option to "-ORBTransportIdleTimeout 2" (or alternatively update
the comment to "3 seconds" if 3s was intended) so the comment and the unique
symbol static Resource_Factory "-ORBTransportIdleTimeout ..." are consistent.
---
Nitpick comments:
In `@TAO/docs/Options.html`:
- Around line 1391-1396: Update the -ORBTransportIdleTimeout option
documentation to explicitly state the default behavior and how to disable it:
mention the actual default value (e.g., "default: connections are kept until
explicitly closed or purged by cache pressure") and specify the sentinel/disable
value (e.g., "set to 0 to disable idle timeout and purging"); reference the
option name -ORBTransportIdleTimeout so maintainers can find and change the <tr>
entry accordingly.
In `@TAO/tao/Resource_Factory.h`:
- Line 257: Add an inline doc comment for the pure virtual
transport_idle_timeout() in Resource_Factory explaining the contract: specify
the units (e.g., seconds), what the return value represents (idle timeout for
transports), and the disable semantics (e.g., return 0 to disable timeout or
negative value is invalid), and any caller expectations (e.g., caller will treat
returned value as an unsigned integer). Update the comment immediately above the
virtual int transport_idle_timeout() const = 0; declaration so all
Resource_Factory implementations follow the same behavior.
In `@TAO/tests/Transport_Idle_Timeout/client.cpp`:
- Around line 70-93: The test is sampling transport cache size once making
assertions flaky; add a shared helper wait_for_cache_size(ORB_ptr orb, size_t
expected, std::chrono::milliseconds timeout) that polls cache_size(orb) until it
equals expected or timeout elapses and returns bool, then replace the direct
check() + cache_size() usages in Transport_Idle_Timeout/client.cpp (and the
duplicate inspection in Echo_i.cpp) with calls to wait_for_cache_size and use
its result for PASS/FAIL reporting; keep existing cache_size() and check()
signatures intact or refactor check() to accept a boolean, and ensure the
polling interval is small (e.g., 50–100ms) to avoid long sleeps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7f3e036d-62ff-48bd-9da7-156999992eed
📒 Files selected for processing (27)
TAO/bin/tao_orb_tests.lstTAO/docs/Options.htmlTAO/tao/Cache_Entries_T.inlTAO/tao/Resource_Factory.hTAO/tao/Transport.cppTAO/tao/Transport.hTAO/tao/Transport_Cache_Manager_T.cppTAO/tao/Transport_Cache_Manager_T.hTAO/tao/Transport_Cache_Manager_T.inlTAO/tao/Transport_Connector.cppTAO/tao/Transport_Idle_Timer.cppTAO/tao/Transport_Idle_Timer.hTAO/tao/Transport_Timer.cppTAO/tao/Transport_Timer.hTAO/tao/default_resource.cppTAO/tao/default_resource.hTAO/tao/tao.mpcTAO/tests/Transport_Idle_Timeout/Echo_i.cppTAO/tests/Transport_Idle_Timeout/Echo_i.hTAO/tests/Transport_Idle_Timeout/Idle_Transport_Timeout.mpcTAO/tests/Transport_Idle_Timeout/client.cppTAO/tests/Transport_Idle_Timeout/client_multiple.cppTAO/tests/Transport_Idle_Timeout/run_test.plTAO/tests/Transport_Idle_Timeout/server.cppTAO/tests/Transport_Idle_Timeout/svc.confTAO/tests/Transport_Idle_Timeout/svc_disabled.confTAO/tests/Transport_Idle_Timeout/test.idl
| int | ||
| TAO_Transport::handle_idle_timeout (const ACE_Time_Value & /* current_time */, const void */*act*/) | ||
| { | ||
| if (TAO_debug_level > 6) | ||
| { | ||
| TAOLIB_DEBUG ((LM_DEBUG, | ||
| ACE_TEXT ("TAO (%P|%t) - Transport[%d]::handle_idle_timeout, ") | ||
| ACE_TEXT ("idle timer expired, closing transport\n"), | ||
| this->id ())); | ||
| } | ||
|
|
||
| // Timer has expired, so setting the idle timer id back to -1 | ||
| this->idle_timer_id_ = -1; | ||
|
|
||
| if (this->transport_cache_manager ().purge_entry_when_purgable (this->cache_map_entry_) == -1) | ||
| { | ||
| if (TAO_debug_level > 6) | ||
| TAOLIB_DEBUG ((LM_DEBUG, | ||
| ACE_TEXT ("TAO (%P|%t) - Transport[%d]::handle_idle_timeout, ") | ||
| ACE_TEXT ("idle_timeout, transport is not purgable, don't close it, reschedule it\n"), | ||
| this->id ())); | ||
|
|
||
| this->schedule_idle_timer (); | ||
| } | ||
| else | ||
| { | ||
| if (TAO_debug_level > 6) | ||
| TAOLIB_DEBUG ((LM_DEBUG, | ||
| ACE_TEXT ("TAO (%P|%t) - Transport[%d]::handle_idle_timeout, ") | ||
| ACE_TEXT ("idle_timeout, transport purged due to idle timeout\n"), | ||
| this->id ())); | ||
|
|
||
| // Close the underlying socket. | ||
| // close_connection() is safe to call from the reactor thread. | ||
| (void) this->close_connection (); | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Version the idle-timer state; idle_timer_id_ alone is racy.
handle_idle_timeout() currently ignores act, while schedule_idle_timer(), cancel_idle_timer(), and the reactor callback all read/write idle_timer_id_ from different threads. If an old timeout is already dequeued when reuse cancels/reschedules the transport, a stale callback or canceller can overwrite the newer ID and leave the replacement timer registered while the transport thinks no timer is pending. Because transport_idle_timer_ is embedded in TAO_Transport, that orphaned reactor entry can then outlive the transport and dispatch through a dangling handler. Please serialize the timer state and gate callbacks with a generation token so stale firings become no-ops.
Also applies to: 2942-2983
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TAO/tao/Transport.cpp` around lines 989 - 1027, handle_idle_timeout is racy
because idle_timer_id_ alone can be overwritten by concurrent schedule/cancel
operations; add a generation token (e.g. idle_timer_gen_ as an atomic/integer)
and include that token in the reactor act when scheduling timers in
schedule_idle_timer and cancel_idle_timer so the reactor callback and cancel
compare generation before mutating state. On schedule: increment
idle_timer_gen_, store the new generation with idle_timer_id_ and pass the
generation as act; on cancel: only clear idle_timer_id_ if the passed/observed
generation matches and otherwise leave newer timers alone; in
handle_idle_timeout(const ACE_Time_Value&, const void *act) extract and compare
the generation in act to the current idle_timer_gen_ and return no-op if they
differ, otherwise proceed with the existing
purge_entry_when_purgable/close_connection logic; protect updates to
idle_timer_id_ and idle_timer_gen_ with the existing transport lock or use
atomics to serialize access.
| ACE_DEBUG ((LM_INFO, ACE_TEXT ("\n=== TC-1: Call again ping ===\n"))); | ||
|
|
||
| // Make another call, but let wait sleep_sec after the ping to the second server, | ||
| // which means the call takes longer as the idle time. | ||
| // at that moment the cache should be 1, this means the call takes longer as the idle time, | ||
| ok &= echo->ping (0, 2, echo2, sleep_sec, 1); | ||
|
|
||
| ok &= check ("TC-1 after ping longer as timeout (expect 1)", cache_size(orb), 1); |
There was a problem hiding this comment.
Keep TC-1 ending in the idle-closed state.
This extra ping() recreates the transport before tc2_reconnect() runs, so TC-2 no longer exercises reconnect-after-idle-close; it just reuses the live connection left by TC-1. Move this long-running-call check into its own scenario, or rerun the idle-close wait before TC-2.
💡 Smallest fix
- ACE_DEBUG ((LM_INFO, ACE_TEXT ("\n=== TC-1: Call again ping ===\n")));
-
- // Make another call, but let wait sleep_sec after the ping to the second server,
- // which means the call takes longer as the idle time.
- // at that moment the cache should be 1, this means the call takes longer as the idle time,
- ok &= echo->ping (0, 2, echo2, sleep_sec, 1);
-
- ok &= check ("TC-1 after ping longer as timeout (expect 1)", cache_size(orb), 1);
-
return ok;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ACE_DEBUG ((LM_INFO, ACE_TEXT ("\n=== TC-1: Call again ping ===\n"))); | |
| // Make another call, but let wait sleep_sec after the ping to the second server, | |
| // which means the call takes longer as the idle time. | |
| // at that moment the cache should be 1, this means the call takes longer as the idle time, | |
| ok &= echo->ping (0, 2, echo2, sleep_sec, 1); | |
| ok &= check ("TC-1 after ping longer as timeout (expect 1)", cache_size(orb), 1); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TAO/tests/Transport_Idle_Timeout/client_multiple.cpp` around lines 158 - 165,
The extra call to echo->ping in TC-1 recreates the transport and prevents TC-2
(tc2_reconnect) from exercising reconnect-after-idle-close; to fix, ensure TC-1
truly ends in the idle-closed state by either moving the long-running call/check
(the echo->ping(..., sleep_sec, ... ) and check("TC-1 ...", cache_size(orb), 1))
into its own scenario, or by re-applying the idle-close wait (the same wait used
earlier in this test) after that ping and before tc2_reconnect() so the
connection is closed again; update the test flow so tc2_reconnect() always runs
against an idle-closed transport.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…into jwi-idletransport
There was a problem hiding this comment.
♻️ Duplicate comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
149-153:⚠️ Potential issue | 🟠 MajorStop both servers on every multi-server setup failure path.
Early returns still leave one server running in several branches, which can poison subsequent scenarios and hang CI. Use a shared cleanup helper and call it for all spawn/IOR wait/transfer failures.
Suggested cleanup pattern
sub run_multiple_scenario { my ($label, $svc_conf, $extra_client_args) = `@_`; + my ($sv1_started, $sv2_started) = (0, 0); @@ my $SV1 = $server1->CreateProcess ( ... ); my $SV2 = $server2->CreateProcess ( ... ); + my $cleanup_servers = sub { + if ($sv1_started) { $SV1->Kill (); $SV1->TimedWait (1); } + if ($sv2_started) { $SV2->Kill (); $SV2->TimedWait (1); } + }; @@ if ($server1_status != 0) { print STDERR "ERROR: server1 Spawn returned $server1_status\n"; return 1; } + $sv1_started = 1; @@ if ($server2_status != 0) { print STDERR "ERROR: server2 Spawn returned $server2_status\n"; + $cleanup_servers->(); return 1; } + $sv2_started = 1; @@ - $SV1->Kill (); $SV1->TimedWait (1); + $cleanup_servers->(); return 1; @@ - $SV2->Kill (); $SV2->TimedWait (1); + $cleanup_servers->(); return 1;Also applies to: 157-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 149 - 153, The test currently returns early on failures (e.g., after $SV2->Spawn, IOR wait or transfer failures) leaving the other server running; add a shared cleanup helper (e.g., cleanup_servers or stop_all_servers) that performs $SV1->Kill, $SV2->Kill (and any IOR/file cleanup) and call it before every early return in the multi-server setup and the sections handling spawn/IOR wait/transfer failures (references: $SV1, $SV2, $SV2->Spawn, $SV1->Spawn, $SV1->Kill, $SV2->Kill). Ensure every branch that currently returns non-zero invokes this helper so no server is left running after a failure.
🧹 Nitpick comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
91-107: Decouple helper return values from global$status.Both helpers mutate global
$statusand then return it. Prefer a local scenario status and let the caller aggregate. This reduces hidden cross-scenario coupling and makes helpers reusable.Also applies to: 200-224
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 91 - 107, Currently helper routines mutate the global $status and return it (see occurrences of $status, $client_status, $server_exit and the final "return $status" in this file); change each helper to track and return a local scenario status (e.g., my $scenario_status) instead of writing to the global $status, and update callers to aggregate those returned values into the global $status (e.g., $status ||= helper_call()). Ensure no helper writes to the global $status variable directly and that return values are used by the caller to set the global $status.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-153: The test currently returns early on failures (e.g., after
$SV2->Spawn, IOR wait or transfer failures) leaving the other server running;
add a shared cleanup helper (e.g., cleanup_servers or stop_all_servers) that
performs $SV1->Kill, $SV2->Kill (and any IOR/file cleanup) and call it before
every early return in the multi-server setup and the sections handling spawn/IOR
wait/transfer failures (references: $SV1, $SV2, $SV2->Spawn, $SV1->Spawn,
$SV1->Kill, $SV2->Kill). Ensure every branch that currently returns non-zero
invokes this helper so no server is left running after a failure.
---
Nitpick comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 91-107: Currently helper routines mutate the global $status and
return it (see occurrences of $status, $client_status, $server_exit and the
final "return $status" in this file); change each helper to track and return a
local scenario status (e.g., my $scenario_status) instead of writing to the
global $status, and update callers to aggregate those returned values into the
global $status (e.g., $status ||= helper_call()). Ensure no helper writes to the
global $status variable directly and that return values are used by the caller
to set the global $status.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 959ea46a-e5aa-43d4-b824-882f43a52684
📒 Files selected for processing (4)
TAO/tao/Cache_Entries_T.inlTAO/tao/tao.mpcTAO/tests/Transport_Idle_Timeout/run_test.plTAO/tests/Transport_Idle_Timeout/svc.conf
✅ Files skipped from review due to trivial changes (3)
- TAO/tests/Transport_Idle_Timeout/svc.conf
- TAO/tao/Cache_Entries_T.inl
- TAO/tao/tao.mpc
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
149-215:⚠️ Potential issue | 🟠 MajorClean up every live server on multi-server early returns.
Several branches here stop only the process that directly failed, and some stop none at all. If
server2fails afterserver1starts, or the client exits non-zero, the sibling server remains alive and can poison the next scenario.💡 Cleanup pattern
+ my ($sv1_live, $sv2_live) = (0, 0); + my $cleanup_servers = sub { + if ($sv1_live) { $SV1->Kill (); $SV1->TimedWait (1); $sv1_live = 0; } + if ($sv2_live) { $SV2->Kill (); $SV2->TimedWait (1); $sv2_live = 0; } + }; + # Start server 1 my $server1_status = $SV1->Spawn (); if ($server1_status != 0) { print STDERR "ERROR: server1 Spawn returned $server1_status\n"; return 1; } + $sv1_live = 1; # Start server 2 my $server2_status = $SV2->Spawn (); if ($server2_status != 0) { print STDERR "ERROR: server2 Spawn returned $server2_status\n"; + $cleanup_servers->(); return 1; } + $sv2_live = 1; ... - $SV1->Kill (); $SV1->TimedWait (1); + $cleanup_servers->(); return 1; ... if ($client_status != 0) { print STDERR "ERROR: client returned $client_status\n"; + $cleanup_servers->(); return 1; } my $server_exit1 = $SV1->WaitKill ($server1->ProcessStopWaitInterval ()); + $sv1_live = 0; if ($server_exit1 != 0) { print STDERR "ERROR: server1 returned $server_exit1\n"; + $cleanup_servers->(); return 1; } my $server_exit2 = $SV2->WaitKill ($server2->ProcessStopWaitInterval ()); + $sv2_live = 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 149 - 215, Several error branches only kill the directly-failing process, leaving the sibling server running and contaminating later tests; update all early-return branches (the Spawn failure handling, IOR wait failures, GetFile/PutFile failures, and the client non-zero return handling around $SV1, $SV2, $CL, $SV1->Spawn/SpawnWait/Kill, $SV2->Spawn/SpawnWait/Kill, $CL->SpawnWaitKill) to always attempt to clean up both servers (call $SV1->Kill(); $SV1->TimedWait(1); $SV2->Kill(); $SV2->TimedWait(1)) and the client ($CL->Kill(); $CL->TimedWait(1)) before returning so no server remains alive after any early return.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 103-106: Change the two scenario helper functions that currently
end with "return 1;" to return 0 on the happy path so callers treat success as
zero; locate the trailing cleanup block (the lines that call
$server->DeleteFile($ior_file1); $client->DeleteFile($ior_file1); followed by
"return 1;") in each helper and replace the "return 1;" with "return 0;". Also
apply the same change to the other identical helper block around the 218–223
region so all successful paths return 0.
- Around line 89-93: When client_status != 0 you currently return without
stopping the server ($SV), leaving a stray server; before returning on client
failure, explicitly stop/kill the server $SV (e.g., call the server object's
kill/stop method and wait for it to exit — use the same test harness method
family as SpawnWaitKill for $CL, e.g. $SV->Kill or
$SV->SpawnWaitKill/$SV->KillWaitKill as available) so the server is cleaned up
prior to returning the error.
---
Duplicate comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 149-215: Several error branches only kill the directly-failing
process, leaving the sibling server running and contaminating later tests;
update all early-return branches (the Spawn failure handling, IOR wait failures,
GetFile/PutFile failures, and the client non-zero return handling around $SV1,
$SV2, $CL, $SV1->Spawn/SpawnWait/Kill, $SV2->Spawn/SpawnWait/Kill,
$CL->SpawnWaitKill) to always attempt to clean up both servers (call
$SV1->Kill(); $SV1->TimedWait(1); $SV2->Kill(); $SV2->TimedWait(1)) and the
client ($CL->Kill(); $CL->TimedWait(1)) before returning so no server remains
alive after any early return.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0fc3076f-b0b9-4d87-b438-7fda32d2ab6c
📒 Files selected for processing (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl
| my $client_status = $CL->SpawnWaitKill ($client->ProcessStartWaitInterval () | ||
| + $client_budget); | ||
| if ($client_status != 0) { | ||
| print STDERR "ERROR: client returned $client_status\n"; | ||
| return 1; |
There was a problem hiding this comment.
Kill $SV before returning on client failure.
SpawnWaitKill() only reaps the client. This branch returns without stopping the server, so a failed client run can leave a stray server behind for the next scenario.
💡 Minimal fix
if ($client_status != 0) {
print STDERR "ERROR: client returned $client_status\n";
+ $SV->Kill (); $SV->TimedWait (1);
return 1;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| my $client_status = $CL->SpawnWaitKill ($client->ProcessStartWaitInterval () | |
| + $client_budget); | |
| if ($client_status != 0) { | |
| print STDERR "ERROR: client returned $client_status\n"; | |
| return 1; | |
| my $client_status = $CL->SpawnWaitKill ($client->ProcessStartWaitInterval () | |
| $client_budget); | |
| if ($client_status != 0) { | |
| print STDERR "ERROR: client returned $client_status\n"; | |
| $SV->Kill (); $SV->TimedWait (1); | |
| return 1; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 89 - 93, When
client_status != 0 you currently return without stopping the server ($SV),
leaving a stray server; before returning on client failure, explicitly stop/kill
the server $SV (e.g., call the server object's kill/stop method and wait for it
to exit — use the same test harness method family as SpawnWaitKill for $CL, e.g.
$SV->Kill or $SV->SpawnWaitKill/$SV->KillWaitKill as available) so the server is
cleaned up prior to returning the error.
* TAO/tests/Transport_Idle_Timeout/run_test.pl:
There was a problem hiding this comment.
♻️ Duplicate comments (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl (1)
150-191:⚠️ Potential issue | 🟠 MajorClean up both servers on every multi-server setup failure.
Once Line 143 has successfully spawned
$SV1, the early-return branches from Line 151 through Line 191 only stop the process that directly failed. Aserver2spawn or IOR handoff failure can therefore leave the other server running and leak it into later test runs.🧹 Minimal fix
@@ my $SV2 = $server2->CreateProcess ( "server", "-ORBdebuglevel $debug_level -ORBSvcConf $svc_conf -o $server2_ior" ); + my ($sv1_started, $sv2_started) = (0, 0); + my $cleanup_servers = sub { + $SV1->Kill (); $SV1->TimedWait (1) if $sv1_started; + $SV2->Kill (); $SV2->TimedWait (1) if $sv2_started; + }; @@ if ($server1_status != 0) { print STDERR "ERROR: server1 Spawn returned $server1_status\n"; return 1; } + $sv1_started = 1; @@ if ($server2_status != 0) { print STDERR "ERROR: server2 Spawn returned $server2_status\n"; + $cleanup_servers->(); return 1; } + $sv2_started = 1; @@ - $SV1->Kill (); $SV1->TimedWait (1); + $cleanup_servers->(); return 1; @@ - $SV2->Kill (); $SV2->TimedWait (1); + $cleanup_servers->(); return 1;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tests/Transport_Idle_Timeout/run_test.pl` around lines 150 - 191, The early-return error branches only kill the directly failing process ($SV1 or $SV2) and can leave the other server running; update every failure path after $SV1 and $SV2 are spawned (e.g., the branches after Spawn, WaitForFileTimed, GetFile, PutFile) to ensure both servers are cleaned up: call $SV1->Kill(); $SV1->TimedWait(1); and $SV2->Kill(); $SV2->TimedWait(1) as appropriate (guarded if needed by a spawn-status flag or by checking the process objects) before returning so neither $SV1 nor $SV2 is leaked into later tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@TAO/tests/Transport_Idle_Timeout/run_test.pl`:
- Around line 150-191: The early-return error branches only kill the directly
failing process ($SV1 or $SV2) and can leave the other server running; update
every failure path after $SV1 and $SV2 are spawned (e.g., the branches after
Spawn, WaitForFileTimed, GetFile, PutFile) to ensure both servers are cleaned
up: call $SV1->Kill(); $SV1->TimedWait(1); and $SV2->Kill(); $SV2->TimedWait(1)
as appropriate (guarded if needed by a spawn-status flag or by checking the
process objects) before returning so neither $SV1 nor $SV2 is leaked into later
tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 75b7b078-57d0-4636-b51a-fbf09be8916c
📒 Files selected for processing (1)
TAO/tests/Transport_Idle_Timeout/run_test.pl
* TAO/tao/Transport.cpp:
* TAO/tao/Transport_Cache_Manager_T.h:
* TAO/tao/Transport_Cache_Manager_T.inl:
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
TAO/tao/Transport.cpp (1)
2941-2960: Consider null-checking the reactor before scheduling.The
reactorpointer retrieved fromorb_core_->reactor()is used directly without a null check. While unlikely in normal operation, this could cause a crash if called during shutdown when the reactor might be unavailable.Compare to
cancel_idle_timer()at line 2968 which does checkif (reactor).🛡️ Suggested defensive null check
void TAO_Transport::schedule_idle_timer () { int const timeout_sec = this->orb_core_->resource_factory ()->transport_idle_timeout (); if (timeout_sec > 0) { ACE_Reactor *reactor = this->orb_core_->reactor (); + if (reactor == nullptr) + { + return; + } const ACE_Time_Value tv (static_cast<time_t> (timeout_sec)); this->idle_timer_id_= reactor->schedule_timer (std::addressof(this->transport_idle_timer_), nullptr, tv);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TAO/tao/Transport.cpp` around lines 2941 - 2960, In TAO_Transport::schedule_idle_timer(), null-check the ACE_Reactor* returned by this->orb_core_->reactor() before calling reactor->schedule_timer and before using it in the debug logging; if reactor is null simply skip scheduling (and avoid accessing reactor or this->idle_timer_id_), mirroring the defensive check used in cancel_idle_timer(), so schedule_timer and TAOLIB_DEBUG only run when reactor != nullptr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TAO/tao/Transport.cpp`:
- Around line 340-342: The comment above the call to this->cancel_idle_timer()
contains a typo ("Cancel any pending time"); update the comment to read "Cancel
any pending timer" so it correctly describes the action performed by the
cancel_idle_timer() method in Transport.cpp.
---
Nitpick comments:
In `@TAO/tao/Transport.cpp`:
- Around line 2941-2960: In TAO_Transport::schedule_idle_timer(), null-check the
ACE_Reactor* returned by this->orb_core_->reactor() before calling
reactor->schedule_timer and before using it in the debug logging; if reactor is
null simply skip scheduling (and avoid accessing reactor or
this->idle_timer_id_), mirroring the defensive check used in
cancel_idle_timer(), so schedule_timer and TAOLIB_DEBUG only run when reactor !=
nullptr.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5ffc1ec4-8a95-4a37-90bb-e2d195a43c23
📒 Files selected for processing (3)
TAO/tao/Transport.cppTAO/tao/Transport_Cache_Manager_T.hTAO/tao/Transport_Cache_Manager_T.inl
🚧 Files skipped from review as they are similar to previous changes (1)
- TAO/tao/Transport_Cache_Manager_T.h
* TAO/tao/Transport.cpp:
Add new
-ORBTransportIdleTimeoutto purge idl connections after the specified number of secondsSummary by CodeRabbit
New Features
Documentation
Tests