Skip to content

TL/UCP: add support for onesided dynamic segments#1149

Open
wfaderhold21 wants to merge 4 commits intoopenucx:masterfrom
wfaderhold21:topic/mem_ext
Open

TL/UCP: add support for onesided dynamic segments#1149
wfaderhold21 wants to merge 4 commits intoopenucx:masterfrom
wfaderhold21:topic/mem_ext

Conversation

@wfaderhold21
Copy link
Copy Markdown
Collaborator

What

This PR adds support in TL/UCP for implicit memory handle creation for onesided algorithms by the introduction of three internal interfaces: ucc_tl_ucp_dynamic_segment_{init | exchange | finalize}. The interfaces initialize a dynamic or implicit segment by registering the source and destination buffers at initialization time to create memory handles, exchanging memory handles during collective start, and finalizing (unmapping) memory on collective completion. These interfaces can be leveraged or ignored by existing or future algorithms. In the case that the user allocates and maps memory either through (1) context creation or (2) memory handles, then the interfaces do not perform any action. This is an update to PR #909.

Why ?

Even with the inclusion of PR #1070, the mapping of memory for message passing programming models such as MPI would require modification to current implementations. This is useful in situations where the messages exchanged in the collective are large.

@swx-jenkins5
Copy link
Copy Markdown

Can one of the admins verify this patch?

@manjugv
Copy link
Copy Markdown
Contributor

manjugv commented Aug 15, 2025

@wfaderhold21 Let's target this for v1.6 release. @janjust @Sergei-Lebedev can we review this in upcoming weeks?

Comment thread test/mpi/test_alltoallv.cc Outdated
Comment thread src/components/tl/ucp/tl_ucp_context.c Outdated
Comment thread src/components/tl/ucp/tl_ucp_coll.h Outdated
Comment thread src/components/tl/ucp/tl_ucp_coll.h Outdated
Comment thread src/components/tl/ucp/tl_ucp_context.c Outdated
Comment thread src/components/tl/ucp/tl_ucp_coll.c
@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

@janjust @Sergei-Lebedev @nsarka @ikryukov I believe I've addressed the feedback thus far. Please let me know if you have any further comments/concerns.

@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

@janjust @nsarka @ikryukov @Sergei-Lebedev any comments on this?

@wfaderhold21 wfaderhold21 force-pushed the topic/mem_ext branch 3 times, most recently from 715356b to 5c5253e Compare October 11, 2025 14:12
@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

@janjust @Sergei-Lebedev @nsarka @ikryukov any feedback would be appreciated.

@janjust janjust force-pushed the topic/mem_ext branch 2 times, most recently from b789f8f to c6b020a Compare October 16, 2025 15:06
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces dynamic memory segment support for TL/UCP onesided collectives, enabling implicit memory handle creation at collective initialization rather than requiring pre-mapped memory regions. Three new internal interfaces (ucc_tl_ucp_dynamic_segment_{init|exchange|finalize}) handle on-demand registration of source/destination buffers, exchange of memory handles via service collectives (allgather), and cleanup on completion. The implementation removes static memory segment arrays (va_base, base_length) from team structures and adds dynamic segment metadata to task structures, allowing algorithms to defer registration until collective invocation. When memory is already mapped through context creation or explicit ucc_mem_map calls, the dynamic segment logic becomes a no-op, preserving backward compatibility. The changes primarily affect alltoall/alltoallv onesided algorithms, with corresponding test infrastructure updates to validate both pre-mapped and dynamic segment code paths.

Important Files Changed

Filename Score Overview
src/components/tl/ucp/tl_ucp_coll.c 2/5 Adds core dynamic segment implementation with memory handle creation/exchange/cleanup; contains critical memory leaks and use-after-free bugs in error paths
src/components/tl/ucp/alltoall/alltoall_onesided.c 2/5 Integrates dynamic segments into alltoall algorithm; missing memory handle swap for GET operations and unclear finalization status handling
src/components/tl/ucp/alltoallv/alltoallv_onesided.c 2/5 Adds global source memory handle support and enforces global destination handles; accesses flags field without mask validation
src/components/tl/ucp/tl_ucp_context.c 2/5 Fixes memory cleanup in unmap functions; introduces use-after-free bug at lines 827-830 where freed data is accessed
src/components/tl/ucp/tl_ucp_sendrecv.h 4/5 Refactors one-sided memory resolution with new segment resolution helper; thread-safety of concurrent rkey unpacking needs verification
src/components/tl/ucp/tl_ucp.h 4/5 Removes static segment arrays from team structure and adds public memory mapping API; architectural change affects all onesided algorithms
src/components/tl/ucp/tl_ucp_task.h 4/5 Adds task flag and embedded dynamic segment structure for per-collective memory management; well-designed integration with existing infrastructure
src/components/tl/ucp/tl_ucp_coll.h 3/5 Declares new dynamic segment interfaces with inline test helper; hard-coded threshold and alltoall-specific parameter may need refinement
src/components/tl/ucp/tl_ucp_team.c 3/5 Removes static memory info initialization from team creation; onesided algorithms must now use dynamic segments or context-level handles
test/mpi/test_alltoallv.cc 4/5 Restructures buffer allocation to use pre-mapped buffers for onesided operations and dynamic allocation otherwise; correctly implements dynamic segment support
test/gtest/coll/test_alltoall.cc 2/5 Adds dynamic segment test case but overrides flags after initialization, potentially creating inconsistent state
test/mpi/test_mpi.h 5/5 Adds memory handle tracking fields and dynamic segment flag to test infrastructure; clean and well-integrated additions
test/mpi/test_case.cc 5/5 Adds "DynSeg" label to test output for dynamic segment visibility; correct implementation following existing pattern
test/gtest/common/test_ucc.cc 5/5 Adds zero-initialization for memory map array to prevent undefined behavior; defensive programming improvement
test/mpi/test_mpi.cc 5/5 Test infrastructure changes supporting dynamic segment feature (review not detailed in file summaries)
src/components/tl/ucp/tl_ucp.c 5/5 Updates EXPORTED_MEMORY_HANDLE config from string to numeric boolean; cosmetic consistency improvement

Confidence score: 2/5

  • This PR contains critical memory management bugs including use-after-free errors and memory leaks that will cause crashes or undefined behavior in production code
  • Score reflects multiple severe issues: use-after-free in tl_ucp_context.c lines 827-830, memory leaks in tl_ucp_coll.c error paths (lines 228-229), use-after-free during cleanup loops (lines 594, 620-627), and missing memory handle swap for GET operations in alltoall_onesided.c. Additional concerns include unchecked flag field access in alltoallv_onesided.c line 35, hard-coded magic numbers in exchange loops, and unclear state handling during synchronous dynamic segment completion
  • Pay close attention to src/components/tl/ucp/tl_ucp_coll.c (memory management in error paths), src/components/tl/ucp/tl_ucp_context.c (use-after-free at lines 827-830), src/components/tl/ucp/alltoall/alltoall_onesided.c (GET operation memory handle logic), and src/components/tl/ucp/alltoallv/alltoallv_onesided.c (flags field validation)

Sequence Diagram

sequenceDiagram
    participant User
    participant UCC_Core
    participant TL_UCP_Coll
    participant Dynamic_Segment
    participant Service_Coll
    participant UCP_Worker

    User->>UCC_Core: ucc_collective_init(alltoall_onesided)
    UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_init()
    
    alt Memory handles NOT provided
        TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_init()
        Dynamic_Segment->>Dynamic_Segment: dynamic_segment_map_memh(src)
        Dynamic_Segment->>Dynamic_Segment: dynamic_segment_map_memh(dst)
        Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_map(EXPORT)
        Dynamic_Segment-->>TL_UCP_Coll: Set UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG
    end
    
    TL_UCP_Coll-->>UCC_Core: Return task handle
    
    User->>UCC_Core: ucc_collective_post(task)
    UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_start()
    
    alt Dynamic segments enabled
        TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_exchange_nb()
        Dynamic_Segment->>Dynamic_Segment: dynamic_segment_pack_memory_handles()
        Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_memh_pack()
        
        loop Exchange memory handle sizes
            Dynamic_Segment->>Service_Coll: dynamic_segment_calculate_sizes_start()
            Service_Coll->>Service_Coll: ucc_service_allgather(sizes)
            Dynamic_Segment->>Service_Coll: dynamic_segment_calculate_sizes_test()
            Service_Coll-->>Dynamic_Segment: Sizes exchanged
        end
        
        Dynamic_Segment->>Dynamic_Segment: dynamic_segment_allocate_buffers()
        
        loop Exchange packed memory handles
            Dynamic_Segment->>Service_Coll: dynamic_segment_pack_and_exchange_data_start()
            Service_Coll->>Service_Coll: ucc_service_allgather(packed_handles)
            Dynamic_Segment->>Service_Coll: dynamic_segment_pack_and_exchange_data_test()
            Service_Coll-->>Dynamic_Segment: Handles exchanged
        end
        
        Dynamic_Segment->>Dynamic_Segment: dynamic_segment_import_memory_handles()
        Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_map(IMPORT)
        Dynamic_Segment-->>TL_UCP_Coll: Memory handles ready
    end
    
    TL_UCP_Coll->>TL_UCP_Coll: Enqueue to progress queue
    
    loop Progress collective
        User->>UCC_Core: ucc_context_progress()
        UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_get/put_progress()
        
        alt Dynamic segment check
            TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_test_dynamic_segment()
            Dynamic_Segment-->>TL_UCP_Coll: Exchange complete/in-progress
        end
        
        loop Post onesided operations
            TL_UCP_Coll->>UCP_Worker: ucc_tl_ucp_get_nb/put_nb()
            UCP_Worker->>UCP_Worker: ucp_get_nbx/ucp_put_nbx()
            TL_UCP_Coll->>TL_UCP_Coll: alltoall_onesided_handle_completion()
        end
        
        TL_UCP_Coll->>TL_UCP_Coll: alltoall_onesided_wait_completion()
        UCP_Worker-->>TL_UCP_Coll: Operations complete
    end
    
    TL_UCP_Coll-->>UCC_Core: UCC_OK
    
    User->>UCC_Core: ucc_collective_finalize(task)
    UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_finalize()
    
    alt Dynamic segments used
        TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_finalize()
        Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_unmap(IMPORT, global)
        Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_unmap(EXPORT, local)
        Dynamic_Segment->>Dynamic_Segment: Free buffers
        Dynamic_Segment-->>TL_UCP_Coll: Cleanup complete
    end
    
    TL_UCP_Coll-->>UCC_Core: Finalized
Loading

16 files reviewed, 18 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +282 to +285
for (auto i = 0; i < ctxs.size(); i++) {
ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
ctxs[i]->args->flags = 0; // No special flags for dynamic segments
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: overriding the mask and flags set by data_init creates inconsistent state. The function already set these based on coll_mask and coll_flags members (lines 38, 60-62). Consider setting coll_mask and coll_flags members before calling data_init instead. Should the test initialize coll_mask and coll_flags before calling data_init, or is there a specific reason to override after initialization?

Comment thread test/gtest/coll/test_alltoall.cc Outdated
data_init(size, dtype, count, ctxs, team, false);
for (auto i = 0; i < ctxs.size(); i++) {
ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
ctxs[i]->args->flags = 0; // No special flags for dynamic segments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: setting flags = 0 removes the UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS flag that was set on line 62 for onesided operations. This may break the onesided algorithm's assumptions.

Comment thread src/components/tl/ucp/tl_ucp_context.c Outdated
Comment on lines 827 to 830
/* Free the TL data structure */
if (data) {
ucc_free(data);
memh->tl_data = NULL;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: data is already freed in both branches above (export at line 805, import at line 820), so this check will always access freed memory. This is a use-after-free bug that could cause crashes.

Comment thread src/components/tl/ucp/tl_ucp_coll.h Outdated
return UCC_OK;
}

if (task->dynamic_segments.exchange_step < 5) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: magic number 5 is not self-documenting; define a constant like UCC_TL_UCP_DYN_SEG_EXCHANGE_STEPS to clarify the intended number of exchange phases

Comment on lines +279 to +281
ucc_status_t ucc_tl_ucp_coll_dynamic_segment_init(ucc_coll_args_t *coll_args,
ucc_tl_ucp_alltoall_onesided_alg_t alg,
ucc_tl_ucp_task_t *task);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: parameter alg is typed as ucc_tl_ucp_alltoall_onesided_alg_t but dynamic segments may be used by other collective types (alltoallv, etc.); consider a more generic type or documenting the restriction. Is the alltoall-specific alg type intentional, or should this accept a generic algorithm enum?

Comment thread src/components/tl/ucp/alltoall/alltoall_onesided.c
Comment on lines +139 to +142
if (task->super.status != UCC_INPROGRESS &&
(task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG)) {
task->super.status = ucc_tl_ucp_coll_dynamic_segment_finalize(task);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: dynamic segment finalization happens after completion/error but the finalize call's error status overwrites the original task status. If the task completed successfully (UCC_OK) but finalization fails, the error is propagated; however if the task already failed, the finalization error replaces it, losing the original failure reason. Should finalization errors be logged separately while preserving the original task failure status?

Comment on lines +196 to +199
if (task->super.status != UCC_INPROGRESS &&
(task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG)) {
task->super.status = ucc_tl_ucp_coll_dynamic_segment_finalize(task);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: same issue as GET progress: finalization status overwrites task status

Comment on lines +209 to +220
if (task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG) {
status = ucc_tl_ucp_coll_dynamic_segment_exchange_nb(task);
if (status == UCC_INPROGRESS) {
return ucc_progress_queue_enqueue(UCC_TL_CORE_CTX(team)->pq, &task->super);
}
if (UCC_OK != status) {
task->super.status = status;
tl_error(UCC_TL_TEAM_LIB(team),
"failed to exchange dynamic segments");
return task->super.status;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: when dynamic segment exchange completes synchronously (returns UCC_OK), the function continues to line 223 and enqueues the task. However, the progress functions (GET/PUT) will call ucc_tl_ucp_test_dynamic_segment again at lines 106 and 164, which may not handle the already-completed case correctly. Does ucc_tl_ucp_test_dynamic_segment return UCC_OK immediately when called again after successful completion?

Comment on lines +306 to +311
status = ucc_tl_ucp_coll_dynamic_segment_init(&coll_args->args, alg, task);
if (UCC_OK != status) {
tl_error(UCC_TL_TEAM_LIB(tl_team),
"failed to initialize dynamic segments");
goto out;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: if dynamic segment initialization fails, the code jumps to out: but the task object has already been allocated (line296) and is never cleaned up, causing a memory leak

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds dynamic segment infrastructure for onesided collectives, enabling implicit memory handle creation and exchange when buffers aren't pre-mapped. Three new interfaces (init, exchange_nb, finalize) manage the lifecycle of memory registration, handle exchange via service collectives, and cleanup.

Key Changes:

  • New tl_ucp_coll.c functions (771 lines) implement multi-step exchange protocol with state machine
  • alltoall_onesided.c integrates dynamic segments with conditional memory handle usage
  • alltoallv_onesided.c adds validation but doesn't support dynamic segments yet (not supported per line 185)
  • Task structure extended with dynamic_segments field tracking exchange state and buffers

Critical Issues Found:

  • Heap corruption bug: Lines 574 and 606 in tl_ucp_coll.c incorrectly call ucc_free() on pointers that point into a contiguous global_buffer, not separately allocated objects
  • Memory leak: Line 213 fails to free src_memh->tl_h->tl_data when dst mapping fails
  • Dead code: Line 559 checks i == UCC_RANK_INVALID inside a loop where this can never be true

Other Concerns:

  • Finalization errors overwrite original task status (alltoall lines 141, 198)
  • No validation that handles are global when provided by user in some paths

Confidence Score: 1/5

  • Critical heap corruption and memory leak bugs make this PR unsafe to merge without fixes
  • Multiple critical memory management bugs will cause crashes or corruption: freeing pointers into contiguous buffer allocation (lines 574, 606) and leaking allocated data on error path (line 213). These are deterministic bugs that will trigger in error scenarios
  • src/components/tl/ucp/tl_ucp_coll.c requires immediate attention - contains heap corruption bug at lines 574 and 606, plus memory leak at line 213. Must be fixed before merge

Important Files Changed

File Analysis

Filename Score Overview
src/components/tl/ucp/tl_ucp_coll.c 1/5 added 771 lines for dynamic segment infrastructure. Critical bugs: heap corruption from freeing pointers into contiguous buffer (lines 574, 606), memory leak when dst mapping fails (line 213)
src/components/tl/ucp/alltoall/alltoall_onesided.c 3/5 integrated dynamic segment support for alltoall. Memory handle swap logic preserved for GET but applied conditionally. Finalization errors may overwrite original task status
src/components/tl/ucp/alltoallv/alltoallv_onesided.c 4/5 minor formatting and validation improvements. Added check for global dst memory handle flag. No dynamic segment integration yet (alltoallv not supported per line 185)
src/components/tl/ucp/tl_ucp_coll.h 4/5 added enum for exchange steps, three new function declarations for dynamic segment lifecycle. Clean interface design with test helper function
src/components/tl/ucp/tl_ucp_task.h 4/5 added UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG flag and dynamic_segments struct with exchange state, buffers, and service collective requests. Well-structured additions

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Alg as Alltoall Algorithm
    participant Init as dynamic_segment_init
    participant Exch as dynamic_segment_exchange_nb
    participant Fin as dynamic_segment_finalize
    participant Svc as Service Collective
    
    App->>Alg: ucc_tl_ucp_alltoall_onesided_init
    Alg->>Init: ucc_tl_ucp_coll_dynamic_segment_init
    Note over Init: Check if buffers already mapped
    alt Buffers not mapped
        Init->>Init: map src_memh via ucc_tl_ucp_mem_map
        Init->>Init: map dst_memh via ucc_tl_ucp_mem_map
        Init->>Alg: Set USE_DYN_SEG flag
    else Buffers pre-mapped
        Init->>Alg: Return UCC_OK (no-op)
    end
    
    App->>Alg: ucc_tl_ucp_alltoall_onesided_start
    Alg->>Exch: ucc_tl_ucp_coll_dynamic_segment_exchange_nb
    
    Note over Exch: STEP_INIT: Pack memory handles
    Exch->>Exch: dynamic_segment_pack_memory_handles
    Exch->>Alg: Return UCC_INPROGRESS
    
    App->>Alg: progress (via task queue)
    Alg->>Exch: ucc_tl_ucp_test_dynamic_segment
    Note over Exch: STEP_SIZE_TEST: Exchange sizes
    Exch->>Svc: Start allgather for pack sizes
    Exch->>Alg: Return UCC_INPROGRESS
    
    App->>Alg: progress
    Alg->>Exch: continue exchange
    Exch->>Svc: Test allgather completion
    Svc-->>Exch: Complete with global sizes
    
    Note over Exch: STEP_DATA_ALLOC: Allocate buffers
    Exch->>Exch: dynamic_segment_allocate_buffers
    
    Note over Exch: STEP_DATA_START: Exchange handles
    Exch->>Svc: Start allgather for packed handles
    Exch->>Alg: Return UCC_INPROGRESS
    
    App->>Alg: progress
    Alg->>Exch: continue exchange
    Exch->>Svc: Test allgather completion
    Svc-->>Exch: Complete with all handles
    
    Note over Exch: STEP_DATA_TEST: Import handles
    Exch->>Exch: dynamic_segment_import_memory_handles
    loop For each rank
        Exch->>Exch: ucc_tl_ucp_mem_map(IMPORT)
    end
    Exch->>Exch: Set exchange_step = COMPLETE
    Exch->>Alg: Return UCC_OK
    
    Note over Alg: Execute onesided operations
    Alg->>Alg: ucc_tl_ucp_get_nb / ucc_tl_ucp_put_nb
    
    App->>Alg: progress until complete
    
    App->>Alg: Task completion/finalize
    Alg->>Fin: ucc_tl_ucp_coll_dynamic_segment_finalize
    loop For each global handle
        Fin->>Fin: ucc_tl_ucp_mem_unmap(IMPORT)
    end
    Fin->>Fin: unmap src_local and dst_local
    Fin->>Alg: Return UCC_OK
Loading

9 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment thread src/components/tl/ucp/tl_ucp_coll.c
Comment thread src/components/tl/ucp/tl_ucp_coll.c
Comment thread src/components/tl/ucp/tl_ucp_coll.c Outdated
Comment thread src/components/tl/ucp/tl_ucp_coll.c Outdated
@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

based on WG discussions, will remove a2av support from here. cleanup and if OK will merge

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Dec 18, 2025

Greptile Summary

This PR introduces ucc_tl_ucp_coll_dynamic_segment_{init,exchange_nb,finalize,destroy} to automatically register and exchange RDMA memory handles for one-sided alltoall without pre-mapped buffers. Several memory-management issues from the prior draft have been addressed, but three new P1 logic bugs were found in tl_ucp_coll.c and one in alltoallv_onesided.c.

Confidence Score: 2/5

Not safe to merge — multiple P1 logic bugs in the dynamic-segment infrastructure remain unresolved

Three new P1 findings on top of several open P1s from prior rounds; ceiling for multiple P1s is below 4

src/components/tl/ucp/tl_ucp_coll.c and src/components/tl/ucp/alltoallv/alltoallv_onesided.c

Important Files Changed

Filename Overview
src/components/tl/ucp/tl_ucp_coll.c Core implementation: early-return nesting bug, error-status overwrite, and dead guard with latent double-free
src/components/tl/ucp/alltoallv/alltoallv_onesided.c P1: flags field read without FIELD_FLAGS mask guard in new src_memh selection
src/components/tl/ucp/alltoall/alltoall_onesided.c Dynamic-segment lifecycle integration looks sound for the happy path
src/components/tl/ucp/tl_ucp_sendrecv.h resolve_p2p refactored to use ctx->remote_info directly; logic correct
src/components/tl/ucp/tl_ucp_task.h New flag, struct, and embedded dynamic_segments field added consistently
src/components/tl/ucp/tl_ucp_context.c mem_unmap per-branch free refactor and stale-buffer release in memh_pack look correct
test/gtest/coll/test_alltoall.cc New dynamic-segment test; mask/flags override is intentional

Reviews (15): Last reviewed commit: "REVIEW: address feedback" | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (6)

  1. src/components/tl/ucp/tl_ucp_coll.c, line 597 (link)

    logic: global[j] is not a separately allocated object—it points into the contiguous global_buffer allocated at line 567. Calling ucc_free(global[j]) will corrupt the heap. Only free global[j]->tl_h (already done at line 597) and the global array itself, not individual entries.

  2. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 117-118 (link)

    logic: memory handle swap missing for GET with dynamic segments. The original code (lines 95-98) swaps src_memh and dst_memh for GET because remote reads need the destination buffer's rkey. Without this swap for dynamic segments, remote address resolution will fail.

  3. test/gtest/coll/test_alltoall.cc, line 283-284 (link)

    logic: overriding mask and flags after data_init discards values set by the function (lines 38, 60-62). The function already configured these fields based on coll_mask and coll_flags members. Set this->coll_mask and this->coll_flags before calling data_init instead of overriding afterward.

  4. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 138-141 (link)

    style: finalization errors overwrite original task status. If the task completed successfully but finalization fails, the error propagates correctly. However, if the task already failed, finalization errors replace the original failure reason. Consider logging finalization errors separately while preserving the original status when task->super.status != UCC_OK.

  5. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 306-310 (link)

    logic: if dynamic segment initialization fails (line 306), the code jumps to out: but task was already allocated at line 296 and is never freed, causing a memory leak. Add cleanup for task before the out: label or use a different error path.

  6. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 209-223 (link)

    style: when exchange completes synchronously (returns UCC_OK at line 209), execution continues to line 223 and enqueues the task. Progress functions will call ucc_tl_ucp_test_dynamic_segment again (lines 106, 164). Verify that ucc_tl_ucp_test_dynamic_segment returns UCC_OK immediately when exchange_step == COMPLETE.

13 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. src/components/tl/ucp/tl_ucp_coll.c, line 198-201 (link)

    logic: condition should also check that the memory handles are present in the args, not just that the mask fields are set. Both src and dst memh could be NULL even if the mask is set.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. src/components/tl/ucp/tl_ucp_coll.c, line 770-776 (link)

    logic: potential service collective request leak: if an error occurs after starting scoll_req_sizes or scoll_req_data but before they complete, these requests are not finalized in the cleanup path

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +283 to +286
for (auto i = 0; i < ctxs.size(); i++) {
ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
ctxs[i]->args->flags = 0; // No special flags for dynamic segments
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Overriding mask and flags after data_init discards configuration already applied by the initialization function. On line 282, data_init is called with the onesided path (line 56-67), which sets mask |= UCC_COLL_ARGS_FIELD_FLAGS | UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER and flags |= UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS. Then lines 284-285 overwrite these with mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER and flags = 0, removing the memory mapped flag that was intentionally set.

Instead, set coll_mask and coll_flags members before calling data_init:

Suggested change
for (auto i = 0; i < ctxs.size(); i++) {
ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
ctxs[i]->args->flags = 0; // No special flags for dynamic segments
}
this->coll_mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
this->coll_flags = 0;
data_init(size, dtype, count, ctxs, team, false);

@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

/build

@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

@janjust it looks like one of the tests are failing but i cannot see where it is failing. can you confirm this?

@janjust
Copy link
Copy Markdown
Collaborator

janjust commented Jan 30, 2026

@janjust it looks like one of the tests are failing but i cannot see where it is failing. can you confirm this?

[2026-01-28T17:22:39.232Z] ++ eval 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null dlcluster.nvidia.com salloc --account=blackwell-ci -N 2 -p gb300nvl72_ci --job-name=ucc-ci-test-nvls-63 --immediate=600 --time=00:30:00 --no-shell'

[2026-01-28T17:32:45.788Z] + SALLOC_OUTPUT='+++ ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null dlcluster.nvidia.com salloc --account=blackwell-ci -N 2 -p gb300nvl72_ci --job-name=ucc-ci-test-nvls-63 --immediate=600 --time=00:30:00 --no-shell

[2026-01-28T17:32:45.788Z] Warning: Permanently added '\''dlcluster.nvidia.com'\'' (ED25519) to the list of known hosts.

[2026-01-28T17:32:45.788Z] limit: stacksize: Can'\''t set limit

[2026-01-28T17:32:45.788Z] salloc: lua: Setting job to exclusive mode as no gpu count was specified

[2026-01-28T17:32:45.788Z] salloc: Pending job allocation 202281

[2026-01-28T17:32:45.788Z] salloc: job 202281 queued and waiting for resources

[2026-01-28T17:32:45.788Z] salloc: error: Unable to allocate resources: Connection timed out'

Yeah, you have to click on the "green" tab before the red to see the error, the error occurs in the pipeline prior.

image

@janjust
Copy link
Copy Markdown
Collaborator

janjust commented Jan 30, 2026

looks like it failed to allocate the resource to do the nvls test

@wfaderhold21
Copy link
Copy Markdown
Collaborator Author

/build

Comment on lines +575 to +578
if (status != UCC_OK) {
tl_error(UCC_TASK_LIB(args->task),
"failed to import dst memory handle for rank %d ", i);
goto out;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak: global[i]->tl_h not freed on failed import

When ucc_tl_ucp_mem_map fails for rank i, control jumps to out:. At that point global[i]->tl_h has been allocated (line 562–563) but the cleanup loop only iterates j < i, so the freshly-allocated tl_h for index i is never freed.

        status = ucc_tl_ucp_mem_map(
            &ctx->super.super, UCC_MEM_MAP_MODE_IMPORT,
            global[i],
            global[i]->tl_h);
        if (status != UCC_OK) {
            tl_error(UCC_TASK_LIB(args->task),
                     "failed to import dst memory handle for rank %d ", i);
            ucc_free(global[i]->tl_h);   /* add this before goto */
            global[i]->tl_h = NULL;
            goto out;
        }

The out: cleanup loop (for (j = 0; j < i; j++)) only unmap+frees handles for ranks 0..i-1. The allocation at rank i must be freed explicitly before jumping to out:, otherwise every failed import leaks one ucc_mem_map_tl_t.

Comment on lines +189 to +196
(coll_args->coll_type == UCC_COLL_TYPE_SCATTERV)) {
tl_debug(UCC_TASK_LIB(task), "dynamic segments are not supported for %s",
ucc_coll_type_str(coll_args->coll_type));
return UCC_ERR_NOT_SUPPORTED;
}
if ((coll_args->mask & UCC_COLL_ARGS_FIELD_FLAGS)) {
if ((coll_args->flags & UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS)) {
return UCC_OK;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early-exit condition misses case where FLAGS is set but both memh are also provided

The two early-return conditions are structured as if (FLAGS in mask) { ... } else if (both memh in mask) { ... }. This means: when UCC_COLL_ARGS_FIELD_FLAGS is present in the mask but UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS is NOT set, the else if branch is never reached — so dynamic segments are created even when the caller has already supplied explicit src and dst memory handles.

The memh check should be independent of the FLAGS check:

if ((coll_args->mask & UCC_COLL_ARGS_FIELD_FLAGS) &&
    (coll_args->flags & UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS)) {
    return UCC_OK;
}
if ((coll_args->mask & UCC_COLL_ARGS_FIELD_MEM_MAP_SRC_MEMH) &&
    (coll_args->mask & UCC_COLL_ARGS_FIELD_MEM_MAP_DST_MEMH)) {
    return UCC_OK;
}

This ensures that callers who provide both memh but also happen to set the FLAGS field without MEM_MAPPED_BUFFERS are not penalised with unnecessary dynamic-segment overhead.

Comment on lines +725 to +739
task->dynamic_segments.exchange_step = UCC_TL_UCP_DYN_SEG_EXCHANGE_STEP_DATA_START;
return UCC_INPROGRESS;

case UCC_TL_UCP_DYN_SEG_EXCHANGE_STEP_DATA_START:
if (task->dynamic_segments.scoll_req_data == NULL) {
/* First call - start the allgather */
status = dynamic_segment_pack_and_exchange_data_start(
task->dynamic_segments.exchange_args,
&task->dynamic_segments.scoll_req_data);
if (status != UCC_OK) {
tl_error(UCC_TASK_LIB(task), "failed to start data exchange %s",
ucc_status_string(status));
goto err_cleanup_global;
}
return UCC_INPROGRESS;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak of tl_h in dynamic_segment_allocate_buffers failure paths

dynamic_segment_alloc_seg allocates two heap objects: the outer ucc_mem_map_memh_t struct and an inner lmemh->tl_h. All failure-cleanup branches in this function call ucc_free(args->src_memh_pack) (or dst_memh_pack) directly, which frees only the outer struct and silently leaks the inner tl_h.

This affects three error paths:

  1. dst_memh_pack alloc failure (line 725):
ucc_free(args->src_memh_pack);         // leaks src_memh_pack->tl_h
  1. exchange_buffer alloc failure (line 735–738):
ucc_free(args->src_memh_pack);         // leaks tl_h
ucc_free(args->dst_memh_pack);         // leaks tl_h
  1. global_buffer alloc failure (lines 747–750, same pattern).

The fix in each location is to free the inner tl_h before freeing the struct, e.g.:

ucc_free(args->src_memh_pack->tl_h);
args->src_memh_pack->tl_h = NULL;
ucc_free(args->src_memh_pack);
args->src_memh_pack = NULL;

Note that dynamic_segment_cleanup_buffers already does this correctly — the same pattern should be applied inline in the failure paths here, or the allocation cleanup should be delegated to that helper.

Comment on lines +501 to +510
return status;
}
return UCC_OK;
}

static ucc_status_t
dynamic_segment_pack_and_exchange_data_test(ucc_tl_ucp_dyn_seg_args_t *args,
ucc_service_coll_req_t **scoll_req)
{
ucc_tl_ucp_team_t *tl_team = UCC_TL_UCP_TASK_TEAM(args->task);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete cleanup when dst_memh mapping fails in ucc_tl_ucp_coll_dynamic_segment_init

When dynamic_segment_map_memh for dst_memh fails, the code correctly calls ucc_tl_ucp_mem_unmap on src_memh->tl_h, frees src_memh->tl_h, and frees src_memh. However, ucc_tl_ucp_mem_unmap frees tl_h->tl_data (the inner UCP data structure) but the error status from unmap is assigned back to status, which overwrites the original failure status from dynamic_segment_map_memh. If the unmap also fails, the caller will only see the unmap error, not the original mapping failure that triggered this path:

status = ucc_tl_ucp_mem_unmap(&ctx->super.super, UCC_MEM_MAP_MODE_EXPORT,
                              src_memh->tl_h);
if (status != UCC_OK) {
    tl_error(UCC_TASK_LIB(task), "failed to unmap src memory handle");
}
ucc_free(src_memh->tl_h);
ucc_free(src_memh);
return status;   // returns unmap error, original error is lost

The original status (the dst_memh mapping error) should be preserved and returned. Consider using a separate local variable for the unmap return value.

Comment on lines +693 to +698
task->dynamic_segments.exchange_args,
&task->dynamic_segments.scoll_req_sizes);
if (status != UCC_OK) {
tl_error(UCC_TASK_LIB(task), "failed to start allgather %s",
ucc_status_string(status));
goto err_cleanup;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead-code check: exchange_size can never be zero

The guard if (args->exchange_size == 0) can never be true because:

args->exchange_size = sizeof(ucc_mem_map_memh_t) + UCC_MEM_MAP_TL_NAME_LEN +
                      sizeof(size_t) + args->max_individual_pack_size;

The first three terms alone sum to a non-zero constant, so the total is always positive regardless of max_individual_pack_size. More importantly, *scoll_req has already been finalized (it is passed through dynamic_segment_calculate_sizes_test) so calling ucc_service_coll_finalize(*scoll_req) here would double-finalize an already-freed request if this branch were ever reached. The guard and its body should be removed to avoid confusing future readers and potential double-free bugs if the invariant ever changes.

Comment on lines 125 to 126
task = ucc_tl_ucp_init_task(coll_args, team);
*task_h = &task->super;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Missing NULL check for ucc_tl_ucp_init_task

ucc_tl_ucp_init_task can return NULL on OOM, but the next line immediately dereferences the result via *task_h = &task->super. This would cause a NULL pointer dereference and crash.

Notably, alltoall_onesided_init was updated in this same PR to include the NULL guard (see the alltoall init where if (ucc_unlikely(!task)) { status = UCC_ERR_NO_MEMORY; goto out; } was added), but the same fix was not applied here.

Suggested change
task = ucc_tl_ucp_init_task(coll_args, team);
*task_h = &task->super;
task = ucc_tl_ucp_init_task(coll_args, team);
if (ucc_unlikely(!task)) {
status = UCC_ERR_NO_MEMORY;
goto out;
}
*task_h = &task->super;

Comment on lines +207 to +215
}
}
/* Register both local buffers for RDMA. Only the remotely-accessed side
* (src for GET, dst for PUT) is exported to peers during exchange; the
* other side is kept as the local UCP memh for the RDMA operation. */
status = dynamic_segment_map_memh(&src_memh, coll_args, DYN_SEG_SRC, task);
if (UCC_OK != status) {
return status;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Early-return skips dynamic segments when local (non-global) memh provided

The early-return conditions skip dynamic segment creation when the relevant memory handle field is present in the mask — but they do not verify that the handle is also global (i.e. that UCC_COLL_ARGS_FLAG_SRC_MEMH_GLOBAL / UCC_COLL_ARGS_FLAG_DST_MEMH_GLOBAL is set and the FLAGS field is present in the mask).

GET path: if UCC_COLL_ARGS_FIELD_MEM_MAP_SRC_MEMH is set but the caller only provided a local handle (UCC_COLL_ARGS_FLAG_SRC_MEMH_GLOBAL is absent), dynamic_segment_init returns UCC_OK without setting UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG. Then in get_progress, remote_rkeys is initialized from TASK_ARGS(task).src_memh.global_memh — which is NULL when only a local handle was supplied — and is subsequently used as a per-rank array, causing a NULL dereference.

PUT path: the symmetric issue exists with dst_memh.global_memh.

The fix is to gate the early-return on the global flag being set:

if (alg == UCC_TL_UCP_ALLTOALL_ONESIDED_GET) {
    if ((coll_args->mask & UCC_COLL_ARGS_FIELD_MEM_MAP_SRC_MEMH) &&
        (coll_args->mask & UCC_COLL_ARGS_FIELD_FLAGS) &&
        (coll_args->flags & UCC_COLL_ARGS_FLAG_SRC_MEMH_GLOBAL)) {
        return UCC_OK;
    }
} else {
    if ((coll_args->mask & UCC_COLL_ARGS_FIELD_MEM_MAP_DST_MEMH) &&
        (coll_args->mask & UCC_COLL_ARGS_FIELD_FLAGS) &&
        (coll_args->flags & UCC_COLL_ARGS_FLAG_DST_MEMH_GLOBAL)) {
        return UCC_OK;
    }
}

Comment on lines +656 to +670
task->dynamic_segments.exchange_args =
ucc_calloc(1, sizeof(ucc_tl_ucp_dyn_seg_args_t), "exchange_args");
if (!task->dynamic_segments.exchange_args) {
tl_error(UCC_TASK_LIB(task), "failed to allocate exchange_args");
return UCC_ERR_NO_MEMORY;
}
task->dynamic_segments.exchange_args->task = task;
/* Only expose the exchanged side for packing; the other side is used
* locally by the RDMA operation and does not need to be shared. */
if (task->dynamic_segments.alg == UCC_TL_UCP_ALLTOALL_ONESIDED_GET) {
task->dynamic_segments.exchange_args->src_memh_local =
task->dynamic_segments.src_local;
task->dynamic_segments.exchange_args->dst_memh_local = NULL;
} else {
task->dynamic_segments.exchange_args->src_memh_local = NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 dynamic_segment_memh_pack copies tl_h by value — shallow pointer copy

memcpy(memh->tl_h, args->dst_memh_local->tl_h, sizeof(ucc_mem_map_tl_t)) (lines 662–664) does a shallow copy of the ucc_mem_map_tl_t struct, which contains pointers (e.g., tl_data, packed_size). The src_memh_pack/dst_memh_pack tl_h now shares raw-pointer fields with src_local->tl_h / dst_local->tl_h.

Later, dynamic_segment_cleanup_buffers frees src_memh_pack->tl_h but does not null-out the aliased pointers in the local handles. While cleanup_buffers is called before destroy in the normal success path, if destroy is called while exchange_args is still live (e.g., a task cancelled mid-flight), both cleanup_buffers (via destroy → not called) and destroy itself would attempt to free the same tl_data pointer, causing a double-free.

The tl_h copy in dynamic_segment_memh_pack is used only to carry tl_name / mode into the packed struct; the pointer fields should not be copied. Only tl_name (and perhaps mode) are meaningful on the receiving side:

Comment on lines +548 to +576
for (i = 0; i < UCC_TL_TEAM_SIZE(tl_team); i++) {
/* Each rank's data in global buffer contains only:
- dst_memh_pack: sizeof(ucc_mem_map_memh_t) + max_individual_pack_size + sizeof(size_t) * 2
*/
offset = i * args->exchange_size;
global[i] =
(ucc_mem_map_memh_t *)PTR_OFFSET(
args->task->dynamic_segments.global_buffer, offset);
global[i]->tl_h =
ucc_calloc(1, sizeof(ucc_mem_map_tl_t), "global tl_h");
if (!global[i]->tl_h) {
tl_error(UCC_TASK_LIB(args->task),
"failed to allocate global tl handles");
status = UCC_ERR_NO_MEMORY;
goto out;
}

status = ucc_tl_ucp_mem_map(
&ctx->super.super, UCC_MEM_MAP_MODE_IMPORT,
global[i],
global[i]->tl_h);
if (status != UCC_OK) {
tl_error(UCC_TASK_LIB(args->task),
"failed to import dst memory handle for rank %d ", i);
ucc_free(global[i]->tl_h);
global[i]->tl_h = NULL;
goto out;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 global[i]->tl_h write corrupts received allgather data

global[i] is a pointer directly into global_buffer at offset i * exchange_size. Writing global[i]->tl_h = ucc_calloc(...) overwrites the tl_h field that was placed there by the allgather (the serialised remote pointer, now meaningless locally — this part is intentional). However, the ucc_mem_map_tl_t struct layout packs the tl_name field and other metadata adjacent to tl_data; if the compiler places tl_h at the very beginning of ucc_mem_map_memh_t, this write may partially overwrite the pack_buffer data that ucc_tl_ucp_mem_map(IMPORT) needs to read for the rkey.

Verify (ideally with a static assert) that offsetof(ucc_mem_map_memh_t, tl_h) does not overlap with the layout assumed by ucc_tl_ucp_mem_map(IMPORT) when reading memh->pack_buffer.

Comment on lines 446 to +463
return UCC_OK;
}

static inline int resolve_segment(const void *va, size_t *key_sizes,
ptrdiff_t *key_offset, size_t nr_segments,
ucc_tl_ucp_remote_info_t *rinfo)
{
int i;
uint64_t base;
uint64_t end;

for (i = 0; i < nr_segments; i++) {
base = (uint64_t)rinfo[i].va_base;
end = base + rinfo[i].len;
if ((uint64_t)va >= base && (uint64_t)va < end) {
return i;
}
*key_offset += key_sizes[i];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 resolve_segment receives uninitialized rinfo when ctx->remote_info is NULL

The new resolve_segment helper dereferences rinfo[i].va_base inside the loop. In ucc_tl_ucp_resolve_p2p_by_va, it is called as:

*segment = resolve_segment(va, key_sizes, &key_offset, ctx->n_rinfo_segs, ctx->remote_info);

If ctx->remote_info is NULL but ctx->n_rinfo_segs is somehow non-zero (e.g. a context that registered segments but whose remote_info pointer was later freed), this results in a NULL pointer dereference inside resolve_segment. The old code had an explicit if (ctx->remote_info) guard in ucc_tl_ucp_team_create_test before copying to team->va_base; that guard was removed along with the fields, but no equivalent null-check was added here.

Consider adding a guard before the call:

if (ctx->remote_info && ctx->n_rinfo_segs > 0) {
    *segment = resolve_segment(va, key_sizes, &key_offset, ctx->n_rinfo_segs, ctx->remote_info);
} else {
    *segment = -1;
}

TL/UCP: enable nonblocking dynamic segments

TL/UCP: alltoall onesided convert to dyn seg
TEST: add mpi dyn seg alltoallv test
Comment on lines 35 to +38
ucc_tl_ucp_task_reset(task, UCC_INPROGRESS);

if (TASK_ARGS(task).mask & UCC_COLL_ARGS_FIELD_MEM_MAP_SRC_MEMH) {
if (TASK_ARGS(task).flags & UCC_COLL_ARGS_FLAG_SRC_MEMH_GLOBAL) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 FIELD_FLAGS mask not checked before reading flags

TASK_ARGS(task).flags is read without first verifying that UCC_COLL_ARGS_FIELD_FLAGS is present in TASK_ARGS(task).mask. If a caller sets UCC_COLL_ARGS_FIELD_MEM_MAP_SRC_MEMH without setting UCC_COLL_ARGS_FIELD_FLAGS, the flags field is indeterminate, so the UCC_COLL_ARGS_FLAG_SRC_MEMH_GLOBAL test may fire spuriously (or silently pass), leading to a NULL dereference or incorrect memh selection at global_memh[grank].

Comment on lines +199 to +215
* (peers push into it). The non-DYN_SEG progress path will use it. */
if (alg == UCC_TL_UCP_ALLTOALL_ONESIDED_GET) {
if ((coll_args->mask & UCC_COLL_ARGS_FIELD_MEM_MAP_SRC_MEMH) &&
(coll_args->mask & UCC_COLL_ARGS_FIELD_FLAGS) &&
(coll_args->flags & UCC_COLL_ARGS_FLAG_SRC_MEMH_GLOBAL)) {
return UCC_OK;
}
} else {
if ((coll_args->mask & UCC_COLL_ARGS_FIELD_MEM_MAP_DST_MEMH) &&
(coll_args->mask & UCC_COLL_ARGS_FIELD_FLAGS) &&
(coll_args->flags & UCC_COLL_ARGS_FLAG_DST_MEMH_GLOBAL)) {
return UCC_OK;
}
}
if (coll_args->src.info.count == 0) {
return UCC_OK;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Early-return for per-algorithm memh only checks FIELD_FLAGS first, hiding the else if

The two bail-out conditions are nested: if (FIELD_FLAGS) { … } else if (both memh) { … }. When FIELD_FLAGS is present in the mask but MEM_MAPPED_BUFFERS is not set, the else if is never reached. A caller who already supplied both src_memh and dst_memh (global) but whose flags field doesn't carry MEM_MAPPED_BUFFERS will still end up in dynamic-segment registration unnecessarily. The two guards should be made independent.

Comment on lines +483 to +495
dynamic_segment_memh_pack((ucc_context_h)&ctx->super.super, args, DYN_SEG_SRC);
copy_size += args->src_pack_size;
memcpy(args->exchange_buffer, args->src_memh_pack, copy_size);
} else {
/* PUT: each rank shares its dst handle so peers can PUT into it */
dynamic_segment_memh_pack((ucc_context_h)&ctx->super.super, args, DYN_SEG_DST);
copy_size += args->dst_pack_size;
memcpy(args->exchange_buffer, args->dst_memh_pack, copy_size);
}
/* Allgather the packed memory handles */
status = ucc_service_allgather(core_team, args->exchange_buffer,
args->task->dynamic_segments.global_buffer,
args->exchange_size, subset, scoll_req);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Original mapping error overwritten by unmap status

When dynamic_segment_map_memh for dst_memh fails, status holds the original error. The subsequent ucc_tl_ucp_mem_unmap call then unconditionally overwrites status. If the unmap itself fails the caller receives the unmap error, losing the original mapping failure that triggered this path. Save status before the unmap and preserve it on return.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants