[TENT] Fix RDMA task memory corruption#1812
[TENT] Fix RDMA task memory corruption#1812alogfans wants to merge 2 commits intokvcache-ai:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request transitions RdmaTask to a pointer-based lifecycle managed by a reference counting mechanism and a Slab allocator. Key changes include updating RdmaSubBatch to store task pointers and implementing ref/deref logic. Review feedback identifies a compilation error regarding the undefined slice_dev_ids variable and critical flaws in the reference counting implementation, where a mismatch between slice-level increments and batch-level decrements results in memory leaks.
| slice->next = nullptr; | ||
| slice->enqueue_ts = enqueue_ts; | ||
| task.num_slices++; | ||
| slice->source_dev_id = slice_dev_ids[slice_idx]; |
| for (auto task : rdma_batch->task_list) { | ||
| task->deref(); | ||
| } |
There was a problem hiding this comment.
There is a mismatch in the reference counting logic that will lead to memory leaks. In submitTransferTasks, task->ref() is called for every slice created (line 327). If a task has ref_count becomes freeSubBatch, task->deref() is only called once per task. Since RdmaSlice does not currently call deref() when it is deallocated, RdmaTask from ever being freed. Furthermore, if a task has 0 slices, ref_count starts at 0 and deref() will decrement it to -1, also failing to trigger deallocation.
| task->success_slices = 0; | ||
| task->resolved_slices = 0; | ||
| task->first_error = PENDING; | ||
| task->ref_count = 0; |
There was a problem hiding this comment.
Setting task->ref_count = 0 here is redundant as the RdmaTask struct already initializes it to 0. However, to fix the lifecycle logic, the batch itself should probably hold a reference to the task while it is in the task_list. Consider initializing ref_count to 1 to represent the batch's ownership, and then having each slice increment it further.
task->ref_count = 1;|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Description
When a large request is split into multiple slices, a race condition occurs between
the worker thread's
acknowledge()and the user thread'slazyFreeBatch():task->status_word = COMPLETEDlazyFreeBatch()sees COMPLETED and immediately freesrdma_batchacknowledge()is still executing and accessesslice->taskslice->taskpoints to memory insiderdma_batch->task_list, this becomesa use-after-free, causing memory corruption
The issue is that
RdmaTaskhas a dependent lifecycle onRdmaSubBatch, but slicesmay outlive the batch.
Solution:
RdmaTaskindependently allocated from Slab with reference countingRdmaTaskis only deallocated when all its slices are releasedslice->taskremains valid even afterrdma_batchis freedChanges:
std::atomic<int> ref_countandref()/deref()methods toRdmaTaskRdmaSubBatch::task_listfromvector<RdmaTask>tovector<RdmaTask*>RdmaTaskStorage::Get().allocate()task->ref()when createdfreeSubBatch()callstask->deref()when releasingThis ensures
slice->taskhas an independent lifecycle and is always valid duringacknowledge()execution, eliminating the race condition.Module
mooncake-transfer-engine)mooncake-store)mooncake-ep)mooncake-integration)mooncake-p2p-store)mooncake-wheel)mooncake-pg)mooncake-rl)Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.