Skip to content

git commit -m "PDP: release locks and remove change on domain-id mism…#6408

Open
lobmoo wants to merge 1 commit into
eProsima:masterfrom
lobmoo:fix/spdp-cache-change-leak-on-early-return
Open

git commit -m "PDP: release locks and remove change on domain-id mism…#6408
lobmoo wants to merge 1 commit into
eProsima:masterfrom
lobmoo:fix/spdp-cache-change-leak-on-early-return

Conversation

@lobmoo
Copy link
Copy Markdown

@lobmoo lobmoo commented May 21, 2026

Title: PDP: release locks and remove change on domain-id mismatch

Description

  • Problem: When processing received Participant DATA, if the participant_data.domain_id differs from the local domain, the current return path exits without releasing previously acquired locks (the reader mutex and the PDP mutex) and without removing the change from the PDP reader history. This can lead to deadlocks or history leaks.

  • Change: In src/cpp/rtps/builtin/discovery/participant/PDPListener.cpp, inside on_new_cache_change_added, when check_discovery_conditions(temp_participant_data_) returns false (domain-id mismatch), the code now performs the following before returning:

    • reader->getMutex().unlock();
    • lock.unlock();
    • parent_pdp_->builtin_endpoints_->remove_from_pdp_reader_history(change);
  • Rationale: Ensure all early return paths that hold locks properly release them and clean up resources to avoid potential deadlocks and leftover history entries.

  • Scope: This affects only PDP discovery early-return logic for domain-id mismatch cases. It does not change normal participant processing flow or public APIs.

  • Testing recommendations:

    • In a multithreaded scenario, simulate Participant announcements from different domains and verify no deadlocks occur.
    • Verify that when domain-id mismatches, the corresponding change is removed from the reader history (e.g., by checking history size or using log assertions).

Files changed

  • Modified file: src/cpp/rtps/builtin/discovery/participant/PDPListener.cpp (around lines 121-128)

Patch snippet

  • Before:
if (!check_discovery_conditions(temp_participant_data_))
{
    return;
}
  • After:
if (!check_discovery_conditions(temp_participant_data_))
{
    // Release locks and remove the change from history before returning
    reader->getMutex().unlock();
    lock.unlock();
    parent_pdp_->builtin_endpoints_->remove_from_pdp_reader_history(change);
    return;
}

Contributor Checklist

  • Commit messages follow project guidelines.
  • Code adheres to the project's style guidelines.
  • Add/update regression tests covering this fix if feasible.
  • Validate the change in a local multithreaded scenario to ensure no deadlocks or resource leaks.
  • Update Doxygen documentation if needed to mention early-return resource cleanup.

Reviewer Checklist

  • PR title and description accurately reflect the change.
  • All early return paths that hold locks release them and clean up resources.
  • CI passes and relevant tests are green.
  • Consider backporting this critical fix to supported branches if applicable.

Backports (optional)

Notes

If you want, I can create a local branch, apply this patch, commit, and push the branch to your remote and prepare a draft PR.

image

…atch

When a received Participant DATA has a different domain_id, return path left
reader and PDP mutexes held and did not remove the change from history.
Ensure locks are released and change removed before returning."
@lobmoo
Copy link
Copy Markdown
Author

lobmoo commented May 21, 2026

When the same multicast address and port are configured for different domain IDs, and they communicate within the same physical network, the early return path causes a memory leak. Without setting a participant's pool size, this will continue until system memory is exhausted.

@lobmoo
Copy link
Copy Markdown
Author

lobmoo commented May 21, 2026

第一次收到这个 writer 的 seq=1,last_notified 还是默认 0,所以 thereIsUpperRecordOf(...) 返回 false,流程继续往下走。
进入 change_received() 之后,会调用 update_last_notified(..., 1),所以这个 writer 的最后通知序列号会变成 1。
但 PDPListener 在 check_discovery_conditions() 失败后,没有执行 remove_from_pdp_reader_history(change),所以这条 change 还留在 reader history 里,日志里才会报 leak。
所以后果是:

同一个 writer 再发 seq=1 的重复包,会被挡住,因为 last_notified=1,thereIsUpperRecordOf(1) 为 true。
如果它后面发 seq=2、3...,都会再次进入处理流程,然后又在 check_discovery_conditions() 失败处被丢掉。
于是就形成你看到的现象:每来一次新的公告,就处理一次、失败一次、history 里多留一条,看起来像“一直被放行”,实际上是“序列号在前进,所以旧包挡住了,新包又继续进来”。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant