Skip to content

[Darwin] mDNS resolve cancel/restart drops in-flight results, causing ~1s NodeID resolve retries#72273

Open
woody-apple wants to merge 1 commit into
project-chip:masterfrom
woody-apple:dev/woody/176263876
Open

[Darwin] mDNS resolve cancel/restart drops in-flight results, causing ~1s NodeID resolve retries#72273
woody-apple wants to merge 1 commit into
project-chip:masterfrom
woody-apple:dev/woody/176263876

Conversation

@woody-apple
Copy link
Copy Markdown
Contributor

@woody-apple woody-apple commented May 30, 2026

Summary

  • User-visible failure: On Darwin, every reconnect to a Matter node after ChipDnssdResolveNoLongerNeeded shows ~1s of extra NodeID-resolve latency. Inbound mDNS resolve answers that were already on the dnssd socket get discarded, so the next resolve has to start from scratch.
  • Root cause: When the consumer counter drops to zero we immediately call Finalize -> DNSServiceRefDeallocate. Per the dnssd contract, DNSServiceRefDeallocate discards any events queued on that connection but not yet read off the socket. A second observation from the mDNS owner is that "starting and stopping queries doesn't query harder" — a tight cancel-then-restart for the same instance name is strictly worse than letting the existing query run.
  • Fix shape: Introduce a per-ResolveContext deferred-teardown window (default 500ms) before the actual DNSServiceRefDeallocate. Inside the window: a queued read indicator dispatches the result through DispatchSuccess (which cancels the timer); a new ChipDnssdResolve for the same instance name reuses the existing context and bumps the counter back to 1, skipping DNSServiceCreateConnection / DNSServiceResolve entirely; otherwise the timer fires OnResolveDeferredTeardown -> Finalize(CHIP_ERROR_CANCELLED), preserving the existing failure-path contract upper layers rely on. Both DispatchSuccess and DispatchFailure cancel any pending teardown timer so it can never fire against a freed context.

Blast radius is confined to the Darwin dnssd platform layer; Linux, ESP32, OpenThread, and Android paths are untouched. The change is behaviorally inert for callers that never re-Resolve after ResolveNoLongerNeeded (the cancel still propagates, just after the window). Rollback is a single-commit revert.

Testing

New TestDarwinDnssdResolveCoalesce tests in src/platform/tests/TestDnssd.cpp exercise the state machine in test time via SetResolveDeferredTeardownDelay:

  • ReusesContextWithinDeferredWindowResolve -> ResolveNoLongerNeeded -> Resolve within the window reuses the same ResolveContext (counter 1 -> 0 -> 1).
  • ReusesContextWithinDeferredWindowDelegate — same as above through the delegate-style API surface.
  • CancelStillPropagatesIfNoInFlightResult — wait > teardown delay with no follow-up Resolve actually fires the deferred teardown exactly once, dispatching CHIP_ERROR_CANCELLED.

Note: a fourth planned test that injects a synthetic OnResolve callback inside the deferred window is deferred — OnResolve lives in an anonymous namespace and is not addressable from the test target. The equivalent behavior is observable via the DispatchSuccess timer-cancellation paths exercised by the existing tests plus integration coverage in the discovery layer.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a deferred teardown mechanism for ResolveContext on Darwin platforms to prevent dropping in-flight resolve results when the consumer count drops to zero, and adds corresponding unit tests. The reviewer identified critical security issues where reusing a deferred-teardown context without updating its callback, context, or delegate pointers could lead to Use-After-Free (UAF) vulnerabilities. Additionally, a potential UAF was flagged in the test helper RunOnMatterEventLoopSync during timeouts, with a suggestion to use thread-safe reference counting.

Comment thread src/platform/Darwin/dnssd/DnssdImpl.cpp Outdated
Comment on lines +417 to +431
// Detect (and log) the case where a different upper-layer caller is
// re-binding to the same instance name with a different callback or
// context. We intentionally do not rebind, because doing so could lose
// an in-flight result delivery to the original caller. The upper layer
// (Discovery) routes results back through the same callback path keyed
// off instance name, so a multiplexed dispatcher is expected; but if
// the dispatch tuple differs, surface that as an error log so any
// breakage of that assumption is observable.
if (!reused && (existingCtx->callback != callback || existingCtx->context != context))
{
ChipLogError(Discovery,
"Mdns: Reusing deferred-teardown ResolveContext for %s with mismatched callback/context; "
"new caller will not receive results until the next Resolve",
StringOrNullMarker(name));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Reusing a deferred-teardown ResolveContext without updating its callback and context pointers is a critical correctness and security issue.

Since the previous consumer's counter dropped to 0, that consumer has cancelled and may have been destroyed. If we reuse the context but keep the old callback and context pointers:

  1. The new caller will never receive the resolve results.
  2. When the resolve completes, the old callback will be invoked with the old context pointer, which is highly likely to be a dangling pointer, leading to a Use-After-Free (UAF) vulnerability or crash.

Since the consumer counter is 0, there are no other active consumers, so it is completely safe to overwrite the callback and context with the new caller's values.

        if (existingCtx->callback != callback || existingCtx->context != context)
        {
            ChipLogDetail(Discovery, "Mdns: Reusing deferred-teardown ResolveContext for %s with new callback/context", StringOrNullMarker(name));
            existingCtx->callback = callback;
            existingCtx->context = context;
        }

Comment thread src/platform/Darwin/dnssd/DnssdImpl.cpp Outdated
Comment on lines +470 to +476
if (!reused && existingCtx->context != delegate)
{
ChipLogError(Discovery,
"Mdns: Reusing deferred-teardown ResolveContext for %s with a different delegate; "
"new delegate will not receive results until the next Resolve",
StringOrNullMarker(name));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Similar to the callback overload above, reusing the deferred-teardown ResolveContext without updating its context (which holds the DiscoverNodeDelegate pointer) can lead to a Use-After-Free (UAF) or crash if the old delegate has been destroyed.

We should update the context pointer to the new delegate when reusing the context.

        if (existingCtx->context != delegate)
        {
            ChipLogDetail(Discovery, "Mdns: Reusing deferred-teardown ResolveContext for %s with a new delegate", StringOrNullMarker(name));
            existingCtx->context = delegate;
        }

Comment thread src/platform/tests/TestDnssd.cpp Outdated
Comment on lines +355 to +386
void RunOnMatterEventLoopSync(std::function<void(void)> fn)
{
std::mutex mutex;
std::condition_variable cv;
bool done = false;

struct Args
{
std::function<void(void)> * fn;
std::mutex * mutex;
std::condition_variable * cv;
bool * done;
};
auto * args = new Args{ &fn, &mutex, &cv, &done };

chip::DeviceLayer::PlatformMgr().ScheduleWork(
[](intptr_t arg) {
auto * a = reinterpret_cast<Args *>(arg);
(*a->fn)();
{
std::lock_guard<std::mutex> lk(*a->mutex);
*a->done = true;
}
a->cv->notify_all();
},
reinterpret_cast<intptr_t>(args));

std::unique_lock<std::mutex> lk(mutex);
bool finished = cv.wait_for(lk, std::chrono::seconds(2), [&done] { return done; });
EXPECT_TRUE(finished);
delete args;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In RunOnMatterEventLoopSync, if cv.wait_for times out (e.g., due to CI slowness or deadlock), delete args is called and the function returns.

However, the scheduled work on the Matter event loop might still execute later. When it does, it will dereference args (which has been deleted) and fn (which was a reference to a local variable on the stack that has gone out of scope). This results in a Use-After-Free (UAF) and will crash the test runner.

We can solve this robustly by using a simple thread-safe reference count inside Args and storing fn by value, so that whichever thread finishes last safely deletes the heap-allocated Args structure.

void RunOnMatterEventLoopSync(std::function<void(void)> fn)
{
    struct Args
    {
        std::atomic<int> refCount{ 2 };
        std::function<void(void)> fn;
        std::mutex mutex;
        std::condition_variable cv;
        bool done = false;
    };
    auto * args = new Args{ 2, std::move(fn), {}, {}, false };

    chip::DeviceLayer::PlatformMgr().ScheduleWork(
        [](intptr_t arg) {
            auto * a = reinterpret_cast<Args *>(arg);
            a->fn();
            {
                std::lock_guard<std::mutex> lk(a->mutex);
                a->done = true;
            }
            a->cv.notify_all();
            if (a->refCount.fetch_sub(1) == 1)
            {
                delete a;
            }
        },
        reinterpret_cast<intptr_t>(args));

    std::unique_lock<std::mutex> lk(args->mutex);
    bool finished = args->cv.wait_for(lk, std::chrono::seconds(2), [args] { return args->done; });
    EXPECT_TRUE(finished);
    if (args->refCount.fetch_sub(1) == 1)
    {
        delete args;
    }
}
References
  1. When a stack-allocated object is passed by reference to a function that stores the reference, ensure the object's lifetime is static to prevent dangling references and undefined behavior.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the Darwin DNS-SD implementation to defer mDNS resolve teardown briefly after the consumer count drops to zero, avoiding dropped in-flight resolver results and enabling immediate re-resolve calls to reuse existing contexts.

Changes:

  • Adds deferred-teardown state and timer APIs for Darwin ResolveContext.
  • Reuses deferred resolve contexts in both callback and delegate resolve paths.
  • Adds Darwin-only tests covering coalescing and delayed cancellation behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/platform/Darwin/dnssd/DnssdImpl.h Adds deferred teardown state and timer helper declarations.
src/platform/Darwin/dnssd/DnssdImpl.cpp Implements deferred teardown scheduling, cancellation, and resolve-context reuse.
src/platform/Darwin/dnssd/DnssdContexts.cpp Cancels pending teardown timers during resolve success/failure dispatch.
src/platform/tests/TestDnssd.cpp Adds Darwin tests for resolve teardown coalescing and cancellation propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/platform/Darwin/dnssd/DnssdImpl.cpp Outdated
}
chip::DeviceLayer::SystemLayer().CancelTimer(OnResolveDeferredTeardown, existingCtx);
existingCtx->deferredTeardownScheduled = false;
(*existingCtx->consumerCounter)++;
Comment thread src/platform/Darwin/dnssd/DnssdImpl.cpp Outdated
}
chip::DeviceLayer::SystemLayer().CancelTimer(OnResolveDeferredTeardown, existingCtx);
existingCtx->deferredTeardownScheduled = false;
(*existingCtx->consumerCounter)++;
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.52%. Comparing base (8a162c6) to head (6508f97).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #72273   +/-   ##
=======================================
  Coverage   55.52%   55.52%           
=======================================
  Files        1630     1630           
  Lines      111127   111127           
  Branches    13418    13418           
=======================================
+ Hits        61706    61707    +1     
+ Misses      49421    49420    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

PR #72273: Size comparison from adba29b to e7af012

Full report (35 builds for bl602, bl616, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, esp32, nrfconnect, psoc6, qpg, realtek, stm32, telink)
platform target config section adba29b e7af012 change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1094216 1094216 0 0.0
RAM 144882 144882 0 0.0
bl616 lighting-app bl616+thread FLASH 1105524 1105524 0 0.0
RAM 104280 104280 0 0.0
bl616+wifi+shell FLASH 1593296 1593296 0 0.0
RAM 98176 98176 0 0.0
bl702 lighting-app bl702+eth FLASH 1057224 1057224 0 0.0
RAM 108509 108509 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 896020 896020 0 0.0
RAM 105884 105884 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 776944 776944 0 0.0
RAM 103388 103388 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 789640 789640 0 0.0
RAM 108676 108676 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 738772 738772 0 0.0
RAM 97596 97596 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 718928 718928 0 0.0
RAM 97636 97636 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 568690 568690 0 0.0
RAM 205056 205056 0 0.0
lock CC3235SF_LAUNCHXL FLASH 596178 596178 0 0.0
RAM 205256 205256 0 0.0
efr32 lock-app BRD4187C FLASH 994060 994060 0 0.0
RAM 131288 131288 0 0.0
window-app BRD4187C FLASH 1100480 1100480 0 0.0
RAM 130360 130360 0 0.0
lock-app BRD4338a FLASH 798629 798629 0 0.0
RAM 243424 243424 0 0.0
esp32 all-clusters-app c3devkit DRAM 99716 99716 0 0.0
FLASH 1621508 1621508 0 0.0
IRAM 94776 94776 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 834248 834248 0 0.0
RAM 157540 157540 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1733916 1733916 0 0.0
RAM 215260 215260 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1622708 1622708 0 0.0
RAM 211548 211548 0 0.0
light cy8ckit_062s2_43012 FLASH 1470196 1470196 0 0.0
RAM 197420 197420 0 0.0
lock cy8ckit_062s2_43012 FLASH 1503332 1503332 0 0.0
RAM 225252 225252 0 0.0
qpg lighting-app qpg6200+debug FLASH 845080 845080 0 0.0
RAM 127964 127964 0 0.0
lock-app qpg6200+debug FLASH 783812 783812 0 0.0
RAM 118912 118912 0 0.0
realtek light-switch-app rtl8777g FLASH 688624 688624 0 0.0
RAM 101764 101764 0 0.0
lighting-app rtl8777g FLASH 729680 729680 0 0.0
RAM 102044 102044 0 0.0
stm32 light STM32WB5MM-DK FLASH 478384 478384 0 0.0
RAM 141476 141476 0 0.0
telink light-app-ota-compress-lzma-shell-factory-data tl3218x FLASH 851748 851748 0 0.0
RAM 44332 44332 0 0.0
light-switch-app-ota-factory-data tl3218x_retention FLASH 731812 731812 0 0.0
RAM 33468 33468 0 0.0
all-devices-app tl7218x FLASH 813028 813028 0 0.0
RAM 97196 97196 0 0.0
bridge-app tl7218x FLASH 731392 731392 0 0.0
RAM 95864 95864 0 0.0
light-app-ota-compress-lzma-shell-factory-data tl7218x FLASH 843076 843076 0 0.0
RAM 99656 99656 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 731882 731882 0 0.0
RAM 55980 55980 0 0.0
all-devices-app tlsr9118bdk40d FLASH 606480 606480 0 0.0
RAM 120152 120152 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 614528 614528 0 0.0
RAM 118496 118496 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 841190 841194 4 0.0
RAM 97364 97364 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 795108 795108 0 0.0
RAM 75164 75164 0 0.0

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from e7af012 to da6e50a Compare May 30, 2026 05:36
Copilot AI review requested due to automatic review settings May 30, 2026 06:08
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from da6e50a to 1d9c0cf Compare May 30, 2026 06:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread src/platform/Darwin/dnssd/DnssdImpl.cpp Outdated
Comment on lines +413 to +434
if (!existingCtx->deferredTeardownScheduled || *existingCtx->consumerCounter != 0)
{
continue;
}
// Detect (and log) the case where a different upper-layer caller is
// re-binding to the same instance name with a different callback or
// context. We intentionally do not rebind, because doing so could lose
// an in-flight result delivery to the original caller. The upper layer
// (Discovery) routes results back through the same callback path keyed
// off instance name, so a multiplexed dispatcher is expected; but if
// the dispatch tuple differs, surface that as an error log so any
// breakage of that assumption is observable.
if (!reused && (existingCtx->callback != callback || existingCtx->context != context))
{
ChipLogError(Discovery,
"Mdns: Reusing deferred-teardown ResolveContext for %s with mismatched callback/context; "
"new caller will not receive results until the next Resolve",
StringOrNullMarker(name));
}
chip::DeviceLayer::SystemLayer().CancelTimer(OnResolveDeferredTeardown, existingCtx);
existingCtx->deferredTeardownScheduled = false;
(*existingCtx->consumerCounter)++;
Comment thread src/platform/Darwin/dnssd/DnssdImpl.cpp Outdated
Comment on lines +466 to +479
if (!existingCtx->deferredTeardownScheduled || *existingCtx->consumerCounter != 0)
{
continue;
}
if (!reused && existingCtx->context != delegate)
{
ChipLogError(Discovery,
"Mdns: Reusing deferred-teardown ResolveContext for %s with a different delegate; "
"new delegate will not receive results until the next Resolve",
StringOrNullMarker(name));
}
chip::DeviceLayer::SystemLayer().CancelTimer(OnResolveDeferredTeardown, existingCtx);
existingCtx->deferredTeardownScheduled = false;
(*existingCtx->consumerCounter)++;
Comment on lines +403 to +407
std::vector<GenericContext *> existingResolves;
MdnsContexts::GetInstance().FindAllMatchingPredicate(
[name](GenericContext * item) {
return item->type == ContextType::Resolve && static_cast<ResolveContext *>(item)->Matches(name);
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI: This is Claude) False positive. The lambda at L466-468 only gathers name-matching candidates; the for-loop body filters on the full request shape before rescuing. See L493-508: browseThatCausedResolve != nullptr, requestedInterfaceId != interfaceId, protocol != newProtocol, and requestedDomain != newDomain all continue, falling through to fresh-context allocation at L557-562. callback/context are rebound at L531-537 before the counter bump. type (service type) is fixed by the protocol family in DNS-SD-on-Darwin and is implied by the ResolveContext kind.

Comment on lines +456 to +460
std::vector<GenericContext *> existingResolves;
MdnsContexts::GetInstance().FindAllMatchingPredicate(
[name](GenericContext * item) {
return item->type == ContextType::Resolve && static_cast<ResolveContext *>(item)->Matches(name);
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI: This is Claude) False positive. Delegate overload's loop at L581-626 enforces the same scope filters: kind (L589 — only callback==nullptr siblings), browseThatCausedResolve (L594), requestedInterfaceId (L598), protocol (L602), and non-empty requestedDomain (L612). A different delegate, interface, or address family will fail one of these filters and fall through to fresh allocation at L644-648. Delegate is rebound at L630-635 before the counter bump.

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 3fb25d8 to e4ccef3 Compare May 30, 2026 06:25
Copilot AI review requested due to automatic review settings May 30, 2026 06:26
@woody-apple woody-apple requested review from Copilot and removed request for Copilot May 30, 2026 06:26
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from f9cd4ed to 036f9bf Compare May 30, 2026 06:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment on lines +489 to +494
{
ChipLogDetail(Discovery, "Mdns: Reusing deferred-teardown ResolveContext for %s with a new delegate",
StringOrNullMarker(name));
primaryCtx->context = delegate;
}
(*primaryCtx->consumerCounter)++;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI: This is Claude) False positive. The delegate overload's kind-mismatch guard at L588-592 (if (existingCtx->callback != nullptr) continue;) ensures we only ever rebind onto a context that was already constructed via the delegate ctor — and the delegate ctor leaves callback as nullptr. So when we hit L630 the existing callback is already null; there is nothing to clear. A callback-overload context can never be rescued by this path.

Comment on lines +411 to +412
[name](GenericContext * item) {
return item->type == ContextType::Resolve && static_cast<ResolveContext *>(item)->Matches(name);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI: This is Claude) False positive. The lambda at L466-468 is a name-only gather, but the loop body at L474-523 enforces the full scope before rescuing: kind (L482), browseThatCausedResolve (L493), requestedInterfaceId (L497), protocol from addressType (L501), requestedDomain (L505). Mismatch on any of these falls through to a fresh Platform::New<ResolveContext> at L558. The IPv4-then-IPv6 cancel/restart case you describe correctly skips the rescue.

Comment on lines +467 to +468
return item->type == ContextType::Resolve && static_cast<ResolveContext *>(item)->Matches(name);
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI: This is Claude) Duplicate of the comment above on the same line — same false positive. The loop at L581-626 filters by kind, browse origin, interface, protocol, and domain; mismatches fall through to fresh allocation at L644-648.

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 9130cd2 to 98f9f11 Compare May 30, 2026 07:29
Copilot AI review requested due to automatic review settings May 30, 2026 08:00
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 98f9f11 to 0957a4f Compare May 30, 2026 08:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 30, 2026 08:15
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 8854af0 to d2bf7c2 Compare May 30, 2026 08:15
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 941f38c to 43a6933 Compare May 30, 2026 08:46
@mergify mergify Bot removed the conflict label May 30, 2026
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 43a6933 to eee4b77 Compare May 30, 2026 08:48
Copilot AI review requested due to automatic review settings May 30, 2026 08:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from c9ff5c1 to a4a4ba6 Compare May 30, 2026 09:11
Copilot AI review requested due to automatic review settings May 30, 2026 09:14
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from a4a4ba6 to 69b5133 Compare May 30, 2026 09:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 30, 2026

PR #72273: Size comparison from 8a162c6 to 3d06e81

Full report (35 builds for bl602, bl616, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, esp32, nrfconnect, psoc6, qpg, realtek, stm32, telink)
platform target config section 8a162c6 3d06e81 change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1094216 1094216 0 0.0
RAM 144882 144882 0 0.0
bl616 lighting-app bl616+thread FLASH 1105524 1105524 0 0.0
RAM 104280 104280 0 0.0
bl616+wifi+shell FLASH 1593296 1593296 0 0.0
RAM 98176 98176 0 0.0
bl702 lighting-app bl702+eth FLASH 1057224 1057224 0 0.0
RAM 108509 108509 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 896020 896020 0 0.0
RAM 105884 105884 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 776944 776944 0 0.0
RAM 103388 103388 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 789640 789640 0 0.0
RAM 108676 108676 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 738772 738772 0 0.0
RAM 97596 97596 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 718928 718928 0 0.0
RAM 97636 97636 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 568690 568690 0 0.0
RAM 205056 205056 0 0.0
lock CC3235SF_LAUNCHXL FLASH 596178 596178 0 0.0
RAM 205256 205256 0 0.0
efr32 lock-app BRD4187C FLASH 994060 994060 0 0.0
RAM 131288 131288 0 0.0
BRD4338a FLASH 798629 798629 0 0.0
RAM 243424 243424 0 0.0
window-app BRD4187C FLASH 1100480 1100480 0 0.0
RAM 130360 130360 0 0.0
esp32 all-clusters-app c3devkit DRAM 99716 99716 0 0.0
FLASH 1621508 1621508 0 0.0
IRAM 94776 94776 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 834248 834248 0 0.0
RAM 157540 157540 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1733916 1733916 0 0.0
RAM 215260 215260 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1622708 1622708 0 0.0
RAM 211548 211548 0 0.0
light cy8ckit_062s2_43012 FLASH 1470196 1470196 0 0.0
RAM 197420 197420 0 0.0
lock cy8ckit_062s2_43012 FLASH 1503332 1503332 0 0.0
RAM 225252 225252 0 0.0
qpg lighting-app qpg6200+debug FLASH 845080 845080 0 0.0
RAM 127964 127964 0 0.0
lock-app qpg6200+debug FLASH 783812 783812 0 0.0
RAM 118912 118912 0 0.0
realtek light-switch-app rtl8777g FLASH 688624 688624 0 0.0
RAM 101764 101764 0 0.0
lighting-app rtl8777g FLASH 729680 729680 0 0.0
RAM 102044 102044 0 0.0
stm32 light STM32WB5MM-DK FLASH 478384 478384 0 0.0
RAM 141476 141476 0 0.0
telink all-devices-app tl7218x FLASH 813028 813028 0 0.0
RAM 97196 97196 0 0.0
tlsr9118bdk40d FLASH 606480 606480 0 0.0
RAM 120152 120152 0 0.0
bridge-app tl7218x FLASH 731392 731392 0 0.0
RAM 95864 95864 0 0.0
light-app-ota-compress-lzma-shell-factory-data tl3218x FLASH 851748 851748 0 0.0
RAM 44332 44332 0 0.0
tl7218x FLASH 843076 843076 0 0.0
RAM 99656 99656 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 731882 731882 0 0.0
RAM 55980 55980 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 795108 795108 0 0.0
RAM 75164 75164 0 0.0
light-switch-app-ota-factory-data tl3218x_retention FLASH 731812 731812 0 0.0
RAM 33468 33468 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 614528 614528 0 0.0
RAM 118496 118496 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 841190 841194 4 0.0
RAM 97364 97364 0 0.0

Copilot AI review requested due to automatic review settings May 30, 2026 18:17
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 2867bd3 to b1cd0bb Compare May 30, 2026 18:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 30, 2026 18:43
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 9bf56a9 to 9b317d8 Compare May 30, 2026 18:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 30, 2026

PR #72273: Size comparison from 8a162c6 to aca125d

Full report (35 builds for bl602, bl616, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, esp32, nrfconnect, psoc6, qpg, realtek, stm32, telink)
platform target config section 8a162c6 aca125d change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1094216 1094216 0 0.0
RAM 144882 144882 0 0.0
bl616 lighting-app bl616+thread FLASH 1105524 1105524 0 0.0
RAM 104280 104280 0 0.0
bl616+wifi+shell FLASH 1593296 1593296 0 0.0
RAM 98176 98176 0 0.0
bl702 lighting-app bl702+eth FLASH 1057224 1057224 0 0.0
RAM 108509 108509 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 896020 896020 0 0.0
RAM 105884 105884 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 776944 776944 0 0.0
RAM 103388 103388 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 789640 789640 0 0.0
RAM 108676 108676 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 738772 738772 0 0.0
RAM 97596 97596 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 718928 718928 0 0.0
RAM 97636 97636 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 568690 568690 0 0.0
RAM 205056 205056 0 0.0
lock CC3235SF_LAUNCHXL FLASH 596178 596178 0 0.0
RAM 205256 205256 0 0.0
efr32 lock-app BRD4187C FLASH 994060 994060 0 0.0
RAM 131288 131288 0 0.0
BRD4338a FLASH 798629 798629 0 0.0
RAM 243424 243424 0 0.0
window-app BRD4187C FLASH 1100480 1100480 0 0.0
RAM 130360 130360 0 0.0
esp32 all-clusters-app c3devkit DRAM 99716 99716 0 0.0
FLASH 1621508 1621508 0 0.0
IRAM 94776 94776 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 834248 834248 0 0.0
RAM 157540 157540 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1733916 1733916 0 0.0
RAM 215260 215260 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1622708 1622708 0 0.0
RAM 211548 211548 0 0.0
light cy8ckit_062s2_43012 FLASH 1470196 1470196 0 0.0
RAM 197420 197420 0 0.0
lock cy8ckit_062s2_43012 FLASH 1503332 1503332 0 0.0
RAM 225252 225252 0 0.0
qpg lighting-app qpg6200+debug FLASH 845080 845080 0 0.0
RAM 127964 127964 0 0.0
lock-app qpg6200+debug FLASH 783812 783812 0 0.0
RAM 118912 118912 0 0.0
realtek light-switch-app rtl8777g FLASH 688624 688624 0 0.0
RAM 101764 101764 0 0.0
lighting-app rtl8777g FLASH 729680 729680 0 0.0
RAM 102044 102044 0 0.0
stm32 light STM32WB5MM-DK FLASH 478384 478384 0 0.0
RAM 141476 141476 0 0.0
telink all-devices-app tl7218x FLASH 813028 813028 0 0.0
RAM 97196 97196 0 0.0
tlsr9118bdk40d FLASH 606480 606480 0 0.0
RAM 120152 120152 0 0.0
bridge-app tl7218x FLASH 731392 731392 0 0.0
RAM 95864 95864 0 0.0
light-app-ota-compress-lzma-shell-factory-data tl3218x FLASH 851748 851748 0 0.0
RAM 44332 44332 0 0.0
tl7218x FLASH 843076 843076 0 0.0
RAM 99656 99656 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 731882 731882 0 0.0
RAM 55980 55980 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 795108 795108 0 0.0
RAM 75164 75164 0 0.0
light-switch-app-ota-factory-data tl3218x_retention FLASH 731812 731812 0 0.0
RAM 33468 33468 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 614528 614528 0 0.0
RAM 118496 118496 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 841190 841194 4 0.0
RAM 97364 97364 0 0.0

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch 2 times, most recently from 42c1b92 to d2399e1 Compare May 31, 2026 18:11
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 31, 2026

PR #72273: Size comparison from 8a162c6 to d2399e1

Full report (31 builds for bl602, bl616, bl702, bl702l, cc13x4_26x4, cc32xx, nrfconnect, psoc6, qpg, realtek, stm32, telink)
platform target config section 8a162c6 d2399e1 change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1094216 1094216 0 0.0
RAM 144882 144882 0 0.0
bl616 lighting-app bl616+thread FLASH 1105524 1105524 0 0.0
RAM 104280 104280 0 0.0
bl616+wifi+shell FLASH 1593296 1593296 0 0.0
RAM 98176 98176 0 0.0
bl702 lighting-app bl702+eth FLASH 1057224 1057224 0 0.0
RAM 108509 108509 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 896020 896020 0 0.0
RAM 105884 105884 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 776944 776944 0 0.0
RAM 103388 103388 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 789640 789640 0 0.0
RAM 108676 108676 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 738772 738772 0 0.0
RAM 97596 97596 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 718928 718928 0 0.0
RAM 97636 97636 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 568690 568690 0 0.0
RAM 205056 205056 0 0.0
lock CC3235SF_LAUNCHXL FLASH 596178 596178 0 0.0
RAM 205256 205256 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 834248 834248 0 0.0
RAM 157540 157540 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1733916 1733916 0 0.0
RAM 215260 215260 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1622708 1622708 0 0.0
RAM 211548 211548 0 0.0
light cy8ckit_062s2_43012 FLASH 1470196 1470196 0 0.0
RAM 197420 197420 0 0.0
lock cy8ckit_062s2_43012 FLASH 1503332 1503332 0 0.0
RAM 225252 225252 0 0.0
qpg lighting-app qpg6200+debug FLASH 845080 845080 0 0.0
RAM 127964 127964 0 0.0
lock-app qpg6200+debug FLASH 783812 783812 0 0.0
RAM 118912 118912 0 0.0
realtek light-switch-app rtl8777g FLASH 688624 688624 0 0.0
RAM 101764 101764 0 0.0
lighting-app rtl8777g FLASH 729680 729680 0 0.0
RAM 102044 102044 0 0.0
stm32 light STM32WB5MM-DK FLASH 478384 478384 0 0.0
RAM 141476 141476 0 0.0
telink all-devices-app tl7218x FLASH 813028 813028 0 0.0
RAM 97196 97196 0 0.0
tlsr9118bdk40d FLASH 606480 606480 0 0.0
RAM 120152 120152 0 0.0
bridge-app tl7218x FLASH 731392 731392 0 0.0
RAM 95864 95864 0 0.0
light-app-ota-compress-lzma-shell-factory-data tl3218x FLASH 851748 851748 0 0.0
RAM 44332 44332 0 0.0
tl7218x FLASH 843076 843076 0 0.0
RAM 99656 99656 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 731882 731882 0 0.0
RAM 55980 55980 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 795108 795108 0 0.0
RAM 75164 75164 0 0.0
light-switch-app-ota-factory-data tl3218x_retention FLASH 731812 731812 0 0.0
RAM 33468 33468 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 614528 614528 0 0.0
RAM 118496 118496 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 841190 841194 4 0.0
RAM 97364 97364 0 0.0

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from d2399e1 to 42a21f2 Compare May 31, 2026 19:52
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 31, 2026

PR #72273: Size comparison from 8a162c6 to 42a21f2

Full report (35 builds for bl602, bl616, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, esp32, nrfconnect, psoc6, qpg, realtek, stm32, telink)
platform target config section 8a162c6 42a21f2 change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1094216 1094216 0 0.0
RAM 144882 144882 0 0.0
bl616 lighting-app bl616+thread FLASH 1105524 1105524 0 0.0
RAM 104280 104280 0 0.0
bl616+wifi+shell FLASH 1593296 1593296 0 0.0
RAM 98176 98176 0 0.0
bl702 lighting-app bl702+eth FLASH 1057224 1057224 0 0.0
RAM 108509 108509 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 896020 896020 0 0.0
RAM 105884 105884 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 776944 776944 0 0.0
RAM 103388 103388 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 789640 789640 0 0.0
RAM 108676 108676 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 738772 738772 0 0.0
RAM 97596 97596 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 718928 718928 0 0.0
RAM 97636 97636 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 568690 568690 0 0.0
RAM 205056 205056 0 0.0
lock CC3235SF_LAUNCHXL FLASH 596178 596178 0 0.0
RAM 205256 205256 0 0.0
efr32 lock-app BRD4187C FLASH 994060 994060 0 0.0
RAM 131288 131288 0 0.0
BRD4338a FLASH 798629 798629 0 0.0
RAM 243424 243424 0 0.0
window-app BRD4187C FLASH 1100480 1100480 0 0.0
RAM 130360 130360 0 0.0
esp32 all-clusters-app c3devkit DRAM 99716 99716 0 0.0
FLASH 1621508 1621508 0 0.0
IRAM 94776 94776 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 834248 834248 0 0.0
RAM 157540 157540 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1733916 1733916 0 0.0
RAM 215260 215260 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1622708 1622708 0 0.0
RAM 211548 211548 0 0.0
light cy8ckit_062s2_43012 FLASH 1470196 1470196 0 0.0
RAM 197420 197420 0 0.0
lock cy8ckit_062s2_43012 FLASH 1503332 1503332 0 0.0
RAM 225252 225252 0 0.0
qpg lighting-app qpg6200+debug FLASH 845080 845080 0 0.0
RAM 127964 127964 0 0.0
lock-app qpg6200+debug FLASH 783812 783812 0 0.0
RAM 118912 118912 0 0.0
realtek light-switch-app rtl8777g FLASH 688624 688624 0 0.0
RAM 101764 101764 0 0.0
lighting-app rtl8777g FLASH 729680 729680 0 0.0
RAM 102044 102044 0 0.0
stm32 light STM32WB5MM-DK FLASH 478384 478384 0 0.0
RAM 141476 141476 0 0.0
telink all-devices-app tl7218x FLASH 813028 813028 0 0.0
RAM 97196 97196 0 0.0
tlsr9118bdk40d FLASH 606480 606480 0 0.0
RAM 120152 120152 0 0.0
bridge-app tl7218x FLASH 731392 731392 0 0.0
RAM 95864 95864 0 0.0
light-app-ota-compress-lzma-shell-factory-data tl3218x FLASH 851748 851748 0 0.0
RAM 44332 44332 0 0.0
tl7218x FLASH 843076 843076 0 0.0
RAM 99656 99656 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 731882 731882 0 0.0
RAM 55980 55980 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 795108 795108 0 0.0
RAM 75164 75164 0 0.0
light-switch-app-ota-factory-data tl3218x_retention FLASH 731812 731812 0 0.0
RAM 33468 33468 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 614528 614528 0 0.0
RAM 118496 118496 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 841190 841194 4 0.0
RAM 97364 97364 0 0.0

@woody-apple woody-apple force-pushed the dev/woody/176263876 branch 2 times, most recently from 87cf9ee to 0078837 Compare June 1, 2026 00:38
…en't dropped

User-visible failure: on Darwin, every reconnect to a Matter node after
ChipDnssdResolveNoLongerNeeded shows ~1s of extra NodeID-resolve latency.
Inbound mDNS resolve answers already queued on the dnssd socket get discarded,
so the next resolve has to start from scratch.

Root cause: when the consumer counter drops to zero we immediately call
Finalize -> DNSServiceRefDeallocate. Per the dnssd contract,
DNSServiceRefDeallocate discards any events queued on that connection but not
yet read. A second observation from the mDNS owner is that "starting and
stopping queries doesn't query harder" -- a tight cancel-then-restart for the
same instance name is strictly worse than letting the existing query run.

Fix: introduce a per-ResolveContext deferred-teardown window (default 500ms)
before the actual DNSServiceRefDeallocate. Inside the window: a queued read
indicator dispatches the result through DispatchSuccess (which cancels the
timer); a new ChipDnssdResolve for the same instance name reuses the existing
context and bumps the counter back to 1, skipping DNSServiceCreateConnection /
DNSServiceResolve entirely; otherwise the timer fires
OnResolveDeferredTeardown -> Finalize(CHIP_ERROR_CANCELLED), preserving the
existing failure-path contract upper layers rely on.

Carve-out: delegate-based ResolveContexts (callback == nullptr, used by
MTRCommissionableBrowser) are NOT subject to deferred teardown. The browser
churns OnBrowseAdd/OnBrowseRemove for the same instance name on the order of
microseconds while a device is being discovered; holding the underlying
DNSServiceRef alive across that churn starves DNSServiceGetAddrInfo of a
chance to deliver before the next remove arrives, which manifested as
MTRCommissionableBrowserTests/test005 timing out under TSAN. The NodeID
reconnect bug this PR fixes is on the callback-based path.

Blast radius is confined to the Darwin dnssd platform layer.

Tests in src/platform/tests/TestDnssd.cpp pin:
- ReusesContextWithinDeferredWindow (callback-based coalescing)
- DelegateBasedResolveIsNotDeferred (delegate-based synchronous teardown)
- CancelStillPropagatesIfNoInFlightResult (timer fires once if no follow-up)
- Multi-sibling rescue, scope mismatch refusal, mismatched-callback rebind,
  shared-counter invariants, repeated toggle within window, etc.

rdar://176263876
@woody-apple woody-apple force-pushed the dev/woody/176263876 branch from 0078837 to 6508f97 Compare June 1, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants