fakeip: persist metadata on every save interval, not just on Close#4140
Open
arthur109 wants to merge 93 commits into
Open
fakeip: persist metadata on every save interval, not just on Close#4140arthur109 wants to merge 93 commits into
arthur109 wants to merge 93 commits into
Conversation
`SecTrustEvaluateWithError` is serial
This reverts commit 62cb06c.
Signed-off-by: macronut <4027187+macronut@users.noreply.github.com>
FakeIPSaveMetadataAsync had two bugs that compound: 1. The 10s timer is debounced: every call invokes .Reset(), pushing the deadline another 10s. Under any active workload (every new domain triggers an allocation triggers a save call) the timer never fires — the only path that ever writes metadata is Close(). 2. Even if the timer fired, the closure captures the metadata pointer from the FIRST call. Subsequent calls only Reset() the timer; the closure is never updated. A delayed fire would persist a stale snapshot from the start of the session. Combined, the on-disk fakeip counter only advances on clean shutdown. On mobile that almost never happens (process kill, OOM, phone reboot), so the counter stays at whatever the last clean Close() wrote while the buckets accumulate well past it. Because Store.Create() doesn't check whether the next IP is already allocated — it just calls FakeIPStore which silently overwrites — the next start loads the stale counter and silently clobbers the reverse-map entries between (counter, actual_max]. Forward-map (fakeip_domain*) still points to those IPs for the old domains, so any app that cached the previous DNS answer (real-world: Instagram, hours of TTL) hits a fake IP that now reverse-maps to a different domain → router dials the wrong outbound → TLS cert mismatch → those hosts break while everything else looks fine. Fix: track the latest metadata in a mutex-protected field on CacheFile, let the timer fire on its own schedule (no Reset), and on fire snapshot the latest pointer and clear the timer so the next allocation reschedules. Metadata now tracks reality within one FakeIPMetadataSaveInterval (10s) of any allocation activity. Verified by reproducing on Android: pre-patch, counter on disk stuck at .6 while buckets held .2–.40. Post-patch, counter advances within ~10s of new allocations and survives force-kill / restart cycles with the saved value matching the bucket max, so the next allocation picks an unused IP instead of overwriting an existing one.
abac453 to
bf9ea6d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
(*CacheFile).FakeIPSaveMetadataAsynchas two bugs that compound to make the on-disk fakeip allocation counter advance only whenClose()runs. On mobile (AndroidBoxServicekilled by OOM /am force-stop/ phone reboot before clean teardown) the counter on disk falls behind reality. The next start loads the stale counter and the allocator inStore.Create()silently overwrites existing reverse-map entries because it doesn't check for collisions before storing.The two bugs
Bug A — timer never fires under load.
Create()calls this on every allocation. Every call after the first goes through.Reset(), pushing the deadline another 10 s. Any active workload (continuously resolving new domains) keeps the timer alive forever.Bug B — even if it fires, it persists the wrong data. The first call's
metadatapointer is captured by the closure. Subsequent calls construct freshFakeIPMetadata{...}values and pass them in, but only the timer is reset — the closure is never replaced. A delayed fire would write the very first snapshot of the session.Together, the only code path that ever writes correct metadata is
Store.Close() → CacheFile.FakeIPSaveMetadata(synchronous, with current values).Why it matters
Store.Create()does not check whether the proposed next IP is already infakeip_address:So when
Start()restoresinet4Currentfrom stale metadata, the first ~N allocations of the new session overwrite the reverse-map entries for IPs[stale_counter+1, actual_bucket_max]. The forward map (fakeip_domain*) still points to those IPs for the old domains, so any app that cached the prior DNS answer keeps connecting to fake IPs whose reverse-map now resolves to a different domain. The router dials the wrong outbound → TLS handshake fails with a cert mismatch → the affected hosts break, while everything else (newly-allocated or unaffected) keeps working.Reproduced this in production on Android with Instagram: the file's
fakeip_metadatahadInet4Current = 198.18.0.6whilefakeip_addressheld entries up to198.18.0.40. After restart, ~34 allocations clobbered existing entries before the counter caught up. Profile pictures and chats kept working (those endpoints didn't get clobbered); reels, stories, posts, profile pages failed (their endpoints got reverse-map rewritten).Fix
Track the latest metadata in a mutex-protected field on
CacheFile, let the timer fire on its own schedule (noReset), and on fire snapshot the latest pointer and clear the timer so the next allocation reschedules.Two new fields on
CacheFile:saveMetadataAccess sync.MutexandlatestFakeIPMetadata *adapter.FakeIPMetadata. Total: 12 added lines, 3 removed.Behaviour after the patch
FakeIPMetadataSaveIntervalwindow all see their latest metadata captured. The timer fires at most one save per interval; subsequent allocations reschedule a fresh interval.FakeIPMetadataSaveInterval(10 s by default) under continuous load, instead of being unbounded.Close()semantics are unchanged.Verification
Tested end-to-end on a Samsung Android device running an embedded libbox:
am force-stopcycles198.18.0.6198.18.0.40198.18.0.12198.18.0.12.13am force-stop198.18.0.12198.18.0.12198.18.0.33198.18.0.33Test plan
go build ./experimental/cachefile/)am force-stopbucket_max + 1, no collisions🤖 Generated with Claude Code