fakeip: persist metadata on every save interval, not just on Close by arthur109 · Pull Request #4140 · SagerNet/sing-box

arthur109 · 2026-05-14T21:42:47Z

Summary

(*CacheFile).FakeIPSaveMetadataAsync has two bugs that compound to make the on-disk fakeip allocation counter advance only when Close() runs. On mobile (Android BoxService killed by OOM / am force-stop / phone reboot before clean teardown) the counter on disk falls behind reality. The next start loads the stale counter and the allocator in Store.Create() silently overwrites existing reverse-map entries because it doesn't check for collisions before storing.

The two bugs

// experimental/cachefile/fakeip.go (before)
func (c *CacheFile) FakeIPSaveMetadataAsync(metadata *adapter.FakeIPMetadata) {
    if c.saveMetadataTimer == nil {
        c.saveMetadataTimer = time.AfterFunc(C.FakeIPMetadataSaveInterval, func() {
            _ = c.FakeIPSaveMetadata(metadata)   // captures FIRST metadata
        })
    } else {
        c.saveMetadataTimer.Reset(C.FakeIPMetadataSaveInterval)   // never updates the closure
    }
}

Bug A — timer never fires under load. Create() calls this on every allocation. Every call after the first goes through .Reset(), pushing the deadline another 10 s. Any active workload (continuously resolving new domains) keeps the timer alive forever.

Bug B — even if it fires, it persists the wrong data. The first call's metadata pointer is captured by the closure. Subsequent calls construct fresh FakeIPMetadata{...} values and pass them in, but only the timer is reset — the closure is never replaced. A delayed fire would write the very first snapshot of the session.

Together, the only code path that ever writes correct metadata is Store.Close() → CacheFile.FakeIPSaveMetadata (synchronous, with current values).

Why it matters

Store.Create() does not check whether the proposed next IP is already in fakeip_address:

nextAddress := s.inet4Current.Next()
// ... range / wrap check ...
s.inet4Current = nextAddress
err := s.storage.FakeIPStore(address, domain)   // overwrites silently

So when Start() restores inet4Current from stale metadata, the first ~N allocations of the new session overwrite the reverse-map entries for IPs [stale_counter+1, actual_bucket_max]. The forward map (fakeip_domain*) still points to those IPs for the old domains, so any app that cached the prior DNS answer keeps connecting to fake IPs whose reverse-map now resolves to a different domain. The router dials the wrong outbound → TLS handshake fails with a cert mismatch → the affected hosts break, while everything else (newly-allocated or unaffected) keeps working.

Reproduced this in production on Android with Instagram: the file's fakeip_metadata had Inet4Current = 198.18.0.6 while fakeip_address held entries up to 198.18.0.40. After restart, ~34 allocations clobbered existing entries before the counter caught up. Profile pictures and chats kept working (those endpoints didn't get clobbered); reels, stories, posts, profile pages failed (their endpoints got reverse-map rewritten).

Fix

Track the latest metadata in a mutex-protected field on CacheFile, let the timer fire on its own schedule (no Reset), and on fire snapshot the latest pointer and clear the timer so the next allocation reschedules.

// after
func (c *CacheFile) FakeIPSaveMetadataAsync(metadata *adapter.FakeIPMetadata) {
    c.saveMetadataAccess.Lock()
    c.latestFakeIPMetadata = metadata
    if c.saveMetadataTimer == nil {
        c.saveMetadataTimer = time.AfterFunc(C.FakeIPMetadataSaveInterval, func() {
            c.saveMetadataAccess.Lock()
            m := c.latestFakeIPMetadata
            c.saveMetadataTimer = nil
            c.saveMetadataAccess.Unlock()
            if m != nil {
                _ = c.FakeIPSaveMetadata(m)
            }
        })
    }
    c.saveMetadataAccess.Unlock()
}

Two new fields on CacheFile: saveMetadataAccess sync.Mutex and latestFakeIPMetadata *adapter.FakeIPMetadata. Total: 12 added lines, 3 removed.

Behaviour after the patch

Allocations within a FakeIPMetadataSaveInterval window all see their latest metadata captured. The timer fires at most one save per interval; subsequent allocations reschedule a fresh interval.
Metadata on disk now lags reality by at most FakeIPMetadataSaveInterval (10 s by default) under continuous load, instead of being unbounded.
Close() semantics are unchanged.
No new goroutines, no busy-looping, no behavioural regression for users on platforms with reliable clean shutdown.

Verification

Tested end-to-end on a Samsung Android device running an embedded libbox:

Scenario	counter on disk	bucket max	result
Pre-patch, after `am force-stop` cycles	`198.18.0.6`	`198.18.0.40`	next start: ~34 silent overwrites → IG breaks
Post-patch, idle session	`198.18.0.12`	`198.18.0.12`	matches; next allocation lands at `.13`
Post-patch, after `am force-stop`	`198.18.0.12`	`198.18.0.12`	survives unclean shutdown
Post-patch, after real phone reboot + new IG session	`198.18.0.33`	`198.18.0.33`	survives reboot; IG loads reels/stories/profiles

Test plan

Compiles (go build ./experimental/cachefile/)
Reproduced bug pre-patch on Android (Instagram broke after phone reboot)
Post-patch: metadata advances within ~10 s of allocations
Post-patch: metadata survives am force-stop
Post-patch: metadata survives full phone reboot
Post-patch: new allocations after restart land at bucket_max + 1, no collisions

🤖 Generated with Claude Code

`SecTrustEvaluateWithError` is serial

…rval backoff

This reverts commit 62cb06c.

Signed-off-by: macronut <4027187+macronut@users.noreply.github.com>

FakeIPSaveMetadataAsync had two bugs that compound: 1. The 10s timer is debounced: every call invokes .Reset(), pushing the deadline another 10s. Under any active workload (every new domain triggers an allocation triggers a save call) the timer never fires — the only path that ever writes metadata is Close(). 2. Even if the timer fired, the closure captures the metadata pointer from the FIRST call. Subsequent calls only Reset() the timer; the closure is never updated. A delayed fire would persist a stale snapshot from the start of the session. Combined, the on-disk fakeip counter only advances on clean shutdown. On mobile that almost never happens (process kill, OOM, phone reboot), so the counter stays at whatever the last clean Close() wrote while the buckets accumulate well past it. Because Store.Create() doesn't check whether the next IP is already allocated — it just calls FakeIPStore which silently overwrites — the next start loads the stale counter and silently clobbers the reverse-map entries between (counter, actual_max]. Forward-map (fakeip_domain*) still points to those IPs for the old domains, so any app that cached the previous DNS answer (real-world: Instagram, hours of TTL) hits a fake IP that now reverse-maps to a different domain → router dials the wrong outbound → TLS cert mismatch → those hosts break while everything else looks fine. Fix: track the latest metadata in a mutex-protected field on CacheFile, let the timer fire on its own schedule (no Reset), and on fire snapshot the latest pointer and clear the timer so the next allocation reschedules. Metadata now tracks reality within one FakeIPMetadataSaveInterval (10s) of any allocation activity. Verified by reproducing on Android: pre-patch, counter on disk stuck at .6 while buckets held .2–.40. Post-patch, counter advances within ~10s of new allocations and survives force-kill / restart cycles with the saved value matching the bucket max, so the next allocation picks an unused IP instead of overwriting an existing one.

nekohasekai added 30 commits April 28, 2026 08:04

Add MAC and hostname rule items

d3575cc

Add Android support for MAC and hostname rule items

c57e864

Add macOS support for MAC and hostname rule items

1c02d7e

documentation: Update descriptions for neighbor rules

4f6d0ff

Refactor ACME support to certificate provider

83fa58f

Add BBR profile and hop interval randomization for Hysteria2

0155352

platform: Add OOM Report & Crash Report

26ddb92

Also enable certificate store by default on Apple platforms

ff94634

`SecTrustEvaluateWithError` is serial

Add evaluate DNS rule action and related rule items

e1a7ab3

platform: Fix set local

9e8f13c

Fix deprecated warning double-formatting on localized clients

0f6d110

oom-killer: Free memory on pressure notification and use gradual inte…

5ee373c

…rval backoff

tools: Network Quality & STUN

0b8f380

platform: Fix darwin signal handler

ccc2742

tools: Tailscale status

ce6d683

Revert "Also enable certificate store by default on Apple platforms"

a7b02f9

This reverts commit 62cb06c.

Fix rules lock

ec75e5e

Fix darwin local DNS transport

eade677

tools: Tailscale status

524578a

Un-deprecate ip_accept_any DNS rule item

e75e1c9

documentation: Fixes

e714efa

Add package_name_regex route, DNS and headless rule item

684ba79

platform: Wrap command RPC error returns with E.Cause

012db0e

Fix lint errors

5779b46

Add cloudflared inbound

08260fa

documentation: Fix missing update for ip_version and query_type

3828b4f

Fix stun test

104c7ae

Fix darwin cgo DNS again

8fb0191

Fix tailscale error

e6f5c24

Add optimistic DNS cache

0319b22

macronut and others added 22 commits May 2, 2026 23:07

Add more spoof method

48c65a2

Signed-off-by: macronut <4027187+macronut@users.noreply.github.com>

Bump version

9ee56ae

release: Add replace_macos_standalone make target

0ee7592

Fix cronet close and crash

31252a7

dns: Fix deadline

228eb2d

Bump version

4807ee9

Fix reset network

3e59917

Fix missing deadline for naive

e0c137e

Skip kickWriteHandshake for server first protocols

18f1056

Add hysteria2 realm service and support

34c90e6

Update hysteria2 realm

2e1a7a5

dns: Fix conn pool leak

7b3a1de

cronet: Fix start cleanup

f059203

dns: Refactor reordered pool

8875b52

Bump version

056c45c

Fix naive inbound close

6af341a

Fix TLS server close

96edb9a

Fix tailscale crash at start

f201569

realm: Add stun retry and lazy server start

cad8997

Fix hysteria2 realm server

44033f9

Bump version

b4d2d89

nekohasekai force-pushed the testing branch 8 times, most recently from abac453 to bf9ea6d Compare May 21, 2026 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fakeip: persist metadata on every save interval, not just on Close#4140

fakeip: persist metadata on every save interval, not just on Close#4140
arthur109 wants to merge 93 commits into
SagerNet:testingfrom
arthur109:fakeip-metadata-async-save

arthur109 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

arthur109 commented May 14, 2026

Summary

The two bugs

Why it matters

Fix

Behaviour after the patch

Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants