Skip to content

refactor: LND SubscribeState EOF-based shutdown and optimize Neutrino peer selection#3964

Open
ajaysehwal wants to merge 13 commits into
ZeusLN:masterfrom
ajaysehwal:refactor-lnd-process
Open

refactor: LND SubscribeState EOF-based shutdown and optimize Neutrino peer selection#3964
ajaysehwal wants to merge 13 commits into
ZeusLN:masterfrom
ajaysehwal:refactor-lnd-process

Conversation

@ajaysehwal
Copy link
Copy Markdown
Contributor

@ajaysehwal ajaysehwal commented Apr 10, 2026

Description

Please enter a description and screenshots, if appropriate, of the work covered in this PR

This PR makes embedded LND startup and shutdown event-driven instead of poll-and-sleep: we wait for the SubscribeState gRPC stream to end (EOF) after stopLnd, so we know the daemon is fully down before starting again. Native code on iOS and Android starts SubscribeState as soon as LND reports it has started, before the JS startLnd promise resolves, so the first state update is not lost.
It also tightens RPC error parsing on both platforms, hardens sync/recovery against transient RPC errors, and speeds up Neutrino peer selection with batched pings and single-pass tiered filtering

What changed

  1. Shutdown: SubscribeState EOF instead of status polling and retrying
    JS stopLnd registers a listener for SubscribeState, calls stop/kill, then awaits EOF (stream closed) as confirmation that LND has stopped.
    Removes the previous retry loop around checkStatus() and related fixed post-stop sleeps on Android/iOS.
  2. Startup: native SubscribeState before JS promise resolves
    Android (LndMobileService): after Lndmobile.start succeeds, invoke SubscribeState via streamMethods and LndStateStreamCallback, then send MSG_START_LND_RESULT. On stream error/close, remove SubscribeState from streamsStarted so a later start can subscribe again.
    iOS (Lnd / LndMobile): subscribeToStateChanges tracks activeStreams so duplicate subscriptions are skipped; cleanup on error/EOF matches Android behavior.
    JS (LndMobileUtils): register the SubscribeState listener before calling startLnd; rely on native stream start and drop the extra subscribeState() + settle sleeps from the old path.
  3. “Already running” recovery
    On LND_ALREADY_RUNNING, stopLnd and wait for EOF, then retry start without the long wallet-recovery polling loop and platform-specific cleanup delays that existed before.
  4. RPC errors and sync robustness
    Android / iOS: central regex parsing for gRPC-style messages (code = … desc = …) into error_code / error_desc bundles/events (aligned across platforms).
    LndMobile.swift: use self.lndGrpcErrorCodeAndDesc inside escaping closures so the project builds with strict capture rules.
    LndMobileErrors: treat WALLET_LOCKED as transient where appropriate; tests extended for RPC_NOT_READY / wallet-locked style messages.
    SyncStore / NodeInfoStore: handle RPC_NOT_READY, RPC_CONNECTION_CLOSED, and avoid unhandled promise rejections during sync/recovery and reactive getNodeInfo.
  5. Neutrino peer selection
    Ping candidates in batches (limited concurrency=3).
    Single-pass tiered selection (optimal → lax → threshold) with deduplication via a set, preserving the same latency thresholds and target peer count.
  6. Android: SubscribeState registration guard
    Only add SubscribeState to streamsStarted when the reflected Method is non-null and we are about to invoke, so a missing registration does not block a later JS streamOnlyOnce subscription.

How to test

  • Cold open wallet: LND starts, unlock/sync behave as before; no missing first SubscribeState state.
  • Switch wallet or restart LND: no stuck “already running”; shutdown completes before next start.
  • flaky RPC: sync/recovery back off or stop cleanly without unhandled rejections.
  • Settings or flows that optimize Neutrino peers: completes in reasonable time; peer list still respects thresholds.
  • iOS: clean build (Swift closure / self rules).
  • Android: log shows no spurious SubscribeState stuck in streamsStarted if reflection were ever missing (should not happen in normal builds).

This pull request is categorized as a:

  • New feature
  • Bug fix
  • Code refactor
  • Configuration change
  • Locales update
  • Quality assurance
  • Other

Checklist

  • I’ve run yarn run tsc and made sure my code compiles correctly
  • I’ve run yarn run lint and made sure my code didn’t contain any problematic patterns
  • I’ve run yarn run prettier and made sure my code is formatted correctly
  • I’ve run yarn run test and made sure all of the tests pass

Testing

If you modified or added a utility file, did you add new unit tests?

  • No, I’m a fool
  • Yes
  • N/A

I have tested this PR on the following platforms (please specify OS version and phone model/VM):

  • Android
  • iOS

I have tested this PR with the following types of nodes (please specify node version and API version where appropriate):

On-device

  • LDK Node
  • Embedded LND

Remote

  • LND (REST)
  • LND (Lightning Node Connect)
  • Core Lightning (CLNRest)
  • Nostr Wallet Connect
  • LndHub

Locales

  • I’ve added new locale text that requires translations
  • I’m aware that new translations should be made on the ZEUS Transfix page and not directly to this repo

Third Party Dependencies and Packages

  • Contributors will need to run yarn after this PR is merged in
  • 3rd party dependencies have been modified:
    • verify that package.json and yarn.lock have been properly updated
    • verify that dependencies are installed for both iOS and Android platforms

Other:

  • Changes were made that require an update to the README
  • Changes were made that require an update to onboarding

@ajaysehwal ajaysehwal marked this pull request as draft April 10, 2026 08:38
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the LND lifecycle management to an event-driven approach by utilizing the SubscribeState gRPC stream for both Android and iOS. Key changes include replacing polling-based shutdown detection with EOF signals from the state stream, registering state listeners before starting LND to prevent missing initial events, and refactoring the Neutrino peer optimization logic. Additionally, several asynchronous calls across the stores and views were updated with proper error handling to avoid unhandled promise rejections. Feedback was provided regarding the robustness of error message parsing in both Java and Swift, suggesting the use of regular expressions over manual string manipulation to handle potential format changes.

Comment thread android/app/src/main/java/com/zeus/LndMobileService.java Outdated
Comment thread ios/LndMobile/LndMobile.swift Outdated
@ajaysehwal ajaysehwal changed the title refactor: LND SubscribeState EOF-based shutdown and optimized Neutrino peer selection refactor: LND SubscribeState EOF-based shutdown and optimize Neutrino peer selection Apr 10, 2026
@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch from a14f54a to 00f71e8 Compare April 13, 2026 15:42
@ajaysehwal
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the robustness of LND lifecycle management by transitioning from polling-based mechanisms to event-driven ones. Key changes include the native-side automatic initiation of the SubscribeState stream on both Android and iOS, ensuring that state events are captured immediately upon startup. The stopLnd function has been refactored to wait for a gRPC EOF signal rather than polling status, and error handling has been unified with regex-based parsing of gRPC status messages. Additionally, the PR optimizes Neutrino peer selection through concurrent pings and adds comprehensive promise rejection handling across the TypeScript stores and views. I have no feedback to provide as the implementation follows best practices for asynchronous coordination and native-to-JS communication.

@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch from 3581eb2 to e5ecd21 Compare April 14, 2026 09:49
@ajaysehwal ajaysehwal marked this pull request as ready for review April 14, 2026 10:10
@myxmaster
Copy link
Copy Markdown
Collaborator

I tested this and wasn't able to start the embedded LND. I didn't look into the details myself, but tried a quick AI assisted analysis, this is the outcome, hope it helps:

Observed regression: normal startup hangs indefinitely, restart loop

Tested on Android (Samsung Galaxy S20+, Android 13). When trying to start the embedded LND, "ZEUS is starting your node." is displayed for ~60s, then LND_START_FAILED timeout fires, restartNeeded(true) is called (restart modal displayed), and after I click the RESTART button, the cycle repeats.

From adb logcat: Go starts fine, reaches Waiting for wallet encryption password (LOCKED state), but no SubscribeState event ever arrives in JS — not even the initial LOCKED event. The 60s LND_READY_TIMEOUT fires and the loop begins.

Likely cause

In LndMobileService.java, inside the startLnd onResponse callback:

if (!streamsStarted.contains("SubscribeState")) {
    streamsStarted.add("SubscribeState");   // added before the null-check
    Method m = streamMethods.get("SubscribeState");
    if (m != null) {
        m.invoke(null, new byte[0], new LndStateStreamCallback(recipient));
    }
    // if m == null: stream is silently never started,
    // but "SubscribeState" stays locked in streamsStarted
}

If streamMethods.get("SubscribeState") returns null (no log, no exception, no fallback), the stream is never started. Because streamsStarted already contains "SubscribeState", every subsequent startLnd call skips the stream as well. JS waits forever for events that will never arrive.

Note: this probably only manifests on the normal startup path. The LND_ALREADY_RUNNING retry path should work because waitForLndReady's stateHandler survives the EOF and picks up events from the restarted LND via a different flow.

Suggested fix

Move streamsStarted.add(...) to after the m != null guard so a silent null can't poison the set:

Method m = streamMethods.get("SubscribeState");
if (m != null) {
    streamsStarted.add("SubscribeState");
    try {
        m.invoke(null, new byte[0], new LndStateStreamCallback(recipient));
    } catch (Exception e) {
        Log.e(TAG, "Failed to start native SubscribeState stream", e);
        streamsStarted.remove("SubscribeState");
    }
} else {
    Log.e(TAG, "SubscribeState method not found in streamMethods");
    // JS subscribeState() call will handle subscription as fallback
}

@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch from e5ecd21 to b4c4ea1 Compare April 16, 2026 05:06
@kaloudis kaloudis added this to the v13.1.0 milestone Apr 16, 2026
@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch 15 times, most recently from 696c5e8 to 64aab0f Compare April 26, 2026 07:59
@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch from 64aab0f to 30a756f Compare May 3, 2026 10:01
@kaloudis kaloudis requested a review from shubhamkmr04 May 7, 2026 05:19
@kaloudis kaloudis requested a review from myxmaster May 7, 2026 05:19
@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch 2 times, most recently from 2a9bee6 to 55f97a9 Compare May 18, 2026 13:57
@ajaysehwal ajaysehwal force-pushed the refactor-lnd-process branch from 701e592 to 2e5237a Compare May 25, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants