Skip to content

Don't persist inbound committed onions in prod#4599

Open
valentinewallace wants to merge 1 commit intolightningdevkit:mainfrom
valentinewallace:2026-05-no-persist-onions-yet
Open

Don't persist inbound committed onions in prod#4599
valentinewallace wants to merge 1 commit intolightningdevkit:mainfrom
valentinewallace:2026-05-no-persist-onions-yet

Conversation

@valentinewallace
Copy link
Copy Markdown
Contributor

A few PRs ago, we started persisting inbound committed HTLC onions in Channels. These onions were persisted to lay groundwork for reconstructing the ChannelManager's pending HTLC maps from them in a future version. However, we've since discovered a different direction where we can instead reconstruct these same maps using persistent monitor events, which may mean that we don't need to persist these onions.

Since persisting a bunch of onions on every manager write is a big commitment that we're not fully confident in yet, switch it off for now until we either confirm the monitor events direction and can delete all this onion persisting code OR realize that we definitely do need it.

@valentinewallace valentinewallace added this to the 0.3 milestone May 5, 2026
@valentinewallace valentinewallace self-assigned this May 5, 2026
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label May 5, 2026
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Comment thread lightning/src/ln/channel.rs
@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 5, 2026

After thorough re-examination, I now realize my prior critical bug report was incorrect. Here's why:

The variable inbound_committed_update_adds_opt is initialized to None at line 16372 in the manual ReadableArgs impl for FundedChannel. This is different from the auto-generated macro path (_init_tlv_field_var) which would initialize it to Some(Vec::new()). When TLV 75 is absent (production case after this PR), the variable stays None, and the if let Some(update_adds) guard at line 16554 correctly skips the block. Committed inbound HTLCs keep their default InboundUpdateAdd::Legacy state, which is the intended behavior.

Review Summary

Correction to prior review: The critical deserialization bug I flagged previously is a false positive. The FundedChannel ReadableArgs impl manually initializes inbound_committed_update_adds_opt to None (line 16372), not Some(Vec::new()) as the auto-generated optional_vec macro would. When TLV 75 is absent, the if let Some(...) guard at line 16554 is correctly not entered.

No new issues found. The PR correctly:

  1. Gates the write of TLV 75 data behind #[cfg(test)] so production nodes stop persisting onion data
  2. Uses #[cfg_attr(not(test), allow(unused_mut))] to suppress the unused_mut warning in production
  3. Uses _-prefixed variable name to suppress unused variable warnings in production
  4. Maintains test coverage by still writing the data in test mode
  5. Has consistent read-side behavior: absent TLV 75 means None means all committed HTLCs stay Legacy
  6. Is safe for the reconstruct_manager_from_monitors path since that's gated on version >= 2 in production

@ldk-reviews-bot ldk-reviews-bot requested a review from wpaulino May 5, 2026 22:08
A few PRs ago, we started persisting inbound committed HTLC onions in Channels.
These onions were persisted to lay groundwork for reconstructing the
ChannelManager's pending HTLC maps from them in a future version. However,
we've since discovered a different direction where we can instead reconstruct
these same maps using persistent monitor events, which may mean that we don't
need to persist these onions.

Since persisting a bunch of onions on every manager write is a big commitment
that we're not fully confident in yet, switch it off for now until we either
confirm the monitor events direction and can delete all this onion persisting
code OR realize that we definitely do need it.
@valentinewallace valentinewallace force-pushed the 2026-05-no-persist-onions-yet branch from c50518a to d68dbf8 Compare May 5, 2026 22:11
@valentinewallace valentinewallace removed the request for review from wpaulino May 5, 2026 22:16
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.28%. Comparing base (b64efcd) to head (d68dbf8).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4599      +/-   ##
==========================================
+ Coverage   86.22%   86.28%   +0.05%     
==========================================
  Files         159      159              
  Lines      109170   109269      +99     
  Branches   109170   109269      +99     
==========================================
+ Hits        94136    94280     +144     
+ Misses      12424    12379      -45     
  Partials     2610     2610              
Flag Coverage Δ
tests 86.28% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider rolling back all the behavioral parts of #4227 rather than only gating writing in prod. With this PR, prod stops writing the inbound onion data, but the version/reconstruction machinery remains. That leaves SERIALIZATION_VERSION == 2 effectively reserved for “reconstruct HTLC maps from channel data.” If any released version already interprets version 2 that way, we cannot safely reuse version 2 for an unrelated serialization bump later. Older readers would start reconstructing.

The standalone cleanups/refactors from the original PR that are useful on their own can be kept ofc. If we decide to pick up the inbound-onion reconstruction path later, reverting the revert should be straightforward, and in the meantime we avoid carrying extra confusing dormant code in an already subtle area.

@valentinewallace
Copy link
Copy Markdown
Contributor Author

I think we should consider rolling back all the behavioral parts of #4227 rather than only gating writing in prod. With this PR, prod stops writing the inbound onion data, but the version/reconstruction machinery remains. That leaves SERIALIZATION_VERSION == 2 effectively reserved for “reconstruct HTLC maps from channel data.” If any released version already interprets version 2 that way, we cannot safely reuse version 2 for an unrelated serialization bump later. Older readers would start reconstructing.

The standalone cleanups/refactors from the original PR that are useful on their own can be kept ofc. If we decide to pick up the inbound-onion reconstruction path later, reverting the revert should be straightforward, and in the meantime we avoid carrying extra confusing dormant code in an already subtle area.

As mentioned in the PR description and offline, the monitor events approach is still in early enough stages that I'm not really confident at all that we don't need the dormant code. As long as we don't persist anything in prod, we always retain the option to cleanly remove the it later. So my preference would be to make at least a bit more progress on monitor events first, and I'm happy to prioritize removing the dormant code if/as soon as it makes more sense. Also, even if we do decide to go in that direction ~immediately, I still would prefer to land this PR first to minimize 0.3 blockage.

W/r/t reserving the serialization version, similar to previous discussions I don't see it as a huge issue since we've never bumped it in the past, but I'm also fine with hardcoding it to u8::max for now to sidestep those concerns. Or maybe go with the TLV flag approach, I forget if there was some reason not to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

weekly goal Someone wants to land this this week

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants