Persistent `MonitorEvent`s by valentinewallace · Pull Request #4491 · lightningdevkit/rust-lightning

valentinewallace · 2026-03-17T19:37:08Z

As part of #4482, we're looking into changing our architecture -- currently, the Channel{Manager} is responsible for managing the resolution of off-chain HTLCs, and the ChannelMonitor is responsible for them once they're on-chain. See the issue description but there's complexity that results from this design.

Quoting the issue, "Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them - adding a new MonitorEvent resolution path through a new method (rather than via ChannelMonitorUpdates)."

Here we don't add any new MonitorEvents but add support for making the ones we do have persistent across restarts until they are explicitly ack'd, via a new chain::Watch API.

update commit messages

ldk-reviews-bot · 2026-03-17T19:37:11Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

lightning/src/chain/channelmonitor.rs

lightning/src/ln/channel.rs

TheBlueMatt · 2026-03-17T20:02:00Z

lightning/src/ln/channelmanager.rs

 			(17, in_flight_monitor_updates, option),
 			(19, peer_storage_dir, optional_vec),
 			(21, async_receive_offer_cache, (default_value, async_receive_offer_cache)),
+			(23, persistent_monitor_events, (default_value, false)),


Given the logic isn't complete, should we write this as an even tlv on the write side and deliberately not implement read for it yet?

Do you mean use an even tlv and conditionally write it only if it's set to true? That makes sense to me

Is this the plan for upgrades/downgrades?

Until ~everything's in place, use even TLVs for things but don't write them in prod, and error in prod if they are set on read

Then have a version or two that allows the even TLVs to be set on read, but doesn't write them, to allow downgrades

Then have a version that allows the even TLVs to be missing, runs the legacy reconstruction logic once, then sets persistent_monitor_events in the manager/monitors at the end of manager read so even TLVs will be henceforth written

Yea, that's basically what I was thinking.

codecov · 2026-03-19T23:03:00Z

Codecov Report

❌ Patch coverage is 86.11111% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.20%. Comparing base (d830f10) to head (bbf1786).

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	74.67%	29 Missing and 10 partials ⚠️
lightning/src/chain/channelmonitor.rs	92.20%	6 Missing and 6 partials ⚠️
lightning/src/ln/channel.rs	94.73%	1 Missing and 2 partials ⚠️
lightning/src/chain/chainmonitor.rs	94.73%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4491      +/-   ##
==========================================
+ Coverage   86.18%   86.20%   +0.01%     
==========================================
  Files         160      160              
  Lines      107536   107789     +253     
  Branches   107536   107789     +253     
==========================================
+ Hits        92680    92916     +236     
- Misses      12231    12235       +4     
- Partials     2625     2638      +13

Flag	Coverage Δ
tests	`86.20% <86.11%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TheBlueMatt

Basically LGTM!

TheBlueMatt · 2026-03-20T17:53:34Z

lightning/src/chain/chainmonitor.rs

+}
+
+impl_writeable_tlv_based!(MonitorEventSource, {
+	(0, event_id, required),


nit: odds for new fields so we can drop them later in future versions if we dont want them.

TheBlueMatt · 2026-03-20T17:58:22Z

lightning/src/chain/channelmonitor.rs

+	/// When set, monitor events are retained until explicitly acked rather than cleared on read.
+	///
+	/// Allows the ChannelManager to reconstruct pending HTLC state by replaying monitor events on
+	/// startup, rather than from complexly polling and reconciling Channel{Monitor} APIs, as well as


complexly lol.

TheBlueMatt · 2026-03-20T18:00:50Z

lightning/src/chain/channelmonitor.rs

+	};
+
+	// Only write `persistent_events_enabled` if it's set to true, as it's an even TLV.
+	let persistent_events_enabled = channel_monitor.persistent_events_enabled.then_some(true);


nit should this write () instead of bool?

TheBlueMatt · 2026-03-20T18:15:35Z

lightning/src/chain/channelmonitor.rs

+	}
+
+	/// Removes a [`MonitorEvent`] by its event ID, acknowledging that it has been processed.
+	pub(super) fn ack_monitor_event(&self, event_id: u64) {


Should be pub, no?

TheBlueMatt · 2026-03-20T18:22:22Z

lightning/src/ln/channel.rs

 	fn fail_htlc<L: Logger, E: FailHTLCContents + Clone>(
 		&mut self, htlc_id_arg: u64, err_contents: E, mut force_holding_cell: bool,
-		logger: &L
+		monitor_event_source: Option<MonitorEventSource>, logger: &L


Do we want to go ahead and update the logic here to push the event source if there isn't one and the fail is still pending?

TheBlueMatt · 2026-03-20T18:35:43Z

lightning/src/ln/channelmanager.rs

 					HTLCFailureMsg::Relay(fail_htlc) => HTLCForwardInfo::FailHTLC {
 						htlc_id: fail_htlc.htlc_id,
 						err_packet: fail_htlc.into(),
+						monitor_event_source: None,


Hmm we're failing to forward, presumably there's a monitor event from the inbound edge we have to complete when we fail the HTLC? Bit funny to fail an HTLC back with a monitor event completion for the same channel but makes sense I think?

TheBlueMatt · 2026-03-20T18:44:15Z

lightning/src/ln/channelmanager.rs

+						// inbound channel durably has the preimage, the outbound
+						// monitor event can be acked. The preimage remains in the
+						// outbound monitor's storage and will be replayed to all
+						// remaining inbound hops on restart via pending_claims_to_replay.


Do we want to note that we want to move away from this eventually? We might eventually move to a world where we skip the replay and move to pushing "temporary" monitor events such that we just have to replay monitor events to do the multi-claim.

ldk-reviews-bot · 2026-03-20T18:46:16Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

Helps when debugging to know which variants failed.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. As a first step towards this, here we persist a flag in the ChannelManager and ChannelMonitors indicating whether this new feature is enabled. It will be used in upcoming commits to maintain compatibility and create an upgrade/downgrade path between LDK versions.

valentinewallace · 2026-03-20T21:35:41Z

Rebased with no changes due to conflicts

lightning/src/chain/chainmonitor.rs

ldk-claude-review-bot · 2026-03-20T22:01:13Z

lightning/src/chain/channelmonitor.rs

 				}
 				let duplicate_event = self.pending_monitor_events.iter().any(
-					|update| if let &MonitorEvent::HTLCEvent(ref upd) = update {
+					|(_, update)| if let &MonitorEvent::HTLCEvent(ref upd) = update {


Bug (latent, currently test-only): When persistent_events_enabled is true, get_and_clear_pending_monitor_events moves events from pending_monitor_events to provided_monitor_events. This duplicate check only scans pending_monitor_events, so it will miss events that have been provided but not yet acked. A subsequent block connection could then push a duplicate HTLCEvent.

The same issue exists in the check_connected_output duplicate checks (around lines 6398 and 6420).

Since persistent_events_enabled is gated behind _test_utils/test today, this isn't a production issue yet — but it will be once the feature is enabled. These checks should also scan provided_monitor_events.

ldk-claude-review-bot · 2026-03-20T22:01:24Z

lightning/src/chain/channelmonitor.rs

 				if accepted_preimage_claim {
 					if !self.pending_monitor_events.iter().any(
-						|update| if let &MonitorEvent::HTLCEvent(ref upd) = update { upd.source == source } else { false }) {
+						|(_, update)| if let &MonitorEvent::HTLCEvent(ref upd) = update { upd.source == source } else { false }) {


Same latent issue as the check_spend_height_closed duplicate check: when persistent_events_enabled is true, this only scans pending_monitor_events but events may have already been moved to provided_monitor_events, allowing duplicate event generation.

ldk-claude-review-bot · 2026-03-20T22:03:43Z

Cleans up the next commit

This field will be deprecated in upcoming commits when we start persisting MonitorEvent ids alongside the MonitorEvents.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we add an as-yet-unused API to chain::Watch to allow the ChannelManager to tell the a ChannelMonitor that a MonitorEvent has been irrevocably processed and can be deleted.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. To allow the ChannelManager to ack specific monitor events once they are resolved in upcoming commits, here we give each MonitorEvent a corresponding unique id. It's implemented in such a way that we can delete legacy monitor event code in the future when the new persistent monitor events flag is enabled by default.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we add a MonitorUpdateCompletionAction that will allow the ChannelManager to tell a monitor that an event is complete, upon completion of a particular ChannelMonitorUpdate. For example, this will be used when an HTLC is irrevocably failed backwards post-channel-close, to delete the corresponding MonitorEvent::HTLCEvent.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we take care of the monitor events that are trivial to ACK, since all except HTLCEvents can be ACK'd immediately after the event is handled and don't need to be re-processed on startup.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. If an outbound payment resolved on-chain and we get a monitor event for it, we want to mark that monitor event as resolved once the user has processed a PaymentSent or a PaymentFailed event.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. If the outbound edge of a forwarded payment is resolved on-chain via preimage, and we get a monitor event for it, we want to mark that monitor event resolved once the preimage is durably persisted in the inbound edge. Hence, we add the monitor event resolution to the completion of the inbound edge's preimage monitor update.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. To ensure we can resolve HTLC monitor events for forward failures, we need to pipe the event id through the HTLC failure pipeline, starting from when we first handle the monitor event and call fail_htlc_backwards. To get it from there to the next stage, here we store the monitor event id in the manager's HTLCForwardInfo::Fail*. In upcoming commits we will eventually get to the point of acking the monitor event when the HTLC is irrevocably removed from the inbound edge via monitor update.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. To ensure we can resolve HTLC monitor events for forward failures, we need to pipe the event id through the HTLC failure pipeline. We started this process in the previous commit, and here we continue by storing the monitor event id in the Channel's holding cell HTLC failures. In upcoming commits we will eventually get to the point of acking the monitor event when the HTLC is irrevocably removed from the inbound edge via monitor update.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. To ensure we can resolve HTLC monitor events for forward failures, we need to pipe the event id through the HTLC failure pipeline. Here we pipe the monitor event source from the manager to the channel, so it can be put in the channel's holding cell. In upcoming commits we will eventually get to the point of acking the monitor event when the HTLC is irrevocably removed from the inbound edge via monitor update.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we build on recent commits by ACK'ing monitor events for forward failures once the monitor update that marks them as failed on the inbound edge is complete.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we build on recent commits by ACK'ing monitor events for forward failures once the monitor update that marks them as failed on the inbound edge is complete, when the HTLC was irrevocably failed via revoke_and_ack.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we build on recent commits by ACK'ing monitor events for forward failures if try to fail backwards and discover the inbound edge is closed. If that's the case, there's nothing for us to do as the HTLC will be resolved via timeout by the inbound edge counterparty.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we build on recent commits by ACK'ing monitor events for forward failures if we go to fail on the inbound edge and get an error when queueing the backwards failure. If we get an error in this case it means the failure is duplicate or the channel is closed, and in either case it's fine to mark the event as resolved.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic. Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them. This will simplify things - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents. Here we complete work that was built on recent prior commits and actually start re-providing monitor events on startup if they went un-acked during runtime. This isn't actually supported in prod yet, so this new code will run randomly in tests, to ensure we still support the old paths.

ldk-claude-review-bot · 2026-03-20T22:31:43Z

lightning/src/ln/channel.rs

@@ -8845,15 +8904,21 @@ where
 						.push(PendingChannelMonitorUpdate { update: monitor_update });
 					return Ok(($htlcs_to_fail, static_invoices, None));


Bug: When the monitor update is blocked (!release_monitor), the monitor_events_to_ack populated from the holding cell drain at line 8921 is silently dropped — PendingChannelMonitorUpdate only stores the ChannelMonitorUpdate, not the associated events to ack. When the blocked update is eventually released via unblock_next_blocked_monitor_update, the ChannelManager never learns about these events and cannot add an AckMonitorEvents completion action.

With persistent_events_enabled, this means the outbound monitor events that triggered the HTLC failures will never be acked and will replay on every restart.

The fix would likely involve either:

Adding a monitor_events_to_ack: Vec<MonitorEventSource> field to PendingChannelMonitorUpdate and returning it alongside the update when unblocking.

Or always returning the events to ack from revoke_and_ack (even when the monitor update itself is blocked), so the ChannelManager can store them independently.

ldk-claude-review-bot · 2026-03-20T22:34:51Z

lightning/src/ln/channelmanager.rs

@@ -9245,6 +9360,7 @@ impl<
 							&incoming_trampoline_shared_secret,
 							&None,
 							*htlc_id,
+							if i == 0 { trampoline_monitor_event_source } else { None },


Nit/concern: Unlike the claim path (line 10217-10221), there's no comment explaining why only the first HTLC gets the monitor_event_source for the failure case. The reasoning is different here — for claims, the preimage is stored in the monitor and will be replayed to remaining hops via pending_claims_to_replay. For failures, there's no equivalent replay mechanism for the remaining inbound HTLCs.

If the outbound monitor event is acked (because the first inbound failure was committed and the outbound monitor happened to persist), but the node crashes before the remaining inbound HTLCs are failed, those remaining failures would not be re-triggered on restart.

This is acknowledged by the TODO at line 9327, but it might be worth a comment here explaining why the risk is acceptable today (e.g., the ack is not durable since the outbound monitor doesn't persist on ack, so the event will replay in practice).

TheBlueMatt reviewed Mar 17, 2026

View reviewed changes

valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from 2a22b3e to 6930ccd Compare March 19, 2026 17:11

valentinewallace added this to Weekly Goals Mar 19, 2026

valentinewallace self-assigned this Mar 19, 2026

valentinewallace added the weekly goal Someone wants to land this this week label Mar 19, 2026

valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from 6930ccd to 2eba975 Compare March 19, 2026 20:24

valentinewallace marked this pull request as ready for review March 19, 2026 20:27

ldk-reviews-bot requested a review from wpaulino March 19, 2026 20:27

valentinewallace requested review from TheBlueMatt and removed request for wpaulino March 19, 2026 20:28

TheBlueMatt reviewed Mar 20, 2026

View reviewed changes

valentinewallace added 3 commits March 20, 2026 17:32

Split up some tests that have many variants

2a55a26

Helps when debugging to know which variants failed.

Remove unnecessary pending_monitor_events clone

0430d7e

valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from 2eba975 to b8644fb Compare March 20, 2026 21:34

ldk-claude-review-bot reviewed Mar 20, 2026

View reviewed changes

lightning/src/chain/chainmonitor.rs Show resolved Hide resolved

ldk-claude-review-bot reviewed Mar 20, 2026

View reviewed changes

valentinewallace added 7 commits March 20, 2026 18:07

Add helper to push monitor events

16a291d

Cleans up the next commit

Rename pending_monitor_events to _legacy

fe77aec

This field will be deprecated in upcoming commits when we start persisting MonitorEvent ids alongside the MonitorEvents.

valentinewallace added 9 commits March 20, 2026 18:08

valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from b8644fb to bbf1786 Compare March 20, 2026 22:11

ldk-claude-review-bot reviewed Mar 20, 2026

View reviewed changes

		@@ -8845,15 +8904,21 @@ where
		.push(PendingChannelMonitorUpdate { update: monitor_update });
		return Ok(($htlcs_to_fail, static_invoices, None));

Conversation

valentinewallace commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Mar 20, 2026

Uh oh!

valentinewallace commented Mar 20, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-claude-review-bot commented Mar 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

valentinewallace commented Mar 17, 2026 •

edited

Loading

ldk-reviews-bot commented Mar 17, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading