Allow merge integrity checks to be aborted sooner by tlrx · Pull Request #16086 · apache/lucene

tlrx · 2026-05-19T15:29:20Z

This change introduces a merge AbortChecker that is threaded through MergeState into StoredFieldsReader.checkIntegrity and CodecUtil.checksumEntireFile, allowing long-running integrity checks like described in #13354 to be aborted sooner after a merge is aborted.

It is enabled via IndexWriterConfig.setMergeAbortCheckIntervalBytes, which indicates a number of bytes to read during checksumming before checking if the corresponding merge has been aborted.

Only stored fields are wired up but other files like postings and doc values could benefit from the same treatment in a follow-up. Having MergeState expose the AbortChecker.checkAborted method could also be useful to other merging logic like HNSW graphs maybe.

This change introduces a merge AbortChecker that is threaded through MergeState into StoredFieldsReader.checkIntegrity and CodecUtil.checksumEntireFile, allowing long-running integrity checks like in apache#13354 to be aborted sooner after a merge is aborted. It is enabled via IndexWriterConfig.setMergeAbortCheckIntervalBytes, which indicate a number of bytes to read during checksumming before checking if the corresponding merge has been aborted. Only stored fields are wired up but other files like postings and doc values could benefit from the same treatment in a follow-up. Having MergeState expose the AbortChecker.checkAborted method can also be useful to other merging logic I think.

romseygeek

Thanks @tlrx, this looks great. I left a couple of comments.

romseygeek · 2026-05-19T15:40:34Z

        }
      }

+      final MergePolicy.AbortChecker mergeAbortChecker;


Maybe add a factory method that takes a OneMerge and a check interval and returns NO_OP if the interval is 0?

Good suggestion, I added MergePolicy.AbortChecker.create() in 227c021.

romseygeek · 2026-05-19T15:41:43Z

+
+    /** Checks if the merge should be aborted, throwing MergeAbortedException if so. */
+    public void checkAborted() throws MergeAbortedException {
+      oneMerge.checkAborted();


This will throw an NPE if it's called on NO_OP which I think is trappy.

Oups, I pushed a68f87c

romseygeek · 2026-05-19T15:44:18Z

Related: #13354

romseygeek · 2026-05-19T16:37:36Z

+    private final OneMerge oneMerge;
+    private final int abortCheckIntervalBytes;
+
+    public AbortChecker(OneMerge oneMerge, int abortCheckIntervalBytes) {


Let's make this private and do all construction via the factory method.

mikemccand

I don't like that we need a whole new class/interface, touching so many existing classes, for this. If it's just a "check every N bytes" why not implement that more directly -- store that int/long somewhere and fix checkIntegrity.

I'm also not a fan of only doing stored fields -- other Lucene index files can be ginormous (vectors) -- can we impl for those as well? Or if it really must only be stored fields, let's confess so in the docs for this?

Finally, I'm not convinced we understand the root cause here, so I think we may be fixing a "not actually a problem". At Amazon we've also seen some evidence that merge abort was insanely slow, but somehow never followed up on it / got to root cause. Can we first spend some time back on the issue doing that? For example, is there maybe a bug that's turned off the entire abort checking best effort system? (It is similar to Thread.interrupt best effort checking, but hopefully better!).

It's hard for me to believe checkIntegrity can take minutes on any Lucene index file unless the storage system holding the index is insanely slow. Or, the index is insanely massive. Both are possible :)

mikemccand · 2026-05-19T17:07:06Z

+   *
+   * @param intervalBytes the interval in bytes, must be positive
+   */
+  public IndexWriterConfig setMergeAbortCheckIntervalBytes(int intervalBytes) {


Let's make this long? In general if something is measuring bytes let's try to use long -- we do math on such things that may lead to overflow?

does not need a configurable value at all, just a reasonable one (e.g. 1MB). If someone is doing s3, even setting this to 1 could take "forever" due to network latency. We shouldn't overengineer here.

If we can simplify how we do it (see idea on CodecUtil.java), i'd suggest overloading the implementing function with a parameter, to allow unit tests to use a small value, but it doesn't need to be public.

mikemccand · 2026-05-19T17:08:45Z

+   * Returns the interval in bytes between abort checks during merge integrity verification. See
+   * {@link IndexWriterConfig#setMergeAbortCheckIntervalBytes(int)}.
+   */
+  public int getMergeAbortCheckIntervalBytes() {


long instead?

mikemccand · 2026-05-19T17:11:42Z

+   * <p>The check is performed in {@link #readBytes} since that is what {@link
+   * ChecksumIndexInput#seek} calls in a loop to compute the checksum.
+   */
+  private static final class AbortableBufferedChecksumIndexInput


Hmm, you're adding another level of buffering / copying, when someone turns on this new setting? I think? Can we subclass FilterIndexInput instead, so it's simply/only the added accounting (tracking total bytes written) and not more buffering / copying?

mikemccand · 2026-05-19T17:12:25Z

  protected IndexWriterEventListener eventListener;

+  /** Interval in bytes between abort checks during merge integrity verification. */
+  protected volatile int mergeAbortCheckIntervalBytes;


Is it only checkIntegrity that we are newly instrumenting here?

mikemccand · 2026-05-19T17:13:53Z

  abstract void checkIntegrity() throws IOException;

+  /** Check the integrity of the index, with periodic merge abort checking. */
+  public abstract void checkIntegrity(MergePolicy.AbortChecker abortChecker) throws IOException;


Since we are only adding "check every N bytes", can we rename AbortChecker to something more specific? Is this a public API, or will Lucene users only interact with the setter/getter on IWC (taking just int or long)?

mikemccand · 2026-05-19T17:34:01Z

Also, since things can vary drastically depending on the env, let's use time/seconds as the "check every", not bytes? We can hardwire this to something reasonable that's surely in the noise of overall merge cost.

For merges, IndexWriter already wraps the provided Directory to track which files are created for Codec file-tracking purposes, I think? If so, we could in theory wrap that to also instrument every IndexInput with this "check every N seconds", maybe.

We should be able to make this fix (once we really understand root cause -- do any of our unit tests check for prompt merge aborts?) without having to expose another knob for Lucene users to tune ...

tlrx · 2026-05-20T10:25:34Z

Thanks @romseygeek and @mikemccand for your feedback!

Before addressing code review comments, I'll try to reply to the higher-level questions:

I'm not convinced we understand the root cause here, so I think we may be fixing a "not actually a problem". At Amazon we've also seen some evidence that merge abort was insanely slow, but somehow never followed up on it / got to root cause. Can we first spend some time back on the issue doing that?

It's hard for me to believe checkIntegrity can take minutes on any Lucene index file unless the storage system holding the index is insanely slow. Or, the index is insanely massive. Both are possible :)

Yes, the storage systems were indeed insanely slow when it happened. I've seen this in two situations: disks saturated with IO (caused I think by many concurrent force merges running on non top notch SSDs) and storage backed by object storage like S3 with high read latency hiccups, as David describes in #13354 (comment). In those situations, the Lucene index needed to be closed for system maintenance purpose or to be reopened on a more performant machine.

For example, is there maybe a bug that's turned off the entire abort checking best effort system?

I don't think there is a bug in the existing abort checking system itself. The existing merge.checkAborted() calls
work correctly at the points where they are checked (there are 2 or 3 of such calls before and in IndexWriter.mergeMiddle()), but once the merge thread starts the integrity checks on the segment readers to merge, then it is committed to finishing reading all the files associated to all reader to merge before it can observe the abort flag again.

Note that this change adds the checkAborted() method to MergeState, so we could maybe add additional checks on the aborted flag before checking the integrity of a reader. That would be less intrusive and already an improvement in my opinion, since the merge will be dropped later anyway.

do any of our unit tests check for prompt merge aborts?

I found some tests that abort merges but they don't check that the abort happens promptly. Specifically, there is no
test that verifies how "quickly" a merge reacts to the abort signal after it has entered checkIntegrity / checksumEntireFile.

I'm also not a fan of only doing stored fields -- other Lucene index files can be ginormous (vectors) -- can we impl for those as well?

Yes I mentioned in the PR description that other files could benefit from this but wanted to keep the radius of changes low at first.

For vectors, I wonder if more checkAborted() calls could be added too when building large graphs during merges.

For merges, IndexWriter already wraps the provided Directory to track which files are created for Codec file-tracking purposes, I think? If so, we could in theory wrap that to also instrument every IndexInput with this "check every N seconds", maybe.

I explored the MergeScheduler#wrapForMerge path but noticed that many readers were not reopening the files/inputs for integrity checks but instead use clones of already opened inputs, making it hard to wrap the index inputs. The typical pattern seems to be reader -> reader merge instance > clone inputs -> seek to zero (then skip by reading to compute the checksum).

I also explored wrapping the readers (StoredFieldsReader, DocValuesProducer etc) using MergePolicy.OneMerge#wrapForMerge(CodecReader) to override the checkIntegry methods but noticed that it would disable some bulk merges optimizations down the road.

Can we subclass FilterIndexInput instead [...]
since things can vary drastically depending on the env, let's use time/seconds as the "check every", not bytes? We can hardwire this to something reasonable that's surely in the noise of overall merge cost.

If we're only accounting for time spent reading bytes for checksumming purposes without tying it to the OneMerge aborted flag, then I agree we can subclass FilterIndexInput directly in checksumEntireFile.

So I think this leaves us with different options:

Implement a FilterIndexInput in CodecUtil.checksumEntireFile() that periodically checks elapsed time
Use the checkAborted() method added to MergeState in this pull request before each reader.checkIntegrity() call in the various writer/consumer merge methods.
Explore wrapping the merge Directory via MergeScheduler.wrapForMerge to return wrapped IndexInputs that periodically checks checkAborted (may require to reopen files instead of relying on clones)
Continue with the current proposal as-is.

I think we could start with option 2 as a simple first improvement in this PR (doesn't interrupt in-progress checksums but avoids starting new ones on an already-aborted merge) and follow up with option 1 or 3 to interrupt long-running checksums.

Does that sounds reasonable? If so I can rework the PR.

rmuir · 2026-05-22T01:10:53Z

              + footerLength(),
          input);
    }
    in.seek(in.length() - footerLength());


This is the call that will cause your i/o. I wonder if, instead of wrapping in a subclass, if we could just change it to a small loop, that only seek()s say, 1MB at once, and calls checkabort() after each iteration.

Thanks @rmuir, checking every ~1MB seek makes sense and the public configurable value on IndexWriterConfig would become unnecessary.

We'd still need to wire the OneMerge::checkAborted call down to CodecUtil.checksumEntireFile though, so the checkIntegrity() signature on the abstract reader classes would need to change (or be overloaded) to accept the abort callback, something that @mikemccand was concerned about.

I understand his point too. this was just a brainstorm to try to minimize it practically. I feel like wrapping all the reads might be overkill: if we could contain the solution here, I think it would make the merge abort a lot better.

Otherwise, after we checksum, for the most part merge is no longer just "reading" but also doing writes at the same time, so the existing checks should work there.

mikemccand · 2026-05-27T11:17:13Z

For vectors, I wonder if more checkAborted() calls could be added too when building large graphs during merges.

Actually does vectors merging today ever check for aborted, e.g. during the concurrent HNSW graph building, except at the very start/end?

I'm also not a fan of only doing stored fields -- other Lucene index files can be ginormous (vectors) -- can we impl for those as well?

Yes I mentioned in the PR description that other files could benefit from this but wanted to keep the radius of changes low at first.

Got it, +1 for PnP ("progress not perfection") for sure.

I think a lower level approach (wrapping all IndexOutput created during merge, since we already wrap Directory to track which files exist, and checking every N seconds) might apply broadly (vectors, stored fields, massive postings files or doc values) with smaller blast radius on API surface.

For merges, IndexWriter already wraps the provided Directory to track which files are created for Codec file-tracking purposes, I think? If so, we could in theory wrap that to also instrument every IndexInput with this "check every N seconds", maybe.

I explored the MergeScheduler#wrapForMerge path but noticed that many readers were not reopening the files/inputs for integrity checks but instead use clones of already opened inputs, making it hard to wrap the index inputs. The typical pattern seems to be reader -> reader merge instance > clone inputs -> seek to zero (then skip by reading to compute the checksum).

Hmm I see, that is a wrinkle. I was thinking we could wrap the output path, but you're right, checkIntegrity is doing tons of reading (from existing open thingy) and no writing.

I also explored wrapping the readers (StoredFieldsReader, DocValuesProducer etc) using MergePolicy.OneMerge#wrapForMerge(CodecReader) to override the checkIntegry methods but noticed that it would disable some bulk merges optimizations down the road.

That is so tricky -- how did you catch that bulk optos broke? I hope we have tests that get angry?

I think we could start with option 2 as a simple first improvement in this PR (doesn't interrupt in-progress checksums but avoids starting new ones on an already-aborted merge) and follow up with option 1 or 3 to interrupt long-running checksums.

+1 to at least start with #2 -- that's an easy win!

I also ... dare say ... is there any chance Thread.interrupt is usable again? The wildly unexpected "I close your file handle because you interrupted me while reading some bytes from it" side effect of Thread.interrupt is maybe less of a big deal now (memory-mapped segments don't have this problem I think)? Thread.interrupt should work for threads stuck deep in IO for checkIntegrity, except, now you need to know which thread to interrupt.

mikemccand · 2026-05-30T14:37:14Z

I think during merging we wrap all IndexOutput in order to possible rate-limit the IO (off by default), and to check for aborting every X bytes (on). Maybe the problem here is we only instrumented that write path, not the read path, and checkIntegrity obviously is all about reading. Could we just instrument the read path for abort checking, symmetric to the write path, and hopefully as delegator so we don't add extra buffering/copying on read?

tlrx · 2026-06-01T16:04:52Z

Could we just instrument the read path for abort checking, symmetric to the write path, and hopefully as delegator so we don't add extra buffering/copying on read?

The symmetry with the read path is appealing! But I think it would require wrapping (or reopening) the IndexInputs used by the merge instances through the wrapped merge directory. And those merge instances are obtained via getMergeInstance(), so we'd need a way to pass the merge directory to them... I suspect the required changes would be as invasive as what this PR proposes. Unless you were thinking of a different solution?

Overall I think the simplest path is a checkIntegrity(OneMerge) overload that propagates the abort check down to checksumEntireFile, where it checks every ~1MB seek as Robert suggested.

mikemccand · 2026-06-12T13:53:43Z

Overall I think the simplest path is a checkIntegrity(OneMerge) overload that propagates the abort check down to checksumEntireFile, where it checks every ~1MB seek as Robert suggested.

+1, that's a delightfully simple solution.

Could we just instrument the read path for abort checking, symmetric to the write path, and hopefully as delegator so we don't add extra buffering/copying on read?

The symmetry with the read path is appealing! But I think it would require wrapping (or reopening) the IndexInputs used by the merge instances through the wrapped merge directory. And those merge instances are obtained via getMergeInstance(), so we'd need a way to pass the merge directory to them... I suspect the required changes would be as invasive as what this PR proposes. Unless you were thinking of a different solution?

Ahh, hrpmph, yes. And these merges will often re-use an already opened pooled reader for each segment, and those were opened well before any merging plans (so no chance to wrap the IndexInputs). I agree, not simple...

tlrx · 2026-06-16T12:24:40Z

Thanks @mikemccand for the feedback.

I opened #16264 to add additional merge abort checks before executing file integrity checksums (the point 2 that I suggested in #16086 (comment)). Hopefully it's a less controversial change. It also threads OneMerge down to MergeState so that once it is merged I can follow up with the other proposed change:

Overall I think the simplest path is a checkIntegrity(OneMerge) overload that propagates the abort check down to checksumEntireFile, where it checks every ~1MB seek as Robert suggested.

+1, that's a delightfully simple solution.

tlrx · 2026-06-22T08:08:42Z

Closed in favor of #16281

romseygeek requested changes May 19, 2026

View reviewed changes

tlrx added 2 commits May 19, 2026 17:52

check null

a68f87c

factory method

227c021

tlrx requested a review from romseygeek May 19, 2026 16:20

romseygeek reviewed May 19, 2026

View reviewed changes

github-actions Bot added module:core/index module:core/codecs module:test-framework labels May 19, 2026

mikemccand mentioned this pull request May 19, 2026

Merges sometimes do lots of work even after being aborted #13354

Open

mikemccand reviewed May 19, 2026

View reviewed changes

rmuir reviewed May 22, 2026

View reviewed changes

tlrx mentioned this pull request Jun 16, 2026

Check if merge is aborted before executing file integrity checks #16264

Merged

tlrx mentioned this pull request Jun 22, 2026

Check if merge is aborted during file integrity checksums #16281

Open

tlrx closed this Jun 22, 2026

Uh oh!

Conversation

tlrx commented May 19, 2026

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romseygeek commented May 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmuir May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand commented May 19, 2026

Uh oh!

tlrx commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand commented May 27, 2026

Uh oh!

mikemccand commented May 30, 2026

Uh oh!

tlrx commented Jun 1, 2026

Uh oh!

mikemccand commented Jun 12, 2026

Uh oh!

tlrx commented Jun 16, 2026

Uh oh!

tlrx commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rmuir May 22, 2026 •

edited

Loading