Skip to content

Service: Use iterator to avoid high space complexity#3415

Open
flyrain wants to merge 3 commits intoapache:mainfrom
flyrain:batch-size-pushdown
Open

Service: Use iterator to avoid high space complexity#3415
flyrain wants to merge 3 commits intoapache:mainfrom
flyrain:batch-size-pushdown

Conversation

@flyrain
Copy link
Copy Markdown
Contributor

@flyrain flyrain commented Jan 11, 2026

Fix #2365 (comment)

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Comment on lines +223 to 224
.filter(mf -> seenPaths.add(mf.path()))
.filter(mf -> TaskUtils.exists(mf.path(), fileIO))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Set<String> uniquePaths = tableMetadata.snapshots().stream()
      .flatMap(sn -> sn.allManifests(fileIO).stream())
      .map(ManifestFile::path)
      .collect(Collectors.toSet());

  return uniquePaths.parallelStream()  // Parallel here!
      .filter(mf -> TaskUtils.exists(mf.path(), fileIO))
      .map(mf -> createManifestTask(...));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we call .collect(Collectors.toSet(), the stream is fully materialized, which will lose the benefit of lazy execution. Here we are trying lower the memory footprint based on lazy execution.

createAndRegisterTasks(batch, metaStoreManager, polarisCallContext, tableEntity);
totalCount += batch.size();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can explicitly call batch.clear ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but we don't have to, as this is the last batch.

singhpk234
singhpk234 previously approved these changes Jan 15, 2026
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this seems like a nice improvement thanks @flyrain !

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 15, 2026
@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented Jan 15, 2026

@pingtimeout : what is your take on this PR?

@dimas-b dimas-b requested a review from pingtimeout January 15, 2026 01:55
@pingtimeout
Copy link
Copy Markdown
Contributor

@dimas-b This PR is very confusing to me as after review, I do not think it fixes anything at all...

Copy link
Copy Markdown
Contributor

@pingtimeout pingtimeout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @flyrain for the attempt at fixing the high space complexity issue. This is a good start, but I don't think we are quite there yet.

As far as I can tell, the space complexity of table cleanup was O(UM + PM + S + ST + PST + T). And with this change, it is O(UM + PM + S + ST + PST + batchSize) where:

  • PM = number of previous metadata files
  • S = number of snapshots
  • ST = number of statistics files
  • PST = number of partition statistics files
  • UM = number of unique manifest files across all snapshots
  • T = total number of created TaskEntities

You can see that by running the code with large number of files under constrained memory. You will see that with the current code, there is always a number of files that results in an OOME, proving that the space complexity issue has not been solved by the change. You may want to use realistic (longer) paths to surface the issue faster.

I want to emphasize one critical point that must be addressed before this PR is merged. In #3256, you said the following:

please take a look to see if that solves the problem. It'd be really nice to run this with the same setup we used to validate the current PR which is this PR fixed the issue

Which contradicts the box that you checked in the description of this PR: Added/updated tests with good coverage, or manually tested (and explained how). Were you able to reproduce the issue before attempting to write a fix?

To summarize: based on my review of the code, I am convinced that this does not solve the underlying issue. And based on the lack of testing, I do not think this PR is ready. I appreciate the desire to provide an alternative to #3256. But I think #3256 is the best option we have, all things considered.

}

@Test
public void testMetadataFileBatchingWithManyFiles() throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is named testMetadataFileBatchingWithManyFiles but only creates 24 files in total. Unfortunately that does not prove that the code is better at handling large tables.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of this unit test is not to simulate a truly large table, but to validate the batching behavior and correctness when metadata files are processed incrementally. As is common practice, we avoid stress or scale tests in unit tests, since they would significantly slow down CI execution and are better suited for dedicated benchmark.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that unit tests should be quick to avoid slowing down CI. My main concern here is whether this code change has been tested at scale. And if so, how?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing it up. I think it's a good idea to have a benchmark, more details are here, #3256 (comment).

.stream()
// distinct by manifest path, since multiple snapshots will contain the same manifest
// Use stateful filter to dedupe while streaming
.filter(mf -> seenPaths.add(mf.path()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line adds all unique manifest files across all snapshots to a set that is maintained in memory. Even though the stream is lazy, all unique manifest paths are materialized on the heap. This means that the space complexity does not change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed analysis. I agree that the only remaining unbounded structure here is the in memory set used to dedup manifest paths. I do not think this is a practical concern.

To put concrete numbers on it, with an extreme case that 1 million file paths and an estimated 50 to 100 bytes per path including object and set overhead, the memory footprint would be roughly 40 MB to 95 MB, which is acceptable. That is already a very large table cleanup scenario. At that scale, the question becomes whether we even want the Polaris server itself to handle such a task synchronously in memory. A delegation service would fit better in that case.

int batchSize = callContext.getRealmConfig().getConfig(BATCH_SIZE_CONFIG_KEY, 10);
return getMetadataFileBatches(tableMetadata, batchSize).stream()

// Stream all metadata files without materializing them all at once
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that this change does it to postpone the call to the .map(...) methods, but afaict the memory consumption stays identical.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change is that stream().toList() has been removed to avoid fully materializing the results in memory. Instead, an iterator is used together with a configurable batch size (taskPersistenceBatchSize) to read and process items incrementally. This bounds memory usage, as shown in lines 169 to 175.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters of the stream construction are eager, so I am afraid the only thing lazily evaluated here is the call to .flatMap(Function.identity())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading, removed. However, all file paths here is part of metadata.json file, we've loaded the matadata.json file as a table metadata to memory already. Applying lazy evaluation doesn't make sense here.

@github-project-automation github-project-automation Bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board Jan 15, 2026
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label Feb 28, 2026
@github-actions github-actions Bot closed this Mar 11, 2026
@github-project-automation github-project-automation Bot moved this from PRs In Progress to Done in Basic Kanban Board Mar 11, 2026
@flyrain flyrain reopened this Apr 5, 2026
@github-project-automation github-project-automation Bot moved this from Done to PRs In Progress in Basic Kanban Board Apr 5, 2026
@github-actions github-actions Bot removed the stale label Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Purge table task implementation prone to OOMs

4 participants