Skip to content

Conversation

@janniklinde
Copy link
Contributor

@janniklinde janniklinde commented Dec 17, 2025

This PR depends on #2368.

This patch provides a major rework of the OOCEvictionManager by separating cache scheduling logic from I/O handling. The rework is needed

  1. because of the necessity to support (multi-)tile pinning (even with LRU policy NullPointerExceptions can occur under memory pressure, especially when requiring multiple blocks to be resident in memory simultaneously)
  2. to give better cache limit guarantees when parallel reading evicted blocks to prevent OOM
  3. to improve I/O performance
  4. to perform I/O tasks in their own thread pool to not block compute tasks while reading / evicting.

Further, we introduce detailed out-of-core statistics and a fine-grained event log that can be exported to CSV using the CLI options -oocStats [topNHeavyHitters] and -oocLogEvents [savedir]. The event log can be visualized to identify bottlenecks (see image below; performance of pca on 1Mx1000 input matrix). Detailed information to the experiment can be found on this repo. The bottom graph shows compute tasks and idle times of the fixed sized ThreadPool. The y-axis of the bottom three graphs shows the Thread ID of the worker performing the read/write/compute tasks.

monitor

Currently, it is still possible to exceed hard limits of the cache because of uncontrolled producers that are not yet unified with the cache system (e.g., ReblockOOCInstruction).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant