Skip to content

Optimize eviction candidate selection at scale #11

@DarlingtonDeveloper

Description

@DarlingtonDeveloper

Problem

RetentionEngine::select_eviction_candidates() loads ALL nodes into memory, deserializes them, sorts by (importance ASC, created_at ASC), then takes the top N.

At 100k+ nodes this becomes expensive — full table scan + deserialize + sort.

Current code

crates/cortex-core/src/policies/retention.rsselect_eviction_candidates()

Fix options

  1. Secondary index — maintain a NODES_BY_IMPORTANCE multimap table (bucketed f32 → node IDs), query lowest bucket first
  2. Materialized priority queue — maintain a sorted eviction queue in a dedicated redb table, updated on put_node
  3. Streaming sort with early exit — iterate by created_at (already ordered in redb), keep a bounded heap of size N by importance

Option 3 is simplest and avoids new indexes. Options 1/2 are O(N) on write but O(1) on eviction.

Priority

Low — only matters at scale (100k+ nodes). Current usage is well below this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions