Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## v0.15.0 (TBD)

- Fixed `FifoCache::push` creating a ghost eviction entry when called with a key that already exists in the cache. The duplicate queue entry inflated the eviction queue length beyond the number of live map entries, consumed an eviction slot without a corresponding map value, and caused a still-live entry to be prematurely dropped when the ghost surfaced as the oldest key. Existing keys are now updated in-place without modifying the eviction queue.

- Added `ca-certificates` to the node Docker runtime image so outbound `https` connections work in containerized deployments ([#1661](https://github.com/0xMiden/node/issues/1661)).
- Reworked `SyncNotes` store queries to fetch multiple matching blocks within one database transaction while preserving the response payload cap ([#2027](https://github.com/0xMiden/node/pull/2027)).
- Added composite index `idx_transactions_account_block_txid` on `transactions(account_id, block_num, transaction_id)` to speed up `select_transactions_records` queries used by `SyncTransactions` ([#1965](https://github.com/0xMiden/node/issues/1965)).
Expand Down
12 changes: 12 additions & 0 deletions crates/utils/src/fifo_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,20 @@ where
}

/// Inserts a key-value pair, evicting the oldest entry if the cache is at capacity.
/// Inserts or updates `key` with `value`.
///
/// If `key` already exists, its value is updated in-place without touching the eviction
/// queue. Re-enqueueing an existing key would create a ghost entry: the queue would
/// grow beyond the number of live map entries, consume an eviction slot for a key that
/// may no longer exist, and prematurely evict a valid entry when the ghost surfaces as
/// the oldest key.
pub fn push(&self, key: K, value: V) {
let mut inner = self.0.lock().expect("fifo cache lock poisoned");
// Key already in cache: update value in-place, leave eviction queue unchanged.
if inner.map.contains_key(&key) {
inner.map.insert(key, value);
return;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying the bug. However the FIFO behaviour is still not quite right with this fix.

Every time an entry is pushed into the FIFO cache, it should be put (or moved) into the back of the eviction queue. This still doesn't happen after this fix.

I think we have two options as to how to implement this (without an O(n) scan of the eviction queue):

  1. Use a linked hash map (requires external crate) instead of the map and vecdeque; or
  2. A tombstone mechanism which prevents prior entries in the eviction queue from removing entries that were pushed multiple times.

@Mirko-von-Leipzig any thoughts / preference?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this is used for the block/proofs caching for the subscriptions?

Can we not just use a VecDequeue since they should always be sequential by block number?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caches are used for reads from arbitrarily connected streams. Only the starting block is "arbitrary" though. If we use just a vecdeque, we would have to peek (front()/back()) to determine whether the cache helps with the range of blocks/proofs we need.

Instead of fetch_block() it would maybe be fetch_block_range(), at least until we catch up to the tip (which should be after getting the initial range).

Do you think we should refactor it this way instead? No need for FifoCache then.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was thinking is something like:

struct FifoCache<T> {
    inner: Arc<RwLock<VecDeque<(BlockNumber, Arc<T>)>>>
    capacity: usize,
}

impl<T> FifoCache<T> {
    async fn push(&self, number: BlockNumber, value: Arc<T>) {
        let mut fifo = self.inner.wr_lock().await;

        if let Some((youngest, _)) = fifo.back() {
            assert_eq!(youngest.child(), number);
        }

        if fifo.len() == self.capacity {
            fifo.pop_front();
        }

        fifo.push_back((number, value));
    }

    async fn get(&self, number: BlockNumber) -> Option<Arc<T>> {
        let fifo = self.inner.rd_lock().await;

        let (oldest, _) = fifo.front()?;

        let offset = number.checked_sub(oldest)?;
        fifo.get(offset)
    }
}

for additional safety we could even separate them into a cloneable Reader, and a single Writer on construction.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both for the detailed feedback, this is really helpful tbh

To make sure I understand the direction before I push anything:

The plan is to drop the HashMap + VecDeque design entirely and replace FifoCache with a VecDeque<(BlockNumber, Arc<T>)> that relies on block numbers being strictly sequential. push asserts that the incoming block is the child of the current back (or the queue is empty), pops the front when at capacity, and pushes to the back. get(number) computes the offset from the front (number - oldest) and indexes into the deque in O(1), returning None if the number is outside the cached range.

That keeps everything O(1), avoids the ghost-entry class of bugs entirely (each block can only be inserted once, in order), and removes the need for either a linked hash map crate or a tombstone scheme.

A few clarifying points before I start:

  1. Should this new type live in crates/utils/src/fifo_cache.rs and keep the FifoCache name (now strictly block-keyed), or should it be renamed to something more specific like BlockCache / SequentialCache since it's no longer a general-purpose FIFO?
  2. The current FifoCache is generic over K, V. The redesign hardcodes BlockNumber as the key. Is it fine to make it non-generic, or do you want it generic over a key type that exposes a child() and checked_sub() (so proofs and any future sequential cache can share it)?
  3. For the assertion assert_eq!(youngest.child(), number) — do you want a hard assert! (panic on misuse) or a soft debug_assert! plus a returned Result / silent no-op in release? Given the writer-side discipline implied by the design, I'd lean toward assert! so violations surface immediately in tests.
  4. Re: the Reader/Writer split — happy to add it. Should that go in this PR, or land the core type first and split in a follow-up?

@sergerad regarding your earlier comment about refactoring to fetch_block_range() — should I also touch the call sites in this PR, or keep this PR focused on the cache type and do the call-site changes separately? Doing both in one PR is fine with me, just want to keep the review surface manageable.

Happy to push the change as soon as you confirm the above. I'll also drop the original ghost-entry fix from this branch since the redesign makes it moot, and rewrite the CHANGELOG entry accordingly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we only need this in the store, so lets add it there and maybe FifoBlockCache as a name.
  2. I think making it generic will be a pain so lets keep it BlockNumber focused for now.
  3. assert! please, just ensure its documented
  4. Yeah lets add the split. So on construction it returns a tuple similar to a channel.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding your earlier comment about refactoring to fetch_block_range() — should I also touch the call sites in this PR, or keep this PR focused on the cache type and do the call-site changes separately? Doing both in one PR is fine with me, just want to keep the review surface manageable.

I think we should make this change in this PR. I wouldn't want us to be performing the pop + offset logic for every get when the caller could just request a range of [arbitrary starting block, highest cached block] via something like fetch_block_range(from: BlockNumber) -> Vec<T>.

if inner.eviction.len() >= inner.capacity.get() {
if let Some(oldest) = inner.eviction.pop_front() {
inner.map.remove(&oldest);
Expand Down