-
-
Notifications
You must be signed in to change notification settings - Fork 107
feat(ring): redesign seeding logic with LRU byte-budget cache #2232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduce SeedingCache, a new data structure for contract caching that replaces the ad-hoc count-based and distance-threshold logic in SeedingManager. Key design principles: - Resource-aware: Uses byte budget, not contract count - Demand-driven: Evicts based on LRU access patterns - Manipulation-resistant: Only GET/PUT/SUBSCRIBE refresh position, not UPDATE (which contract creators control) This is the foundation for issue #2231. The cache is not yet integrated with the rest of the system - that will come in subsequent commits. Includes comprehensive unit tests for: - Basic add/remove/contains operations - LRU eviction order - Byte budget enforcement - Large contracts displacing multiple small ones - Access refreshing LRU position - Edge cases (zero budget, oversized contracts) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@claude Please do a critical review of this PR |
|
Claude finished @sanity's task —— View job Critical Review of PR #2232OverviewThis PR introduces a principled LRU byte-budget cache to replace the current ad-hoc seeding logic. The design is well-motivated and the implementation is solid as a foundation. However, there are several critical issues that need addressing before integration. 🔴 Critical Issues1. Incorrect LRU Eviction Budget Check (crates/core/src/ring/seeding_cache.rs:117)The eviction loop has a subtle but serious bug: while self.current_bytes + size_bytes > self.budget_bytes && !self.lru_order.is_empty()Problem: This allows the cache to exceed its budget when adding contracts larger than the budget. Scenario:
This is intentional per the test at line 418 (
Recommendation: Option 3 is best. Add a 2. Zero-Budget Edge Case is Broken (crates/core/src/ring/seeding_cache.rs:390-402)The test assert!(cache.is_empty() || cache.current_bytes() == 100);This test accepts either outcome, which is a red flag. With zero budget:
Fix: Add an early return when pub fn record_access(...) -> Vec<ContractKey> {
// Early reject for oversized contracts
if size_bytes > self.budget_bytes {
return vec![]; // Or return an error type
}
// ... rest of implementation
}3. Missing Integration with Access OperationsThe PR description states "GET/PUT/SUBSCRIBE refresh position" but there's no integration code showing where these operations call
Recommendation: Add integration points or at least stub comments showing where the hooks will go: // In get_op.rs (example):
// cache.record_access(key, state.len() as u64, AccessType::Get);4. No Eviction Callback MechanismWhen contracts are evicted, the caller receives
The current Recommendation: Add an eviction callback: pub struct SeedingCache<T: TimeSource, F: FnMut(ContractKey)> {
// ...
on_evict: Option<F>,
}
// In eviction:
if let Some(ref mut callback) = self.on_evict {
callback(oldest_key);
}Or document that the caller MUST process the returned eviction list.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new LRU-based seeding cache with byte-budget awareness as a foundation for replacing the current ad-hoc seeding logic. The design shifts from magic-number thresholds to a principled, demand-driven approach where storage limits are byte-based and contracts are retained based on actual access patterns (GET/PUT/SUBSCRIBE).
Key Changes
- Implements
SeedingCache<T: TimeSource>with byte-budget aware LRU eviction - Introduces
AccessTypeenum to distinguish manipulation-resistant operations (GET/PUT/SUBSCRIBE) from UPDATE operations - Provides comprehensive test coverage (12 unit tests) for cache operations, eviction logic, and edge cases
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| crates/core/src/ring/seeding_cache.rs | New module implementing byte-budget LRU cache with access tracking, eviction logic, and comprehensive unit tests |
| crates/core/src/ring/mod.rs | Adds seeding_cache module to the ring module |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if let Some(existing) = self.contracts.get_mut(&key) { | ||
| // Already cached - refresh position | ||
| existing.last_accessed = now; | ||
| existing.last_access_type = access_type; | ||
|
|
||
| // Move to back of LRU (most recently used) | ||
| self.lru_order.retain(|k| k != &key); | ||
| self.lru_order.push_back(key); |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When refreshing an existing contract (line 106), the code doesn't handle the case where size_bytes has changed. Contract state can grow or shrink over time (e.g., through PUT operations), but the existing contract's size_bytes field and current_bytes tracking are not updated to reflect the new size.
This will cause incorrect byte budget accounting. For example:
- Contract initially added with 100 bytes
- Contract state grows to 200 bytes
record_access(key, 200, AccessType::Put)is called- The cached
size_bytesremains 100, andcurrent_bytesis not adjusted - The cache now underestimates its memory usage by 100 bytes
Recommendation: Update the existing contract's size and adjust current_bytes accordingly:
if let Some(existing) = self.contracts.get_mut(&key) {
// Update byte accounting if size changed
let size_delta = size_bytes as i64 - existing.size_bytes as i64;
self.current_bytes = if size_delta >= 0 {
self.current_bytes + size_delta as u64
} else {
self.current_bytes.saturating_sub((-size_delta) as u64)
};
existing.size_bytes = size_bytes;
existing.last_accessed = now;
existing.last_access_type = access_type;
// Move to back of LRU (most recently used)
self.lru_order.retain(|k| k != &key);
self.lru_order.push_back(key);
}Additionally, consider whether size increases should trigger eviction if the new size would exceed the budget.
| // With zero budget, any contract should be immediately evicted | ||
| // Actually, this means we can never cache anything | ||
| let evicted = cache.record_access(key, 100, AccessType::Get); | ||
|
|
||
| // With zero budget, the contract is added but we can't evict anything | ||
| // (nothing to evict). This is a pathological edge case. |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test comment is confusing and contradicts the assertion. The comment says "With zero budget, any contract should be immediately evicted" and "this means we can never cache anything", but then the assertion allows for cache.current_bytes() == 100, meaning the contract was actually cached.
Looking at the record_access implementation, with a zero budget the while loop condition self.current_bytes + size_bytes > self.budget_bytes (0 + 100 > 0) is true, but !self.lru_order.is_empty() is false (nothing to evict), so the loop doesn't execute and the contract is added.
Recommendation: Update the comment to accurately describe the actual behavior:
// With zero budget, the contract is added anyway because there's nothing
// to evict. This is a pathological edge case - the cache allows exceeding
// the budget when there are no contracts to evict.| // With zero budget, any contract should be immediately evicted | |
| // Actually, this means we can never cache anything | |
| let evicted = cache.record_access(key, 100, AccessType::Get); | |
| // With zero budget, the contract is added but we can't evict anything | |
| // (nothing to evict). This is a pathological edge case. | |
| // With zero budget, the contract is added anyway because there's nothing | |
| // to evict. This is a pathological edge case - the cache allows exceeding | |
| // the budget when there are no contracts to evict. | |
| let evicted = cache.record_access(key, 100, AccessType::Get); |
| // With zero budget, the contract is added but we can't evict anything | ||
| // (nothing to evict). This is a pathological edge case. |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on lines 398-399 is also confusing and doesn't accurately describe the code behavior. The comment says "we can't evict anything (nothing to evict)" but this applies to any empty cache, not specifically the zero budget case. The more significant issue is that the cache allows adding a contract that exceeds the budget when there's nothing to evict.
Recommendation: Remove or update this comment to focus on the actual edge case behavior (contracts added despite exceeding budget when cache is empty).
| // With zero budget, the contract is added but we can't evict anything | |
| // (nothing to evict). This is a pathological edge case. | |
| // Edge case: When the cache is empty (including zero budget), a contract larger than the budget | |
| // can still be added because there is nothing to evict. This test documents that behavior. |
| #[test] | ||
| fn test_refresh_existing_contract() { | ||
| let time_source = MockTimeSource::new(Instant::now()); | ||
| let mut cache = SeedingCache::new(1000, time_source.clone()); | ||
| let key = make_key(1); | ||
|
|
||
| // First access | ||
| cache.record_access(key, 100, AccessType::Get); | ||
| let first_access = cache.get(&key).unwrap().last_accessed; | ||
|
|
||
| // Advance time and access again | ||
| let mut new_time_source = time_source.clone(); | ||
| new_time_source.advance_time(Duration::from_secs(10)); | ||
| cache.time_source = new_time_source; | ||
|
|
||
| cache.record_access(key, 100, AccessType::Put); | ||
|
|
||
| // Should still be one contract, but updated | ||
| assert_eq!(cache.len(), 1); | ||
| assert_eq!(cache.current_bytes(), 100); // Size unchanged | ||
|
|
||
| let info = cache.get(&key).unwrap(); | ||
| assert_eq!(info.last_access_type, AccessType::Put); | ||
| assert!(info.last_accessed > first_access); | ||
| } |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for contract size changes. The test_refresh_existing_contract test passes the same size (100 bytes) on both accesses, but doesn't test what happens when a contract's size changes between accesses.
This is important because:
- Contract state can grow/shrink through PUT operations
- The current implementation has a bug where size changes aren't reflected in byte accounting (see related bug comment)
Recommendation: Add a test case like:
#[test]
fn test_contract_size_change() {
let mut cache = make_cache(1000);
let key = make_key(1);
// Add contract with initial size
cache.record_access(key, 100, AccessType::Get);
assert_eq!(cache.current_bytes(), 100);
// Update with larger size
cache.record_access(key, 200, AccessType::Put);
assert_eq!(cache.current_bytes(), 200);
// Update with smaller size
cache.record_access(key, 150, AccessType::Put);
assert_eq!(cache.current_bytes(), 150);
}| self.lru_order.retain(|k| k != &key); | ||
| self.lru_order.push_back(key); |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using VecDeque::retain to move an element to the back of the LRU is O(n) where n is the number of cached contracts. This operation happens on every access to an already-cached contract, which could be a hot path.
For better performance with large caches, consider using a doubly-linked list or maintaining a secondary index (e.g., HashMap<ContractKey, usize> mapping to position in lru_order). Alternatively, consider a crate like lru which implements this efficiently.
That said, this may be acceptable depending on expected cache sizes. If the cache typically holds dozens of contracts, this is fine. If it could hold thousands, this becomes a bottleneck.
Recommendation: Either:
- Document the performance characteristics and expected cache size
- Consider a more efficient data structure if cache sizes are expected to be large
- Add a TODO to optimize this if it becomes a bottleneck during integration
| }; | ||
| self.contracts.insert(key, contract); | ||
| self.lru_order.push_back(key); | ||
| self.current_bytes += size_bytes; |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential integer overflow when adding to current_bytes on line 136. If self.current_bytes + size_bytes overflows u64, this will wrap around, causing incorrect byte accounting.
While this is unlikely in practice (would require petabytes of data), defensive programming suggests using checked or saturating arithmetic.
Recommendation: Use checked arithmetic and handle the overflow case:
self.current_bytes = self.current_bytes.saturating_add(size_bytes);This ensures the cache gracefully handles pathological cases rather than silently wrapping around.
| self.current_bytes += size_bytes; | |
| self.current_bytes = self.current_bytes.saturating_add(size_bytes); |
Addresses review feedback from Claude and Copilot on PR #2232: 1. **Contract size changes now update byte accounting** - When a contract's size changes (e.g., state grows via PUT), the cache now correctly updates `current_bytes` to reflect the new size - Previously, size changes were ignored, causing incorrect accounting 2. **Use saturating arithmetic for safety** - `current_bytes` now uses `saturating_add` to prevent theoretical overflow in pathological cases (petabytes of data) 3. **Document O(n) LRU performance** - Added comment explaining that `VecDeque::retain` is O(n) - Acceptable for typical cache sizes (dozens to hundreds) - Note to consider `lru` crate if scaling to thousands 4. **Clarify edge case behavior in tests** - Zero-budget test: Clearly document that budget is a soft limit when the cache is empty (nothing to evict) - Over-budget test: Explain design decision to accept oversized contracts rather than making them unfindable 5. **Add tests for contract size changes** - test_contract_size_change_increases: Verify byte accounting on grow - test_contract_size_change_decreases: Verify byte accounting on shrink - test_contract_size_change_triggers_no_eviction: Document that size changes don't trigger eviction (only new contracts do) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Complete integration of the byte-budget aware LRU seeding cache: - Replace score-based seeding with LRU access tracking - Update GET operations to call record_get_access with state size - Update PUT operations to call seed_contract with state size - Update test utilities and test infrastructure - Remove old should_seed logic and magic distance thresholds - Add eviction handling: clear subscribers when contracts evicted - Update tests to use new API The seeding decision is now simple: cache contracts accessed via GET/PUT/SUBSCRIBE, let LRU eviction handle limits when byte budget is exceeded (default 100MB). Full upstream unsubscription on eviction (sending Unsubscribed messages) is deferred to #2164 (subscription tree branch pruning) as it requires tracking upstream->downstream relationships. Also fixes pre-existing clippy warnings in transport_perf benchmark. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolve conflicts: - Keep LRU byte-budget seed_contract(key, size) API - Add announce_contract_cached() calls from main - Remove dropped contract notification code (handled internally by SeedingManager) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolve conflicts: - Keep simplified subscription handling (eviction handled internally) - Keep comment in transport_perf benchmark 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The subscription request in BroadcastTo state was incorrectly using start_subscription_request_internal with track_parent=false, which meant the parent operation wasn't being tracked. This caused subscribe operations to have state=None when receiving ReturnSub messages, triggering invalid_transition errors. Changed to use start_subscription_request which properly tracks the parent-child relationship. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Problem
The current seeding logic in
SeedingManager::should_seed()uses ad-hoc thresholds that don't reflect principled design:The three-tier decision tree treats all contracts equally regardless of size, doesn't consider actual demand, and conflates storage limits with proximity requirements.
Why This Matters
Design Discussion
This PR emerged from a design discussion (see issue #2231) about how seeding should work from first principles:
Core insight: Proximity emerges naturally from routing - peers near a contract's location see more GETs, keeping it fresh in their caches. We don't need explicit distance thresholds.
Key properties of the new design:
This PR
What's Included
SeedingCache<T: TimeSource>struct incrates/core/src/ring/seeding_cache.rsAccessTypeenum (Get, Put, Subscribe) for manipulation-resistant refreshshould_seedlogic and magic thresholdsIntegration Changes
record_get_access(key, size_bytes)instead ofshould_seed+seed_contractseed_contract(key, size_bytes)with state sizeMockRingto match new APISeedingCacheinstead ofDashMap<ContractKey, Score>What's Deferred to #2164
Unsubscribedto upstream peers requires tracking upstream->downstream relationships per contract. This is documented with a TODO comment.Testing
All 28 seeding-related tests pass:
ring::seeding_cache::*- LRU cache behaviorring::seeding::*- SeedingManager integrationoperations::test_utils::*- Mock ring behaviorRelated
[AI-assisted - Claude]