Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces Cache Groups functionality to Cripts, providing infrastructure for managing associations between cache entries using custom identifiers. This implementation follows an emerging RFC draft for cache group invalidation patterns in HTTP caching.
Key changes include:
- New Cache::Group class with disk persistence, rotating hash maps, and configurable aging policies
- Thread-safe operations using shared_mutex with automatic periodic syncing to disk
- Example implementation and comprehensive documentation for using Cache Groups in Cripts
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cripts/CacheGroup.cc | Core implementation with hash-based storage, disk I/O, transaction logging, and Manager singleton for lifecycle management |
| include/cripts/CacheGroup.hpp | Public API defining the Group class with Insert/Lookup methods and nested Manager class |
| src/cripts/CMakeLists.txt | Adds CacheGroup.hpp to the list of public headers |
| include/cripts/Matcher.hpp | Includes algorithm header (duplicate include) |
| example/cripts/cache_groups.cc | Working example demonstrating Cache Groups for cache invalidation workflows |
| doc/developer-guide/cripts/cripts-misc.en.rst | Documentation explaining Cache Groups concept and usage patterns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bryancall
left a comment
There was a problem hiding this comment.
Thanks for adding Cache Groups to Cripts! This is a useful feature for implementing cache group invalidation patterns. I have a few observations:
Critical Bug: Iterator increment after erase()
In _cripts_cache_group_sync (line ~50-60), there's an iterator invalidation bug:
for (auto it = groups.begin(); it != groups.end() && processed < max_to_process; ++it) {
if (auto group = it->second.lock()) {
// ...
} else {
it = groups.erase(it); // Returns next iterator, then loop does ++it, skipping an element
}
}When erase() returns the next valid iterator and then the loop increments again, an element gets skipped. Consider:
for (auto it = groups.begin(); it != groups.end() && processed < max_to_process; ) {
if (auto group = it->second.lock()) {
// ...
++it;
} else {
it = groups.erase(it); // Don't increment here
}
}Error Handling
-
Missing file read error checking in
LoadFromDisk(): Thefile.read()calls (lines ~229-232 and ~241) don't check if reads succeeded. If the file is corrupted or truncated, uninitialized data gets used. -
clearLog() called unconditionally in
WriteToDisk(): IfsyncMap()fails, the transaction log is still cleared, which could lead to data loss. Consider only clearing the log after all syncs succeed. -
Inconsistent error reporting: Line 318 uses
std::cerrwhile the rest of the code usesTSWarning. Should be consistent. -
Missing filesystem error handling in
Initialize()(lines ~85-86):create_directoriesandpermissionscan throw or fail silently. TheclearLog()method (line 363) shows the correct pattern using error_code overloads.
Documentation & Style
- Spelling: "hodling" → "holding" (line 289), "assosication" → "association" (docs line 421)
- Duplicate
#include <algorithm>inMatcher.hpp - The magic number
63072000(2 years in seconds) appears multiple times - consider a named constant with documentation explaining the choice
Testing
This is a significant feature with complex persistence logic (disk serialization, transaction log replay, crash recovery). Would be good to have automated test coverage for these code paths.
Minor API Note
The Factory() returning void* requiring manual delete in do_delete_instance() is a bit awkward, but I understand this may be constrained by the Cripts plugin interface.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- cleans up the notion around cached URLs and headers, and cache keys. - adds APIs to set the lookup status as well
bryancall
left a comment
There was a problem hiding this comment.
Thanks for the updates here. After another pass, I still see two blocking concerns that were previously raised.
- Durability and data-loss risk in write path
- WriteToDisk updates last_sync before syncMap confirms success, and clearLog is called unconditionally at the end.
- If map sync or rename fails, this can both suppress retries and drop transaction-log recovery state.
- Please gate last_sync advancement and log truncation on successful persistence of all required slots.
- Missing automated coverage for persistence and recovery semantics
- This feature adds significant on-disk behavior (serialization, map rotation, transaction replay, crash and restart recovery), but there is still no targeted test coverage for these paths.
- Please add tests for truncated or corrupt map files, sync or rename failure handling, and log replay correctness across restart.
The iterator-erase issue in the periodic sync path looks fixed; thanks for addressing that.
Duplicate malformed review; superseded by latest review.
WriteToDisk previously updated last_sync and cleared the transaction log before confirming syncMap succeeded. On rename failure the log would be lost even though the map was never written to disk. syncMap now returns bool; WriteToDisk reverts last_sync on failure and only calls clearLog when all dirty slots have been successfully synced. Adds Catch2 unit tests covering basic insert/lookup, persist-and-reload, transaction log replay across restarts, corrupt/truncated/wrong-version map file handling, sync failure log preservation, and map rotation. More unit tests will be done in a future PR.
Add missing standard headers in CacheGroup.cc. Use std::error_code overload for filesystem::remove in syncMap error paths. Add <filesystem> and <system_error> to CacheGroup.hpp. Use fixed-width types in _MapHeader for a stable on-disk format. Fix max_to_process calculation in the sync continuation to spread groups evenly across the sync window.
bryancall
left a comment
There was a problem hiding this comment.
Everything else looks good -- persistence tests are solid, error handling is thorough, and all 8 existing tests pass clean with ASAN on Fedora 43. Just one bug in the revert logic that needs fixing.
Other minor observations (non-blocking, could be follow-up):
_Entry::timestamp(time_point) and_Entry::length(size_t) are written directly to disk but aren't fixed-width types like_MapHeaderuses. The VERSION field provides a migration path, so this is fine for now.- The Cache Groups RST section is missing a
.. _cripts-misc-cache-groups:anchor label for consistency with other sections.
|
[approve ci freebsd] |
Added the docs link, ignoring the first one (these files are transient, and most certainly won't go between hosts or architectures or ATS versions. |
Fix sync retry logic: revert last_sync to its previous value on syncMap failure instead of setting it to last_write. The old code made the slot appear clean, preventing retries until a new Insert bumped last_write again.
This is a second version, since half of the original patch was merged in a separate PR.