perf: use hash() as sort key in stable repr and canonicalize JSON keys #12
DS-Review / DS-Review
failed
May 6, 2026 in 0s
DS-Review completed
PR Review
PR: This PR replaces repr()-based sorting with hash()-based sorting in stable repr routine and canonicalizes JSON output in the SQLite persistence layer.
| Severity | Issue |
|---|---|
P0 |
Sorting by hash() breaks cross-process determinism — different processes may produce different orderings for identical collections due to per-process hash seed (PYTHONHASHSEED). This undermines content-based fingerprint guarantees. — src/cashet/hashing.py:347 |
P1 |
When keys hash to the same value, Python's stable sort preserves insertion order among them, making canonical representation insertion-order-dependent even within a single process. Need a fallback tie-breaking comparator (e.g., repr()) for total order. — src/cashet/hashing.py:347 |
P2 |
The change from repr() to hash() alters the contract: canonical representation is now only deterministic per process. This must be documented in the docstring and README to avoid surprising users with multi-process setups (e.g., Redis, multi-worker). — src/cashet/hashing.py:347 |
P3 |
sort_keys=True in json.dumps() on a list is a no-op; it only affects dictionary keys. Remove it to avoid confusion. — src/cashet/store.py:315 |
Notes
- The switch to
hash()breaks cross-process cache sharing and content-based fingerprint guarantees. Use a deterministic total order (e.g.,repr()or a combination with fallback) to maintain stability across interpreter invocations. - The
sort_keys=Trueon list serialization is harmless but misleading; consider removing for clarity.
Verdict
Request changes — The P0 cross-process determinism issue must be resolved before merging; the PR's performance gains do not outweigh the loss of content-addressability across processes.
Loading